Blame: docs/source/python/orc.rst - apache/arrow

apache / arrow UNCLAIMED

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

0 0 11 C++

Normal View History Raw

ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00			`.. Licensed to the Apache Software Foundation (ASF) under one`
			`.. or more contributor license agreements. See the NOTICE file`
			`.. distributed with this work for additional information`
			`.. regarding copyright ownership. The ASF licenses this file`
			`.. to you under the Apache License, Version 2.0 (the`
			`.. "License"); you may not use this file except in compliance`
			`.. with the License. You may obtain a copy of the License at`

			`.. http://www.apache.org/licenses/LICENSE-2.0`

			`.. Unless required by applicable law or agreed to in writing,`
			`.. software distributed under the License is distributed on an`
			`.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`.. KIND, either express or implied. See the License for the`
			`.. specific language governing permissions and limitations`
			`.. under the License.`

			`.. currentmodule:: pyarrow`
			`.. _orc:`

			`Reading and Writing the Apache ORC Format`
			`=============================================`

			The `Apache ORC <http://orc.apache.org/>`_ project provides a
			`standardized open-source columnar storage format for use in data analysis`
			systems. It was created originally for use in `Apache Hadoop
			<http://hadoop.apache.org/>`_ with systems like `Apache Drill
			<http://drill.apache.org>`_, `Apache Hive <http://hive.apache.org>`_, `Apache
ARROW-16237: [Docs] Apache Impala is no longer incubating According to https://incubator.apache.org/projects/ Apache Impala graduated in 2017. Hence all instances of "incubating" in our user guides related to Impala need to be removed. Closes #12923 from iajoiner/ARROW-16237 Authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com> 2022-04-20 05:59:00 +09:00			Impala <http://impala.apache.org>`_, and `Apache Spark
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00			<http://spark.apache.org>`_ adopting it as a shared standard for high
			`performance data IO.`

			`Apache Arrow is an ideal in-memory representation layer for data that is being read`
			`or written with ORC files.`

			`Obtaining pyarrow with ORC Support`
			`--------------------------------------`

			If you installed ``pyarrow`` with pip or conda, it should be built with ORC
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`support bundled:`

			`.. code-block:: python`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
			`>>> from pyarrow import orc`

			If you are building ``pyarrow`` from source, you must use
			``-DARROW_ORC=ON`` when compiling the C++ libraries and enable the ORC
			extensions when building ``pyarrow``. See the :ref:`Python Development
			<python-development>` page for more details.

			`Reading and Writing Single Files`
			`--------------------------------`

			The functions :func:`~.orc.read_table` and :func:`~.orc.write_table`
			read and write the :ref:`pyarrow.Table <data.table>` object, respectively.

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`Let's look at a simple table:`

			`.. code-block:: python`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
			`>>> import numpy as np`
			`>>> import pyarrow as pa`

			`>>> table = pa.table(`
			`... {`
			`... 'one': [-1, np.nan, 2.5],`
			`... 'two': ['foo', 'bar', 'baz'],`
			`... 'three': [True, False, True]`
			`... }`
			`... )`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			We write this to ORC format with ``write_table``:

			`.. code-block:: python`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
			`>>> from pyarrow import orc`
			`>>> orc.write_table(table, 'example.orc')`

			`This creates a single ORC file. In practice, an ORC dataset may consist`
			`of many files in many directories. We can read a single file back with`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			``read_table``:

			`.. code-block:: python`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
			`>>> table2 = orc.read_table('example.orc')`

			`You can pass a subset of columns to read, which can be much faster than reading`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`the whole file (due to the columnar layout):`

			`.. code-block:: python`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
			`>>> orc.read_table('example.orc', columns=['one', 'three'])`
			`pyarrow.Table`
			`one: double`
			`three: bool`
			`----`
			`one: [[-1,nan,2.5]]`
			`three: [[true,false,true]]`

			`We need not use a string to specify the origin of the file. It can be any of:`

			`* A file path as a string`
			`* A Python file object`
			`* A pathlib.Path object`
			* A :ref:`NativeFile <io.native_file>` from PyArrow

			`In general, a Python file object will have the worst read performance, while a`
			string file path or an instance of :class:`~.NativeFile` (especially memory
			`maps) will perform the best.`

			`We can also read partitioned datasets with multiple ORC files through the`
			:mod:`pyarrow.dataset <dataset>` interface.

			`.. seealso::`
			:ref:`Documentation for datasets <dataset>`.

			`ORC file writing options`
			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

			:func:`~pyarrow.orc.write_table()` has a number of options to
			`control various settings when writing an ORC file.`

			* ``file_version``, the ORC format version to use. ``'0.11'`` ensures
			compatibility with older readers, while ``'0.12'`` is the newer one.
GH-39990: [Docs][CI] Add sphinx-lint for docs linting (#40022) ### What changes are included in this PR? This adds developer tooling to the repo for linting the docs by adding the sphinx-lint tool to archery and our pre-commit hooks. In both locations, only two rules are enabled at the moment (Discussed in https://github.com/apache/arrow/pull/40006): `trailing-whitespace` and `missing-final-newline`. This PR also fixes the individual issues covered by the new tooling. ### Are these changes tested? Yes, though manually. I tested this works by running `archery lint --docs` and `pre-commit` without and without changes that should get caught by the rules. It works as expected. ### Are there any user-facing changes? Yes, 1. Developers that use pre-commit hooks will see a change in behavior when they modify docs 2. Developers using archery will see a new --docs option in `archery lint` 3. Developers working on the docs may see CI failures related to the new checks * Closes: #39990 * GitHub Issue: #39990 Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Bryce Mecum <petridish@gmail.com> 2024-04-30 17:27:26 -08:00			* ``stripe_size``, to control the approximate size of data within a column
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00			`stripe. This currently defaults to 64MB.`

			See the :func:`~pyarrow.orc.write_table()` docstring for more details.

			`Finer-grained Reading and Writing`
			`---------------------------------`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			``read_table`` uses the :class:`~.ORCFile` class, which has other features:

			`.. code-block:: python`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
			`>>> orc_file = orc.ORCFile('example.orc')`
			`>>> orc_file.metadata`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`<BLANKLINE>`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00			`-- metadata --`
			`>>> orc_file.schema`
			`one: double`
			`two: string`
			`three: bool`
			`>>> orc_file.nrows`
			`3`

			See the :class:`~pyarrow.orc.ORCFile` docstring for more details.

			As you can learn more in the `Apache ORC format
			<https://orc.apache.org/specification/>`_, an ORC file consists of
			multiple stripes. ``read_table`` will read all of the stripes and
			`concatenate them into a single table. You can read individual stripes with`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			``read_stripe``:

			`.. code-block:: python`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
			`>>> orc_file.nstripes`
			`1`
			`>>> orc_file.read_stripe(0)`
			`pyarrow.RecordBatch`
			`one: double`
			`two: string`
			`three: bool`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`----`
			`one: [-1,nan,2.5]`
			`two: ["foo","bar","baz"]`
			`three: [true,false,true]`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			We can write an ORC file using ``ORCWriter``:

			`.. code-block:: python`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
			`>>> with orc.ORCWriter('example2.orc') as writer:`
			`... writer.write(table)`

			`Compression`
			`---------------------------------------------`

			`The data pages within a column in a row group can be compressed after the`
			`encoding passes (dictionary, RLE encoding). In PyArrow we don't use compression`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`by default, but Snappy, ZSTD, Zlib, and LZ4 are also supported:`

			`.. code-block:: python`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> orc.write_table(table, 'example.orc', compression='uncompressed')`
			`>>> orc.write_table(table, 'example.orc', compression='zlib')`
			`>>> orc.write_table(table, 'example.orc', compression='zstd')`
			`>>> orc.write_table(table, 'example.orc', compression='snappy')`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
MINOR: Remove gzip compression reference from Pyarrow Orc doc (#45658) ### Rationale for this change This document includes this sample code: `orc.write_table(table, where, compression='gzip')` which doesn't actually work: `ValueError: Unknown CompressionKind: GZIP` ### What changes are included in this PR? Replace `gzip` references with `zlib`. ### Are these changes tested? Only updated documentation. ### Are there any user-facing changes? Yes, this doc is posted here: https://arrow.apache.org/docs/python/orc.html Authored-by: Gibby Free <33621269+gibbyfree@users.noreply.github.com> Signed-off-by: Gang Wu <ustcwg@gmail.com> 2025-03-04 06:23:08 -08:00			`Snappy generally results in better performance, while Zlib may yield smaller`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00			`files.`

			`Reading from cloud storage`
			`--------------------------`

			`In addition to local files, pyarrow supports other filesystems, such as cloud`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			filesystems, through the ``filesystem`` keyword:

			`.. code-block:: python`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
			`>>> from pyarrow import fs`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> s3 = fs.S3FileSystem(region="us-east-2") # doctest: +SKIP`
			`>>> table = orc.read_table("bucket/object/key/prefix", filesystem=s3) # doctest: +SKIP`
ARROW-13231: [Doc] Add ORC documentation Closes #11779 from iajoiner/ARROW-13231-docs Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-04 10:54:28 +02:00
			`.. seealso::`
			:ref:`Documentation for filesystems <filesystem>`.