Blame: docs/source/python/getstarted.rst - apache/arrow

apache / arrow UNCLAIMED

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

0 0 16 C++

Normal View History Raw

ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00			`.. Licensed to the Apache Software Foundation (ASF) under one`
			`.. or more contributor license agreements. See the NOTICE file`
			`.. distributed with this work for additional information`
			`.. regarding copyright ownership. The ASF licenses this file`
			`.. to you under the Apache License, Version 2.0 (the`
			`.. "License"); you may not use this file except in compliance`
			`.. with the License. You may obtain a copy of the License at`

			`.. http://www.apache.org/licenses/LICENSE-2.0`

			`.. Unless required by applicable law or agreed to in writing,`
			`.. software distributed under the License is distributed on an`
			`.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`.. KIND, either express or implied. See the License for the`
			`.. specific language governing permissions and limitations`
			`.. under the License.`

			`.. _getstarted:`

			`Getting Started`
			`===============`

			Arrow manages data in arrays (:class:`pyarrow.Array`), which can be
			grouped in tables (:class:`pyarrow.Table`) to represent columns of data
			`in tabular data.`

			`Arrow also provides support for various formats to get those tabular`
			`data in and out of disk and networks. Most commonly used formats are`
GH-39990: [Docs][CI] Add sphinx-lint for docs linting (#40022) ### What changes are included in this PR? This adds developer tooling to the repo for linting the docs by adding the sphinx-lint tool to archery and our pre-commit hooks. In both locations, only two rules are enabled at the moment (Discussed in https://github.com/apache/arrow/pull/40006): `trailing-whitespace` and `missing-final-newline`. This PR also fixes the individual issues covered by the new tooling. ### Are these changes tested? Yes, though manually. I tested this works by running `archery lint --docs` and `pre-commit` without and without changes that should get caught by the rules. It works as expected. ### Are there any user-facing changes? Yes, 1. Developers that use pre-commit hooks will see a change in behavior when they modify docs 2. Developers using archery will see a new --docs option in `archery lint` 3. Developers working on the docs may see CI failures related to the new checks * Closes: #39990 * GitHub Issue: #39990 Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Bryce Mecum <petridish@gmail.com> 2024-04-30 17:27:26 -08:00			Parquet (:ref:`parquet`) and the IPC format (:ref:`ipc`).
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
			`Creating Arrays and Tables`
			`--------------------------`

			`Arrays in Arrow are collections of data of uniform type. That allows`
			`Arrow to use the best performing implementation to store the data and`
			`perform computations on it. So each array is meant to have data and`
			`a type`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> import pyarrow as pa`
			`>>> days = pa.array([1, 12, 17, 23, 28], type=pa.int8())`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
			`Multiple arrays can be combined in tables to form the columns`
			`in tabular data when attached to a column name`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> months = pa.array([1, 3, 5, 7, 1], type=pa.int8())`
			`>>> years = pa.array([1990, 2000, 1995, 2000, 1995], type=pa.int16())`
			`>>> birthdays_table = pa.table([days, months, years],`
			`... names=["days", "months", "years"])`
			`>>> birthdays_table`
			`pyarrow.Table`
			`days: int8`
			`months: int8`
			`years: int16`
			`----`
			`days: [[1,12,17,23,28]]`
			`months: [[1,3,5,7,1]]`
			`years: [[1990,2000,1995,2000,1995]]`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
			See :ref:`data` for more details.

			`Saving and Loading Tables`
			`-------------------------`

			`Once you have tabular data, Arrow provides out of the box`
			`the features to save and restore that data for common formats`
			`like Parquet:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> import pyarrow.parquet as pq`
			`>>> pq.write_table(birthdays_table, 'birthdays.parquet')`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
			`Once you have your data on disk, loading it back is a single function call,`
			`and Arrow is heavily optimized for memory and speed so loading`
			`data will be as quick as possible`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> reloaded_birthdays = pq.read_table('birthdays.parquet')`
			`>>> reloaded_birthdays`
			`pyarrow.Table`
			`days: int8`
			`months: int8`
			`years: int16`
			`----`
			`days: [[1,12,17,23,28]]`
			`months: [[1,3,5,7,1]]`
			`years: [[1990,2000,1995,2000,1995]]`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
			`Saving and loading back data in arrow is usually done through`
GH-39990: [Docs][CI] Add sphinx-lint for docs linting (#40022) ### What changes are included in this PR? This adds developer tooling to the repo for linting the docs by adding the sphinx-lint tool to archery and our pre-commit hooks. In both locations, only two rules are enabled at the moment (Discussed in https://github.com/apache/arrow/pull/40006): `trailing-whitespace` and `missing-final-newline`. This PR also fixes the individual issues covered by the new tooling. ### Are these changes tested? Yes, though manually. I tested this works by running `archery lint --docs` and `pre-commit` without and without changes that should get caught by the rules. It works as expected. ### Are there any user-facing changes? Yes, 1. Developers that use pre-commit hooks will see a change in behavior when they modify docs 2. Developers using archery will see a new --docs option in `archery lint` 3. Developers working on the docs may see CI failures related to the new checks * Closes: #39990 * GitHub Issue: #39990 Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Bryce Mecum <petridish@gmail.com> 2024-04-30 17:27:26 -08:00			:ref:`Parquet <parquet>`, :ref:`IPC format <ipc>` (:ref:`feather`),
ARROW-13553: [Doc] Add guidelines for code reviews Closes #11757 from pitrou/ARROW-13553-doc-reviewing Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-12-02 18:27:48 +01:00			:ref:`CSV <py-csv>` or :ref:`Line-Delimited JSON <json>` formats.
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
			`Performing Computations`
			`-----------------------`

			`Arrow ships with a bunch of compute functions that can be applied`
GH-39990: [Docs][CI] Add sphinx-lint for docs linting (#40022) ### What changes are included in this PR? This adds developer tooling to the repo for linting the docs by adding the sphinx-lint tool to archery and our pre-commit hooks. In both locations, only two rules are enabled at the moment (Discussed in https://github.com/apache/arrow/pull/40006): `trailing-whitespace` and `missing-final-newline`. This PR also fixes the individual issues covered by the new tooling. ### Are these changes tested? Yes, though manually. I tested this works by running `archery lint --docs` and `pre-commit` without and without changes that should get caught by the rules. It works as expected. ### Are there any user-facing changes? Yes, 1. Developers that use pre-commit hooks will see a change in behavior when they modify docs 2. Developers using archery will see a new --docs option in `archery lint` 3. Developers working on the docs may see CI failures related to the new checks * Closes: #39990 * GitHub Issue: #39990 Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Bryce Mecum <petridish@gmail.com> 2024-04-30 17:27:26 -08:00			`to its arrays and tables, so through the compute functions`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00			`it's possible to apply transformations to the data`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> import pyarrow.compute as pc`
			`>>> pc.value_counts(birthdays_table["years"])`
			`<pyarrow.lib.StructArray object at ...>`
			`-- is_valid: all not null`
			`-- child 0 type: int16`
			`[`
			`1990,`
			`2000,`
			`1995`
			`]`
			`-- child 1 type: int64`
			`[`
			`1,`
			`2,`
			`2`
			`]`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
			See :ref:`compute` for a list of available compute functions and
			`how to use them.`

			`Working with large data`
			`-----------------------`

			Arrow also provides the :class:`pyarrow.dataset` API to work with
			`large data, which will handle for you partitioning of your data in`
			`smaller chunks`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> import pyarrow.dataset as ds`
			`>>> ds.write_dataset(birthdays_table, "savedir", format="parquet",`
			`... partitioning=ds.partitioning(`
			`... pa.schema([birthdays_table.schema.field("years")])`
			`... ))`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
			`Loading back the partitioned dataset will detect the chunks`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> birthdays_dataset = ds.dataset("savedir", format="parquet", partitioning=["years"])`
			`>>> birthdays_dataset.files`
			`['savedir/1990/part-0.parquet', 'savedir/1995/part-0.parquet', 'savedir/2000/part-0.parquet']`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
			`and will lazily load chunks of data only when iterating over them`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> current_year = 2025`
			`>>> for table_chunk in birthdays_dataset.to_batches():`
			`... print("AGES", pc.subtract(current_year, table_chunk["years"]))`
			`AGES [`
			`35`
			`]`
			`AGES [`
			`30,`
			`30`
			`]`
			`AGES [`
			`25,`
			`25`
			`]`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
			`For further details on how to work with big datasets, how to filter them,`
			how to project them, etc., refer to :ref:`dataset` documentation.

MINOR: [Python][Docs] Fix typo and add Returns for new_file/new_stream (#13369) Lead-authored-by: Saul Pwanson <saul@voltrondata.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com> 2022-06-12 07:11:27 -07:00			`Continuing from here`
			`--------------------`
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00
GH-39990: [Docs][CI] Add sphinx-lint for docs linting (#40022) ### What changes are included in this PR? This adds developer tooling to the repo for linting the docs by adding the sphinx-lint tool to archery and our pre-commit hooks. In both locations, only two rules are enabled at the moment (Discussed in https://github.com/apache/arrow/pull/40006): `trailing-whitespace` and `missing-final-newline`. This PR also fixes the individual issues covered by the new tooling. ### Are these changes tested? Yes, though manually. I tested this works by running `archery lint --docs` and `pre-commit` without and without changes that should get caught by the rules. It works as expected. ### Are there any user-facing changes? Yes, 1. Developers that use pre-commit hooks will see a change in behavior when they modify docs 2. Developers using archery will see a new --docs option in `archery lint` 3. Developers working on the docs may see CI failures related to the new checks * Closes: #39990 * GitHub Issue: #39990 Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Bryce Mecum <petridish@gmail.com> 2024-04-30 17:27:26 -08:00			`For digging further into Arrow, you might want to read the`
			:doc:`PyArrow Documentation <./index>` itself or the
ARROW-13404: [Doc][Python] Improve PyArrow documentation for new users * Add link to the cookbooks * Improve a bit landing page for PyArrow for people that don't already know Arrow * Add a Getting Started section to introduce people to Arrow and PyArrow Closes #10999 from amol-/ARROW-13404 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-09-01 12:20:23 +02:00			`Arrow Python Cookbook <https://arrow.apache.org/cookbook/py/>`_