Blame: docs/source/python/data.rst - apache/arrow

apache / arrow UNCLAIMED

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

0 0 19 C++

Normal View History Raw

ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			`.. Licensed to the Apache Software Foundation (ASF) under one`
			`.. or more contributor license agreements. See the NOTICE file`
			`.. distributed with this work for additional information`
			`.. regarding copyright ownership. The ASF licenses this file`
			`.. to you under the Apache License, Version 2.0 (the`
			`.. "License"); you may not use this file except in compliance`
			`.. with the License. You may obtain a copy of the License at`

			`.. http://www.apache.org/licenses/LICENSE-2.0`

			`.. Unless required by applicable law or agreed to in writing,`
			`.. software distributed under the License is distributed on an`
			`.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`.. KIND, either express or implied. See the License for the`
			`.. specific language governing permissions and limitations`
			`.. under the License.`

			`.. currentmodule:: pyarrow`
			`.. _data:`

ARROW-3094: [Python] Easier construction of schemas and struct types Allow calling `pa.schema()` and `pa.struct()` with a list of tuples, or a mapping of strings to datatypes, instead of having to call `pa.field()` explicitly for each field. The latter is still possible if e.g. wanting to pass metadata. Author: Antoine Pitrou <antoine@python.org> Closes #2450 from pitrou/ARROW-3094-easier-struct-schema-construction and squashes the following commits: 4b275417 <Antoine Pitrou> Use shorthand notation more often in docs 12151239 <Antoine Pitrou> Fix for Python 2.7 39a21df7 <Antoine Pitrou> ARROW-3094: Easier construction of schemas and struct types 2018-08-21 11:51:23 -04:00			`Data Types and In-Memory Data Model`
			`===================================`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			`Apache Arrow defines columnar array data structures by composing type metadata`
			`with memory buffers, like the ones explained in the documentation on`
			:ref:`Memory and IO <io>`. These data structures are exposed in Python through
			`a series of interrelated classes:`

GH-41691: [Doc] Remove notion of "logical type" (#41958) In several places in the Arrow specification and documentation we use the term "logical types", but we don't use it consistently and we don't actually have physical types (only physical layouts) to contrast it with. This creates confusion for readers as it is not immediately clear whether all data types are "logical" and if there is a meaningful distinction behind our usage of this term. Also address GH-14752 by adding a table of data types with their respective parameters and the corresponding layouts. * GitHub Issue: #41691 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2024-06-06 17:38:48 +02:00			* Type Metadata: Instances of ``pyarrow.DataType``, which describe the
			`type of an array and govern how its values are interpreted`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			* Schemas: Instances of ``pyarrow.Schema``, which describe a named
			`collection of types. These can be thought of as the column types in a`
			`table-like object.`
			* Arrays: Instances of ``pyarrow.Array``, which are atomic, contiguous
			`columnar data structures composed from Arrow Buffer objects`
			* Record Batches: Instances of ``pyarrow.RecordBatch``, which are a
			`collection of Array objects with a particular Schema`
			* Tables: Instances of ``pyarrow.Table``, a logical table data structure in
			which each column consists of one or more ``pyarrow.Array`` objects of the
			`same type.`

			`We will examine these in the sections below in a series of examples.`

			`.. _data.types:`

			`Type Metadata`
			`-------------`

			`Apache Arrow defines language agnostic column-oriented data structures for`
			`array data. These include:`

			`* Fixed-length primitive types: numbers, booleans, date and times, fixed`
			`size binary, decimals, and other values that fit into a given number`
			`* Variable-length primitive types: binary, string`
ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00			`* Nested types: list, map, struct, and union`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			`* Dictionary type: An encoded categorical type (more on this later)`

GH-41691: [Doc] Remove notion of "logical type" (#41958) In several places in the Arrow specification and documentation we use the term "logical types", but we don't use it consistently and we don't actually have physical types (only physical layouts) to contrast it with. This creates confusion for readers as it is not immediately clear whether all data types are "logical" and if there is a meaningful distinction behind our usage of this term. Also address GH-14752 by adding a table of data types with their respective parameters and the corresponding layouts. * GitHub Issue: #41691 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2024-06-06 17:38:48 +02:00			`Each data type in Arrow has a corresponding factory function for creating`
			`an instance of that type object in Python:`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> import pyarrow as pa`
			`>>> t1 = pa.int32()`
			`>>> t2 = pa.string()`
			`>>> t3 = pa.binary()`
			`>>> t4 = pa.binary(10)`
			`>>> t5 = pa.timestamp('ms')`
			`>>> t1`
			`DataType(int32)`
			`>>> print(t1)`
			`int32`
			`>>> print(t4)`
			`fixed_size_binary[10]`
			`>>> print(t5)`
			`timestamp[ms]`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-41691: [Doc] Remove notion of "logical type" (#41958) In several places in the Arrow specification and documentation we use the term "logical types", but we don't use it consistently and we don't actually have physical types (only physical layouts) to contrast it with. This creates confusion for readers as it is not immediately clear whether all data types are "logical" and if there is a meaningful distinction behind our usage of this term. Also address GH-14752 by adding a table of data types with their respective parameters and the corresponding layouts. * GitHub Issue: #41691 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2024-06-06 17:38:48 +02:00			`.. note::`
			`Different data types might use a given physical storage. For example,`
			``int64``, ``float64``, and ``timestamp[ms]`` all occupy 64 bits per value.
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-41611: [Docs][CI] Enable most sphinx-lint rules for documentation (#41612) ### Rationale for this change https://github.com/apache/arrow/issues/41611 ### What changes are included in this PR? - Update to pre-commit config to enable all checks except `dangling-hyphen`, `line-too-long` by default - Associated fix docs ### Are these changes tested? Yes, by building and looking at the docs locally. ### Are there any user-facing changes? Just docs. * GitHub Issue: #41611 Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-05-16 03:30:14 -08:00			These objects are ``metadata``; they are used for describing the data in arrays,
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			`schemas, and record batches. In Python, they can be used in functions where the`
			`input data (e.g. Python objects) may be coerced to more than one Arrow type.`

			The :class:`~pyarrow.Field` type is a type plus a name and optional
			`user-defined metadata:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> f0 = pa.field('int32_field', t1)`
			`>>> f0`
			`pyarrow.Field<int32_field: int32>`
			`>>> f0.name`
			`'int32_field'`
			`>>> f0.type`
			`DataType(int32)`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00			`Arrow supports nested value types like list, map, struct, and union. When`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			`creating these, you must pass types or fields to indicate the data types of the`
			`types' children. For example, we can define a list of int32 values with:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> t6 = pa.list_(t1)`
			`>>> t6`
			`ListType(list<item: int32>)`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-41611: [Docs][CI] Enable most sphinx-lint rules for documentation (#41612) ### Rationale for this change https://github.com/apache/arrow/issues/41611 ### What changes are included in this PR? - Update to pre-commit config to enable all checks except `dangling-hyphen`, `line-too-long` by default - Associated fix docs ### Are these changes tested? Yes, by building and looking at the docs locally. ### Are there any user-facing changes? Just docs. * GitHub Issue: #41611 Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-05-16 03:30:14 -08:00			A ``struct`` is a collection of named fields:
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> fields = [`
			`... pa.field('s0', t1),`
			`... pa.field('s1', t2),`
			`... pa.field('s2', t4),`
			`... pa.field('s3', t6),`
			`... ]`
			`>>> t7 = pa.struct(fields)`
			`>>> print(t7)`
			`struct<s0: int32, s1: string, s2: fixed_size_binary[10], s3: list<item: int32>>`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
ARROW-3094: [Python] Easier construction of schemas and struct types Allow calling `pa.schema()` and `pa.struct()` with a list of tuples, or a mapping of strings to datatypes, instead of having to call `pa.field()` explicitly for each field. The latter is still possible if e.g. wanting to pass metadata. Author: Antoine Pitrou <antoine@python.org> Closes #2450 from pitrou/ARROW-3094-easier-struct-schema-construction and squashes the following commits: 4b275417 <Antoine Pitrou> Use shorthand notation more often in docs 12151239 <Antoine Pitrou> Fix for Python 2.7 39a21df7 <Antoine Pitrou> ARROW-3094: Easier construction of schemas and struct types 2018-08-21 11:51:23 -04:00			For convenience, you can pass ``(name, type)`` tuples directly instead of
			:class:`~pyarrow.Field` instances:

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-3094: [Python] Easier construction of schemas and struct types Allow calling `pa.schema()` and `pa.struct()` with a list of tuples, or a mapping of strings to datatypes, instead of having to call `pa.field()` explicitly for each field. The latter is still possible if e.g. wanting to pass metadata. Author: Antoine Pitrou <antoine@python.org> Closes #2450 from pitrou/ARROW-3094-easier-struct-schema-construction and squashes the following commits: 4b275417 <Antoine Pitrou> Use shorthand notation more often in docs 12151239 <Antoine Pitrou> Fix for Python 2.7 39a21df7 <Antoine Pitrou> ARROW-3094: Easier construction of schemas and struct types 2018-08-21 11:51:23 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> t8 = pa.struct([('s0', t1), ('s1', t2), ('s2', t4), ('s3', t6)])`
			`>>> print(t8)`
			`struct<s0: int32, s1: string, s2: fixed_size_binary[10], s3: list<item: int32>>`
			`>>> t8 == t7`
			`True`
ARROW-3094: [Python] Easier construction of schemas and struct types Allow calling `pa.schema()` and `pa.struct()` with a list of tuples, or a mapping of strings to datatypes, instead of having to call `pa.field()` explicitly for each field. The latter is still possible if e.g. wanting to pass metadata. Author: Antoine Pitrou <antoine@python.org> Closes #2450 from pitrou/ARROW-3094-easier-struct-schema-construction and squashes the following commits: 4b275417 <Antoine Pitrou> Use shorthand notation more often in docs 12151239 <Antoine Pitrou> Fix for Python 2.7 39a21df7 <Antoine Pitrou> ARROW-3094: Easier construction of schemas and struct types 2018-08-21 11:51:23 -04:00

ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			See :ref:`Data Types API <api.types>` for a full listing of data type
			`functions.`

			`.. _data.schema:`

			`Schemas`
			`-------`

			The :class:`~pyarrow.Schema` type is similar to the ``struct`` array type; it
			`defines the column names and types in a record batch or table data`
ARROW-3094: [Python] Easier construction of schemas and struct types Allow calling `pa.schema()` and `pa.struct()` with a list of tuples, or a mapping of strings to datatypes, instead of having to call `pa.field()` explicitly for each field. The latter is still possible if e.g. wanting to pass metadata. Author: Antoine Pitrou <antoine@python.org> Closes #2450 from pitrou/ARROW-3094-easier-struct-schema-construction and squashes the following commits: 4b275417 <Antoine Pitrou> Use shorthand notation more often in docs 12151239 <Antoine Pitrou> Fix for Python 2.7 39a21df7 <Antoine Pitrou> ARROW-3094: Easier construction of schemas and struct types 2018-08-21 11:51:23 -04:00			structure. The :func:`pyarrow.schema` factory function makes new Schema objects in
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			`Python:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> my_schema = pa.schema([('field0', t1),`
			`... ('field1', t2),`
			`... ('field2', t4),`
			`... ('field3', t6)])`
			`>>> my_schema`
			`field0: int32`
			`field1: string`
			`field2: fixed_size_binary[10]`
			`field3: list<item: int32>`
			`child 0, item: int32`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			`In some applications, you may not create schemas directly, only using the ones`
			that are embedded in :ref:`IPC messages <ipc>`.

GH-42012: [Python] Add Schema with_field or set_field method (#46348) ### Rationale for this change Wasn't clear to users how to create a new schema based on an existing one ### What changes are included in this PR? Add a test for creating a new schema based on existing one, update docs too ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: #42012 Lead-authored-by: Nic Crane <thisisnic@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Nic Crane <thisisnic@gmail.com> 2025-05-09 11:39:28 -04:00			`Schemas are immutable, which means you can't update an existing schema, but you`
			can create a new one with updated values using :meth:`Schema.set`.

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
GH-42012: [Python] Add Schema with_field or set_field method (#46348) ### Rationale for this change Wasn't clear to users how to create a new schema based on an existing one ### What changes are included in this PR? Add a test for creating a new schema based on existing one, update docs too ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: #42012 Lead-authored-by: Nic Crane <thisisnic@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Nic Crane <thisisnic@gmail.com> 2025-05-09 11:39:28 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> updated_field = pa.field('field0_new', pa.int64())`
			`>>> my_schema2 = my_schema.set(0, updated_field)`
			`>>> my_schema2`
			`field0_new: int64`
			`field1: string`
			`field2: fixed_size_binary[10]`
			`field3: list<item: int32>`
			`child 0, item: int32`
GH-42012: [Python] Add Schema with_field or set_field method (#46348) ### Rationale for this change Wasn't clear to users how to create a new schema based on an existing one ### What changes are included in this PR? Add a test for creating a new schema based on existing one, update docs too ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: #42012 Lead-authored-by: Nic Crane <thisisnic@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Nic Crane <thisisnic@gmail.com> 2025-05-09 11:39:28 -04:00
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			`.. _data.array:`

			`Arrays`
			`------`

			`For each data type, there is an accompanying array data structure for holding`
			`memory buffers that define a single contiguous chunk of columnar array`
			`data. When you are using PyArrow, this data may come from IPC tools, though it`
			`can also be created from various types of Python sequences (lists, NumPy`
			`arrays, pandas data).`

			A simple way to create arrays is with ``pyarrow.array``, which is similar to
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00			the ``numpy.array`` function. By default PyArrow will infer the data type
			`for you:`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> arr = pa.array([1, 2, None, 3])`
			`>>> arr`
			`<pyarrow.lib.Int64Array object at ...>`
			`[`
			`1,`
			`2,`
			`null,`
			`3`
			`]`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00			`But you may also pass a specific data type to override type inference:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> pa.array([1, 2], type=pa.uint16())`
			`<pyarrow.lib.UInt16Array object at ...>`
			`[`
			`1,`
			`2`
			`]`
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			The array's ``type`` attribute is the corresponding piece of type metadata:

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> arr.type`
			`DataType(int64)`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			`Each in-memory array has a known length and null count (which will be 0 if`
			`there are no null values):`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> len(arr)`
			`4`
			`>>> arr.null_count`
			`1`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			Scalar values can be selected with normal indexing. ``pyarrow.array`` converts
			``None`` values to Arrow nulls; we return the special ``pyarrow.NA`` value for
			`nulls:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> arr[0]`
			`<pyarrow.Int64Scalar: 1>`
			`>>> arr[2]`
			`<pyarrow.Int64Scalar: None>`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			`Arrow data is immutable, so values can be selected but not assigned.`

			`Arrays can be sliced without copying:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> arr[1:3]`
			`<pyarrow.lib.Int64Array object at ...>`
			`[`
			`2,`
			`null`
			`]`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
ARROW-2806: [C++/Python] More consistent null/nan handling I'll take care of the cast issues mentioned in the ticket in a separate PR, already opened https://issues.apache.org/jira/browse/ARROW-2854 for them. Author: Korn, Uwe <Uwe.Korn@blue-yonder.com> Closes #2270 from xhochy/ARROW-2806 and squashes the following commits: 418f3fb0 <Korn, Uwe> ARROW-2806: More consistent null/nan handling 2018-07-17 14:54:21 +02:00			`None values and NAN handling`
			`~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

			As mentioned in the above section, the Python object ``None`` is always
			converted to an Arrow null element on the conversion to ``pyarrow.Array``. For
			`the float NaN value which is either represented by the Python object`
			``float('nan')`` or ``numpy.nan`` we normally convert it to a valid float
			`value during the conversion. If an integer input is supplied to`
			``pyarrow.array`` that contains ``np.nan``, ``ValueError`` is raised.

ARROW-8444: [Documentation] Fix spelling errors across the codebase Quickly run `codespell` which found a couple of misspellings. Closes #6931 from kszucs/spelling Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com> 2020-04-14 15:13:44 -04:00			`To handle better compatibility with Pandas, we support interpreting NaN values as`
ARROW-2806: [C++/Python] More consistent null/nan handling I'll take care of the cast issues mentioned in the ticket in a separate PR, already opened https://issues.apache.org/jira/browse/ARROW-2854 for them. Author: Korn, Uwe <Uwe.Korn@blue-yonder.com> Closes #2270 from xhochy/ARROW-2806 and squashes the following commits: 418f3fb0 <Korn, Uwe> ARROW-2806: More consistent null/nan handling 2018-07-17 14:54:21 +02:00			null elements. This is enabled automatically on all ``from_pandas`` function and
MINOR: [Python][Docs] Fix two typos in data.rst (#37997) ### What changes are included in this PR? Fixed two minor typos in data.rst Authored-by: Erik McKelvey <Erik.McKelvey.is@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com> 2023-10-03 17:43:21 -07:00			can be enabled on the other conversion functions by passing ``from_pandas=True``
ARROW-2806: [C++/Python] More consistent null/nan handling I'll take care of the cast issues mentioned in the ticket in a separate PR, already opened https://issues.apache.org/jira/browse/ARROW-2854 for them. Author: Korn, Uwe <Uwe.Korn@blue-yonder.com> Closes #2270 from xhochy/ARROW-2806 and squashes the following commits: 418f3fb0 <Korn, Uwe> ARROW-2806: More consistent null/nan handling 2018-07-17 14:54:21 +02:00			`as a function parameter.`

ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00			`List arrays`
			`~~~~~~~~~~~`

			``pyarrow.array`` is able to infer the type of simple nested data structures
			`like lists:`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> nested_arr = pa.array([[], None, [1, 2], [None, 1]])`
			`>>> print(nested_arr.type)`
			`list<item: int64>`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-40312: [Python] Add ListView documentation to user guide (#40313) ### Rationale for this change Add documentation of the new ListView array type. ### What changes are included in this PR? * Update the PyArrow data types documentation page ### Are these changes tested? Yes, built docs locally and verified. ### Are there any user-facing changes? No, unless you count documentation updates as user-facing. * GitHub Issue: #40312 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-03-13 03:23:50 -04:00			`ListView arrays`
MINOR: [Docs] Add link to ListView format in pyarrow docs (#40679) ### Rationale for this change We should link to the list view format when referencing this new layout in pyarrow usage docs. ### What changes are included in this PR? * Add a link to the listview layout ### Are these changes tested? Yes, I generated docs and verified the link. ### Are there any user-facing changes? No. Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-03-20 01:16:45 -04:00			`~~~~~~~~~~~~~~~`
GH-40312: [Python] Add ListView documentation to user guide (#40313) ### Rationale for this change Add documentation of the new ListView array type. ### What changes are included in this PR? * Update the PyArrow data types documentation page ### Are these changes tested? Yes, built docs locally and verified. ### Are there any user-facing changes? No, unless you count documentation updates as user-facing. * GitHub Issue: #40312 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-03-13 03:23:50 -04:00
			``pyarrow.array`` can create an alternate list type called ListView:

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
GH-40312: [Python] Add ListView documentation to user guide (#40313) ### Rationale for this change Add documentation of the new ListView array type. ### What changes are included in this PR? * Update the PyArrow data types documentation page ### Are these changes tested? Yes, built docs locally and verified. ### Are there any user-facing changes? No, unless you count documentation updates as user-facing. * GitHub Issue: #40312 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-03-13 03:23:50 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> nested_arr = pa.array([[], None, [1, 2], [None, 1]], type=pa.list_view(pa.int64()))`
			`>>> print(nested_arr.type)`
			`list_view<item: int64>`
GH-40312: [Python] Add ListView documentation to user guide (#40313) ### Rationale for this change Add documentation of the new ListView array type. ### What changes are included in this PR? * Update the PyArrow data types documentation page ### Are these changes tested? Yes, built docs locally and verified. ### Are there any user-facing changes? No, unless you count documentation updates as user-facing. * GitHub Issue: #40312 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-03-13 03:23:50 -04:00
			`ListView arrays have a different set of buffers than List arrays. The ListView array`
			`has both an offsets and sizes buffer, while a List array only has an offsets buffer.`
			`This allows for ListView arrays to specify out-of-order offsets:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> values = [1, 2, 3, 4, 5, 6]`
			`>>> offsets = [4, 2, 0]`
			`>>> sizes = [2, 2, 2]`
			`>>> arr = pa.ListViewArray.from_arrays(offsets, sizes, values)`
			`>>> arr`
			`<pyarrow.lib.ListViewArray object at ...>`
			`[`
			`[`
			`5,`
			`6`
			`],`
			`[`
			`3,`
			`4`
			`],`
			`[`
			`1,`
			`2`
			`]`
			`]`
GH-40312: [Python] Add ListView documentation to user guide (#40313) ### Rationale for this change Add documentation of the new ListView array type. ### What changes are included in this PR? * Update the PyArrow data types documentation page ### Are these changes tested? Yes, built docs locally and verified. ### Are there any user-facing changes? No, unless you count documentation updates as user-facing. * GitHub Issue: #40312 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-03-13 03:23:50 -04:00
MINOR: [Docs] Add link to ListView format in pyarrow docs (#40679) ### Rationale for this change We should link to the list view format when referencing this new layout in pyarrow usage docs. ### What changes are included in this PR? * Add a link to the listview layout ### Are these changes tested? Yes, I generated docs and verified the link. ### Are there any user-facing changes? No. Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-03-20 01:16:45 -04:00			See the format specification for more details on :ref:`listview-layout`.

ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00			`Struct arrays`
			`~~~~~~~~~~~~~`

ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00			``pyarrow.array`` is able to infer the schema of a struct type from arrays of
			`dictionaries:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> pa.array([{'x': 1, 'y': True}, {'z': 3.4, 'x': 4}])`
			`<pyarrow.lib.StructArray object at ...>`
			`-- is_valid: all not null`
			`-- child 0 type: int64`
			`[`
			`1,`
			`4`
			`]`
			`-- child 1 type: bool`
			`[`
			`true,`
			`null`
			`]`
			`-- child 2 type: double`
			`[`
			`null,`
			`3.4`
			`]`
ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00
GH-40312: [Python] Add ListView documentation to user guide (#40313) ### Rationale for this change Add documentation of the new ListView array type. ### What changes are included in this PR? * Update the PyArrow data types documentation page ### Are these changes tested? Yes, built docs locally and verified. ### Are there any user-facing changes? No, unless you count documentation updates as user-facing. * GitHub Issue: #40312 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-03-13 03:23:50 -04:00			`Struct arrays can be initialized from a sequence of Python dicts or tuples. For tuples,`
ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00			`you must explicitly pass the type:`
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> ty = pa.struct([('x', pa.int8()),`
			`... ('y', pa.bool_())])`
			`>>> pa.array([{'x': 1, 'y': True}, {'x': 2, 'y': False}], type=ty)`
			`<pyarrow.lib.StructArray object at ...>`
			`-- is_valid: all not null`
			`-- child 0 type: int8`
			`[`
			`1,`
			`2`
			`]`
			`-- child 1 type: bool`
			`[`
			`true,`
			`false`
			`]`
			`>>> pa.array([(3, True), (4, False)], type=ty)`
			`<pyarrow.lib.StructArray object at ...>`
			`-- is_valid: all not null`
			`-- child 0 type: int8`
			`[`
			`3,`
			`4`
			`]`
			`-- child 1 type: bool`
			`[`
			`true,`
			`false`
			`]`
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00
			`When initializing a struct array, nulls are allowed both at the struct`
			`level and at the individual field level. If initializing from a sequence`
			`of Python dicts, a missing dict key is handled as a null value:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> pa.array([{'x': 1}, None, {'y': None}], type=ty)`
			`<pyarrow.lib.StructArray object at ...>`
			`-- is_valid:`
			`[`
			`true,`
			`false,`
			`true`
			`]`
			`-- child 0 type: int8`
			`[`
			`1,`
			`0,`
			`null`
			`]`
			`-- child 1 type: bool`
			`[`
			`null,`
			`false,`
			`null`
			`]`
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00
			`You can also construct a struct array from existing arrays for each of the`
			`struct's components. In this case, data storage will be shared with the`
			`individual arrays, and no copy is involved:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> xs = pa.array([5, 6, 7], type=pa.int16())`
			`>>> ys = pa.array([False, True, True])`
			`>>> arr = pa.StructArray.from_arrays((xs, ys), names=('x', 'y'))`
			`>>> arr.type`
			`StructType(struct<x: int16, y: bool>)`
			`>>> arr`
			`<pyarrow.lib.StructArray object at ...>`
			`-- is_valid: all not null`
			`-- child 0 type: int16`
			`[`
			`5,`
			`6,`
			`7`
			`]`
			`-- child 1 type: bool`
			`[`
			`false,`
			`true,`
			`true`
			`]`
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00
ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00			`Map arrays`
			`~~~~~~~~~~`

			`Map arrays can be constructed from lists of lists of tuples (key-item pairs), but only if`
			the type is explicitly passed into :meth:`array`:

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> data = [[('x', 1), ('y', 0)], [('a', 2), ('b', 45)]]`
			`>>> ty = pa.map_(pa.string(), pa.int64())`
			`>>> pa.array(data, type=ty)`
			`<pyarrow.lib.MapArray object at ...>`
			`[`
			`keys:`
			`[`
			`"x",`
			`"y"`
			`]`
			`values:`
			`[`
			`1,`
			`0`
			`],`
			`keys:`
			`[`
			`"a",`
			`"b"`
			`]`
			`values:`
			`[`
			`2,`
			`45`
			`]`
			`]`
ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00
GH-40312: [Python] Add ListView documentation to user guide (#40313) ### Rationale for this change Add documentation of the new ListView array type. ### What changes are included in this PR? * Update the PyArrow data types documentation page ### Are these changes tested? Yes, built docs locally and verified. ### Are there any user-facing changes? No, unless you count documentation updates as user-facing. * GitHub Issue: #40312 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-03-13 03:23:50 -04:00			`MapArrays can also be constructed from offset, key, and item arrays. Offsets represent the`
ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00			starting position of each map. Note that the :attr:`MapArray.keys` and :attr:`MapArray.items`
GH-40312: [Python] Add ListView documentation to user guide (#40313) ### Rationale for this change Add documentation of the new ListView array type. ### What changes are included in this PR? * Update the PyArrow data types documentation page ### Are these changes tested? Yes, built docs locally and verified. ### Are there any user-facing changes? No, unless you count documentation updates as user-facing. * GitHub Issue: #40312 Authored-by: Dane Pitkin <dane@voltrondata.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2024-03-13 03:23:50 -04:00			`properties give the flattened keys and items. To keep the keys and items associated to`
			their row, use the :meth:`ListArray.from_arrays` constructor with the
ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00			:attr:`MapArray.offsets` property.

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> arr = pa.MapArray.from_arrays([0, 2, 3], ['x', 'y', 'z'], [4, 5, 6])`
			`>>> arr.keys`
			`<pyarrow.lib.StringArray object at ...>`
			`[`
			`"x",`
			`"y",`
			`"z"`
			`]`
			`>>> arr.items`
			`<pyarrow.lib.Int64Array object at ...>`
			`[`
			`4,`
			`5,`
			`6`
			`]`
			`>>> pa.ListArray.from_arrays(arr.offsets, arr.keys)`
			`<pyarrow.lib.ListArray object at ...>`
			`[`
			`[`
			`"x",`
			`"y"`
			`],`
			`[`
			`"z"`
			`]`
			`]`
			`>>> pa.ListArray.from_arrays(arr.offsets, arr.items)`
			`<pyarrow.lib.ListArray object at ...>`
			`[`
			`[`
			`4,`
			`5`
			`],`
			`[`
			`6`
			`]`
			`]`
ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray ## Summary of Changes * MapArray should inherit from ListArray to provide access to offsets property. (C++ class has this inheritance.) * Updated StructArray Python tutorial to indicate that schemas can now be inferred in some cases. * Create MapArray Python guide to show how to construct and access keys and items. I mention that the `keys` and `items` attributes are flattened, since that surprised me, and shows how to construct the ListArray containing keys and items. Closes #12007 from wjones127/ARROW-15087-py-doc-maptype Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-11 10:28:05 +01:00
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00			`Union arrays`
			`~~~~~~~~~~~~`

			`The union type represents a nested array type where each value can be one`
			`(and only one) of a set of possible types. There are two possible`
			`storage types for union arrays: sparse and dense.`

			`In a sparse union array, each of the child arrays has the same length`
			as the resulting union array. They are adjuncted with a ``int8`` "types"
			`array that tells, for each value, from which child array it must be`
			`selected:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> xs = pa.array([5, 6, 7])`
			`>>> ys = pa.array([False, False, True])`
			`>>> types = pa.array([0, 1, 1], type=pa.int8())`
			`>>> union_arr = pa.UnionArray.from_sparse(types, [xs, ys])`
			`>>> union_arr.type`
			`SparseUnionType(sparse_union<0: int64=0, 1: bool=1>)`
			`>>> union_arr`
			`<pyarrow.lib.UnionArray object at ...>`
			`-- is_valid: all not null`
			`-- type_ids: [`
			`0,`
			`1,`
			`1`
			`]`
			`-- child 0 type: int64`
			`[`
			`5,`
			`6,`
			`7`
			`]`
			`-- child 1 type: bool`
			`[`
			`false,`
			`false,`
			`true`
			`]`
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00
			In a dense union array, you also pass, in addition to the ``int8`` "types"
			array, a ``int32`` "offsets" array that tells, for each value, at
			`each offset in the selected child array it can be found:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> xs = pa.array([5, 6, 7])`
			`>>> ys = pa.array([False, True])`
			`>>> types = pa.array([0, 1, 1, 0, 0], type=pa.int8())`
			`>>> offsets = pa.array([0, 0, 1, 1, 2], type=pa.int32())`
			`>>> union_arr = pa.UnionArray.from_dense(types, offsets, [xs, ys])`
			`>>> union_arr.type`
			`DenseUnionType(dense_union<0: int64=0, 1: bool=1>)`
			`>>> union_arr`
			`<pyarrow.lib.UnionArray object at ...>`
			`-- is_valid: all not null`
			`-- type_ids: [`
			`0,`
			`1,`
			`1,`
			`0,`
			`0`
			`]`
			`-- value_offsets: [`
			`0,`
			`0,`
			`1,`
			`1,`
			`2`
			`]`
			`-- child 0 type: int64`
			`[`
			`5,`
			`6,`
			`7`
			`]`
			`-- child 1 type: bool`
			`[`
			`false,`
			`true`
			`]`
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00
ARROW-10094: [Python][Doc] Document missing pandas to arrow conversions Just adding a couple of example to some of the missing converstion types. Closes #11445 from danielfrg/ARROW-10094-pandas-category-timestamp Lead-authored-by: "Daniel Rodriguez <daniel@danielfrg.com>" Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Daniel Rodriguez <daniel@danielfrg.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2021-10-20 14:09:35 +02:00			`.. _data.dictionary:`
ARROW-1937: [Python] Document nested array initialization Author: Antoine Pitrou <antoine@python.org> Closes #1663 from pitrou/ARROW-1937-nested-array-init-doc and squashes the following commits: b85d1b2 <Antoine Pitrou> ARROW-1937: Document nested array initialization 2018-02-27 10:39:36 +01:00
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			`Dictionary Arrays`
			`~~~~~~~~~~~~~~~~~`

			`The Dictionary type in PyArrow is a special array type that is similar to a`
			factor in R or a ``pandas.Categorical``. It enables one or more record batches
			`in a file or stream to transmit integer indices referencing a shared`
			`dictionary containing the distinct values in the logical array. This is`
			`particularly often used with strings to save memory and improve performance.`

			`The way that dictionaries are handled in the Apache Arrow format and the way`
			`they appear in C++ and Python is slightly different. We define a special`
			:class:`~.DictionaryArray` type with a corresponding dictionary type. Let's
			`consider an example:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> indices = pa.array([0, 1, 0, 1, 2, 0, None, 2])`
			`>>> dictionary = pa.array(['foo', 'bar', 'baz'])`
			`>>>`
			`>>> dict_array = pa.DictionaryArray.from_arrays(indices, dictionary)`
			`>>> dict_array`
			`<pyarrow.lib.DictionaryArray object at ...>`
			`...`
			`-- dictionary:`
			`[`
			`"foo",`
			`"bar",`
			`"baz"`
			`]`
			`-- indices:`
			`[`
			`0,`
			`1,`
			`0,`
			`1,`
			`2,`
			`0,`
			`null,`
			`2`
			`]`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			`Here we have:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> print(dict_array.type)`
			`dictionary<values=string, indices=int64, ordered=0>`
			`>>> dict_array.indices`
			`<pyarrow.lib.Int64Array object at ...>`
			`[`
			`0,`
			`1,`
			`0,`
			`1,`
			`2,`
			`0,`
			`null,`
			`2`
			`]`
			`>>> dict_array.dictionary`
			`<pyarrow.lib.StringArray object at ...>`
			`[`
			`"foo",`
			`"bar",`
			`"baz"`
			`]`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			When using :class:`~.DictionaryArray` with pandas, the analogue is
			``pandas.Categorical`` (more on this later):

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> dict_array.to_pandas()`
			`0 foo`
			`1 bar`
			`2 foo`
			`3 bar`
			`4 baz`
			`5 foo`
			`6 NaN`
			`7 baz`
			`dtype: category`
GH-49150: [Doc][CI][Python] Doctests failing on rst files due to pandas 3+ (#49088) Fixes: #49150 See https://github.com/apache/arrow/pull/48619#issuecomment-3823269381 ### Rationale for this change Fix CI failures ### What changes are included in this PR? Tests are made more general to allow for Pandas 2 and Pandas 3 style string types ### Are these changes tested? By CI ### Are there any user-facing changes? No * GitHub Issue: #49150 Authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Rok Mihevc <rok@mihevc.org> 2026-02-05 01:06:04 +01:00			`Categories (3, str): ['foo', 'bar', 'baz']`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			`.. _data.record_batch:`

			`Record Batches`
			`--------------`

			`A Record Batch in Apache Arrow is a collection of equal-length array`
			`instances. Let's consider a collection of arrays:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> data = [`
			`... pa.array([1, 2, 3, 4]),`
			`... pa.array(['foo', 'bar', 'baz', None]),`
			`... pa.array([True, None, False, True])`
			`... ]`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			`A record batch can be created from this list of arrays using`
			``RecordBatch.from_arrays``:

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> batch = pa.RecordBatch.from_arrays(data, ['f0', 'f1', 'f2'])`
			`>>> batch.num_columns`
			`3`
			`>>> batch.num_rows`
			`4`
			`>>> batch.schema`
			`f0: int64`
			`f1: string`
			`f2: bool`
			`>>>`
			`>>> batch[1]`
			`<pyarrow.lib.StringArray object at ...>`
			`[`
			`"foo",`
			`"bar",`
			`"baz",`
			`null`
			`]`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			`A record batch can be sliced without copying memory like an array:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> batch2 = batch.slice(1, 3)`
			`>>> batch2[1]`
			`<pyarrow.lib.StringArray object at ...>`
			`[`
			`"bar",`
			`"baz",`
			`null`
			`]`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
			`.. _data.table:`

			`Tables`
			`------`

			The PyArrow :class:`~.Table` type is not part of the Apache Arrow
			`specification, but is rather a tool to help with wrangling multiple record`
			`batches and array pieces as a single logical dataset. As a relevant example, we`
			`may receive multiple small record batches in a socket stream, then need to`
			`concatenate them into contiguous memory for use in NumPy or pandas. The Table`
			`object makes this efficient without requiring additional memory copying.`

			`Considering the record batch we created above, we can create a Table containing`
			one or more copies of the batch using ``Table.from_batches``:

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> batches = [batch] * 5`
			`>>> table = pa.Table.from_batches(batches)`
			`>>> table`
			`pyarrow.Table`
			`f0: int64`
			`f1: string`
			`f2: bool`
			`----`
			`f0: [[1,2,3,4],[1,2,3,4],...,[1,2,3,4],[1,2,3,4]]`
			`f1: [["foo","bar","baz",null],...,["foo","bar","baz",null]]`
			`f2: [[true,null,false,true],...,[true,null,false,true]]`
			`>>> table.num_rows`
			`20`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
ARROW-5893: [C++][Python][GLib][Ruby][MATLAB][R] Remove arrow::Column class This completely removes `arrow::Column` from the C++ library and adapts the Python bindings to follow, to help assist with mailing list discussion. There are other places that this would touch: - [x] GLib - [x] Ruby - [x] MATLAB - [x] R You can see the evident reduction of boilerplate and simplification in many places -- it is particularly pronounced in parquet/arrow/arrow-reader-writer-test.cc. This refactor also exposed a bug where a Column's data type did not match its Field type. Closes #4841 from wesm/remove-column-class and squashes the following commits: b389d4278 <Wes McKinney> Fix up Python/C++ docs to remove references to arrow::Column cc34548f0 <Wes McKinney> Fix non-deterministic Table.from_pydict behavior. Add unicode/utf8 test for pyarrow.table. Fix usages of ChunkedArray.data and add unit test 4be8d5a6d <Wes McKinney> UTF-8 py2/3 compatibility issues 9b5a61e58 <Wes McKinney> Try to fix unicode issue a5ee7b68c <Wes McKinney> Fix MATLAB code (I hope) 2c7c51ec5 <Wes McKinney> code review comments 020f52a86 <Wes McKinney> Re-render R README fd4473e59 <Wes McKinney> Fix up R C++ code and unit tests 244d2a79c <Wes McKinney> Begin refactoring R library. Change Feather to return ChunkedArray from GetColumn d30a365af <Sutou Kouhei> Remove unused variable d89366a9f <Sutou Kouhei> Follow API change 983d378e3 <Sutou Kouhei> Follow API change 3099328c6 <Sutou Kouhei> Follow API change f092f1b88 <Sutou Kouhei> Add missing available annotations 41dd1fe78 <Sutou Kouhei> Revert needless style change f120a7de8 <Sutou Kouhei> Suppress a warning with MSVC 62f364985 <Sutou Kouhei> Remove unused lambda captures 00900518e <Sutou Kouhei> Add API index for 1.0.0 bd6002a79 <Sutou Kouhei> Remove entries 335d7b69d <Sutou Kouhei> Implement Arrow::Column in Ruby fb5ad1cc8 <Sutou Kouhei> Add garrow_chunked_array_get_n_rows() for consistency 60dfb0cf4 <Sutou Kouhei> Add garrow_schema_get_field_index() c4ff97201 <Sutou Kouhei> Remove backward compatible column API from GArrowRecordBatch 179aa67a3 <Sutou Kouhei> Use "column_data" instead of "column" 795b6c12c <Sutou Kouhei> Follow arrow::Column remove ac6f3728b <Kevin Gurney> Remove use of arrow::Column from feather_reader.cc. Remove use of deprecated StatusCode enum class values in handle_status.cc fcaaf8ce6 <Wes McKinney> Add pyarrow.ChunkedArray.flatten method. Remove Column from glib, but haven't fixed unit tests yet f0d48ccc9 <Wes McKinney> Adapt Python bindings c161a9ac8 <Wes McKinney> Fix up Parquet, too 93e3cade2 <Wes McKinney> arrow-tests all passing again a7344a811 <Wes McKinney> Stage 1 of cutting away Column Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Kevin Gurney <kevin.p.gurney@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2019-07-16 23:00:05 -05:00			The table's columns are instances of :class:`~.ChunkedArray`, which is a
			`container for one or more arrays of the same type.`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> c = table[0]`
			`>>> c`
			`<pyarrow.lib.ChunkedArray object at ...>`
			`[`
			`[`
			`1,`
			`2,`
			`3,`
			`4`
			`],`
			`...`
			`[`
			`1,`
			`2,`
			`3,`
			`4`
			`]`
			`]`
			`>>> c.num_chunks`
			`5`
			`>>> c.chunk(0)`
			`<pyarrow.lib.Int64Array object at ...>`
			`[`
			`1,`
			`2,`
			`3,`
			`4`
			`]`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
ARROW-2869: [Python] Add documentation for Array.to_numpy Author: Antoine Pitrou <antoine@python.org> Closes #2351 from pitrou/ARROW-2869-document-numpy and squashes the following commits: 2792dc84 <Antoine Pitrou> Fix renamed reference 8cb89989 <Antoine Pitrou> Revert "Capitalize Pandas" 34d8c36e <Antoine Pitrou> Capitalize Pandas 395231e0 <Antoine Pitrou> Address review comments 347ca4e7 <Antoine Pitrou> ARROW-2869: Add documentation for Array.to_numpy 2018-08-05 16:09:58 -04:00			As you'll see in the :ref:`pandas section <pandas_interop>`, we can convert
			`these objects to contiguous NumPy arrays for use in pandas:`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> c.to_pandas()`
			`0 1`
			`1 2`
			`2 3`
			`3 4`
			`4 1`
			`5 2`
			`6 3`
			`7 4`
			`8 1`
			`9 2`
			`10 3`
			`11 4`
			`12 1`
			`13 2`
			`14 3`
			`15 4`
			`16 1`
			`17 2`
			`18 3`
			`19 4`
			`Name: f0, dtype: int64`
ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00
ARROW-2181: [PYTHON][DOC] Add doc on usage of concat_tables Adding Python API doc on usage of pa.concat_tables. Author: Bryan Cutler <cutlerb@gmail.com> Closes #1733 from BryanCutler/doc-python-concat_table-ARROW-2181 and squashes the following commits: 2f8d9f80 <Bryan Cutler> use full pyarrow in api ref 95087090 <Bryan Cutler> added doc on usage for concat_tables 2018-03-10 13:34:22 -05:00			`Multiple tables can also be concatenated together to form a single table using`
			``pyarrow.concat_tables``, if the schemas are equal:

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-2181: [PYTHON][DOC] Add doc on usage of concat_tables Adding Python API doc on usage of pa.concat_tables. Author: Bryan Cutler <cutlerb@gmail.com> Closes #1733 from BryanCutler/doc-python-concat_table-ARROW-2181 and squashes the following commits: 2f8d9f80 <Bryan Cutler> use full pyarrow in api ref 95087090 <Bryan Cutler> added doc on usage for concat_tables 2018-03-10 13:34:22 -05:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> tables = [table] * 2`
			`>>> table_all = pa.concat_tables(tables)`
			`>>> table_all.num_rows`
			`40`
			`>>> c = table_all[0]`
			`>>> c.num_chunks`
			`10`
ARROW-2181: [PYTHON][DOC] Add doc on usage of concat_tables Adding Python API doc on usage of pa.concat_tables. Author: Bryan Cutler <cutlerb@gmail.com> Closes #1733 from BryanCutler/doc-python-concat_table-ARROW-2181 and squashes the following commits: 2f8d9f80 <Bryan Cutler> use full pyarrow in api ref 95087090 <Bryan Cutler> added doc on usage for concat_tables 2018-03-10 13:34:22 -05:00
			This is similar to ``Table.from_batches``, but uses tables as input instead of
			`record batches. Record batches can be made into tables, but not the other way`
			`around, so if your data is already in table form, then use`
			``pyarrow.concat_tables``.

ARROW-446: [Python] Expand Sphinx documentation for 0.3 I am going to finish the data model section and revamp the Parquet section, so we can get this pushed out with the release announcement tomorrow. We should continue to add a lot of new documentation over the coming weeks Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #656 from wesm/ARROW-446 and squashes the following commits: b92c6d2 [Wes McKinney] Make pass over Parquet docs a46f846 [Wes McKinney] Make a pass over Parquet documentation 066d0b9 [Wes McKinney] Finish first cut at data model section 4f510fb [Wes McKinney] Install IPython before building docs 4885222 [Wes McKinney] Start on a data model section 1d512e9 [Wes McKinney] Add barebones IPC section 0f800d8 [Wes McKinney] Add section on OSFile, MemoryMappedFile aabf5b2 [Wes McKinney] Add draft about memory/io 5968847 [Wes McKinney] More on Memory/IO section 2017-05-08 00:49:37 -04:00			`Custom Schema and Field Metadata`
			`--------------------------------`

ARROW-12545: [Python][Docs] Fill in section about Custom Schema and Field Metadata Closes #11985 from AlenkaF/ARROW-12545 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-10 14:51:59 +01:00			`Arrow supports both schema-level and field-level custom key-value metadata`
			`allowing for systems to insert their own application defined metadata to`
			`customize behavior.`

			Custom metadata can be accessed at :attr:`Schema.metadata` for the schema-level
			and :attr:`Field.metadata` for the field-level.

			Note that this metadata is preserved in :ref:`ipc` processes.

			`To customize the schema metadata of an existing table you can use`
			:meth:`Table.replace_schema_metadata`:

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-12545: [Python][Docs] Fill in section about Custom Schema and Field Metadata Closes #11985 from AlenkaF/ARROW-12545 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-10 14:51:59 +01:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> table.schema.metadata`
			`>>> table = table.replace_schema_metadata({"f0": "First dose"})`
			`>>> table.schema.metadata`
			`{b'f0': b'First dose'}`
ARROW-12545: [Python][Docs] Fill in section about Custom Schema and Field Metadata Closes #11985 from AlenkaF/ARROW-12545 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-10 14:51:59 +01:00
			`To customize the metadata of the field from the table schema you can use`
			:meth:`Field.with_metadata`:

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`
ARROW-12545: [Python][Docs] Fill in section about Custom Schema and Field Metadata Closes #11985 from AlenkaF/ARROW-12545 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-10 14:51:59 +01:00
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> field_f1 = table.schema.field("f1")`
			`>>> field_f1.metadata`
			`>>> field_f1 = field_f1.with_metadata({"f1": "Second dose"})`
			`>>> field_f1.metadata`
			`{b'f1': b'Second dose'}`
ARROW-12545: [Python][Docs] Fill in section about Custom Schema and Field Metadata Closes #11985 from AlenkaF/ARROW-12545 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-10 14:51:59 +01:00
			`Both options create a shallow copy of the data and do not in fact change the`
			`Schema which is immutable. To change the metadata in the schema of the table`
			we created a new object when calling :meth:`Table.replace_schema_metadata`.

			`To change the metadata of the field in the schema we would need to define`
			`a new schema and cast the data to this schema:`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> my_schema2 = pa.schema([`
			`... pa.field('f0', pa.int64(), metadata={"name": "First dose"}),`
			`... pa.field('f1', pa.string(), metadata={"name": "Second dose"}),`
			`... pa.field('f2', pa.bool_())],`
			`... metadata={"f2": "booster"})`
			`>>> t2 = table.cast(my_schema2)`
			`>>> t2.schema.field("f0").metadata`
			`{b'name': b'First dose'}`
			`>>> t2.schema.field("f1").metadata`
			`{b'name': b'Second dose'}`
			`>>> t2.schema.metadata`
			`{b'f2': b'booster'}`
ARROW-12545: [Python][Docs] Fill in section about Custom Schema and Field Metadata Closes #11985 from AlenkaF/ARROW-12545 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-10 14:51:59 +01:00
MINOR: [Python][Docs] Fix two typos in data.rst (#37997) ### What changes are included in this PR? Fixed two minor typos in data.rst Authored-by: Erik McKelvey <Erik.McKelvey.is@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com> 2023-10-03 17:43:21 -07:00			Metadata key and value pairs are ``std::string`` objects in the C++ implementation
ARROW-12545: [Python][Docs] Fill in section about Custom Schema and Field Metadata Closes #11985 from AlenkaF/ARROW-12545 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-01-10 14:51:59 +01:00			and so they are bytes objects (``b'...'``) in Python.
ARROW-15860: [Python] Document RecordBatchReader Closes #12627 from wjones127/ARROW-15860-document-rbr Lead-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-05 10:25:56 +02:00
			`Record Batch Readers`
			`--------------------`

			Many functions in PyArrow either return or take as an argument a :class:`RecordBatchReader`.
			`It can be used like any iterable of record batches, but also provides their common`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`schema without having to get any of the batches.`

			`.. code-block:: python`
ARROW-15860: [Python] Document RecordBatchReader Closes #12627 from wjones127/ARROW-15860-document-rbr Lead-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-05 10:25:56 +02:00
			`>>> schema = pa.schema([('x', pa.int64())])`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>>`
ARROW-15860: [Python] Document RecordBatchReader Closes #12627 from wjones127/ARROW-15860-document-rbr Lead-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-05 10:25:56 +02:00			`>>> def iter_record_batches():`
			`... for i in range(2):`
			`... yield pa.RecordBatch.from_arrays([pa.array([1, 2, 3])], schema=schema)`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>>`
ARROW-15860: [Python] Document RecordBatchReader Closes #12627 from wjones127/ARROW-15860-document-rbr Lead-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-05 10:25:56 +02:00			`>>> reader = pa.RecordBatchReader.from_batches(schema, iter_record_batches())`
			`>>> print(reader.schema)`
			`x: int64`
			`>>> for batch in reader:`
			`... print(batch)`
			`pyarrow.RecordBatch`
			`x: int64`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`----`
			`x: [1,2,3]`
ARROW-15860: [Python] Document RecordBatchReader Closes #12627 from wjones127/ARROW-15860-document-rbr Lead-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-05 10:25:56 +02:00			`pyarrow.RecordBatch`
			`x: int64`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`----`
			`x: [1,2,3]`
ARROW-15860: [Python] Document RecordBatchReader Closes #12627 from wjones127/ARROW-15860-document-rbr Lead-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-04-05 10:25:56 +02:00
			It can also be sent between languages using the :ref:`C stream interface <c-stream-interface>`.
GH-40841: [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) ### Rationale for this change The work on the conversion from `Table`/`RecordBatch` to `Tensor` is progressing and we have to make sure to add information to the documentation. ### What changes are included in this PR? I propose to add - new page (`converting_recordbatch_to_tensor.rst`) in the `cpp/examples` section, - added section (Conversion of RecordBatch do Tensor) in the `docs/source/python/data.rst` the content above would be updated as the features are added in the future (row-major conversion, `Table::ToTensor`, DLPack support for `Tensor` class, etc.) ### Are these changes tested? It will be tested with the crossbow preview-docs job. ### Are there any user-facing changes? No, just documentation. * GitHub Issue: #40841 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2024-03-29 08:29:28 +01:00
MINOR: [Python][Docs] Use CMake presets to simplify Python build installation (#41500) ### Rationale for this change This should simplify the number of steps users have to go through to get a working Python installation from source Authored-by: Will Ayd <william.ayd@icloud.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2024-05-17 04:24:56 -04:00			`Conversion of RecordBatch to Tensor`
GH-40841: [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) ### Rationale for this change The work on the conversion from `Table`/`RecordBatch` to `Tensor` is progressing and we have to make sure to add information to the documentation. ### What changes are included in this PR? I propose to add - new page (`converting_recordbatch_to_tensor.rst`) in the `cpp/examples` section, - added section (Conversion of RecordBatch do Tensor) in the `docs/source/python/data.rst` the content above would be updated as the features are added in the future (row-major conversion, `Table::ToTensor`, DLPack support for `Tensor` class, etc.) ### Are these changes tested? It will be tested with the crossbow preview-docs job. ### Are there any user-facing changes? No, just documentation. * GitHub Issue: #40841 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2024-03-29 08:29:28 +01:00			`-----------------------------------`

			Each array of the ``RecordBatch`` has it's own contiguous memory that is not necessarily
			`adjacent to other arrays. A different memory structure that is used in machine learning`
			`libraries is a two dimensional array (also called a 2-dim tensor or a matrix) which takes`
			`only one contiguous block of memory.`

			For this reason there is a function ``pyarrow.RecordBatch.to_tensor()`` available
			`to efficiently convert tabular columnar data into a tensor.`

			`Data types supported in this conversion are unsigned, signed integer and float`
			`types. Currently only column-major conversion is supported.`

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

			`>>> arr1 = [1, 2, 3, 4, 5]`
			`>>> arr2 = [10, 20, 30, 40, 50]`
			`>>> batch = pa.RecordBatch.from_arrays(`
GH-40841: [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) ### Rationale for this change The work on the conversion from `Table`/`RecordBatch` to `Tensor` is progressing and we have to make sure to add information to the documentation. ### What changes are included in this PR? I propose to add - new page (`converting_recordbatch_to_tensor.rst`) in the `cpp/examples` section, - added section (Conversion of RecordBatch do Tensor) in the `docs/source/python/data.rst` the content above would be updated as the features are added in the future (row-major conversion, `Table::ToTensor`, DLPack support for `Tensor` class, etc.) ### Are these changes tested? It will be tested with the crossbow preview-docs job. ### Are there any user-facing changes? No, just documentation. * GitHub Issue: #40841 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2024-03-29 08:29:28 +01:00			`... [`
			`... pa.array(arr1, type=pa.uint16()),`
			`... pa.array(arr2, type=pa.int16()),`
			`... ], ["a", "b"]`
			`... )`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`>>> batch.to_tensor()`
GH-40841: [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) ### Rationale for this change The work on the conversion from `Table`/`RecordBatch` to `Tensor` is progressing and we have to make sure to add information to the documentation. ### What changes are included in this PR? I propose to add - new page (`converting_recordbatch_to_tensor.rst`) in the `cpp/examples` section, - added section (Conversion of RecordBatch do Tensor) in the `docs/source/python/data.rst` the content above would be updated as the features are added in the future (row-major conversion, `Table::ToTensor`, DLPack support for `Tensor` class, etc.) ### Are these changes tested? It will be tested with the crossbow preview-docs job. ### Are there any user-facing changes? No, just documentation. * GitHub Issue: #40841 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2024-03-29 08:29:28 +01:00			`<pyarrow.Tensor>`
			`type: int32`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`shape: (5, 2)`
			`strides: (8, 4)`
			`>>> batch.to_tensor().to_numpy()`
GH-40841: [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) ### Rationale for this change The work on the conversion from `Table`/`RecordBatch` to `Tensor` is progressing and we have to make sure to add information to the documentation. ### What changes are included in this PR? I propose to add - new page (`converting_recordbatch_to_tensor.rst`) in the `cpp/examples` section, - added section (Conversion of RecordBatch do Tensor) in the `docs/source/python/data.rst` the content above would be updated as the features are added in the future (row-major conversion, `Table::ToTensor`, DLPack support for `Tensor` class, etc.) ### Are these changes tested? It will be tested with the crossbow preview-docs job. ### Are there any user-facing changes? No, just documentation. * GitHub Issue: #40841 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2024-03-29 08:29:28 +01:00			`array([[ 1, 10],`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`[ 2, 20],`
			`[ 3, 30],`
			`[ 4, 40],`
			`[ 5, 50]], dtype=int32)`
GH-40841: [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) ### Rationale for this change The work on the conversion from `Table`/`RecordBatch` to `Tensor` is progressing and we have to make sure to add information to the documentation. ### What changes are included in this PR? I propose to add - new page (`converting_recordbatch_to_tensor.rst`) in the `cpp/examples` section, - added section (Conversion of RecordBatch do Tensor) in the `docs/source/python/data.rst` the content above would be updated as the features are added in the future (row-major conversion, `Table::ToTensor`, DLPack support for `Tensor` class, etc.) ### Are these changes tested? It will be tested with the crossbow preview-docs job. ### Are there any user-facing changes? No, just documentation. * GitHub Issue: #40841 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2024-03-29 08:29:28 +01:00
			With ``null_to_nan`` set to ``True`` one can also convert data with
			nulls. They will be converted to ``NaN``:

GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`.. code-block:: python`

GH-40841: [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) ### Rationale for this change The work on the conversion from `Table`/`RecordBatch` to `Tensor` is progressing and we have to make sure to add information to the documentation. ### What changes are included in this PR? I propose to add - new page (`converting_recordbatch_to_tensor.rst`) in the `cpp/examples` section, - added section (Conversion of RecordBatch do Tensor) in the `docs/source/python/data.rst` the content above would be updated as the features are added in the future (row-major conversion, `Table::ToTensor`, DLPack support for `Tensor` class, etc.) ### Are these changes tested? It will be tested with the crossbow preview-docs job. ### Are there any user-facing changes? No, just documentation. * GitHub Issue: #40841 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2024-03-29 08:29:28 +01:00			`>>> batch = pa.record_batch(`
			`... [`
			`... pa.array([1, 2, 3, 4, None], type=pa.int32()),`
			`... pa.array([10, 20, 30, 40, None], type=pa.float32()),`
			`... ], names = ["a", "b"]`
			`... )`
			`>>> batch.to_tensor(null_to_nan=True).to_numpy()`
			`array([[ 1., 10.],`
GH-28859: [Doc][Python] Use only code-block directive and set up doctest for the python user guide (#48619) ### Rationale for this change In many places in the Python User Guide the code exampels are written with IPython directive (elsewhere code-block is used). IPython directives are converted to IPython format (`In` and `Out` during the doc build). This can lead to slower builds. ### What changes are included in this PR? IPython directives are converted to runnable code-block (with `>>>` and `...`) and pytest doctest support for `.rst` files is added to the `conda-python-docs` CI job. This means the code in the Python User Guide is tested separately to the building of the documentation. ### Are these changes tested? Yes, with the CI. ### Are there any user-facing changes? Changes to the Python User Guide examples will have to be tested with `pytest --doctest-glob='.rst' docs/source/python/file.rst` GitHub Issue: #28859 Lead-authored-by: AlenkaF <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: tadeja <tadeja@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-30 09:09:33 +01:00			`[ 2., 20.],`
			`[ 3., 30.],`
			`[ 4., 40.],`
			`[nan, nan]])`