Blame: python/pyarrow/_compute_docstrings.py - apache/arrow

apache / arrow UNCLAIMED

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

0 0 9 C++

Normal View History Raw

ARROW-10209: [Python] Support positional options in compute functions This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name: ``` >>> pc.split_pattern("abacab", "a") <pyarrow.ListScalar: ['', 'b', 'c', 'b']> ``` ... and producing the following doc at the prompt: ``` split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None) Split string according to separator. Split each string according to the exact `pattern` defined in SplitPatternOptions. The output for each string input is a list of strings. The maximum number of splits and direction of splitting (forward, reverse) can optionally be defined in SplitPatternOptions. Parameters ---------- strings : Array-like or scalar-like Argument to compute function pattern : optional Parameter for SplitPatternOptions constructor. Either `options` or `pattern` can be passed, but not both at the same time. max_splits : optional Parameter for SplitPatternOptions constructor. Either `options` or `max_splits` can be passed, but not both at the same time. reverse : optional Parameter for SplitPatternOptions constructor. Either `options` or `reverse` can be passed, but not both at the same time. options : pyarrow.compute.SplitPatternOptions, optional Parameters altering compute function semantics. memory_pool : pyarrow.MemoryPool, optional If not passed, will allocate memory from the default memory pool. ``` Closes #11955 from pitrou/ARROW-10209-compute-pos-args Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-12-16 19:49:10 +01:00			`# Licensed to the Apache Software Foundation (ASF) under one`
			`# or more contributor license agreements. See the NOTICE file`
			`# distributed with this work for additional information`
			`# regarding copyright ownership. The ASF licenses this file`
			`# to you under the Apache License, Version 2.0 (the`
			`# "License"); you may not use this file except in compliance`
			`# with the License. You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing,`
			`# software distributed under the License is distributed on an`
			`# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`# KIND, either express or implied. See the License for the`
			`# specific language governing permissions and limitations`
			`# under the License.`

			`"""`
			`Custom documentation additions for compute functions.`
			`"""`

			`function_doc_additions = {}`

			`function_doc_additions["filter"] = """`
			`Examples`
			`--------`
			`>>> import pyarrow as pa`
			`>>> arr = pa.array(["a", "b", "c", None, "e"])`
			`>>> mask = pa.array([True, False, None, False, True])`
			`>>> arr.filter(mask)`
ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-modules) A series of 3 PRs add `doctest` functionality to ensure that docstring examples are actually correct (and keep being correct). - [x] Add `--doctest-module` - [x] Add `--doctest-cython` https://github.com/apache/arrow/pull/13204 - [x] Create a CI job https://github.com/apache/arrow/pull/13216 This PR can be tested with `pytest --doctest-modules python/pyarrow`. Closes #13199 from AlenkaF/ARROW-16018 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-05-25 12:55:15 +02:00			`<pyarrow.lib.StringArray object at ...>`
ARROW-10209: [Python] Support positional options in compute functions This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name: ``` >>> pc.split_pattern("abacab", "a") <pyarrow.ListScalar: ['', 'b', 'c', 'b']> ``` ... and producing the following doc at the prompt: ``` split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None) Split string according to separator. Split each string according to the exact `pattern` defined in SplitPatternOptions. The output for each string input is a list of strings. The maximum number of splits and direction of splitting (forward, reverse) can optionally be defined in SplitPatternOptions. Parameters ---------- strings : Array-like or scalar-like Argument to compute function pattern : optional Parameter for SplitPatternOptions constructor. Either `options` or `pattern` can be passed, but not both at the same time. max_splits : optional Parameter for SplitPatternOptions constructor. Either `options` or `max_splits` can be passed, but not both at the same time. reverse : optional Parameter for SplitPatternOptions constructor. Either `options` or `reverse` can be passed, but not both at the same time. options : pyarrow.compute.SplitPatternOptions, optional Parameters altering compute function semantics. memory_pool : pyarrow.MemoryPool, optional If not passed, will allocate memory from the default memory pool. ``` Closes #11955 from pitrou/ARROW-10209-compute-pos-args Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-12-16 19:49:10 +01:00			`[`
			`"a",`
			`"e"`
			`]`
			`>>> arr.filter(mask, null_selection_behavior='emit_null')`
ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-modules) A series of 3 PRs add `doctest` functionality to ensure that docstring examples are actually correct (and keep being correct). - [x] Add `--doctest-module` - [x] Add `--doctest-cython` https://github.com/apache/arrow/pull/13204 - [x] Create a CI job https://github.com/apache/arrow/pull/13216 This PR can be tested with `pytest --doctest-modules python/pyarrow`. Closes #13199 from AlenkaF/ARROW-16018 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-05-25 12:55:15 +02:00			`<pyarrow.lib.StringArray object at ...>`
ARROW-10209: [Python] Support positional options in compute functions This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name: ``` >>> pc.split_pattern("abacab", "a") <pyarrow.ListScalar: ['', 'b', 'c', 'b']> ``` ... and producing the following doc at the prompt: ``` split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None) Split string according to separator. Split each string according to the exact `pattern` defined in SplitPatternOptions. The output for each string input is a list of strings. The maximum number of splits and direction of splitting (forward, reverse) can optionally be defined in SplitPatternOptions. Parameters ---------- strings : Array-like or scalar-like Argument to compute function pattern : optional Parameter for SplitPatternOptions constructor. Either `options` or `pattern` can be passed, but not both at the same time. max_splits : optional Parameter for SplitPatternOptions constructor. Either `options` or `max_splits` can be passed, but not both at the same time. reverse : optional Parameter for SplitPatternOptions constructor. Either `options` or `reverse` can be passed, but not both at the same time. options : pyarrow.compute.SplitPatternOptions, optional Parameters altering compute function semantics. memory_pool : pyarrow.MemoryPool, optional If not passed, will allocate memory from the default memory pool. ``` Closes #11955 from pitrou/ARROW-10209-compute-pos-args Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-12-16 19:49:10 +01:00			`[`
			`"a",`
			`null,`
			`"e"`
			`]`
			`"""`

			`function_doc_additions["mode"] = """`
			`Examples`
			`--------`
			`>>> import pyarrow as pa`
			`>>> import pyarrow.compute as pc`
			`>>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])`
			`>>> modes = pc.mode(arr, 2)`
			`>>> modes[0]`
ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-modules) A series of 3 PRs add `doctest` functionality to ensure that docstring examples are actually correct (and keep being correct). - [x] Add `--doctest-module` - [x] Add `--doctest-cython` https://github.com/apache/arrow/pull/13204 - [x] Create a CI job https://github.com/apache/arrow/pull/13216 This PR can be tested with `pytest --doctest-modules python/pyarrow`. Closes #13199 from AlenkaF/ARROW-16018 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-05-25 12:55:15 +02:00			`<pyarrow.StructScalar: [('mode', 2), ('count', 5)]>`
ARROW-10209: [Python] Support positional options in compute functions This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name: ``` >>> pc.split_pattern("abacab", "a") <pyarrow.ListScalar: ['', 'b', 'c', 'b']> ``` ... and producing the following doc at the prompt: ``` split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None) Split string according to separator. Split each string according to the exact `pattern` defined in SplitPatternOptions. The output for each string input is a list of strings. The maximum number of splits and direction of splitting (forward, reverse) can optionally be defined in SplitPatternOptions. Parameters ---------- strings : Array-like or scalar-like Argument to compute function pattern : optional Parameter for SplitPatternOptions constructor. Either `options` or `pattern` can be passed, but not both at the same time. max_splits : optional Parameter for SplitPatternOptions constructor. Either `options` or `max_splits` can be passed, but not both at the same time. reverse : optional Parameter for SplitPatternOptions constructor. Either `options` or `reverse` can be passed, but not both at the same time. options : pyarrow.compute.SplitPatternOptions, optional Parameters altering compute function semantics. memory_pool : pyarrow.MemoryPool, optional If not passed, will allocate memory from the default memory pool. ``` Closes #11955 from pitrou/ARROW-10209-compute-pos-args Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-12-16 19:49:10 +01:00			`>>> modes[1]`
ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-modules) A series of 3 PRs add `doctest` functionality to ensure that docstring examples are actually correct (and keep being correct). - [x] Add `--doctest-module` - [x] Add `--doctest-cython` https://github.com/apache/arrow/pull/13204 - [x] Create a CI job https://github.com/apache/arrow/pull/13216 This PR can be tested with `pytest --doctest-modules python/pyarrow`. Closes #13199 from AlenkaF/ARROW-16018 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2022-05-25 12:55:15 +02:00			`<pyarrow.StructScalar: [('mode', 1), ('count', 2)]>`
ARROW-10209: [Python] Support positional options in compute functions This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name: ``` >>> pc.split_pattern("abacab", "a") <pyarrow.ListScalar: ['', 'b', 'c', 'b']> ``` ... and producing the following doc at the prompt: ``` split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None) Split string according to separator. Split each string according to the exact `pattern` defined in SplitPatternOptions. The output for each string input is a list of strings. The maximum number of splits and direction of splitting (forward, reverse) can optionally be defined in SplitPatternOptions. Parameters ---------- strings : Array-like or scalar-like Argument to compute function pattern : optional Parameter for SplitPatternOptions constructor. Either `options` or `pattern` can be passed, but not both at the same time. max_splits : optional Parameter for SplitPatternOptions constructor. Either `options` or `max_splits` can be passed, but not both at the same time. reverse : optional Parameter for SplitPatternOptions constructor. Either `options` or `reverse` can be passed, but not both at the same time. options : pyarrow.compute.SplitPatternOptions, optional Parameters altering compute function semantics. memory_pool : pyarrow.MemoryPool, optional If not passed, will allocate memory from the default memory pool. ``` Closes #11955 from pitrou/ARROW-10209-compute-pos-args Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-12-16 19:49:10 +01:00			`"""`
GH-48668: [Python][Docs] Add python examples for compute functions `min/max/min_max` (#48648) ### Rationale for this change To improve python documentation ### What changes are included in this PR? Add python examples for compute functions `min/max/min_max` ### Are these changes tested? Yes, doc-test ### Are there any user-facing changes? Doc-only changes * GitHub Issue: #48668 Lead-authored-by: Ruifeng Zheng <ruifengz@apache.org> Co-authored-by: Ruifeng Zheng <ruifengz@foxmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-01-11 23:41:15 -08:00
			`function_doc_additions["min"] = """`
			`Examples`
			`--------`
			`>>> import pyarrow as pa`
			`>>> import pyarrow.compute as pc`
			`>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])`
			`>>> pc.min(arr1)`
			`<pyarrow.Int64Scalar: 1>`

			Using ``skip_nulls`` to handle null values.

			`>>> arr2 = pa.array([1.0, None, 2.0, 3.0])`
			`>>> pc.min(arr2)`
			`<pyarrow.DoubleScalar: 1.0>`
			`>>> pc.min(arr2, skip_nulls=False)`
			`<pyarrow.DoubleScalar: None>`

			Using ``ScalarAggregateOptions`` to control minimum number of non-null values.

			`>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])`
			`>>> pc.min(arr3)`
			`<pyarrow.DoubleScalar: 1.0>`
			`>>> pc.min(arr3, options=pc.ScalarAggregateOptions(min_count=3))`
			`<pyarrow.DoubleScalar: 1.0>`
			`>>> pc.min(arr3, options=pc.ScalarAggregateOptions(min_count=4))`
			`<pyarrow.DoubleScalar: None>`

			`This function also works with string values.`

			`>>> arr4 = pa.array(["z", None, "y", "x"])`
			`>>> pc.min(arr4)`
			`<pyarrow.StringScalar: 'x'>`
			`"""`

			`function_doc_additions["max"] = """`
			`Examples`
			`--------`
			`>>> import pyarrow as pa`
			`>>> import pyarrow.compute as pc`
			`>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])`
			`>>> pc.max(arr1)`
			`<pyarrow.Int64Scalar: 3>`

			Using ``skip_nulls`` to handle null values.

			`>>> arr2 = pa.array([1.0, None, 2.0, 3.0])`
			`>>> pc.max(arr2)`
			`<pyarrow.DoubleScalar: 3.0>`
			`>>> pc.max(arr2, skip_nulls=False)`
			`<pyarrow.DoubleScalar: None>`

			Using ``ScalarAggregateOptions`` to control minimum number of non-null values.

			`>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])`
			`>>> pc.max(arr3)`
			`<pyarrow.DoubleScalar: 3.0>`
			`>>> pc.max(arr3, options=pc.ScalarAggregateOptions(min_count=3))`
			`<pyarrow.DoubleScalar: 3.0>`
			`>>> pc.max(arr3, options=pc.ScalarAggregateOptions(min_count=4))`
			`<pyarrow.DoubleScalar: None>`

			`This function also works with string values.`

			`>>> arr4 = pa.array(["z", None, "y", "x"])`
			`>>> pc.max(arr4)`
			`<pyarrow.StringScalar: 'z'>`
			`"""`

			`function_doc_additions["min_max"] = """`
			`Examples`
			`--------`
			`>>> import pyarrow as pa`
			`>>> import pyarrow.compute as pc`
			`>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])`
			`>>> pc.min_max(arr1)`
			`<pyarrow.StructScalar: [('min', 1), ('max', 3)]>`

			Using ``skip_nulls`` to handle null values.

			`>>> arr2 = pa.array([1.0, None, 2.0, 3.0])`
			`>>> pc.min_max(arr2)`
			`<pyarrow.StructScalar: [('min', 1.0), ('max', 3.0)]>`
			`>>> pc.min_max(arr2, skip_nulls=False)`
			`<pyarrow.StructScalar: [('min', None), ('max', None)]>`

			Using ``ScalarAggregateOptions`` to control minimum number of non-null values.

			`>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])`
			`>>> pc.min_max(arr3)`
			`<pyarrow.StructScalar: [('min', 1.0), ('max', 3.0)]>`
			`>>> pc.min_max(arr3, options=pc.ScalarAggregateOptions(min_count=3))`
			`<pyarrow.StructScalar: [('min', 1.0), ('max', 3.0)]>`
			`>>> pc.min_max(arr3, options=pc.ScalarAggregateOptions(min_count=4))`
			`<pyarrow.StructScalar: [('min', None), ('max', None)]>`

			`This function also works with string values.`

			`>>> arr4 = pa.array(["z", None, "y", "x"])`
			`>>> pc.min_max(arr4)`
			`<pyarrow.StructScalar: [('min', 'x'), ('max', 'z')]>`
			`"""`
GH-49269: [Python][Docs] Add code examples for compute function first/last/first_last (#49270) ### Rationale for this change To improve python documentation ### What changes are included in this PR? Add code examples for compute function first/last/first_last ### Are these changes tested? Yes, doc-test ### Are there any user-facing changes? Yes, doc-only changes * GitHub Issue: #49269 Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: AlenkaF <frim.alenka@gmail.com> 2026-03-05 04:12:49 -08:00
			`function_doc_additions["first"] = """`
			`Examples`
			`--------`
			`>>> import pyarrow as pa`
			`>>> import pyarrow.compute as pc`
			`>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])`
			`>>> pc.first(arr1)`
			`<pyarrow.Int64Scalar: 1>`

			Using ``skip_nulls`` to handle null values.

			`>>> arr2 = pa.array([None, 1.0, 2.0, 3.0])`
			`>>> pc.first(arr2)`
			`<pyarrow.DoubleScalar: 1.0>`
			`>>> pc.first(arr2, skip_nulls=False)`
			`<pyarrow.DoubleScalar: None>`

			Using ``ScalarAggregateOptions`` to control minimum number of non-null values.

			`>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])`
			`>>> pc.first(arr3)`
			`<pyarrow.DoubleScalar: 1.0>`
			`>>> pc.first(arr3, options=pc.ScalarAggregateOptions(min_count=3))`
			`<pyarrow.DoubleScalar: 1.0>`
			`>>> pc.first(arr3, options=pc.ScalarAggregateOptions(min_count=4))`
			`<pyarrow.DoubleScalar: None>`

			`See Also`
			`--------`
			`pyarrow.compute.first_last`
			`pyarrow.compute.last`
			`"""`

			`function_doc_additions["last"] = """`
			`Examples`
			`--------`
			`>>> import pyarrow as pa`
			`>>> import pyarrow.compute as pc`
			`>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])`
			`>>> pc.last(arr1)`
			`<pyarrow.Int64Scalar: 2>`

			Using ``skip_nulls`` to handle null values.

			`>>> arr2 = pa.array([1.0, 2.0, 3.0, None])`
			`>>> pc.last(arr2)`
			`<pyarrow.DoubleScalar: 3.0>`
			`>>> pc.last(arr2, skip_nulls=False)`
			`<pyarrow.DoubleScalar: None>`

			Using ``ScalarAggregateOptions`` to control minimum number of non-null values.

			`>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])`
			`>>> pc.last(arr3)`
			`<pyarrow.DoubleScalar: 3.0>`
			`>>> pc.last(arr3, options=pc.ScalarAggregateOptions(min_count=3))`
			`<pyarrow.DoubleScalar: 3.0>`
			`>>> pc.last(arr3, options=pc.ScalarAggregateOptions(min_count=4))`
			`<pyarrow.DoubleScalar: None>`

			`See Also`
			`--------`
			`pyarrow.compute.first`
			`pyarrow.compute.first_last`
			`"""`

			`function_doc_additions["first_last"] = """`
			`Examples`
			`--------`
			`>>> import pyarrow as pa`
			`>>> import pyarrow.compute as pc`
			`>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])`
			`>>> pc.first_last(arr1)`
			`<pyarrow.StructScalar: [('first', 1), ('last', 2)]>`

			Using ``skip_nulls`` to handle null values.

			`>>> arr2 = pa.array([None, 2.0, 3.0, None])`
			`>>> pc.first_last(arr2)`
			`<pyarrow.StructScalar: [('first', 2.0), ('last', 3.0)]>`
			`>>> pc.first_last(arr2, skip_nulls=False)`
			`<pyarrow.StructScalar: [('first', None), ('last', None)]>`

			Using ``ScalarAggregateOptions`` to control minimum number of non-null values.

			`>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])`
			`>>> pc.first_last(arr3)`
			`<pyarrow.StructScalar: [('first', 1.0), ('last', 3.0)]>`
			`>>> pc.first_last(arr3, options=pc.ScalarAggregateOptions(min_count=3))`
			`<pyarrow.StructScalar: [('first', 1.0), ('last', 3.0)]>`
			`>>> pc.first_last(arr3, options=pc.ScalarAggregateOptions(min_count=4))`
			`<pyarrow.StructScalar: [('first', None), ('last', None)]>`

			`See Also`
			`--------`
			`pyarrow.compute.first`
			`pyarrow.compute.last`
			`"""`