ARROW-10209: [Python] Support positional options in compute functions
This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name:
```
>>> pc.split_pattern("abacab", "a")
<pyarrow.ListScalar: ['', 'b', 'c', 'b']>
```
... and producing the following doc at the prompt:
```
split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None)
Split string according to separator.
Split each string according to the exact `pattern` defined in
SplitPatternOptions. The output for each string input is a list
of strings.
The maximum number of splits and direction of splitting
(forward, reverse) can optionally be defined in SplitPatternOptions.
Parameters
----------
strings : Array-like or scalar-like
Argument to compute function
pattern : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `pattern` can be passed, but not both at the same time.
max_splits : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `max_splits` can be passed, but not both at the same time.
reverse : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `reverse` can be passed, but not both at the same time.
options : pyarrow.compute.SplitPatternOptions, optional
Parameters altering compute function semantics.
memory_pool : pyarrow.MemoryPool, optional
If not passed, will allocate memory from the default memory pool.
```
Closes #11955 from pitrou/ARROW-10209-compute-pos-args
Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
2021-12-16 19:49:10 +01:00
|
|
|
# Licensed to the Apache Software Foundation (ASF) under one
|
|
|
|
|
# or more contributor license agreements. See the NOTICE file
|
|
|
|
|
# distributed with this work for additional information
|
|
|
|
|
# regarding copyright ownership. The ASF licenses this file
|
|
|
|
|
# to you under the Apache License, Version 2.0 (the
|
|
|
|
|
# "License"); you may not use this file except in compliance
|
|
|
|
|
# with the License. You may obtain a copy of the License at
|
|
|
|
|
#
|
|
|
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
#
|
|
|
|
|
# Unless required by applicable law or agreed to in writing,
|
|
|
|
|
# software distributed under the License is distributed on an
|
|
|
|
|
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
|
|
|
# KIND, either express or implied. See the License for the
|
|
|
|
|
# specific language governing permissions and limitations
|
|
|
|
|
# under the License.
|
|
|
|
|
|
|
|
|
|
"""
|
|
|
|
|
Custom documentation additions for compute functions.
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
function_doc_additions = {}
|
|
|
|
|
|
|
|
|
|
function_doc_additions["filter"] = """
|
|
|
|
|
Examples
|
|
|
|
|
--------
|
|
|
|
|
>>> import pyarrow as pa
|
|
|
|
|
>>> arr = pa.array(["a", "b", "c", None, "e"])
|
|
|
|
|
>>> mask = pa.array([True, False, None, False, True])
|
|
|
|
|
>>> arr.filter(mask)
|
2022-05-25 12:55:15 +02:00
|
|
|
<pyarrow.lib.StringArray object at ...>
|
ARROW-10209: [Python] Support positional options in compute functions
This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name:
```
>>> pc.split_pattern("abacab", "a")
<pyarrow.ListScalar: ['', 'b', 'c', 'b']>
```
... and producing the following doc at the prompt:
```
split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None)
Split string according to separator.
Split each string according to the exact `pattern` defined in
SplitPatternOptions. The output for each string input is a list
of strings.
The maximum number of splits and direction of splitting
(forward, reverse) can optionally be defined in SplitPatternOptions.
Parameters
----------
strings : Array-like or scalar-like
Argument to compute function
pattern : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `pattern` can be passed, but not both at the same time.
max_splits : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `max_splits` can be passed, but not both at the same time.
reverse : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `reverse` can be passed, but not both at the same time.
options : pyarrow.compute.SplitPatternOptions, optional
Parameters altering compute function semantics.
memory_pool : pyarrow.MemoryPool, optional
If not passed, will allocate memory from the default memory pool.
```
Closes #11955 from pitrou/ARROW-10209-compute-pos-args
Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
2021-12-16 19:49:10 +01:00
|
|
|
[
|
|
|
|
|
"a",
|
|
|
|
|
"e"
|
|
|
|
|
]
|
|
|
|
|
>>> arr.filter(mask, null_selection_behavior='emit_null')
|
2022-05-25 12:55:15 +02:00
|
|
|
<pyarrow.lib.StringArray object at ...>
|
ARROW-10209: [Python] Support positional options in compute functions
This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name:
```
>>> pc.split_pattern("abacab", "a")
<pyarrow.ListScalar: ['', 'b', 'c', 'b']>
```
... and producing the following doc at the prompt:
```
split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None)
Split string according to separator.
Split each string according to the exact `pattern` defined in
SplitPatternOptions. The output for each string input is a list
of strings.
The maximum number of splits and direction of splitting
(forward, reverse) can optionally be defined in SplitPatternOptions.
Parameters
----------
strings : Array-like or scalar-like
Argument to compute function
pattern : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `pattern` can be passed, but not both at the same time.
max_splits : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `max_splits` can be passed, but not both at the same time.
reverse : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `reverse` can be passed, but not both at the same time.
options : pyarrow.compute.SplitPatternOptions, optional
Parameters altering compute function semantics.
memory_pool : pyarrow.MemoryPool, optional
If not passed, will allocate memory from the default memory pool.
```
Closes #11955 from pitrou/ARROW-10209-compute-pos-args
Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
2021-12-16 19:49:10 +01:00
|
|
|
[
|
|
|
|
|
"a",
|
|
|
|
|
null,
|
|
|
|
|
"e"
|
|
|
|
|
]
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
function_doc_additions["mode"] = """
|
|
|
|
|
Examples
|
|
|
|
|
--------
|
|
|
|
|
>>> import pyarrow as pa
|
|
|
|
|
>>> import pyarrow.compute as pc
|
|
|
|
|
>>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
|
|
|
|
|
>>> modes = pc.mode(arr, 2)
|
|
|
|
|
>>> modes[0]
|
2022-05-25 12:55:15 +02:00
|
|
|
<pyarrow.StructScalar: [('mode', 2), ('count', 5)]>
|
ARROW-10209: [Python] Support positional options in compute functions
This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name:
```
>>> pc.split_pattern("abacab", "a")
<pyarrow.ListScalar: ['', 'b', 'c', 'b']>
```
... and producing the following doc at the prompt:
```
split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None)
Split string according to separator.
Split each string according to the exact `pattern` defined in
SplitPatternOptions. The output for each string input is a list
of strings.
The maximum number of splits and direction of splitting
(forward, reverse) can optionally be defined in SplitPatternOptions.
Parameters
----------
strings : Array-like or scalar-like
Argument to compute function
pattern : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `pattern` can be passed, but not both at the same time.
max_splits : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `max_splits` can be passed, but not both at the same time.
reverse : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `reverse` can be passed, but not both at the same time.
options : pyarrow.compute.SplitPatternOptions, optional
Parameters altering compute function semantics.
memory_pool : pyarrow.MemoryPool, optional
If not passed, will allocate memory from the default memory pool.
```
Closes #11955 from pitrou/ARROW-10209-compute-pos-args
Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
2021-12-16 19:49:10 +01:00
|
|
|
>>> modes[1]
|
2022-05-25 12:55:15 +02:00
|
|
|
<pyarrow.StructScalar: [('mode', 1), ('count', 2)]>
|
ARROW-10209: [Python] Support positional options in compute functions
This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name:
```
>>> pc.split_pattern("abacab", "a")
<pyarrow.ListScalar: ['', 'b', 'c', 'b']>
```
... and producing the following doc at the prompt:
```
split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None)
Split string according to separator.
Split each string according to the exact `pattern` defined in
SplitPatternOptions. The output for each string input is a list
of strings.
The maximum number of splits and direction of splitting
(forward, reverse) can optionally be defined in SplitPatternOptions.
Parameters
----------
strings : Array-like or scalar-like
Argument to compute function
pattern : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `pattern` can be passed, but not both at the same time.
max_splits : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `max_splits` can be passed, but not both at the same time.
reverse : optional
Parameter for SplitPatternOptions constructor. Either `options`
or `reverse` can be passed, but not both at the same time.
options : pyarrow.compute.SplitPatternOptions, optional
Parameters altering compute function semantics.
memory_pool : pyarrow.MemoryPool, optional
If not passed, will allocate memory from the default memory pool.
```
Closes #11955 from pitrou/ARROW-10209-compute-pos-args
Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
2021-12-16 19:49:10 +01:00
|
|
|
"""
|
2026-01-11 23:41:15 -08:00
|
|
|
|
|
|
|
|
function_doc_additions["min"] = """
|
|
|
|
|
Examples
|
|
|
|
|
--------
|
|
|
|
|
>>> import pyarrow as pa
|
|
|
|
|
>>> import pyarrow.compute as pc
|
|
|
|
|
>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
|
|
|
|
|
>>> pc.min(arr1)
|
|
|
|
|
<pyarrow.Int64Scalar: 1>
|
|
|
|
|
|
|
|
|
|
Using ``skip_nulls`` to handle null values.
|
|
|
|
|
|
|
|
|
|
>>> arr2 = pa.array([1.0, None, 2.0, 3.0])
|
|
|
|
|
>>> pc.min(arr2)
|
|
|
|
|
<pyarrow.DoubleScalar: 1.0>
|
|
|
|
|
>>> pc.min(arr2, skip_nulls=False)
|
|
|
|
|
<pyarrow.DoubleScalar: None>
|
|
|
|
|
|
|
|
|
|
Using ``ScalarAggregateOptions`` to control minimum number of non-null values.
|
|
|
|
|
|
|
|
|
|
>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])
|
|
|
|
|
>>> pc.min(arr3)
|
|
|
|
|
<pyarrow.DoubleScalar: 1.0>
|
|
|
|
|
>>> pc.min(arr3, options=pc.ScalarAggregateOptions(min_count=3))
|
|
|
|
|
<pyarrow.DoubleScalar: 1.0>
|
|
|
|
|
>>> pc.min(arr3, options=pc.ScalarAggregateOptions(min_count=4))
|
|
|
|
|
<pyarrow.DoubleScalar: None>
|
|
|
|
|
|
|
|
|
|
This function also works with string values.
|
|
|
|
|
|
|
|
|
|
>>> arr4 = pa.array(["z", None, "y", "x"])
|
|
|
|
|
>>> pc.min(arr4)
|
|
|
|
|
<pyarrow.StringScalar: 'x'>
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
function_doc_additions["max"] = """
|
|
|
|
|
Examples
|
|
|
|
|
--------
|
|
|
|
|
>>> import pyarrow as pa
|
|
|
|
|
>>> import pyarrow.compute as pc
|
|
|
|
|
>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
|
|
|
|
|
>>> pc.max(arr1)
|
|
|
|
|
<pyarrow.Int64Scalar: 3>
|
|
|
|
|
|
|
|
|
|
Using ``skip_nulls`` to handle null values.
|
|
|
|
|
|
|
|
|
|
>>> arr2 = pa.array([1.0, None, 2.0, 3.0])
|
|
|
|
|
>>> pc.max(arr2)
|
|
|
|
|
<pyarrow.DoubleScalar: 3.0>
|
|
|
|
|
>>> pc.max(arr2, skip_nulls=False)
|
|
|
|
|
<pyarrow.DoubleScalar: None>
|
|
|
|
|
|
|
|
|
|
Using ``ScalarAggregateOptions`` to control minimum number of non-null values.
|
|
|
|
|
|
|
|
|
|
>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])
|
|
|
|
|
>>> pc.max(arr3)
|
|
|
|
|
<pyarrow.DoubleScalar: 3.0>
|
|
|
|
|
>>> pc.max(arr3, options=pc.ScalarAggregateOptions(min_count=3))
|
|
|
|
|
<pyarrow.DoubleScalar: 3.0>
|
|
|
|
|
>>> pc.max(arr3, options=pc.ScalarAggregateOptions(min_count=4))
|
|
|
|
|
<pyarrow.DoubleScalar: None>
|
|
|
|
|
|
|
|
|
|
This function also works with string values.
|
|
|
|
|
|
|
|
|
|
>>> arr4 = pa.array(["z", None, "y", "x"])
|
|
|
|
|
>>> pc.max(arr4)
|
|
|
|
|
<pyarrow.StringScalar: 'z'>
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
function_doc_additions["min_max"] = """
|
|
|
|
|
Examples
|
|
|
|
|
--------
|
|
|
|
|
>>> import pyarrow as pa
|
|
|
|
|
>>> import pyarrow.compute as pc
|
|
|
|
|
>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
|
|
|
|
|
>>> pc.min_max(arr1)
|
|
|
|
|
<pyarrow.StructScalar: [('min', 1), ('max', 3)]>
|
|
|
|
|
|
|
|
|
|
Using ``skip_nulls`` to handle null values.
|
|
|
|
|
|
|
|
|
|
>>> arr2 = pa.array([1.0, None, 2.0, 3.0])
|
|
|
|
|
>>> pc.min_max(arr2)
|
|
|
|
|
<pyarrow.StructScalar: [('min', 1.0), ('max', 3.0)]>
|
|
|
|
|
>>> pc.min_max(arr2, skip_nulls=False)
|
|
|
|
|
<pyarrow.StructScalar: [('min', None), ('max', None)]>
|
|
|
|
|
|
|
|
|
|
Using ``ScalarAggregateOptions`` to control minimum number of non-null values.
|
|
|
|
|
|
|
|
|
|
>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])
|
|
|
|
|
>>> pc.min_max(arr3)
|
|
|
|
|
<pyarrow.StructScalar: [('min', 1.0), ('max', 3.0)]>
|
|
|
|
|
>>> pc.min_max(arr3, options=pc.ScalarAggregateOptions(min_count=3))
|
|
|
|
|
<pyarrow.StructScalar: [('min', 1.0), ('max', 3.0)]>
|
|
|
|
|
>>> pc.min_max(arr3, options=pc.ScalarAggregateOptions(min_count=4))
|
|
|
|
|
<pyarrow.StructScalar: [('min', None), ('max', None)]>
|
|
|
|
|
|
|
|
|
|
This function also works with string values.
|
|
|
|
|
|
|
|
|
|
>>> arr4 = pa.array(["z", None, "y", "x"])
|
|
|
|
|
>>> pc.min_max(arr4)
|
|
|
|
|
<pyarrow.StructScalar: [('min', 'x'), ('max', 'z')]>
|
|
|
|
|
"""
|
2026-03-05 04:12:49 -08:00
|
|
|
|
|
|
|
|
function_doc_additions["first"] = """
|
|
|
|
|
Examples
|
|
|
|
|
--------
|
|
|
|
|
>>> import pyarrow as pa
|
|
|
|
|
>>> import pyarrow.compute as pc
|
|
|
|
|
>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
|
|
|
|
|
>>> pc.first(arr1)
|
|
|
|
|
<pyarrow.Int64Scalar: 1>
|
|
|
|
|
|
|
|
|
|
Using ``skip_nulls`` to handle null values.
|
|
|
|
|
|
|
|
|
|
>>> arr2 = pa.array([None, 1.0, 2.0, 3.0])
|
|
|
|
|
>>> pc.first(arr2)
|
|
|
|
|
<pyarrow.DoubleScalar: 1.0>
|
|
|
|
|
>>> pc.first(arr2, skip_nulls=False)
|
|
|
|
|
<pyarrow.DoubleScalar: None>
|
|
|
|
|
|
|
|
|
|
Using ``ScalarAggregateOptions`` to control minimum number of non-null values.
|
|
|
|
|
|
|
|
|
|
>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])
|
|
|
|
|
>>> pc.first(arr3)
|
|
|
|
|
<pyarrow.DoubleScalar: 1.0>
|
|
|
|
|
>>> pc.first(arr3, options=pc.ScalarAggregateOptions(min_count=3))
|
|
|
|
|
<pyarrow.DoubleScalar: 1.0>
|
|
|
|
|
>>> pc.first(arr3, options=pc.ScalarAggregateOptions(min_count=4))
|
|
|
|
|
<pyarrow.DoubleScalar: None>
|
|
|
|
|
|
|
|
|
|
See Also
|
|
|
|
|
--------
|
|
|
|
|
pyarrow.compute.first_last
|
|
|
|
|
pyarrow.compute.last
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
function_doc_additions["last"] = """
|
|
|
|
|
Examples
|
|
|
|
|
--------
|
|
|
|
|
>>> import pyarrow as pa
|
|
|
|
|
>>> import pyarrow.compute as pc
|
|
|
|
|
>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
|
|
|
|
|
>>> pc.last(arr1)
|
|
|
|
|
<pyarrow.Int64Scalar: 2>
|
|
|
|
|
|
|
|
|
|
Using ``skip_nulls`` to handle null values.
|
|
|
|
|
|
|
|
|
|
>>> arr2 = pa.array([1.0, 2.0, 3.0, None])
|
|
|
|
|
>>> pc.last(arr2)
|
|
|
|
|
<pyarrow.DoubleScalar: 3.0>
|
|
|
|
|
>>> pc.last(arr2, skip_nulls=False)
|
|
|
|
|
<pyarrow.DoubleScalar: None>
|
|
|
|
|
|
|
|
|
|
Using ``ScalarAggregateOptions`` to control minimum number of non-null values.
|
|
|
|
|
|
|
|
|
|
>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])
|
|
|
|
|
>>> pc.last(arr3)
|
|
|
|
|
<pyarrow.DoubleScalar: 3.0>
|
|
|
|
|
>>> pc.last(arr3, options=pc.ScalarAggregateOptions(min_count=3))
|
|
|
|
|
<pyarrow.DoubleScalar: 3.0>
|
|
|
|
|
>>> pc.last(arr3, options=pc.ScalarAggregateOptions(min_count=4))
|
|
|
|
|
<pyarrow.DoubleScalar: None>
|
|
|
|
|
|
|
|
|
|
See Also
|
|
|
|
|
--------
|
|
|
|
|
pyarrow.compute.first
|
|
|
|
|
pyarrow.compute.first_last
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
function_doc_additions["first_last"] = """
|
|
|
|
|
Examples
|
|
|
|
|
--------
|
|
|
|
|
>>> import pyarrow as pa
|
|
|
|
|
>>> import pyarrow.compute as pc
|
|
|
|
|
>>> arr1 = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
|
|
|
|
|
>>> pc.first_last(arr1)
|
|
|
|
|
<pyarrow.StructScalar: [('first', 1), ('last', 2)]>
|
|
|
|
|
|
|
|
|
|
Using ``skip_nulls`` to handle null values.
|
|
|
|
|
|
|
|
|
|
>>> arr2 = pa.array([None, 2.0, 3.0, None])
|
|
|
|
|
>>> pc.first_last(arr2)
|
|
|
|
|
<pyarrow.StructScalar: [('first', 2.0), ('last', 3.0)]>
|
|
|
|
|
>>> pc.first_last(arr2, skip_nulls=False)
|
|
|
|
|
<pyarrow.StructScalar: [('first', None), ('last', None)]>
|
|
|
|
|
|
|
|
|
|
Using ``ScalarAggregateOptions`` to control minimum number of non-null values.
|
|
|
|
|
|
|
|
|
|
>>> arr3 = pa.array([1.0, None, float("nan"), 3.0])
|
|
|
|
|
>>> pc.first_last(arr3)
|
|
|
|
|
<pyarrow.StructScalar: [('first', 1.0), ('last', 3.0)]>
|
|
|
|
|
>>> pc.first_last(arr3, options=pc.ScalarAggregateOptions(min_count=3))
|
|
|
|
|
<pyarrow.StructScalar: [('first', 1.0), ('last', 3.0)]>
|
|
|
|
|
>>> pc.first_last(arr3, options=pc.ScalarAggregateOptions(min_count=4))
|
|
|
|
|
<pyarrow.StructScalar: [('first', None), ('last', None)]>
|
|
|
|
|
|
|
|
|
|
See Also
|
|
|
|
|
--------
|
|
|
|
|
pyarrow.compute.first
|
|
|
|
|
pyarrow.compute.last
|
|
|
|
|
"""
|