COMMITS
/ python/pyarrow/dataset.py October 13, 2025
June 6, 2025
B
GH-46729: [Python] Allow constructing InMemoryDataset from RecordBatchReader (#46731)
Bryce Mecum committed
May 14, 2025
E
GH-26818: [C++][Python] Preserve order when writing dataset multi-threaded (#44470)
Enrico Minack committed
May 12, 2025
C
GH-45619: [Python] Use f-string instead of string.format (#45629)
ChiLin Chiu committed
November 18, 2024
J
GH-43410: [Python] Support Arrow PyCapsule stream objects in write_dataset (#43771)
Joris Van den Bossche committed
February 27, 2024
F
GH-40142: [Python] Allow FileInfo instances to be passed to dataset init (#40143)
Florian Jetter committed
January 22, 2024
0
GH-39579: [Python] fix raising ValueError on _ensure_partitioning (#39593)
0x0000ffff committed
December 1, 2023
J
GH-38944: [Python] Fix spelling (#38945)
Josh Soref committed
October 11, 2023
D
July 8, 2023
J
GH-36553: [Python] Improve error message if certain submodule (cython or cpp) is not built (#36554)
Joris Van den Bossche committed
April 12, 2023
J
GH-34216: [Python] Support for reading JSON Datasets With Python (#34586)
Junming Chen committed
April 5, 2023
J
GH-33825: [Python] Expose pyarrow.dataset.get_partition_keys publicly (get key/value from partition expression) (#33862)
Joris Van den Bossche committed
January 6, 2023
B
ARROW-15006: [Python][Doc] Add five more numpydoc checks to CI (#15214)
Bryce Mecum committed
December 12, 2022
A
ARROW-16616: [Python] Add lazy Dataset.filter() method (#13409)
Alessandro Molina committed
August 31, 2022
E
August 2, 2022
June 9, 2022
W
ARROW-16761: [C++][Python] Track bytes written in dataset (#13338)
Will Jones committed
June 1, 2022
A
ARROW-14632: [Python] Make write_dataset arguments keyword-only
Austin Dickey committed
May 27, 2022
S
ARROW-16302: [C++] Null values in partitioning field for FilenamePartitioning
Sanjiban Sengupta committed
May 25, 2022
A
ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-modules)
Alenka Frim committed
April 25, 2022
S
ARROW-15892: [C++] Dataset APIs require s3:ListBucket Permissions
Sanjiban Sengupta committed
March 30, 2022
S
ARROW-14612: [C++] Support for filename-based partitioning
Sanjiban Sengupta committed
January 25, 2022
A
ARROW-15433: [Doc] Fix warnings when building
Antoine Pitrou committed
January 19, 2022
A
ARROW-14738: [Python][Doc] Make return types clickable
Alessandro Molina committed
January 14, 2022
J
ARROW-15077: [Python] Move Expression class from _dataset to _compute cython module
Joris Van den Bossche committed
January 10, 2022
W
ARROW-13554: [C++] Remove deprecated Scanner::Scan
Weston Pace committed
December 20, 2021
V
ARROW-15019: [Python] Add bindings for new dataset writing options
Vibhatha Abeykoon committed
December 13, 2021
K
ARROW-14625: [Python][CI] Enable Python test on s390x
Kazuaki Ishizaki committed
December 2, 2021
W
ARROW-14931: [Python] csv/orc format strings missing from some dataset docs
Weston Pace committed
November 8, 2021
October 4, 2021
W
ARROW-13650: [C++] Create dataset writer to encapsulate dataset writer logic
Weston Pace committed
September 28, 2021
J
ARROW-13572: [C++][Datasets] Add ORC support to Datasets API
Joris Van den Bossche committed
September 21, 2021
A
ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names
Alessandro Molina committed
July 14, 2021
W
ARROW-12364: [Python] [Dataset] Add metadata_collector option to ds.write_dataset()
Weston Pace committed
May 21, 2021
W
ARROW-12468: [Python][R] Expose ScannerBuilder::UseAsync to Python & R
Weston Pace committed
May 6, 2021
D
ARROW-12231: [C++][Python][Dataset] Isolate one-shot data to scanner
David Li committed
May 4, 2021
J
ARROW-12631: [Python] Accept Scanner in pyarrow.dataset.write_dataset
Joris Van den Bossche committed
April 28, 2021
D
ARROW-12407: [Python][Dataset] Remove ScanTask bindings
David Li committed
April 15, 2021
J
ARROW-12188: [Docs] Switch to pydata-sphinx-theme for the main sphinx docs
Joris Van den Bossche committed
B
ARROW-11797: [C++][Dataset] Provide batch stream Scanner methods
Benjamin Kietzman committed
April 12, 2021
D
April 6, 2021
D
ARROW-10882: [Python] Allow writing dataset from iterator of batches
David Li committed
March 23, 2021
March 15, 2021
January 25, 2021
J
ARROW-10370: [Python] Clean-up filesystem handling in write_dataset
Joris Van den Bossche committed
January 15, 2021
B
ARROW-10247: [C++][Dataset] Support writing datasets partitioned on dictionary columns
Benjamin Kietzman committed
January 14, 2021
A
ARROW-10264: [Python] Fix failing hdfs test
Antoine Pitrou committed
December 2, 2020
J
ARROW-10644: [Python] Consolidate path/filesystem handling in pyarrow.dataset and pyarrow.fs
Joris Van den Bossche committed
October 8, 2020
B
ARROW-9782: [C++][Dataset] More configurable Dataset writing
Benjamin Kietzman committed
September 28, 2020
B
ARROW-9924: [C++][Dataset] Enable per-column parallelism for single ParquetFileFragment scans
Benjamin Kietzman committed