Blame: python/pyarrow/_feather.pyx - apache/arrow

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

# Licensed to the Apache Software Foundation (ASF) under one

ARROW-1093: [Python] Run flake8 in Travis CI. Add note about development to README Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #924 from wesm/ARROW-1093 and squashes the following commits: 75969c37 [Wes McKinney] Add spaces before continuation backslash fa24ec92 [Wes McKinney] Specify file suffixes completely c0af17c3 [Wes McKinney] Add flake8 file for Cython, fix Cython style errors 7ffa6135 [Wes McKinney] Add Cython flake8 file f10e8d1f [Wes McKinney] Run flake8 in Travis CI. Add note to README

2017-08-01 22:50:21 -04:00

# ---------------------------------------------------------------------

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

# Implement Feather file format

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

ARROW-12506: [Python] Improve modularity of pyarrow codebase: _feather module Closes #10131 from amol-/ARROW-12506 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

2021-04-27 18:21:31 +02:00

# cython: profile=False

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

								def write_feather(Table table, object dest, compression=None,

							

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

								    cdef CFeatherProperties properties

							

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

								    if compression == 'zstd':

							

ARROW-8274: [C++] Use LZ4 frame format for "LZ4" compression in IPC Closes #6766 from pitrou/ARROW-8274-ipc-lz4-frame-format Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-30 20:21:31 -05:00

								        properties.compression = CCompressionType_LZ4_FRAME

							

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

    else:

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

								    if chunksize is not None:

							

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

								    if compression_level is not None:

							

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

								    with nogil:

							

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

ARROW-9469: [Python] Make more objects weakrefable By default, Cython extension classes (defined with "cdef class") don't have a weakref slot, so add one to all of them. This adds just one memory word to each object, which IMHO is acceptable. Closes #7758 from pitrou/ARROW-9469-py-weakrefable-objects Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>

2020-07-29 12:24:36 +02:00

								cdef class FeatherReader(_Weakrefable):

							

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

    cdef:

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

								        shared_ptr[CFeatherReader] reader

							

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

ARROW-14470: [Python] Expose the use_threads option in Feather read functions Closes #11558 from AlenkaF/ARROW-14470 Authored-by: Alenka Frim <frim.alenka@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

2021-10-27 18:12:28 +02:00

								    def __cinit__(self, source, c_bool use_memory_map, c_bool use_threads):

							

ARROW-2859: [Python] Accept buffer-like objects as sources in open_file, open_stream APIs The behavior had been to treat a string-like object like a file name; we didn't have any APIs that made use of this fact, and I think that being able to read a stream from an object importing the buffer protocol is much more convenient and natural as `pa.open_stream(buf)` than `pa.open_stream(pa.BufferReader(buf))`. I may look at quickly adding support for pathlib.Path objects here. I also added the precursor for addressing ARROW-2807 Author: Wes McKinney <wesm+git@apache.org> Closes #2314 from wesm/ARROW-2859 and squashes the following commits: b64a828c <Wes McKinney> Fix docstrings 5cc363f8 <Wes McKinney> Amend usages of get_result, add FutureWarning b11e5328 <Wes McKinney> Add pathlib test. Refactor to use pytest 53f32e84 <Wes McKinney> Add test for stream from buffer protocol a6fc8f1c <Wes McKinney> Do not try to open file from buffer input, add use_memory_map flag

2018-07-24 15:23:06 -04:00

								        get_reader(source, use_memory_map, &reader)

							

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema.

2017-07-15 16:51:51 -04:00

								        with nogil:

							

ARROW-14470: [Python] Expose the use_threads option in Feather read functions Closes #11558 from AlenkaF/ARROW-14470 Authored-by: Alenka Frim <frim.alenka@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

2021-10-27 18:12:28 +02:00

								            self.reader = GetResultValue(CFeatherReader.Open(reader, options))

							

ARROW-3638: [C++][Python] Move reading from Feather as Table feature to C++ from Python It's for using the feature from GLib. (I know Feather is deprecated format.) Author: Kouhei Sutou <kou@clear-code.com> Closes #2853 from kou/cpp-feather and squashes the following commits: a30ceaa4 <Kouhei Sutou> Remove unused import 2f84233b <Kouhei Sutou> Fix style fec8632e <Kouhei Sutou> Fix style ed596a66 <Kouhei Sutou> Add missing status check b042f181 <Kouhei Sutou> Use int to suppress warning on MSVC e225a77e <Kouhei Sutou> Move reading from feather as Table feature to C++ from Python

2018-10-28 10:21:48 +01:00

ARROW-8641: [C++][Python] Sort included indices in IpcReader - Respect column selection in FeatherReader Closes #7122 from jorisvandenbossche/ARROW-8641-feather-order Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-05-07 11:23:25 -05:00

    @property

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

								    def read(self):

							

ARROW-3638: [C++][Python] Move reading from Feather as Table feature to C++ from Python It's for using the feature from GLib. (I know Feather is deprecated format.) Author: Kouhei Sutou <kou@clear-code.com> Closes #2853 from kou/cpp-feather and squashes the following commits: a30ceaa4 <Kouhei Sutou> Remove unused import 2f84233b <Kouhei Sutou> Fix style fec8632e <Kouhei Sutou> Fix style ed596a66 <Kouhei Sutou> Add missing status check b042f181 <Kouhei Sutou> Use int to suppress warning on MSVC e225a77e <Kouhei Sutou> Move reading from feather as Table feature to C++ from Python

2018-10-28 10:21:48 +01:00

								        cdef shared_ptr[CTable] sp_table

							

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

								    def read_indices(self, indices):

							

ARROW-3638: [C++][Python] Move reading from Feather as Table feature to C++ from Python It's for using the feature from GLib. (I know Feather is deprecated format.) Author: Kouhei Sutou <kou@clear-code.com> Closes #2853 from kou/cpp-feather and squashes the following commits: a30ceaa4 <Kouhei Sutou> Remove unused import 2f84233b <Kouhei Sutou> Fix style fec8632e <Kouhei Sutou> Fix style ed596a66 <Kouhei Sutou> Add missing status check b042f181 <Kouhei Sutou> Use int to suppress warning on MSVC e225a77e <Kouhei Sutou> Move reading from feather as Table feature to C++ from Python

2018-10-28 10:21:48 +01:00

        cdef:

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

2020-03-29 19:05:36 -05:00

								    def read_names(self, names):

							

ARROW-3638: [C++][Python] Move reading from Feather as Table feature to C++ from Python It's for using the feature from GLib. (I know Feather is deprecated format.) Author: Kouhei Sutou <kou@clear-code.com> Closes #2853 from kou/cpp-feather and squashes the following commits: a30ceaa4 <Kouhei Sutou> Remove unused import 2f84233b <Kouhei Sutou> Fix style fec8632e <Kouhei Sutou> Fix style ed596a66 <Kouhei Sutou> Add missing status check b042f181 <Kouhei Sutou> Use int to suppress warning on MSVC e225a77e <Kouhei Sutou> Move reading from feather as Table feature to C++ from Python

2018-10-28 10:21:48 +01:00

        cdef:

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00			`# Licensed to the Apache Software Foundation (ASF) under one`
			`# or more contributor license agreements. See the NOTICE file`
			`# distributed with this work for additional information`
			`# regarding copyright ownership. The ASF licenses this file`
			`# to you under the Apache License, Version 2.0 (the`
			`# "License"); you may not use this file except in compliance`
			`# with the License. You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing,`
			`# software distributed under the License is distributed on an`
			`# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`# KIND, either express or implied. See the License for the`
			`# specific language governing permissions and limitations`
			`# under the License.`

ARROW-1093: [Python] Run flake8 in Travis CI. Add note about development to README Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #924 from wesm/ARROW-1093 and squashes the following commits: 75969c37 [Wes McKinney] Add spaces before continuation backslash fa24ec92 [Wes McKinney] Specify file suffixes completely c0af17c3 [Wes McKinney] Add flake8 file for Cython, fix Cython style errors 7ffa6135 [Wes McKinney] Add Cython flake8 file f10e8d1f [Wes McKinney] Run flake8 in Travis CI. Add note to README 2017-08-01 22:50:21 -04:00			`# ---------------------------------------------------------------------`
ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`# Implement Feather file format`
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00
ARROW-12506: [Python] Improve modularity of pyarrow codebase: _feather module Closes #10131 from amol-/ARROW-12506 Authored-by: Alessandro Molina <amol@turbogears.org> Signed-off-by: Antoine Pitrou <antoine@python.org> 2021-04-27 18:21:31 +02:00			`# cython: profile=False`
			`# distutils: language = c++`
			`# cython: language_level=3`

			`from cython.operator cimport dereference as deref`
			`from pyarrow.includes.common cimport *`
			`from pyarrow.includes.libarrow cimport *`
			`from pyarrow.includes.libarrow_feather cimport *`
			`from pyarrow.lib cimport (check_status, Table, _Weakrefable,`
			`get_writer, get_reader, pyarrow_wrap_table)`
			`from pyarrow.lib import tobytes`

ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00
			`class FeatherError(Exception):`
			`pass`


ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`def write_feather(Table table, object dest, compression=None,`
			`compression_level=None, chunksize=None, version=2):`
			`cdef shared_ptr[COutputStream] sink`
			`get_writer(dest, &sink)`
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00
ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`cdef CFeatherProperties properties`
			`if version == 2:`
			`properties.version = kFeatherV2Version`
			`else:`
			`properties.version = kFeatherV1Version`
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00
ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`if compression == 'zstd':`
			`properties.compression = CCompressionType_ZSTD`
			`elif compression == 'lz4':`
ARROW-8274: [C++] Use LZ4 frame format for "LZ4" compression in IPC Closes #6766 from pitrou/ARROW-8274-ipc-lz4-frame-format Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-30 20:21:31 -05:00			`properties.compression = CCompressionType_LZ4_FRAME`
ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`else:`
			`properties.compression = CCompressionType_UNCOMPRESSED`
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00
ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`if chunksize is not None:`
			`properties.chunksize = chunksize`
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00
ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`if compression_level is not None:`
			`properties.compression_level = compression_level`
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00
ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`with nogil:`
			`check_status(WriteFeather(deref(table.table), sink.get(),`
			`properties))`
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00

ARROW-9469: [Python] Make more objects weakrefable By default, Cython extension classes (defined with "cdef class") don't have a weakref slot, so add one to all of them. This adds just one memory word to each object, which IMHO is acceptable. Closes #7758 from pitrou/ARROW-9469-py-weakrefable-objects Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com> 2020-07-29 12:24:36 +02:00			`cdef class FeatherReader(_Weakrefable):`
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00			`cdef:`
ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`shared_ptr[CFeatherReader] reader`
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00
ARROW-14470: [Python] Expose the use_threads option in Feather read functions Closes #11558 from AlenkaF/ARROW-14470 Authored-by: Alenka Frim <frim.alenka@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2021-10-27 18:12:28 +02:00			`def __cinit__(self, source, c_bool use_memory_map, c_bool use_threads):`
			`cdef:`
			`shared_ptr[CRandomAccessFile] reader`
			`CIpcReadOptions options = CIpcReadOptions.Defaults()`
			`options.use_threads = use_threads`

ARROW-2859: [Python] Accept buffer-like objects as sources in open_file, open_stream APIs The behavior had been to treat a string-like object like a file name; we didn't have any APIs that made use of this fact, and I think that being able to read a stream from an object importing the buffer protocol is much more convenient and natural as `pa.open_stream(buf)` than `pa.open_stream(pa.BufferReader(buf))`. I may look at quickly adding support for pathlib.Path objects here. I also added the precursor for addressing ARROW-2807 Author: Wes McKinney <wesm+git@apache.org> Closes #2314 from wesm/ARROW-2859 and squashes the following commits: b64a828c <Wes McKinney> Fix docstrings 5cc363f8 <Wes McKinney> Amend usages of get_result, add FutureWarning b11e5328 <Wes McKinney> Add pathlib test. Refactor to use pytest 53f32e84 <Wes McKinney> Add test for stream from buffer protocol a6fc8f1c <Wes McKinney> Do not try to open file from buffer input, add use_memory_map flag 2018-07-24 15:23:06 -04:00			`get_reader(source, use_memory_map, &reader)`
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings This patch does a bunch of things: * Decouples the RecordBatchStreamReader from the actual message iteration (which is handled by a new `arrow::ipc::MessageReader` interface * Enables `arrow::ipc::Message` to hold all of the memory for a complete unit of data: metadata plus body * Renames some IPC methods for better consistency (GetNextRecordBatch -> ReadNextRecordBatch) * Adds function to serialize a complete encapsulated message to an `arrow::io::OutputStream* * Add Python bindings for all of the above, introduce `pyarrow.Message`, `pyarrow.MessageReader`. Add `read_message` and `Message.serialize` functions for efficient memory round trips * Add `pyarrow.read_record_batch` for reading a single record batch given a message and a known schema Later we will want to add `pyarrow.read_schema`, but it seemed like a bit of work to make it work for dictionaries. This implements the C++ analogue to ARROW-1047, which was for Java. Not sure why I didn't create a JIRA about this. cc @icexelloss Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #839 from wesm/ARROW-1214 and squashes the following commits: 07f1820a [Wes McKinney] Refactor to introduce MessageReader abstract type, use unique_ptr for messages instead of shared_ptr. First cut at Message, MessageReader Python API. Add read_message, C++/Python machinery for message roundtrips to Buffer, comparison. Add function to read RecordBatch from encapsulated message given schema. 2017-07-15 16:51:51 -04:00			`with nogil:`
ARROW-14470: [Python] Expose the use_threads option in Feather read functions Closes #11558 from AlenkaF/ARROW-14470 Authored-by: Alenka Frim <frim.alenka@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> 2021-10-27 18:12:28 +02:00			`self.reader = GetResultValue(CFeatherReader.Open(reader, options))`
ARROW-3638: [C++][Python] Move reading from Feather as Table feature to C++ from Python It's for using the feature from GLib. (I know Feather is deprecated format.) Author: Kouhei Sutou <kou@clear-code.com> Closes #2853 from kou/cpp-feather and squashes the following commits: a30ceaa4 <Kouhei Sutou> Remove unused import 2f84233b <Kouhei Sutou> Fix style fec8632e <Kouhei Sutou> Fix style ed596a66 <Kouhei Sutou> Add missing status check b042f181 <Kouhei Sutou> Use int to suppress warning on MSVC e225a77e <Kouhei Sutou> Move reading from feather as Table feature to C++ from Python 2018-10-28 10:21:48 +01:00
ARROW-8641: [C++][Python] Sort included indices in IpcReader - Respect column selection in FeatherReader Closes #7122 from jorisvandenbossche/ARROW-8641-feather-order Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-05-07 11:23:25 -05:00			`@property`
			`def version(self):`
			`return self.reader.get().version()`

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`def read(self):`
ARROW-3638: [C++][Python] Move reading from Feather as Table feature to C++ from Python It's for using the feature from GLib. (I know Feather is deprecated format.) Author: Kouhei Sutou <kou@clear-code.com> Closes #2853 from kou/cpp-feather and squashes the following commits: a30ceaa4 <Kouhei Sutou> Remove unused import 2f84233b <Kouhei Sutou> Fix style fec8632e <Kouhei Sutou> Fix style ed596a66 <Kouhei Sutou> Add missing status check b042f181 <Kouhei Sutou> Use int to suppress warning on MSVC e225a77e <Kouhei Sutou> Move reading from feather as Table feature to C++ from Python 2018-10-28 10:21:48 +01:00			`cdef shared_ptr[CTable] sp_table`
			`with nogil:`
			`check_status(self.reader.get()`
			`.Read(&sp_table))`

			`return pyarrow_wrap_table(sp_table)`

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`def read_indices(self, indices):`
ARROW-3638: [C++][Python] Move reading from Feather as Table feature to C++ from Python It's for using the feature from GLib. (I know Feather is deprecated format.) Author: Kouhei Sutou <kou@clear-code.com> Closes #2853 from kou/cpp-feather and squashes the following commits: a30ceaa4 <Kouhei Sutou> Remove unused import 2f84233b <Kouhei Sutou> Fix style fec8632e <Kouhei Sutou> Fix style ed596a66 <Kouhei Sutou> Add missing status check b042f181 <Kouhei Sutou> Use int to suppress warning on MSVC e225a77e <Kouhei Sutou> Move reading from feather as Table feature to C++ from Python 2018-10-28 10:21:48 +01:00			`cdef:`
			`shared_ptr[CTable] sp_table`
			`vector[int] c_indices`

			`for index in indices:`
			`c_indices.push_back(index)`
			`with nogil:`
			`check_status(self.reader.get()`
			`.Read(c_indices, &sp_table))`

			`return pyarrow_wrap_table(sp_table)`

ARROW-5510: [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format This is based on top of ARROW-7979, so I will need to rebase once that is merged. Excluding the changes from ARROW-7979, this patch is a substantial code reduction in Feather-related code. I removed a lot of cruft from the V1 implementation and made things a lot simpler without altering the user-facing functionality. To summarize: * V2 is exactly the Arrow IPC file format, with the option for the experimental "trivial" body buffer compression implemented in ARROW-7979. `read_feather` functions distinguish the files based on the magic bytes at the beginning of the file ("FEA1" versus "ARROW1") * A `ipc::feather::WriteProperties` struct has been introduced to allow setting the file version, as well as chunksize (since large tables are broken up into smaller chunks when writing), compression type, and compression level (compressor-specific) * LZ4 and ZSTD are the only codecs intended to be supported (also in line with mailing list discussion about IPC compression). The default is LZ4 unless -DARROW_WITH_LZ4=OFF in which case it's uncompressed * Unit tests in Python now test both versions * R tests are only running the V2 version. I'll need some help adding options to set the version as well as the compression type and compression level Since 0.17.0 is likely to be released without formalizing IPC compression, I will plan to support an "ARROW:experimental_compression" metadata member in 0.17.0 Feather files. Other notes: * Column decompression is currently serial. I'll work on making this parallel ASAP as it will impact benchmarks significantly. * Compression (both chunk-level and column-level) is serial. Write performance would be much improved, especially at higher compression levels, by compressing in parallel at least at the column level * Write performance could be improved by compressing chunks and writing them to disk concurrently. It's done serially at the moment, so will open a follow up JIRA about this Closes #6694 from wesm/feather-v2 Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Wes McKinney <wesm+git@apache.org> 2020-03-29 19:05:36 -05:00			`def read_names(self, names):`
ARROW-3638: [C++][Python] Move reading from Feather as Table feature to C++ from Python It's for using the feature from GLib. (I know Feather is deprecated format.) Author: Kouhei Sutou <kou@clear-code.com> Closes #2853 from kou/cpp-feather and squashes the following commits: a30ceaa4 <Kouhei Sutou> Remove unused import 2f84233b <Kouhei Sutou> Fix style fec8632e <Kouhei Sutou> Fix style ed596a66 <Kouhei Sutou> Add missing status check b042f181 <Kouhei Sutou> Use int to suppress warning on MSVC e225a77e <Kouhei Sutou> Move reading from feather as Table feature to C++ from Python 2018-10-28 10:21:48 +01:00			`cdef:`
			`shared_ptr[CTable] sp_table`
			`vector[c_string] c_names`

			`for name in names:`
			`c_names.push_back(tobytes(name))`
			`with nogil:`
			`check_status(self.reader.get()`
			`.Read(c_names, &sp_table))`

			`return pyarrow_wrap_table(sp_table)`