SIGN IN SIGN UP
apache / arrow UNCLAIMED

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

0 0 19 C++
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# distutils: language = c++
# cython: language_level = 3
ARROW-11297: [C++][Python] Add ORC writer options Closes #9702 from iajoiner/ARROW-11297 Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Ying Zhou <yingzhou474@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Dmitry Patsura <zaets28rus@gmail.com> Co-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: mqy <meng.qingyou@gmail.com> Co-authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Co-authored-by: Yordan Pavlov <yordan.pavlov@outlook.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Kenta Murata <mrkn@mrkn.jp> Co-authored-by: Ryan Jennings <ryan@ryanj.net> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Max Burke <max@urbanlogiq.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Co-authored-by: Mahmut Bulut <vertexclique@gmail.com> Co-authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Co-authored-by: Johannes Müller <JohannesMueller@fico.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
2022-01-20 15:41:39 +01:00
from libcpp cimport bool as c_bool
from libc.string cimport const_char
from libcpp.vector cimport vector as std_vector
from pyarrow.includes.common cimport *
from pyarrow.includes.libarrow cimport (CArray, CSchema, CStatus,
ARROW-7906: [C++] [Python] Add ORC write support This pull request tracks the progress on adding ORC write support. The functionality is not complete yet. However for most types the process of populating a ColumnVectorBatch in ORC using data from Arrow Array. Arrow data types (arrow::Type::type) I do support: Boolean: BOOL Numerical: INT8, INT16, INT32, INT64, FLOAT, DOUBLE Time-related: DATE32 Binary: BINARY, STRING, LARGE_BINARY, LARGE_STRING, FIXED_SIZE_BINARY Nested: LIST, LARGE_LIST, FIXED_SIZE_LIST, STRUCT, MAP, DENSE_UNION, SPARSE_UNION Arrow data types I plan to support: Numerical: DECIMAL128 Time-related: DATE64, TIMESTAMP Dictionary: DICTIONARY Arrow data types I currently do NOT plan to support: Numerical: UINT8, UINT16, UINT32, UINT64, HALF_FLOAT, DECIMAL256 (There are no corresponding types in ORC. Of course except for in the case of DECIMAL256 we can always cast them into larger types. However I think maybe users need to explicitly do that.) Time-related: TIME32, TIME64, INTERVAL_MONTHS, INTERVAL_DAY_TIME, DURATION (There are no corresponding types in ORC and it is impossible to cast them into ORC types without losing time-related information) Extension: EXTENSION Closes #8648 from mathyingzhou/ARROW-7906_pyarrow_write_orc Lead-authored-by: Ying Zhou <yingzhou474@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Dmitry Patsura <zaets28rus@gmail.com> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: Yordan Pavlov <yordan.pavlov@outlook.com> Co-authored-by: mqy <meng.qingyou@gmail.com> Co-authored-by: Kenta Murata <mrkn@mrkn.jp> Co-authored-by: Johannes Müller <JohannesMueller@fico.com> Co-authored-by: Mahmut Bulut <vertexclique@gmail.com> Co-authored-by: Ryan Jennings <ryan@ryanj.net> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Co-authored-by: Max Burke <max@urbanlogiq.com> Co-authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
2021-04-19 13:50:16 +02:00
CResult, CTable, CMemoryPool,
CKeyValueMetadata,
CRecordBatch,
ARROW-11297: [C++][Python] Add ORC writer options Closes #9702 from iajoiner/ARROW-11297 Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Ying Zhou <yingzhou474@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Dmitry Patsura <zaets28rus@gmail.com> Co-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: mqy <meng.qingyou@gmail.com> Co-authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Co-authored-by: Yordan Pavlov <yordan.pavlov@outlook.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Kenta Murata <mrkn@mrkn.jp> Co-authored-by: Ryan Jennings <ryan@ryanj.net> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Max Burke <max@urbanlogiq.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Co-authored-by: Mahmut Bulut <vertexclique@gmail.com> Co-authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Co-authored-by: Johannes Müller <JohannesMueller@fico.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
2022-01-20 15:41:39 +01:00
CTable, CCompressionType,
ARROW-6655: [Python] Filesystem bindings for S3 - Add support for S3FileSystem in the python bindings. - Fixed issue with reading all the content of an S3 object - Introduce `minio_server` fixture for parametrized testing of all filesystem implementations - Fixed s3fs parquet test and updated it to use minio_server fixture Closes #5423 from kszucs/s3 and squashes the following commits: 384c96052 <Krisztián Szűcs> Resolve review comments 73e6625f9 <Krisztián Szűcs> S3Options 98bd91ad1 <Krisztián Szűcs> remove commented tests db89859ca <Krisztián Szűcs> rename to s3fs 44784582a <Krisztián Szűcs> fix read() issue c1df10b92 <Krisztián Szűcs> initialization in first use 192ab6547 <Krisztián Szűcs> flake8 f70f9fbd8 <Krisztián Szűcs> remove minio-client dependency d399643dc <Krisztián Szűcs> simplify test suite fee57a9a4 <Krisztián Szűcs> remove accidentally committed files 751cfd429 <Krisztián Szűcs> resolve a couple of review comments; enum workaround 45436f7b3 <Krisztián Szűcs> cython flake8 38dcb88d0 <Krisztián Szűcs> rat c541b3e15 <Krisztián Szűcs> comment left 098048a8a <Krisztián Szűcs> more compat 00340ed4c <Krisztián Szűcs> fixture error handling 2be25ce29 <Krisztián Szűcs> auto initialize s3 on import 88e0c9f79 <Krisztián Szűcs> py2 compat 8585a6085 <Krisztián Szűcs> py2 compat d37228711 <Krisztián Szűcs> install minio in the conda-toolchain build 041cad42a <Krisztián Szűcs> executable flag fb0f2813a <Krisztián Szűcs> travis minio install script 8cbe0eeef <Krisztián Szűcs> travis osx 72e56a68f <Krisztián Szűcs> enable S3 in travis python builds 7800c75d8 <Krisztián Szűcs> appveyor flag 7daf5668d <Krisztián Szűcs> fix syntax error in travis script 68eb59161 <Krisztián Szűcs> enable PYARROW_WITH_S3 on appveyor 2cb19d1ff <Krisztián Szűcs> conditional import of test dependencies efa05d28e <Krisztián Szűcs> use minio for dask.s3fs test too f25ae5aed <Krisztián Szűcs> travis 9042c7e4c <Krisztián Szűcs> use S3FS_DIR 9ce7180d1 <Krisztián Szűcs> cmake format; fix orc cimport 45a2a17ba <Krisztián Szűcs> docstrings c0b91621e <Krisztián Szűcs> test requirements; flake8 44aedfd10 <Krisztián Szűcs> stat test a343950e2 <Krisztián Szűcs> testing suite 1551b525c <Krisztián Szűcs> wip dd41d21c2 <Krisztián Szűcs> imports 5200af16f <Krisztián Szűcs> s3 filesystem bindings Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
2019-10-01 15:22:19 +02:00
CRandomAccessFile, COutputStream,
TimeUnit)
ARROW-11297: [C++][Python] Add ORC writer options Closes #9702 from iajoiner/ARROW-11297 Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Ying Zhou <yingzhou474@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Dmitry Patsura <zaets28rus@gmail.com> Co-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: mqy <meng.qingyou@gmail.com> Co-authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Co-authored-by: Yordan Pavlov <yordan.pavlov@outlook.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Kenta Murata <mrkn@mrkn.jp> Co-authored-by: Ryan Jennings <ryan@ryanj.net> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Max Burke <max@urbanlogiq.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Co-authored-by: Mahmut Bulut <vertexclique@gmail.com> Co-authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Co-authored-by: Johannes Müller <JohannesMueller@fico.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
2022-01-20 15:41:39 +01:00
cdef extern from "arrow/adapters/orc/options.h" \
namespace "arrow::adapters::orc" nogil:
cdef enum CompressionStrategy \
" arrow::adapters::orc::CompressionStrategy":
_CompressionStrategy_SPEED \
" arrow::adapters::orc::CompressionStrategy::kSpeed"
_CompressionStrategy_COMPRESSION \
" arrow::adapters::orc::CompressionStrategy::kCompression"
cdef enum WriterId" arrow::adapters::orc::WriterId":
_WriterId_ORC_JAVA_WRITER" arrow::adapters::orc::WriterId::kOrcJava"
_WriterId_ORC_CPP_WRITER" arrow::adapters::orc::WriterId::kOrcCpp"
_WriterId_PRESTO_WRITER" arrow::adapters::orc::WriterId::kPresto"
_WriterId_SCRITCHLEY_GO \
" arrow::adapters::orc::WriterId::kScritchleyGo"
_WriterId_TRINO_WRITER" arrow::adapters::orc::WriterId::kTrino"
_WriterId_UNKNOWN_WRITER" arrow::adapters::orc::WriterId::kUnknown"
cdef enum WriterVersion" arrow::adapters::orc::WriterVersion":
_WriterVersion_ORIGINAL \
" arrow::adapters::orc::WriterVersion::kOriginal"
_WriterVersion_HIVE_8732 \
" arrow::adapters::orc::WriterVersion::kHive8732"
_WriterVersion_HIVE_4243 \
" arrow::adapters::orc::WriterVersion::kHive4243"
_WriterVersion_HIVE_12055 \
" arrow::adapters::orc::WriterVersion::kHive12055"
_WriterVersion_HIVE_13083 \
" arrow::adapters::orc::WriterVersion::kHive13083"
_WriterVersion_ORC_101" arrow::adapters::orc::WriterVersion::kOrc101"
_WriterVersion_ORC_135" arrow::adapters::orc::WriterVersion::kOrc135"
_WriterVersion_ORC_517" arrow::adapters::orc::WriterVersion::kOrc517"
_WriterVersion_ORC_203" arrow::adapters::orc::WriterVersion::kOrc203"
_WriterVersion_ORC_14" arrow::adapters::orc::WriterVersion::kOrc14"
_WriterVersion_MAX" arrow::adapters::orc::WriterVersion::kMax"
cdef cppclass FileVersion" arrow::adapters::orc::FileVersion":
FileVersion(uint32_t major_version, uint32_t minor_version)
uint32_t major_version()
uint32_t minor_version()
ARROW-11297: [C++][Python] Add ORC writer options Closes #9702 from iajoiner/ARROW-11297 Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Ying Zhou <yingzhou474@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Dmitry Patsura <zaets28rus@gmail.com> Co-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: mqy <meng.qingyou@gmail.com> Co-authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Co-authored-by: Yordan Pavlov <yordan.pavlov@outlook.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Kenta Murata <mrkn@mrkn.jp> Co-authored-by: Ryan Jennings <ryan@ryanj.net> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Max Burke <max@urbanlogiq.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Co-authored-by: Mahmut Bulut <vertexclique@gmail.com> Co-authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Co-authored-by: Johannes Müller <JohannesMueller@fico.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
2022-01-20 15:41:39 +01:00
c_string ToString()
cdef struct WriteOptions" arrow::adapters::orc::WriteOptions":
int64_t batch_size
FileVersion file_version
int64_t stripe_size
CCompressionType compression
int64_t compression_block_size
CompressionStrategy compression_strategy
int64_t row_index_stride
double padding_tolerance
double dictionary_key_size_threshold
std_vector[int64_t] bloom_filter_columns
double bloom_filter_fpp
cdef extern from "arrow/adapters/orc/adapter.h" \
namespace "arrow::adapters::orc" nogil:
cdef cppclass ORCFileReader:
@staticmethod
CResult[unique_ptr[ORCFileReader]] Open(
const shared_ptr[CRandomAccessFile]& file,
CMemoryPool* pool)
CResult[shared_ptr[const CKeyValueMetadata]] ReadMetadata()
CResult[shared_ptr[CSchema]] ReadSchema()
CResult[shared_ptr[CRecordBatch]] ReadStripe(int64_t stripe)
CResult[shared_ptr[CRecordBatch]] ReadStripe(
int64_t stripe, std_vector[c_string])
CResult[shared_ptr[CTable]] Read()
CResult[shared_ptr[CTable]] Read(std_vector[c_string])
int64_t NumberOfStripes()
int64_t NumberOfRows()
ARROW-11297: [C++][Python] Add ORC writer options Closes #9702 from iajoiner/ARROW-11297 Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Ying Zhou <yingzhou474@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Dmitry Patsura <zaets28rus@gmail.com> Co-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: mqy <meng.qingyou@gmail.com> Co-authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Co-authored-by: Yordan Pavlov <yordan.pavlov@outlook.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Kenta Murata <mrkn@mrkn.jp> Co-authored-by: Ryan Jennings <ryan@ryanj.net> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Max Burke <max@urbanlogiq.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Co-authored-by: Mahmut Bulut <vertexclique@gmail.com> Co-authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Co-authored-by: Johannes Müller <JohannesMueller@fico.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
2022-01-20 15:41:39 +01:00
FileVersion GetFileVersion()
c_string GetSoftwareVersion()
CResult[CCompressionType] GetCompression()
int64_t GetCompressionSize()
int64_t GetRowIndexStride()
WriterId GetWriterId()
int32_t GetWriterIdValue()
WriterVersion GetWriterVersion()
int64_t GetNumberOfStripeStatistics()
int64_t GetContentLength()
int64_t GetStripeStatisticsLength()
int64_t GetFileFooterLength()
int64_t GetFilePostscriptLength()
int64_t GetFileLength()
c_string GetSerializedFileTail()
ARROW-7906: [C++] [Python] Add ORC write support This pull request tracks the progress on adding ORC write support. The functionality is not complete yet. However for most types the process of populating a ColumnVectorBatch in ORC using data from Arrow Array. Arrow data types (arrow::Type::type) I do support: Boolean: BOOL Numerical: INT8, INT16, INT32, INT64, FLOAT, DOUBLE Time-related: DATE32 Binary: BINARY, STRING, LARGE_BINARY, LARGE_STRING, FIXED_SIZE_BINARY Nested: LIST, LARGE_LIST, FIXED_SIZE_LIST, STRUCT, MAP, DENSE_UNION, SPARSE_UNION Arrow data types I plan to support: Numerical: DECIMAL128 Time-related: DATE64, TIMESTAMP Dictionary: DICTIONARY Arrow data types I currently do NOT plan to support: Numerical: UINT8, UINT16, UINT32, UINT64, HALF_FLOAT, DECIMAL256 (There are no corresponding types in ORC. Of course except for in the case of DECIMAL256 we can always cast them into larger types. However I think maybe users need to explicitly do that.) Time-related: TIME32, TIME64, INTERVAL_MONTHS, INTERVAL_DAY_TIME, DURATION (There are no corresponding types in ORC and it is impossible to cast them into ORC types without losing time-related information) Extension: EXTENSION Closes #8648 from mathyingzhou/ARROW-7906_pyarrow_write_orc Lead-authored-by: Ying Zhou <yingzhou474@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Dmitry Patsura <zaets28rus@gmail.com> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: Yordan Pavlov <yordan.pavlov@outlook.com> Co-authored-by: mqy <meng.qingyou@gmail.com> Co-authored-by: Kenta Murata <mrkn@mrkn.jp> Co-authored-by: Johannes Müller <JohannesMueller@fico.com> Co-authored-by: Mahmut Bulut <vertexclique@gmail.com> Co-authored-by: Ryan Jennings <ryan@ryanj.net> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Co-authored-by: Max Burke <max@urbanlogiq.com> Co-authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
2021-04-19 13:50:16 +02:00
cdef cppclass ORCFileWriter:
@staticmethod
ARROW-11297: [C++][Python] Add ORC writer options Closes #9702 from iajoiner/ARROW-11297 Lead-authored-by: Ian Alexander Joiner <iajoiner809@gmail.com> Co-authored-by: Ying Zhou <yingzhou474@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Dmitry Patsura <zaets28rus@gmail.com> Co-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: mqy <meng.qingyou@gmail.com> Co-authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Co-authored-by: Yordan Pavlov <yordan.pavlov@outlook.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Kenta Murata <mrkn@mrkn.jp> Co-authored-by: Ryan Jennings <ryan@ryanj.net> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Max Burke <max@urbanlogiq.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Co-authored-by: Mahmut Bulut <vertexclique@gmail.com> Co-authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Co-authored-by: Johannes Müller <JohannesMueller@fico.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
2022-01-20 15:41:39 +01:00
CResult[unique_ptr[ORCFileWriter]] Open(
COutputStream* output_stream, const WriteOptions& writer_options)
ARROW-7906: [C++] [Python] Add ORC write support This pull request tracks the progress on adding ORC write support. The functionality is not complete yet. However for most types the process of populating a ColumnVectorBatch in ORC using data from Arrow Array. Arrow data types (arrow::Type::type) I do support: Boolean: BOOL Numerical: INT8, INT16, INT32, INT64, FLOAT, DOUBLE Time-related: DATE32 Binary: BINARY, STRING, LARGE_BINARY, LARGE_STRING, FIXED_SIZE_BINARY Nested: LIST, LARGE_LIST, FIXED_SIZE_LIST, STRUCT, MAP, DENSE_UNION, SPARSE_UNION Arrow data types I plan to support: Numerical: DECIMAL128 Time-related: DATE64, TIMESTAMP Dictionary: DICTIONARY Arrow data types I currently do NOT plan to support: Numerical: UINT8, UINT16, UINT32, UINT64, HALF_FLOAT, DECIMAL256 (There are no corresponding types in ORC. Of course except for in the case of DECIMAL256 we can always cast them into larger types. However I think maybe users need to explicitly do that.) Time-related: TIME32, TIME64, INTERVAL_MONTHS, INTERVAL_DAY_TIME, DURATION (There are no corresponding types in ORC and it is impossible to cast them into ORC types without losing time-related information) Extension: EXTENSION Closes #8648 from mathyingzhou/ARROW-7906_pyarrow_write_orc Lead-authored-by: Ying Zhou <yingzhou474@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Dmitry Patsura <zaets28rus@gmail.com> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Yibo Cai <yibo.cai@arm.com> Co-authored-by: Yordan Pavlov <yordan.pavlov@outlook.com> Co-authored-by: mqy <meng.qingyou@gmail.com> Co-authored-by: Kenta Murata <mrkn@mrkn.jp> Co-authored-by: Johannes Müller <JohannesMueller@fico.com> Co-authored-by: Mahmut Bulut <vertexclique@gmail.com> Co-authored-by: Ryan Jennings <ryan@ryanj.net> Co-authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Jörn Horstmann <joern.horstmann@signavio.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Matt Brubeck <mbrubeck@limpet.net> Co-authored-by: Max Burke <max@urbanlogiq.com> Co-authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
2021-04-19 13:50:16 +02:00
CStatus Write(const CTable& table)
CStatus Close()