mirror of
https://github.com/apache/arrow.git
synced 2026-04-08 05:27:07 +00:00
### Rationale for this change
Performance is important in Apache Arrow. So benchmark is useful for developing Apache Arrow implementation.
### What changes are included in this PR?
* Add benchmarks for file and streaming readers.
* Add support for `mmap` in streaming reader.
Here are benchmark results on my environment.
Pure Ruby implementation is about 5-6x slower than release build C++ implementation but a bit faster than debug build C++ implementation.
Release build C++/GLib:
File format:
```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-reader.yaml
ruby 4.1.0dev (2026-02-19T09:04:23Z master 6bb0b6b16c) +PRISM [x86_64-linux]
Warming up --------------------------------------
Arrow::Table.load 11.207k i/s - 12.188k times in 1.087487s (89.23μs/i)
Arrow::RecordBatchFileReader 19.724k i/s - 21.296k times in 1.079727s (50.70μs/i)
ArrowFormat::FileReader 3.555k i/s - 3.883k times in 1.092223s (281.28μs/i)
Calculating -------------------------------------
Arrow::Table.load 11.483k i/s - 33.622k times in 2.928024s (87.09μs/i)
Arrow::RecordBatchFileReader 19.673k i/s - 59.170k times in 3.007729s (50.83μs/i)
ArrowFormat::FileReader 3.574k i/s - 10.665k times in 2.984214s (279.81μs/i)
Comparison:
Arrow::RecordBatchFileReader: 19672.6 i/s
Arrow::Table.load: 11482.8 i/s - 1.71x slower
ArrowFormat::FileReader: 3573.8 i/s - 5.50x slower
```
Streaming format:
```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-reader.yaml
ruby 4.1.0dev (2026-02-19T09:04:23Z master 6bb0b6b16c) +PRISM [x86_64-linux]
Warming up --------------------------------------
Arrow::Table.load 11.360k i/s - 12.485k times in 1.099067s (88.03μs/i)
Arrow::RecordBatchStreamReader 20.180k i/s - 21.857k times in 1.083126s (49.56μs/i)
ArrowFormat::StreamingReader 3.398k i/s - 3.400k times in 1.000479s (294.26μs/i)
Calculating -------------------------------------
Arrow::Table.load 11.397k i/s - 34.078k times in 2.990170s (87.74μs/i)
Arrow::RecordBatchStreamReader 20.039k i/s - 60.538k times in 3.020964s (49.90μs/i)
ArrowFormat::StreamingReader 3.340k i/s - 10.195k times in 3.052059s (299.37μs/i)
Comparison:
Arrow::RecordBatchStreamReader: 20039.3 i/s
Arrow::Table.load: 11396.7 i/s - 1.76x slower
ArrowFormat::StreamingReader: 3340.4 i/s - 6.00x slower
```
Debug build C++/GLib:
File format:
```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-reader.yaml
ruby 4.1.0dev (2026-02-19T09:04:23Z master 6bb0b6b16c) +PRISM [x86_64-linux]
Warming up --------------------------------------
Arrow::Table.load 2.175k i/s - 2.200k times in 1.011375s (459.72μs/i)
Arrow::RecordBatchFileReader 3.129k i/s - 3.421k times in 1.093397s (319.61μs/i)
ArrowFormat::FileReader 3.384k i/s - 3.430k times in 1.013625s (295.52μs/i)
Calculating -------------------------------------
Arrow::Table.load 2.145k i/s - 6.525k times in 3.041760s (466.17μs/i)
Arrow::RecordBatchFileReader 3.020k i/s - 9.386k times in 3.108456s (331.18μs/i)
ArrowFormat::FileReader 3.368k i/s - 10.151k times in 3.013576s (296.87μs/i)
Comparison:
ArrowFormat::FileReader: 3368.4 i/s
Arrow::RecordBatchFileReader: 3019.5 i/s - 1.12x slower
Arrow::Table.load: 2145.1 i/s - 1.57x slower
```
Streaming format:
```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-reader.yaml
ruby 4.1.0dev (2026-02-19T09:04:23Z master 6bb0b6b16c) +PRISM [x86_64-linux]
Warming up --------------------------------------
Arrow::Table.load 2.115k i/s - 2.140k times in 1.011815s (472.81μs/i)
Arrow::RecordBatchStreamReader 3.052k i/s - 3.355k times in 1.099273s (327.65μs/i)
ArrowFormat::StreamingReader 3.283k i/s - 3.290k times in 1.002016s (304.56μs/i)
Calculating -------------------------------------
Arrow::Table.load 2.198k i/s - 6.345k times in 2.886603s (454.94μs/i)
Arrow::RecordBatchStreamReader 3.105k i/s - 9.156k times in 2.948523s (322.03μs/i)
ArrowFormat::StreamingReader 3.225k i/s - 9.850k times in 3.054339s (310.09μs/i)
Comparison:
ArrowFormat::StreamingReader: 3224.9 i/s
Arrow::RecordBatchStreamReader: 3105.3 i/s - 1.04x slower
Arrow::Table.load: 2198.1 i/s - 1.47x slower
```
### Are these changes tested?
Yes.
### Are there any user-facing changes?
No.
* GitHub Issue: #49544
Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
439 lines
14 KiB
YAML
439 lines
14 KiB
YAML