Files
Sutou Kouhei 10eaafd2b4 GH-49544: [Ruby] Add benchmark for readers (#49545)
### Rationale for this change

Performance is important in Apache Arrow. So benchmark is useful for developing Apache Arrow implementation.

### What changes are included in this PR?

* Add benchmarks for file and streaming readers.
* Add support for `mmap` in streaming reader. 

Here are benchmark results on my environment.

Pure Ruby implementation is about 5-6x slower than release build C++ implementation but a bit faster than debug build C++ implementation.

Release build C++/GLib:

File format:

```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-reader.yaml
ruby 4.1.0dev (2026-02-19T09:04:23Z master 6bb0b6b16c) +PRISM [x86_64-linux]
Warming up --------------------------------------
           Arrow::Table.load    11.207k i/s -     12.188k times in 1.087487s (89.23μs/i)
Arrow::RecordBatchFileReader    19.724k i/s -     21.296k times in 1.079727s (50.70μs/i)
     ArrowFormat::FileReader     3.555k i/s -      3.883k times in 1.092223s (281.28μs/i)
Calculating -------------------------------------
           Arrow::Table.load    11.483k i/s -     33.622k times in 2.928024s (87.09μs/i)
Arrow::RecordBatchFileReader    19.673k i/s -     59.170k times in 3.007729s (50.83μs/i)
     ArrowFormat::FileReader     3.574k i/s -     10.665k times in 2.984214s (279.81μs/i)

Comparison:
Arrow::RecordBatchFileReader:     19672.6 i/s
           Arrow::Table.load:     11482.8 i/s - 1.71x  slower
     ArrowFormat::FileReader:      3573.8 i/s - 5.50x  slower

```

Streaming format:

```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-reader.yaml
ruby 4.1.0dev (2026-02-19T09:04:23Z master 6bb0b6b16c) +PRISM [x86_64-linux]
Warming up --------------------------------------
             Arrow::Table.load    11.360k i/s -     12.485k times in 1.099067s (88.03μs/i)
Arrow::RecordBatchStreamReader    20.180k i/s -     21.857k times in 1.083126s (49.56μs/i)
  ArrowFormat::StreamingReader     3.398k i/s -      3.400k times in 1.000479s (294.26μs/i)
Calculating -------------------------------------
             Arrow::Table.load    11.397k i/s -     34.078k times in 2.990170s (87.74μs/i)
Arrow::RecordBatchStreamReader    20.039k i/s -     60.538k times in 3.020964s (49.90μs/i)
  ArrowFormat::StreamingReader     3.340k i/s -     10.195k times in 3.052059s (299.37μs/i)

Comparison:
Arrow::RecordBatchStreamReader:     20039.3 i/s
             Arrow::Table.load:     11396.7 i/s - 1.76x  slower
  ArrowFormat::StreamingReader:      3340.4 i/s - 6.00x  slower

```

Debug build C++/GLib:

File format:

```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-reader.yaml
ruby 4.1.0dev (2026-02-19T09:04:23Z master 6bb0b6b16c) +PRISM [x86_64-linux]
Warming up --------------------------------------
           Arrow::Table.load     2.175k i/s -      2.200k times in 1.011375s (459.72μs/i)
Arrow::RecordBatchFileReader     3.129k i/s -      3.421k times in 1.093397s (319.61μs/i)
     ArrowFormat::FileReader     3.384k i/s -      3.430k times in 1.013625s (295.52μs/i)
Calculating -------------------------------------
           Arrow::Table.load     2.145k i/s -      6.525k times in 3.041760s (466.17μs/i)
Arrow::RecordBatchFileReader     3.020k i/s -      9.386k times in 3.108456s (331.18μs/i)
     ArrowFormat::FileReader     3.368k i/s -     10.151k times in 3.013576s (296.87μs/i)

Comparison:
     ArrowFormat::FileReader:      3368.4 i/s
Arrow::RecordBatchFileReader:      3019.5 i/s - 1.12x  slower
           Arrow::Table.load:      2145.1 i/s - 1.57x  slower

```

Streaming format:

```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-reader.yaml
ruby 4.1.0dev (2026-02-19T09:04:23Z master 6bb0b6b16c) +PRISM [x86_64-linux]
Warming up --------------------------------------
             Arrow::Table.load     2.115k i/s -      2.140k times in 1.011815s (472.81μs/i)
Arrow::RecordBatchStreamReader     3.052k i/s -      3.355k times in 1.099273s (327.65μs/i)
  ArrowFormat::StreamingReader     3.283k i/s -      3.290k times in 1.002016s (304.56μs/i)
Calculating -------------------------------------
             Arrow::Table.load     2.198k i/s -      6.345k times in 2.886603s (454.94μs/i)
Arrow::RecordBatchStreamReader     3.105k i/s -      9.156k times in 2.948523s (322.03μs/i)
  ArrowFormat::StreamingReader     3.225k i/s -      9.850k times in 3.054339s (310.09μs/i)

Comparison:
  ArrowFormat::StreamingReader:      3224.9 i/s
Arrow::RecordBatchStreamReader:      3105.3 i/s - 1.04x  slower
             Arrow::Table.load:      2198.1 i/s - 1.47x  slower

```

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #49544

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
2026-03-21 17:59:16 +09:00

439 lines
14 KiB
YAML