Blame: python/benchmarks/streaming.py - apache/arrow

apache / arrow UNCLAIMED

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

0 0 19 C++

Normal View History Raw

ARROW-1035: [Python] Add streaming dataframe reconstruction benchmark Author: Antoine Pitrou <antoine@python.org> Closes #1665 from pitrou/ARROW-1035-streaming-benchmark and squashes the following commits: 32b1956 <Antoine Pitrou> ARROW-1035: Add streaming dataframe reconstruction benchmark 2018-02-27 10:38:39 +01:00			`# Licensed to the Apache Software Foundation (ASF) under one`
			`# or more contributor license agreements. See the NOTICE file`
			`# distributed with this work for additional information`
			`# regarding copyright ownership. The ASF licenses this file`
			`# to you under the Apache License, Version 2.0 (the`
			`# "License"); you may not use this file except in compliance`
			`# with the License. You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing,`
			`# software distributed under the License is distributed on an`
			`# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`# KIND, either express or implied. See the License for the`
			`# specific language governing permissions and limitations`
			`# under the License.`

			`import numpy as np`
			`import pandas as pd`
			`import pyarrow as pa`

			`from . import common`
			`from .common import KILOBYTE, MEGABYTE`


			`def generate_chunks(total_size, nchunks, ncols, dtype=np.dtype('int64')):`
			`rowsize = total_size // nchunks // ncols`
			`assert rowsize % dtype.itemsize == 0`
ARROW-2395: [Python] Fix flake8 warnings outside of pyarrow/ directory. Check in CI Author: Wes McKinney <wesm+git@apache.org> Closes #2137 from wesm/ARROW-2395 and squashes the following commits: d9e6e9f3 <Wes McKinney> Fix Cython flake8 warnings in examples 059a6d6b <Wes McKinney> Make functions in benchmarks/common.py py2-compatible 13791863 <Wes McKinney> Fix flake8 warnings outside of pyarrow/ directory. Check in CI 2018-06-15 11:24:32 +02:00
			`def make_column(col, chunk):`
			`return np.frombuffer(common.get_random_bytes(`
			`rowsize, seed=col + 997 * chunk)).view(dtype)`

ARROW-1035: [Python] Add streaming dataframe reconstruction benchmark Author: Antoine Pitrou <antoine@python.org> Closes #1665 from pitrou/ARROW-1035-streaming-benchmark and squashes the following commits: 32b1956 <Antoine Pitrou> ARROW-1035: Add streaming dataframe reconstruction benchmark 2018-02-27 10:38:39 +01:00			`return [pd.DataFrame({`
ARROW-2395: [Python] Fix flake8 warnings outside of pyarrow/ directory. Check in CI Author: Wes McKinney <wesm+git@apache.org> Closes #2137 from wesm/ARROW-2395 and squashes the following commits: d9e6e9f3 <Wes McKinney> Fix Cython flake8 warnings in examples 059a6d6b <Wes McKinney> Make functions in benchmarks/common.py py2-compatible 13791863 <Wes McKinney> Fix flake8 warnings outside of pyarrow/ directory. Check in CI 2018-06-15 11:24:32 +02:00			`'c' + str(col): make_column(col, chunk)`
			`for col in range(ncols)})`
			`for chunk in range(nchunks)]`
ARROW-1035: [Python] Add streaming dataframe reconstruction benchmark Author: Antoine Pitrou <antoine@python.org> Closes #1665 from pitrou/ARROW-1035-streaming-benchmark and squashes the following commits: 32b1956 <Antoine Pitrou> ARROW-1035: Add streaming dataframe reconstruction benchmark 2018-02-27 10:38:39 +01:00

			`class StreamReader(object):`
			`"""`
			`Benchmark in-memory streaming to a Pandas dataframe.`
			`"""`
			`total_size = 64 * MEGABYTE`
			`ncols = 8`
			`chunk_sizes = [16 * KILOBYTE, 256 * KILOBYTE, 8 * MEGABYTE]`

			`param_names = ['chunk_size']`
			`params = [chunk_sizes]`

			`def setup(self, chunk_size):`
			`# Note we're careful to stream different chunks instead of`
			`# streaming N times the same chunk, so that we avoid operating`
			`# entirely out of L1/L2.`
			`chunks = generate_chunks(self.total_size,`
			`nchunks=self.total_size // chunk_size,`
			`ncols=self.ncols)`
			`batches = [pa.RecordBatch.from_pandas(df)`
			`for df in chunks]`
			`schema = batches[0].schema`
			`sink = pa.BufferOutputStream()`
			`stream_writer = pa.RecordBatchStreamWriter(sink, schema)`
			`for batch in batches:`
			`stream_writer.write_batch(batch)`
ARROW-4576: [Python] Fix error during benchmarks Author: Antoine Pitrou <antoine@python.org> Closes #3650 from pitrou/ARROW-4576-py-benchmarks-fix and squashes the following commits: 67e8069d1 <Antoine Pitrou> ARROW-4576: Fix error during benchmarks 2019-02-15 00:52:04 -06:00			`self.source = sink.getvalue()`
ARROW-1035: [Python] Add streaming dataframe reconstruction benchmark Author: Antoine Pitrou <antoine@python.org> Closes #1665 from pitrou/ARROW-1035-streaming-benchmark and squashes the following commits: 32b1956 <Antoine Pitrou> ARROW-1035: Add streaming dataframe reconstruction benchmark 2018-02-27 10:38:39 +01:00
			`def time_read_to_dataframe(self, *args):`
			`reader = pa.RecordBatchStreamReader(self.source)`
			`table = reader.read_all()`
ARROW-2395: [Python] Fix flake8 warnings outside of pyarrow/ directory. Check in CI Author: Wes McKinney <wesm+git@apache.org> Closes #2137 from wesm/ARROW-2395 and squashes the following commits: d9e6e9f3 <Wes McKinney> Fix Cython flake8 warnings in examples 059a6d6b <Wes McKinney> Make functions in benchmarks/common.py py2-compatible 13791863 <Wes McKinney> Fix flake8 warnings outside of pyarrow/ directory. Check in CI 2018-06-15 11:24:32 +02:00			`df = table.to_pandas() # noqa`