Blame: python/paddle/profiler/timer.py - PaddlePaddle/Paddle

PaddlePaddle / Paddle UNCLAIMED

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

23800 0 0 C++

Normal View History Raw

Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.`
【code format check upgrade】 step2：yapf (#42944) * use yapf to format all python file * yapf exclude two unittests file for they rely on writing and reading file, and format will break them * disable diff_py_file because too many diff files cause command following failed 2022-06-05 10:58:58 +08:00			`#`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`# Licensed under the Apache License, Version 2.0 (the "License");`
			`# you may not use this file except in compliance with the License.`
			`# You may obtain a copy of the License at`
【code format check upgrade】 step2：yapf (#42944) * use yapf to format all python file * yapf exclude two unittests file for they rely on writing and reading file, and format will break them * disable diff_py_file because too many diff files cause command following failed 2022-06-05 10:58:58 +08:00			`#`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`# http://www.apache.org/licenses/LICENSE-2.0`
【code format check upgrade】 step2：yapf (#42944) * use yapf to format all python file * yapf exclude two unittests file for they rely on writing and reading file, and format will break them * disable diff_py_file because too many diff files cause command following failed 2022-06-05 10:58:58 +08:00			`#`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`# Unless required by applicable law or agreed to in writing, software`
			`# distributed under the License is distributed on an "AS IS" BASIS,`
			`# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`# See the License for the specific language governing permissions and`
			`# limitations under the License.`

			`import timeit`
			`from collections import OrderedDict`


[CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition (#47642) * [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition * fix an increment 2022-11-08 11:29:41 +08:00			`class Stack:`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`"""`
			`The stack in a Last-In/First-Out (LIFO) manner. New element is added at`
			`the end and an element is removed from that end.`
			`"""`

			`def __init__(self):`
			`self.items = []`

			`def push(self, item):`
			`self.items.append(item)`

			`def pop(self):`
			`return self.items.pop()`

			`def is_empty(self):`
			`return len(self.items) == 0`

			`def peek(self):`
			`if not self.is_empty():`
			`return self.items[len(self.items) - 1]`
			`else:`
			`return None`


[CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition (#47642) * [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition * fix an increment 2022-11-08 11:29:41 +08:00			`class Event:`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`"""`
			`A Event is used to record the cost of every step and the cost of`
			`the total steps except skipped steps.`
			`"""`

			`def __init__(self):`
			`self.reader_cost_averager = TimeAverager()`
			`self.batch_cost_averager = TimeAverager()`
			`self.total_samples = 0`
			`self.total_iters = 0`
			`self.skip_iter = 10`
[CodeStyle][C408][C409][C410] Fix unnecessary <dict/list/tuple> call and unnecessary <list/tuple> passed to <list/tupule>() (#51928) * autofix * add select config * autofix C410 * add C410 select 2023-03-23 10:16:17 +08:00			`self.reader_records = {'max': 0, 'min': float('inf'), 'total': 0}`
			`self.batch_records = {'max': 0, 'min': float('inf'), 'total': 0}`
			`self.speed_records = {'max': 0, 'min': float('inf')}`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`self.reader = None`
			`self.need_record = True`
			`# The speed mode depends on the setting of num_samples, there`
			`# are 2 modes: steps/s(num_samples=None) or samples/s.`
			`self.speed_mode = 'samples/s'`
			`# The speed unit depends on the unit of samples that is`
			`# specified in step_info and only works in this speed_mode="samples/s".`
			`self.speed_unit = 'samples/s'`

			`def reset(self):`
			`self.reader_cost_averager.reset()`
			`self.batch_cost_averager.reset()`

			`def record_reader(self, usetime):`
			`self.reader_cost_averager.record(usetime)`
			`if self.total_iters >= self.skip_iter:`
			`self._update_records(usetime, self.reader_records)`

			`def record_batch(self, usetime, num_samples=None):`
			`if num_samples is None:`
			`self.speed_mode = "steps/s"`
			`self.speed_unit = "steps/s"`
			`self.batch_cost_averager.record(usetime, num_samples)`
			`self.total_iters += 1`

			`if self.total_iters >= self.skip_iter:`
			`self._update_records(usetime, self.batch_records)`
			`if self.speed_mode == "samples/s":`
			`current_speed = float(num_samples) / usetime`
			`self.total_samples += num_samples`
			`else:`
			`current_speed = 1.0 / usetime # steps/s`
			`self._update_records(current_speed, self.speed_records)`

			`def _update_records(self, current_record, records):`
			`if current_record > records['max']:`
			`records['max'] = current_record`
			`elif current_record < records['min']:`
			`records['min'] = current_record`
			`if 'total' in records.keys():`
			`records['total'] += current_record`

			`def reader_average(self):`
			`return self.reader_cost_averager.get_average()`

			`def batch_average(self):`
			`return self.batch_cost_averager.get_average()`

			`def speed_average(self):`
			`if self.speed_mode == "samples/s":`
			`return self.batch_cost_averager.get_ips_average()`
			`else:`
			`return self.batch_cost_averager.get_step_average()`

			`def get_summary(self):`
			`if self.total_iters <= self.skip_iter:`
			`return {}`

			`reader_avg = 0`
			`batch_avg = 0`
			`speed_avg = 0`

			`self.total_iters -= self.skip_iter`
			`reader_avg = self.reader_records['total'] / float(self.total_iters)`
			`batch_avg = self.batch_records['total'] / float(self.total_iters)`
			`if self.speed_mode == "samples/s":`
			`speed_avg = float(self.total_samples) / self.batch_records['total']`
			`else:`
			`speed_avg = float(self.total_iters) / self.batch_records['total']`

[CodeStyle][C408][C409][C410] Fix unnecessary <dict/list/tuple> call and unnecessary <list/tuple> passed to <list/tupule>() (#51928) * autofix * add select config * autofix C410 * add C410 select 2023-03-23 10:16:17 +08:00			`reader_summary = {`
			`'max': self.reader_records['max'],`
			`'min': self.reader_records['min'],`
			`'avg': reader_avg,`
			`}`
			`batch_summary = {`
			`'max': self.batch_records['max'],`
			`'min': self.batch_records['min'],`
			`'avg': batch_avg,`
			`}`
			`ips_summary = {`
			`'max': self.speed_records['max'],`
			`'min': self.speed_records['min'],`
			`'avg': speed_avg,`
			`}`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`reader_ratio = (reader_avg / batch_avg) * 100`
[CodeStyle][C408][C409][C410] Fix unnecessary <dict/list/tuple> call and unnecessary <list/tuple> passed to <list/tupule>() (#51928) * autofix * add select config * autofix C410 * add C410 select 2023-03-23 10:16:17 +08:00			`summary = {`
			`'reader_summary': reader_summary,`
			`'batch_summary': batch_summary,`
			`'ips_summary': ips_summary,`
			`'reader_ratio': reader_ratio,`
			`}`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00
			`return summary`


[CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition (#47642) * [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition * fix an increment 2022-11-08 11:29:41 +08:00			`class Hook:`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`"""`
			`As the base class. All types of hooks should inherit from it.`
			`"""`

			`def begin(self, benchmark):`
			`pass`

			`def end(self, benchmark):`
			`pass`

			`def before_reader(self, benchmark):`
			`pass`

			`def after_reader(self, benchmark):`
			`pass`

			`def after_step(self, benchmark):`
			`pass`


			`class TimerHook(Hook):`
			`"""`
			`A hook for recording real-time performance and the summary`
[CodeStyle][W291] trim trailing whitespace in python file (#45937) * trim trailing whitespace * fix `.cmake-format.py` * revert npu ut changes, avoid npu ci error 2022-09-14 21:56:19 +08:00			`performance of total steps.`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`"""`

			`def __init__(self):`
			`self.start_time = timeit.default_timer()`
			`self.start_reader = timeit.default_timer()`

			`def begin(self, benchmark):`
			`"""`
			`Create the event for timing and initialize the start time of a step.`
			This function will be called in `Profiler.start()`.
			`"""`

			`benchmark.events.push(Event())`
			`benchmark.current_event = benchmark.events.peek()`
			`self.start_time = timeit.default_timer()`

			`def before_reader(self, benchmark):`
			`"""`
			`Initialize the start time of the dataloader. This function will be`
Fix some typos in paddle/. (#42408) 2022-05-12 23:25:03 +08:00			called at the beginning of `next` method in `_DataLoaderIterMultiProcess` or
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`_DataLoaderIterSingleProcess`.

			`"""`

			`self.start_reader = timeit.default_timer()`

			`def after_reader(self, benchmark):`
			`"""`
			`Record the cost of dataloader for the current step. Since the skipped steps`
			`are 10, it will update the maximum, minimum and the total time from the step`
			11 to the current step. This function will be called at the end of `next`
			method in `_DataLoaderIterMultiProcess` or `_DataLoaderIterSingleProcess`.

			`"""`

			`reader_cost = timeit.default_timer() - self.start_reader`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`if (`
			`(benchmark.current_event is None)`
			`or (not benchmark.current_event.need_record)`
			`or (reader_cost == 0)`
			`):`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`return`
			`benchmark.current_event.record_reader(reader_cost)`

			`def after_step(self, benchmark):`
			`"""`
			`Record the cost for the current step. It will contain the cost of the loading`
			data if there is a dataloader. Similar to `after_reader`, it will also update
			`the maximum, minimum and the total time from the step 11 to the current step`
Fix some typos in paddle/. (#42408) 2022-05-12 23:25:03 +08:00			`as well as the maximum and minimum speed of the model. This function will`
			be called in `Profiler.step()`.
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00
			`"""`

[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`if (benchmark.current_event is None) or (`
			`not benchmark.current_event.need_record`
			`):`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`return`
			`batch_cost = timeit.default_timer() - self.start_time`
			`benchmark.current_event.record_batch(batch_cost, benchmark.num_samples)`
			`self.start_time = timeit.default_timer()`

			`def end(self, benchmark):`
			`"""`
			`Print the performance summary of the model and pop the current event`
			`from the events stack. Since there may be nested timing events, such`
			`as evaluation in the training process, the current event needs to be`
			`update to the event at the top of the stack.`

			`"""`

			`if benchmark.events.is_empty():`
			`return`
			`self._print_summary(benchmark)`
			`benchmark.events.pop()`
			`benchmark.current_event = benchmark.events.peek()`
			`self.start_time = timeit.default_timer()`

			`def _print_summary(self, benchmark):`
			`summary = benchmark.current_event.get_summary()`
			`if not summary:`
			`return`
			`print('Perf Summary'.center(100, '='))`
			`if summary['reader_ratio'] != 0:`
[CodeStyle][UP031] fix some `python/paddle/` - part 8 (#65552) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> 2024-06-29 02:48:21 +08:00			`print(`
			`'Reader Ratio: '`
			`+ '{:.3f}'.format(summary['reader_ratio'])`
			`+ '%'`
			`)`
			`print(f'Time Unit: s, IPS Unit: {benchmark.current_event.speed_unit}')`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`print(`
			`'\|',`
			`''.center(15),`
			`'\|',`
			`'avg'.center(15),`
			`'\|',`
			`'max'.center(15),`
			`'\|',`
			`'min'.center(15),`
			`'\|',`
			`)`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`# if DataLoader is not called, reader_summary is unnecessary.`
			`if summary['reader_summary']['avg'] != 0:`
			`self._print_stats('reader_cost', summary['reader_summary'])`
			`self._print_stats('batch_cost', summary['batch_summary'])`
			`self._print_stats('ips', summary['ips_summary'])`

			`def _print_stats(self, item, message_dict):`
[CodeStyle][UP031] fix some `python/paddle/` - part 8 (#65552) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> 2024-06-29 02:48:21 +08:00			`avg_str = '{:.5f}'.format(message_dict['avg'])`
			`max_str = '{:.5f}'.format(message_dict['max'])`
			`min_str = '{:.5f}'.format(message_dict['min'])`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`print(`
			`'\|',`
			`item.center(15),`
			`'\|',`
			`avg_str.center(15),`
			`'\|',`
			`max_str.center(15),`
			`'\|',`
			`min_str.center(15),`
			`'\|',`
			`)`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00

[CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition (#47642) * [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition * fix an increment 2022-11-08 11:29:41 +08:00			`class TimeAverager:`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`"""`
			`Record the cost of every step and count the average.`
			`"""`

			`def __init__(self):`
			`self.reset()`

			`def reset(self):`
			`self._total_iters = 0`
			`self._total_time = 0`
			`self._total_samples = 0`

			`def record(self, usetime, num_samples=None):`
			`self._total_iters += 1`
			`self._total_time += usetime`
			`if num_samples:`
			`self._total_samples += num_samples`

			`def get_average(self):`
			`"""`
			`Get the average cost of loading data or a step.`
			`"""`

			`if self._total_iters == 0:`
			`return 0`
			`return self._total_time / float(self._total_iters)`

			`def get_ips_average(self):`
			`"""`
			`Get the average throughput when speed mode is "samples/s".`
			`"""`

			`if not self._total_samples or self._total_iters == 0:`
			`return 0`
			`return float(self._total_samples) / self._total_time`

			`def get_step_average(self):`
			`"""`
			`Get the average speed when speed mode is "step/s".`
			`"""`

			`if self._total_iters == 0:`
			`return 0`
			`return float(self._total_iters) / self._total_time`


[CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition (#47642) * [CodeStyle][py2][U004] unecessary explicit `object` inheritance in class definition * fix an increment 2022-11-08 11:29:41 +08:00			`class Benchmark:`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`"""`
			A tool for the statistics of model performance. The `before_reader`
			and `after_reader` are called in the DataLoader to count the cost
			of loading the data. The `begin`, `step` and `end` are called to
			`count the cost of a step or total steps.`
			`"""`

			`def __init__(self):`
			`self.num_samples = None`
			`self.hooks = OrderedDict(timer_hook=TimerHook())`
			`self.current_event = None`
			`self.events = Stack()`

			`def step(self, num_samples=None):`
			`"""`
			`Record the statistic for the current step. It will be called in`
			`Profiler.step()`.
			`"""`

			`self.num_samples = num_samples`
			`self.after_step()`

			`def step_info(self, unit):`
			`"""`
			`It returns the statistic of the current step as a string. It contains`
			`"reader_cost", "batch_cost" and "ips".`
			`"""`

			`message = ''`
			`reader_average = self.current_event.reader_average()`
			`batch_average = self.current_event.batch_average()`
			`if reader_average:`
[CodeStyle][UP031] fix some `python/paddle/` - part 8 (#65552) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> 2024-06-29 02:48:21 +08:00			`message += f' reader_cost: {reader_average:.5f} s'`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`if batch_average:`
			`if self.current_event.speed_mode == 'steps/s':`
			`self.current_event.speed_unit = 'steps/s'`
			`else:`
			`self.current_event.speed_unit = unit + '/s'`
[CodeStyle][UP030][UP031][UP032] using f-string (#52062) * autofix Co-authored-by: Liyulingyue <83450930+Liyulingyue@users.noreply.github.com> * revert changes in python/paddle/distributed/fleet/utils/hybrid_parallel_util.py * empty commit, trigger ci * fix test_slice --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> 2023-03-31 10:11:56 +08:00			`message += ' {}: {:.5f} s'.format('batch_cost', batch_average)`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`speed_average = self.current_event.speed_average()`
			`if speed_average:`
[CodeStyle][task 1] enable Ruff UP032 rule with . except `python/paddle/base` (#57409) * update up032 * update up032 * Update api_gen.py * Update api_gen.py * Update sampcd_processor_utils.py 2023-09-22 10:14:38 +08:00			`message += (`
			`f' ips: {speed_average:.3f} {self.current_event.speed_unit}'`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`)`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`self.current_event.reset()`
			`return message`

			`def begin(self):`
			`for hook in self.hooks.values():`
			`hook.begin(self)`

			`def before_reader(self):`
			`for hook in self.hooks.values():`
			`hook.before_reader(self)`

			`def after_reader(self):`
			`for hook in self.hooks.values():`
			`hook.after_reader(self)`

			`def after_step(self):`
			`for hook in self.hooks.values():`
			`hook.after_step(self)`

			`def end(self):`
			`for hook in self.hooks.values():`
			`hook.end(self)`

			`def check_if_need_record(self, reader):`
			`if self.current_event is None:`
			`return`
			`if self.current_event.need_record:`
			`# set reader for the current event at the first iter`
			`if self.current_event.reader is None:`
			`self.current_event.reader = reader`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`elif (`
			`self.current_event.reader.__dict__['_dataset']`
			`!= reader.__dict__['_dataset']`
			`):`
Fix some typos (outpus, componnets, etc.) (#61767) 2024-02-19 16:36:13 +08:00			`# enter a new task but not calling begin() to record it.`
【code format check upgrade】 step2：yapf (#42944) * use yapf to format all python file * yapf exclude two unittests file for they rely on writing and reading file, and format will break them * disable diff_py_file because too many diff files cause command following failed 2022-06-05 10:58:58 +08:00			`# we pause the timer until the end of new task, so that`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`# the cost of new task is not added to the current event.`
Fix some typos in paddle/. (#42408) 2022-05-12 23:25:03 +08:00			`# eg. start evaluation in the training task`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`self.current_event.need_record = False`
			`else:`
			`# when the new task exits, continue timing for the current event.`
[CodeStyle][black] use black instead of yapf (#46014) * update config * re-blacken python code * temporarily disable date and diff_py_file * skip a format 2022-10-23 20:01:27 +08:00			`if (`
			`self.current_event.reader.__dict__['_dataset']`
			`== reader.__dict__['_dataset']`
			`):`
Add timer tool to Profiler (#40386) 2022-03-30 10:08:59 +08:00			`self.current_event.need_record = True`
			`self.hooks['timer_hook'].start_time = timeit.default_timer()`


			`_benchmark_ = Benchmark()`


			`def benchmark():`
			`return _benchmark_`