Blame: examples/cli/cli.cpp - ggml-org/whisper.cpp

Port of OpenAI's Whisper model in C/C++

examples : refactor in order to reuse code and reduce duplication (#482) * examples : refactor common code into a library * examples : refactor common SDL code into a library * make : update Makefile to use common libs * common : fix MSVC M_PI .. * addon.node : link common lib 2023-02-15 19:28:10 +02:00			`#include "common.h"`
common : separate whisper sources (#2846) * common : separate whisper sources * examples : add chrono * examples : add more headers 2025-02-27 12:50:32 +02:00			`#include "common-whisper.h"`
Initial release 2022-09-25 21:23:15 +03:00
examples : refactor in order to reuse code and reduce duplication (#482) * examples : refactor common code into a library * examples : refactor common SDL code into a library * make : update Makefile to use common libs * common : fix MSVC M_PI .. * addon.node : link common lib 2023-02-15 19:28:10 +02:00			`#include "whisper.h"`
main : add command-style grammar (#1998) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-28 03:02:10 -07:00			`#include "grammar-parser.h"`
Fix bug in FFT The FFT routine does not work for odd N Solution is to add DFT and use it when N is odd 2022-10-02 17:46:21 +03:00
wip : experimental color coding of tokens based on probabilities 2022-10-21 17:33:59 +03:00			`#include <cmath>`
whisper : add support for --carry-initial-prompt (#3395) * Add support for --carry-initial-prompt * PR fixes for ruby and go * Refactoring for readability * WIP 1 * WIP 2 * PR fixes * More PR fixes * PR fix * Further simplification * d'oh * One more logic fix * Update src/whisper.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Truncate prompt_past0 upon initialization * Slight simplification --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-10-10 18:51:15 +02:00			`#include <algorithm>`
ref #17 : add options to output result to file Support for: - plain text - VTT - SRT 2022-10-08 17:22:22 +03:00			`#include <fstream>`
Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`#include <cstdio>`
			`#include <string>`
			`#include <thread>`
			`#include <vector>`
main : add <cstring> header 2023-03-29 23:59:45 +03:00			`#include <cstring>`
vad : add initial Voice Activity Detection (VAD) support (#3065) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-05-12 16:10:11 +02:00			`#include <cfloat>`
Fix bug in FFT The FFT routine does not work for odd N Solution is to add DFT and use it when N is odd 2022-10-02 17:46:21 +03:00
Fixes for Windows (#2790) Fixes for Windows: * MSVC default to utf-8 without BOM. * Console output code page changed to utf-8. --------- Co-authored-by: Judd <foldl@boxvest.com> 2025-02-06 15:37:21 +08:00			`#if defined(_WIN32)`
whisper : enhance model download scripts functionality and resolve compiler warning (#2925) * whisper : improve whisper-cli executable path detection in model download shell scripts If whisper-cli is found on the path, do not suggest invoking from build directory. This improves flexibility and usability for distribution and packaging scenarios. * whisper : enhance Windows model download batch script to have comparable functionality and behaviour as shell scripts * Download models to the current directory if the script is executed from the \bin\ directory (for future distribution scenarios where the script is in the \bin\ subdirectory of a Windows build) * Add model_path command line argument * If whisper-cli is found on the path, do not suggest invoking from build directory * whisper : resolve compiler warning by removing duplicate definition of NOMINMAX in whisper-cli code 2025-03-24 19:39:50 +11:00			`#ifndef NOMINMAX`
Fixes for Windows (#2790) Fixes for Windows: * MSVC default to utf-8 without BOM. * Console output code page changed to utf-8. --------- Co-authored-by: Judd <foldl@boxvest.com> 2025-02-06 15:37:21 +08:00			`#define NOMINMAX`
whisper : enhance model download scripts functionality and resolve compiler warning (#2925) * whisper : improve whisper-cli executable path detection in model download shell scripts If whisper-cli is found on the path, do not suggest invoking from build directory. This improves flexibility and usability for distribution and packaging scenarios. * whisper : enhance Windows model download batch script to have comparable functionality and behaviour as shell scripts * Download models to the current directory if the script is executed from the \bin\ directory (for future distribution scenarios where the script is in the \bin\ subdirectory of a Windows build) * Add model_path command line argument * If whisper-cli is found on the path, do not suggest invoking from build directory * whisper : resolve compiler warning by removing duplicate definition of NOMINMAX in whisper-cli code 2025-03-24 19:39:50 +11:00			`#endif`
Fixes for Windows (#2790) Fixes for Windows: * MSVC default to utf-8 without BOM. * Console output code page changed to utf-8. --------- Co-authored-by: Judd <foldl@boxvest.com> 2025-02-06 15:37:21 +08:00			`#include <windows.h>`
			`#endif`

whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`// helper function to replace substrings`
whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`static void replace_all(std::string & s, const std::string & search, const std::string & replace) {`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00			`for (size_t pos = 0; ; pos += replace.length()) {`
			`pos = s.find(search, pos);`
			`if (pos == std::string::npos) break;`
			`s.erase(pos, search.length());`
			`s.insert(pos, replace);`
			`}`
			`}`

Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`// command-line parameters`
			`struct whisper_params {`
whisper : token-level timestamps with DTW (#1485) * whisper.cpp: impl dtw algo * WIP: producing and placing DTW timestamps on tokens * Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. * Fix mistake causing incorrect alignment of dtw timestamps * implement N_TOP_MOST and CUSTOM alignment heads setting * whisper: fix typo on alignment heads enum * Fix issues related to changes in whisper.cpp * Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function * decoder: save cross QKs only if requested * Calling median filter with ggml_map_custom1 * Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads * Copying cross QKs from decoder backend correctly * dtw: cleanup * Fix incorrect n_frames passed to dtw when near end of audio * Fix aheads_masks_init for backend != CPU * whisper : minor style * main : add dtw (wip) * whisper: fix invalid memory access in aheads_masks_init * main : add dtw (cont) * whisper : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-20 13:25:26 -03:00			`int32_t n_threads = std::min(4, (int32_t) std::thread::hardware_concurrency());`
			`int32_t n_processors = 1;`
			`int32_t offset_t_ms = 0;`
			`int32_t offset_n = 0;`
			`int32_t duration_ms = 0;`
			`int32_t progress_step = 5;`
			`int32_t max_context = -1;`
			`int32_t max_len = 0;`
			`int32_t best_of = whisper_full_default_params(WHISPER_SAMPLING_GREEDY).greedy.best_of;`
			`int32_t beam_size = whisper_full_default_params(WHISPER_SAMPLING_BEAM_SEARCH).beam_search.beam_size;`
			`int32_t audio_ctx = 0;`
Initial release 2022-09-25 21:23:15 +03:00
main : add command-style grammar (#1998) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-28 03:02:10 -07:00			`float word_thold = 0.01f;`
			`float entropy_thold = 2.40f;`
			`float logprob_thold = -1.00f;`
cli : add no_speech_thold (#2663) 2024-12-24 08:29:19 +01:00			`float no_speech_thold = 0.6f;`
main : add command-style grammar (#1998) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-28 03:02:10 -07:00			`float grammar_penalty = 100.0f;`
main : add options for temperature control (#2088) Add two options: ``` -tp, --temperature N [0.00 ] The sampling temperature, between 0 and 1 -tpi, --temperature-inc N [0.20 ] The increment of temperature, between 0 and 1 ``` The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at> 2024-05-13 13:59:44 +02:00			`float temperature = 0.0f;`
			`float temperature_inc = 0.2f;`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
whisper : significantly improve the inference quality (#1148) * Fix MSVC compile error C3688 Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC. * Significantly improve inference quality In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference. * Significantly improve inference quality At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue. * Addressed a few minor issues Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`. * Significantly improve inference quality Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information. * Add annotation and performance improvement * Calculate FFT only when fft_in are not all zero * Some minor performance improvement * Fixed a bug impacting inference quality * The first version after all the analysis is completed. * Fix some bugs and add debug mode * Fixed several bugs * Temporarily disable speed-up mode and add debug mode. * Add debug mode * Disable speed-up mode and add debug mode * Fix CI error (#1) * Fix error * Fix error * Fixed several bugs including [BLANK_AUDIO] problem * Remove Hard-coded hann window * Some Final Fix (#2) * Fix error * Fix error * Probably the last commit * Probably the last commit * whisper : minor coding style changes * whisper : remove debug from public API --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-08-28 00:51:33 +08:00			`bool debug_mode = false;`
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00			`bool translate = false;`
			`bool detect_language = false;`
			`bool diarize = false;`
			`bool tinydiarize = false;`
			`bool split_on_word = false;`
			`bool no_fallback = false;`
			`bool output_txt = false;`
			`bool output_vtt = false;`
			`bool output_srt = false;`
			`bool output_wts = false;`
			`bool output_csv = false;`
			`bool output_jsn = false;`
examples : Implement JSON output for Token-Level data in main (#1358) 2023-10-31 21:54:52 +02:00			`bool output_jsn_full = false;`
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00			`bool output_lrc = false;`
main : add cli option to disable system prints (#1740) 2024-01-08 16:41:28 +02:00			`bool no_prints = false;`
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00			`bool print_special = false;`
			`bool print_colors = false;`
examples : add --print-confidence option to cli (#3150) * examples : add --print-confidence option to cli This commit adds a new command-line option `--print-confidence` to the whisper-cli. When enabled, this option prints the confidence level of each token in the transcribed text using ANSI formatting codes. The confidence levels are represented using different styles: ```console main: confidence: highlighted (low confidence), underlined (medium), dim (high confidence) ``` Refs: https://github.com/ggml-org/whisper.cpp/issues/3135 2025-05-14 19:21:48 +02:00			`bool print_confidence= false;`
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00			`bool print_progress = false;`
			`bool no_timestamps = false;`
main : log probs to text file (#1205) * token/probability file generated with -ls * code comment cleaning * main : indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-08-27 18:09:06 +02:00			`bool log_score = false;`
whisper : add context param to disable gpu (#1293) * whisper : check state->ctx_metal not null * whisper : add whisper_context_params { use_gpu } * whisper : new API with params & deprecate old API * examples : use no-gpu param && whisper_init_from_file_with_params * whisper.objc : enable metal & disable on simulator * whisper.swiftui, metal : enable metal & support load default.metallib * whisper.android : use new API * bindings : use new API * addon.node : fix build & test * bindings : updata java binding * bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java * metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load * metal : move bundle var into block * metal : use SWIFT_PACKAGE instead of GGML_SWIFT * style : minor updates --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-11-06 17:04:24 +08:00			`bool use_gpu = true;`
whisper : enable flash attention by default (#3441) 2025-09-30 15:47:20 +03:00			`bool flash_attn = true;`
examples : use -dev/--device and WHISPER_ARG_DEVICE (#3557) Align device selection naming with llama.cpp. 2026-01-21 04:40:30 -03:00			`int32_t gpu_device = 0;`
cli : add --suppress_nst support (#2664) 2024-12-24 08:30:07 +01:00			`bool suppress_nst = false;`
whisper : add support for --carry-initial-prompt (#3395) * Add support for --carry-initial-prompt * PR fixes for ruby and go * Refactoring for readability * WIP 1 * WIP 2 * PR fixes * More PR fixes * PR fix * Further simplification * d'oh * One more logic fix * Update src/whisper.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Truncate prompt_past0 upon initialization * Slight simplification --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-10-10 18:51:15 +02:00			`bool carry_initial_prompt = false;`
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00
			`std::string language = "en";`
examples : small code cleanups (#322) - remove unnecessary initialization of string to "" - use empty() instead of checking size() - use emplace_back instead of push_back - use nullptr instead of NULL - remove unnecessary call to .data() on string - use character overload of find_first_of() instead of passing a string 2022-12-23 13:18:51 -05:00			`std::string prompt;`
qual-bench.sh : add quality comparison tool, and update main.cpp to allow using a font file (#569) 2023-03-06 09:18:11 -08:00			`std::string font_path = "/System/Library/Fonts/Supplemental/Courier New Bold.ttf";`
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00			`std::string model = "models/ggml-base.en.bin";`
main : add command-style grammar (#1998) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-28 03:02:10 -07:00			`std::string grammar;`
			`std::string grammar_rule;`
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00
			`// [TDRZ] speaker turn string`
			`std::string tdrz_speaker_turn = " [SPEAKER_TURN]"; // TODO: set from command line`
ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00
whisper : suppress tokens with a regex (#1997) * Allow a regular expression to describe tokens to suppress. Example: --suppress-tokens-re "[,\.]\|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens. Technique inspired by https://github.com/openai/whisper/discussions/1041 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Blind change to fix Java test. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-04-09 08:27:28 -07:00			`// A regular expression that matches tokens to suppress`
			`std::string suppress_regex;`

whisper : add OpenVINO support (#1037) * openvino: use OpenVINO encoder inference * openvino: add python script for OpenVINO model generation * whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * whisper: Fix compilation error * whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures * cmake: Add openvino-encoder as separate object target * whisper : minor style fixes * minor : indentation fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-04 08:56:11 -04:00			`std::string openvino_encode_device = "CPU";`

whisper : token-level timestamps with DTW (#1485) * whisper.cpp: impl dtw algo * WIP: producing and placing DTW timestamps on tokens * Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. * Fix mistake causing incorrect alignment of dtw timestamps * implement N_TOP_MOST and CUSTOM alignment heads setting * whisper: fix typo on alignment heads enum * Fix issues related to changes in whisper.cpp * Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function * decoder: save cross QKs only if requested * Calling median filter with ggml_map_custom1 * Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads * Copying cross QKs from decoder backend correctly * dtw: cleanup * Fix incorrect n_frames passed to dtw when near end of audio * Fix aheads_masks_init for backend != CPU * whisper : minor style * main : add dtw (wip) * whisper: fix invalid memory access in aheads_masks_init * main : add dtw (cont) * whisper : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-20 13:25:26 -03:00			`std::string dtw = "";`

ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00			`std::vector<std::string> fname_inp = {};`
examples : refactor in order to reuse code and reduce duplication (#482) * examples : refactor common code into a library * examples : refactor common SDL code into a library * make : update Makefile to use common libs * common : fix MSVC M_PI .. * addon.node : link common lib 2023-02-15 19:28:10 +02:00			`std::vector<std::string> fname_out = {};`
main : add command-style grammar (#1998) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-28 03:02:10 -07:00
			`grammar_parser::parse_state grammar_parsed;`
vad : add initial Voice Activity Detection (VAD) support (#3065) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-05-12 16:10:11 +02:00
			`// Voice Activity Detection (VAD) parameters`
			`bool vad = false;`
			`std::string vad_model = "";`
			`float vad_threshold = 0.5f;`
			`int vad_min_speech_duration_ms = 250;`
			`int vad_min_silence_duration_ms = 100;`
			`float vad_max_speech_duration_s = FLT_MAX;`
			`int vad_speech_pad_ms = 30;`
			`float vad_samples_overlap = 0.1f;`
Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`};`
Initial release 2022-09-25 21:23:15 +03:00
whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`static void whisper_print_usage(int argc, char ** argv, const whisper_params & params);`
Initial release 2022-09-25 21:23:15 +03:00
whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`static char * whisper_param_turn_lowercase(char * in){`
examples : Auto lowercase language parameter in main.cpp (#1928) * Auto lowercase language parameter * Update examples/main/main.cpp Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com> --------- Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com> 2024-03-06 23:25:10 +01:00			`int string_len = strlen(in);`
whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`for (int i = 0; i < string_len; i++){`
examples : Auto lowercase language parameter in main.cpp (#1928) * Auto lowercase language parameter * Update examples/main/main.cpp Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com> --------- Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com> 2024-03-06 23:25:10 +01:00			`(in+i) = tolower((unsigned char)(in+i));`
			`}`
			`return in;`
			`}`

cli : fix segfault on missing argument (#2700) 2025-01-04 09:47:41 +01:00			`static char * requires_value_error(const std::string & arg) {`
			`fprintf(stderr, "error: argument %s requires value\n", arg.c_str());`
			`exit(0);`
			`}`

whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`static bool whisper_params_parse(int argc, char ** argv, whisper_params & params) {`
examples : use -dev/--device and WHISPER_ARG_DEVICE (#3557) Align device selection naming with llama.cpp. 2026-01-21 04:40:30 -03:00			`if (const char * env_device = std::getenv("WHISPER_ARG_DEVICE")) {`
			`params.gpu_device = std::stoi(env_device);`
			`}`

Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`for (int i = 1; i < argc; i++) {`
			`std::string arg = argv[i];`
yt-wsp.sh : print help on empty args 2023-02-18 09:42:31 +02:00
main : fix std in input (#503) if we don't add this as an explicit check, then we get an "error: unknown argument: -" later on 2023-02-15 17:31:16 +00:00			`if (arg == "-"){`
			`params.fname_inp.push_back(arg);`
			`continue;`
			`}`
yt-wsp.sh : print help on empty args 2023-02-18 09:42:31 +02:00
ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00			`if (arg[0] != '-') {`
			`params.fname_inp.push_back(arg);`
			`continue;`
			`}`

refactoring : more readable code 2022-11-25 19:08:51 +02:00			`if (arg == "-h" \|\| arg == "--help") {`
Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`whisper_print_usage(argc, argv, params);`
			`exit(0);`
refactoring : more readable code 2022-11-25 19:08:51 +02:00			`}`
cli : fix segfault on missing argument (#2700) 2025-01-04 09:47:41 +01:00			`#define ARGV_NEXT (((i + 1) < argc) ? argv[++i] : requires_value_error(arg))`
whisper : add support for --carry-initial-prompt (#3395) * Add support for --carry-initial-prompt * PR fixes for ruby and go * Refactoring for readability * WIP 1 * WIP 2 * PR fixes * More PR fixes * PR fix * Further simplification * d'oh * One more logic fix * Update src/whisper.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Truncate prompt_past0 upon initialization * Slight simplification --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-10-10 18:51:15 +02:00			`else if (arg == "-t" \|\| arg == "--threads") { params.n_threads = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-p" \|\| arg == "--processors") { params.n_processors = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-ot" \|\| arg == "--offset-t") { params.offset_t_ms = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-on" \|\| arg == "--offset-n") { params.offset_n = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-d" \|\| arg == "--duration") { params.duration_ms = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-mc" \|\| arg == "--max-context") { params.max_context = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-ml" \|\| arg == "--max-len") { params.max_len = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-bo" \|\| arg == "--best-of") { params.best_of = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-bs" \|\| arg == "--beam-size") { params.beam_size = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-ac" \|\| arg == "--audio-ctx") { params.audio_ctx = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-wt" \|\| arg == "--word-thold") { params.word_thold = std::stof(ARGV_NEXT); }`
			`else if (arg == "-et" \|\| arg == "--entropy-thold") { params.entropy_thold = std::stof(ARGV_NEXT); }`
			`else if (arg == "-lpt" \|\| arg == "--logprob-thold") { params.logprob_thold = std::stof(ARGV_NEXT); }`
			`else if (arg == "-nth" \|\| arg == "--no-speech-thold") { params.no_speech_thold = std::stof(ARGV_NEXT); }`
			`else if (arg == "-tp" \|\| arg == "--temperature") { params.temperature = std::stof(ARGV_NEXT); }`
			`else if (arg == "-tpi" \|\| arg == "--temperature-inc") { params.temperature_inc = std::stof(ARGV_NEXT); }`
			`else if (arg == "-debug"\|\| arg == "--debug-mode") { params.debug_mode = true; }`
			`else if (arg == "-tr" \|\| arg == "--translate") { params.translate = true; }`
			`else if (arg == "-di" \|\| arg == "--diarize") { params.diarize = true; }`
			`else if (arg == "-tdrz" \|\| arg == "--tinydiarize") { params.tinydiarize = true; }`
			`else if (arg == "-sow" \|\| arg == "--split-on-word") { params.split_on_word = true; }`
			`else if (arg == "-nf" \|\| arg == "--no-fallback") { params.no_fallback = true; }`
			`else if (arg == "-otxt" \|\| arg == "--output-txt") { params.output_txt = true; }`
			`else if (arg == "-ovtt" \|\| arg == "--output-vtt") { params.output_vtt = true; }`
			`else if (arg == "-osrt" \|\| arg == "--output-srt") { params.output_srt = true; }`
			`else if (arg == "-owts" \|\| arg == "--output-words") { params.output_wts = true; }`
			`else if (arg == "-olrc" \|\| arg == "--output-lrc") { params.output_lrc = true; }`
			`else if (arg == "-fp" \|\| arg == "--font-path") { params.font_path = ARGV_NEXT; }`
			`else if (arg == "-ocsv" \|\| arg == "--output-csv") { params.output_csv = true; }`
			`else if (arg == "-oj" \|\| arg == "--output-json") { params.output_jsn = true; }`
			`else if (arg == "-ojf" \|\| arg == "--output-json-full") { params.output_jsn_full = params.output_jsn = true; }`
			`else if (arg == "-of" \|\| arg == "--output-file") { params.fname_out.emplace_back(ARGV_NEXT); }`
			`else if (arg == "-np" \|\| arg == "--no-prints") { params.no_prints = true; }`
			`else if (arg == "-ps" \|\| arg == "--print-special") { params.print_special = true; }`
			`else if (arg == "-pc" \|\| arg == "--print-colors") { params.print_colors = true; }`
			`else if ( arg == "--print-confidence") { params.print_confidence= true; }`
			`else if (arg == "-pp" \|\| arg == "--print-progress") { params.print_progress = true; }`
			`else if (arg == "-nt" \|\| arg == "--no-timestamps") { params.no_timestamps = true; }`
			`else if (arg == "-l" \|\| arg == "--language") { params.language = whisper_param_turn_lowercase(ARGV_NEXT); }`
			`else if (arg == "-dl" \|\| arg == "--detect-language") { params.detect_language = true; }`
			`else if ( arg == "--prompt") { params.prompt = ARGV_NEXT; }`
			`else if ( arg == "--carry-initial-prompt") { params.carry_initial_prompt = true; }`
			`else if (arg == "-m" \|\| arg == "--model") { params.model = ARGV_NEXT; }`
			`else if (arg == "-f" \|\| arg == "--file") { params.fname_inp.emplace_back(ARGV_NEXT); }`
			`else if (arg == "-oved" \|\| arg == "--ov-e-device") { params.openvino_encode_device = ARGV_NEXT; }`
			`else if (arg == "-dtw" \|\| arg == "--dtw") { params.dtw = ARGV_NEXT; }`
			`else if (arg == "-ls" \|\| arg == "--log-score") { params.log_score = true; }`
			`else if (arg == "-ng" \|\| arg == "--no-gpu") { params.use_gpu = false; }`
examples : use -dev/--device and WHISPER_ARG_DEVICE (#3557) Align device selection naming with llama.cpp. 2026-01-21 04:40:30 -03:00			`else if (arg == "-dev" \|\| arg == "--device") { params.gpu_device = std::stoi(ARGV_NEXT); }`
whisper : add support for --carry-initial-prompt (#3395) * Add support for --carry-initial-prompt * PR fixes for ruby and go * Refactoring for readability * WIP 1 * WIP 2 * PR fixes * More PR fixes * PR fix * Further simplification * d'oh * One more logic fix * Update src/whisper.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Truncate prompt_past0 upon initialization * Slight simplification --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-10-10 18:51:15 +02:00			`else if (arg == "-fa" \|\| arg == "--flash-attn") { params.flash_attn = true; }`
			`else if (arg == "-nfa" \|\| arg == "--no-flash-attn") { params.flash_attn = false; }`
			`else if (arg == "-sns" \|\| arg == "--suppress-nst") { params.suppress_nst = true; }`
			`else if ( arg == "--suppress-regex") { params.suppress_regex = ARGV_NEXT; }`
			`else if ( arg == "--grammar") { params.grammar = ARGV_NEXT; }`
			`else if ( arg == "--grammar-rule") { params.grammar_rule = ARGV_NEXT; }`
			`else if ( arg == "--grammar-penalty") { params.grammar_penalty = std::stof(ARGV_NEXT); }`
vad : add initial Voice Activity Detection (VAD) support (#3065) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-05-12 16:10:11 +02:00			`// Voice Activity Detection (VAD)`
vad : remove shortform for --vad option in cli.cpp (#3145) This commit removes the shortform for the --vad option in cli.cpp. The motivation for this is that `-v` is often used for verbose or version is many tools and this might cause confusion. Refs: https://github.com/ggml-org/whisper.cpp/pull/3065#issuecomment-2873243334 2025-05-13 06:04:05 +02:00			`else if ( arg == "--vad") { params.vad = true; }`
vad : add initial Voice Activity Detection (VAD) support (#3065) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-05-12 16:10:11 +02:00			`else if (arg == "-vm" \|\| arg == "--vad-model") { params.vad_model = ARGV_NEXT; }`
			`else if (arg == "-vt" \|\| arg == "--vad-threshold") { params.vad_threshold = std::stof(ARGV_NEXT); }`
cli : fix short name conflict for vad options [no ci] (#3247) This commit fixes a short name conflict whisper-cli for `--vad-min-speech-duration-ms` and `--vad-min-silence-duration-ms` which currently have the same short name `-vsd`. Refs: https://github.com/ggml-org/whisper.cpp/pull/3246#pullrequestreview-2923800114 2025-06-13 10:25:25 +02:00			`else if (arg == "-vspd" \|\| arg == "--vad-min-speech-duration-ms") { params.vad_min_speech_duration_ms = std::stoi(ARGV_NEXT); }`
cli: Fix assignment for vad_min_silence_duration_ms (#3467) * cli: Fix assignment for vad_min_silence_duration_ms Found and fixed this simple copy/paste error * server : fix vad_min_silence_duration_ms assignment --------- Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com> 2025-10-10 15:21:03 +02:00			`else if (arg == "-vsd" \|\| arg == "--vad-min-silence-duration-ms") { params.vad_min_silence_duration_ms = std::stoi(ARGV_NEXT); }`
vad : add initial Voice Activity Detection (VAD) support (#3065) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-05-12 16:10:11 +02:00			`else if (arg == "-vmsd" \|\| arg == "--vad-max-speech-duration-s") { params.vad_max_speech_duration_s = std::stof(ARGV_NEXT); }`
			`else if (arg == "-vp" \|\| arg == "--vad-speech-pad-ms") { params.vad_speech_pad_ms = std::stoi(ARGV_NEXT); }`
			`else if (arg == "-vo" \|\| arg == "--vad-samples-overlap") { params.vad_samples_overlap = std::stof(ARGV_NEXT); }`
refactoring : more readable code 2022-11-25 19:08:51 +02:00			`else {`
Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`fprintf(stderr, "error: unknown argument: %s\n", arg.c_str());`
ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`whisper_print_usage(argc, argv, params);`
			`exit(0);`
Initial release 2022-09-25 21:23:15 +03:00			`}`
			`}`

			`return true;`
			`}`

whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`static void whisper_print_usage(int /argc/, char ** argv, const whisper_params & params) {`
Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`fprintf(stderr, "\n");`
examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759) 2025-02-27 12:06:54 +05:00			`fprintf(stderr, "usage: %s [options] file0 file1 ...\n", argv[0]);`
			`fprintf(stderr, "supported audio formats: flac, mp3, ogg, wav\n");`
Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`fprintf(stderr, "\n");`
			`fprintf(stderr, "options:\n");`
whisper : add support for --carry-initial-prompt (#3395) * Add support for --carry-initial-prompt * PR fixes for ruby and go * Refactoring for readability * WIP 1 * WIP 2 * PR fixes * More PR fixes * PR fix * Further simplification * d'oh * One more logic fix * Update src/whisper.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Truncate prompt_past0 upon initialization * Slight simplification --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-10-10 18:51:15 +02:00			`fprintf(stderr, " -h, --help [default] show this help message and exit\n");`
			`fprintf(stderr, " -t N, --threads N [%-7d] number of threads to use during computation\n", params.n_threads);`
			`fprintf(stderr, " -p N, --processors N [%-7d] number of processors to use during computation\n", params.n_processors);`
			`fprintf(stderr, " -ot N, --offset-t N [%-7d] time offset in milliseconds\n", params.offset_t_ms);`
			`fprintf(stderr, " -on N, --offset-n N [%-7d] segment index offset\n", params.offset_n);`
			`fprintf(stderr, " -d N, --duration N [%-7d] duration of audio to process in milliseconds\n", params.duration_ms);`
			`fprintf(stderr, " -mc N, --max-context N [%-7d] maximum number of text context tokens to store\n", params.max_context);`
			`fprintf(stderr, " -ml N, --max-len N [%-7d] maximum segment length in characters\n", params.max_len);`
			`fprintf(stderr, " -sow, --split-on-word [%-7s] split on word rather than on token\n", params.split_on_word ? "true" : "false");`
			`fprintf(stderr, " -bo N, --best-of N [%-7d] number of best candidates to keep\n", params.best_of);`
			`fprintf(stderr, " -bs N, --beam-size N [%-7d] beam size for beam search\n", params.beam_size);`
			`fprintf(stderr, " -ac N, --audio-ctx N [%-7d] audio context size (0 - all)\n", params.audio_ctx);`
			`fprintf(stderr, " -wt N, --word-thold N [%-7.2f] word timestamp probability threshold\n", params.word_thold);`
			`fprintf(stderr, " -et N, --entropy-thold N [%-7.2f] entropy threshold for decoder fail\n", params.entropy_thold);`
			`fprintf(stderr, " -lpt N, --logprob-thold N [%-7.2f] log probability threshold for decoder fail\n", params.logprob_thold);`
			`fprintf(stderr, " -nth N, --no-speech-thold N [%-7.2f] no speech threshold\n", params.no_speech_thold);`
			`fprintf(stderr, " -tp, --temperature N [%-7.2f] The sampling temperature, between 0 and 1\n", params.temperature);`
			`fprintf(stderr, " -tpi, --temperature-inc N [%-7.2f] The increment of temperature, between 0 and 1\n",params.temperature_inc);`
			`fprintf(stderr, " -debug, --debug-mode [%-7s] enable debug mode (eg. dump log_mel)\n", params.debug_mode ? "true" : "false");`
			`fprintf(stderr, " -tr, --translate [%-7s] translate from source language to english\n", params.translate ? "true" : "false");`
			`fprintf(stderr, " -di, --diarize [%-7s] stereo audio diarization\n", params.diarize ? "true" : "false");`
			`fprintf(stderr, " -tdrz, --tinydiarize [%-7s] enable tinydiarize (requires a tdrz model)\n", params.tinydiarize ? "true" : "false");`
			`fprintf(stderr, " -nf, --no-fallback [%-7s] do not use temperature fallback while decoding\n", params.no_fallback ? "true" : "false");`
			`fprintf(stderr, " -otxt, --output-txt [%-7s] output result in a text file\n", params.output_txt ? "true" : "false");`
			`fprintf(stderr, " -ovtt, --output-vtt [%-7s] output result in a vtt file\n", params.output_vtt ? "true" : "false");`
			`fprintf(stderr, " -osrt, --output-srt [%-7s] output result in a srt file\n", params.output_srt ? "true" : "false");`
			`fprintf(stderr, " -olrc, --output-lrc [%-7s] output result in a lrc file\n", params.output_lrc ? "true" : "false");`
			`fprintf(stderr, " -owts, --output-words [%-7s] output script for generating karaoke video\n", params.output_wts ? "true" : "false");`
			`fprintf(stderr, " -fp, --font-path [%-7s] path to a monospace font for karaoke video\n", params.font_path.c_str());`
			`fprintf(stderr, " -ocsv, --output-csv [%-7s] output result in a CSV file\n", params.output_csv ? "true" : "false");`
			`fprintf(stderr, " -oj, --output-json [%-7s] output result in a JSON file\n", params.output_jsn ? "true" : "false");`
			`fprintf(stderr, " -ojf, --output-json-full [%-7s] include more information in the JSON file\n", params.output_jsn_full ? "true" : "false");`
			`fprintf(stderr, " -of FNAME, --output-file FNAME [%-7s] output file path (without file extension)\n", "");`
			`fprintf(stderr, " -np, --no-prints [%-7s] do not print anything other than the results\n", params.no_prints ? "true" : "false");`
			`fprintf(stderr, " -ps, --print-special [%-7s] print special tokens\n", params.print_special ? "true" : "false");`
			`fprintf(stderr, " -pc, --print-colors [%-7s] print colors\n", params.print_colors ? "true" : "false");`
			`fprintf(stderr, " --print-confidence [%-7s] print confidence\n", params.print_confidence ? "true" : "false");`
			`fprintf(stderr, " -pp, --print-progress [%-7s] print progress\n", params.print_progress ? "true" : "false");`
			`fprintf(stderr, " -nt, --no-timestamps [%-7s] do not print timestamps\n", params.no_timestamps ? "true" : "false");`
			`fprintf(stderr, " -l LANG, --language LANG [%-7s] spoken language ('auto' for auto-detect)\n", params.language.c_str());`
			`fprintf(stderr, " -dl, --detect-language [%-7s] exit after automatically detecting language\n", params.detect_language ? "true" : "false");`
			`fprintf(stderr, " --prompt PROMPT [%-7s] initial prompt (max n_text_ctx/2 tokens)\n", params.prompt.c_str());`
			`fprintf(stderr, " --carry-initial-prompt [%-7s] always prepend initial prompt\n", params.carry_initial_prompt ? "true" : "false");`
			`fprintf(stderr, " -m FNAME, --model FNAME [%-7s] model path\n", params.model.c_str());`
			`fprintf(stderr, " -f FNAME, --file FNAME [%-7s] input audio file path\n", "");`
			`fprintf(stderr, " -oved D, --ov-e-device DNAME [%-7s] the OpenVINO device used for encode inference\n", params.openvino_encode_device.c_str());`
			`fprintf(stderr, " -dtw MODEL --dtw MODEL [%-7s] compute token-level timestamps\n", params.dtw.c_str());`
			`fprintf(stderr, " -ls, --log-score [%-7s] log best decoder scores of tokens\n", params.log_score?"true":"false");`
			`fprintf(stderr, " -ng, --no-gpu [%-7s] disable GPU\n", params.use_gpu ? "false" : "true");`
examples : use -dev/--device and WHISPER_ARG_DEVICE (#3557) Align device selection naming with llama.cpp. 2026-01-21 04:40:30 -03:00			`fprintf(stderr, " -dev N, --device N [%-7d] GPU device ID (default: 0)\n", params.gpu_device);`
whisper : add support for --carry-initial-prompt (#3395) * Add support for --carry-initial-prompt * PR fixes for ruby and go * Refactoring for readability * WIP 1 * WIP 2 * PR fixes * More PR fixes * PR fix * Further simplification * d'oh * One more logic fix * Update src/whisper.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Truncate prompt_past0 upon initialization * Slight simplification --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-10-10 18:51:15 +02:00			`fprintf(stderr, " -fa, --flash-attn [%-7s] enable flash attention\n", params.flash_attn ? "true" : "false");`
			`fprintf(stderr, " -nfa, --no-flash-attn [%-7s] disable flash attention\n", params.flash_attn ? "false" : "true");`
			`fprintf(stderr, " -sns, --suppress-nst [%-7s] suppress non-speech tokens\n", params.suppress_nst ? "true" : "false");`
			`fprintf(stderr, " --suppress-regex REGEX [%-7s] regular expression matching tokens to suppress\n", params.suppress_regex.c_str());`
			`fprintf(stderr, " --grammar GRAMMAR [%-7s] GBNF grammar to guide decoding\n", params.grammar.c_str());`
			`fprintf(stderr, " --grammar-rule RULE [%-7s] top-level GBNF grammar rule name\n", params.grammar_rule.c_str());`
			`fprintf(stderr, " --grammar-penalty N [%-7.1f] scales down logits of nongrammar tokens\n", params.grammar_penalty);`
vad : add initial Voice Activity Detection (VAD) support (#3065) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-05-12 16:10:11 +02:00			`// Voice Activity Detection (VAD) parameters`
			`fprintf(stderr, "\nVoice Activity Detection (VAD) options:\n");`
vad : remove shortform for --vad option in cli.cpp (#3145) This commit removes the shortform for the --vad option in cli.cpp. The motivation for this is that `-v` is often used for verbose or version is many tools and this might cause confusion. Refs: https://github.com/ggml-org/whisper.cpp/pull/3065#issuecomment-2873243334 2025-05-13 06:04:05 +02:00			`fprintf(stderr, " --vad [%-7s] enable Voice Activity Detection (VAD)\n", params.vad ? "true" : "false");`
vad : add initial Voice Activity Detection (VAD) support (#3065) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-05-12 16:10:11 +02:00			`fprintf(stderr, " -vm FNAME, --vad-model FNAME [%-7s] VAD model path\n", params.vad_model.c_str());`
			`fprintf(stderr, " -vt N, --vad-threshold N [%-7.2f] VAD threshold for speech recognition\n", params.vad_threshold);`
			`fprintf(stderr, " -vspd N, --vad-min-speech-duration-ms N [%-7d] VAD min speech duration (0.0-1.0)\n", params.vad_min_speech_duration_ms);`
			`fprintf(stderr, " -vsd N, --vad-min-silence-duration-ms N [%-7d] VAD min silence duration (to split segments)\n", params.vad_min_silence_duration_ms);`
			`fprintf(stderr, " -vmsd N, --vad-max-speech-duration-s N [%-7s] VAD max speech duration (auto-split longer)\n", params.vad_max_speech_duration_s == FLT_MAX ?`
			`std::string("FLT_MAX").c_str() :`
			`std::to_string(params.vad_max_speech_duration_s).c_str());`
			`fprintf(stderr, " -vp N, --vad-speech-pad-ms N [%-7d] VAD speech padding (extend segments)\n", params.vad_speech_pad_ms);`
			`fprintf(stderr, " -vo N, --vad-samples-overlap N [%-7.2f] VAD samples overlap (seconds between segments)\n", params.vad_samples_overlap);`
Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`fprintf(stderr, "\n");`
ref #4 : added transcription timestamps Can be turned off with "-nt" argument. Performance has also improved. 2022-09-29 23:09:04 +03:00			`}`

main : add stereo-channel-based diarization (#64) Not tested - I don't have stereo dialog audio 2022-11-25 22:08:58 +02:00			`struct whisper_print_user_data {`
			`const whisper_params * params;`

			`const std::vector<std::vector<float>> * pcmf32s;`
whisper : move progress calculation out of whisper.cpp (#1081) Current `progress_step` was hardcoded into whisper.cpp, this resulted in bindings having to access progress only at that step even if progress callback was being called at every iteration. With this change we get greater granularity progress reporting from whisper.cpp and bindings/implementations can define their own progress step. 2023-07-25 21:23:34 +05:30			`int progress_prev;`
main : add stereo-channel-based diarization (#64) Not tested - I don't have stereo dialog audio 2022-11-25 22:08:58 +02:00			`};`

whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`static std::string estimate_diarization_speaker(std::vector<std::vector<float>> pcmf32s, int64_t t0, int64_t t1, bool id_only = false) {`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`std::string speaker = "";`
			`const int64_t n_samples = pcmf32s[0].size();`

examples : clean up common code (#1871) move some utility functions into common.h 2024-02-19 09:50:15 +01:00			`const int64_t is0 = timestamp_to_sample(t0, n_samples, WHISPER_SAMPLE_RATE);`
			`const int64_t is1 = timestamp_to_sample(t1, n_samples, WHISPER_SAMPLE_RATE);`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00
			`double energy0 = 0.0f;`
			`double energy1 = 0.0f;`

			`for (int64_t j = is0; j < is1; j++) {`
			`energy0 += fabs(pcmf32s[0][j]);`
			`energy1 += fabs(pcmf32s[1][j]);`
			`}`

			`if (energy0 > 1.1*energy1) {`
			`speaker = "0";`
			`} else if (energy1 > 1.1*energy0) {`
			`speaker = "1";`
			`} else {`
			`speaker = "?";`
			`}`

			`//printf("is0 = %lld, is1 = %lld, energy0 = %f, energy1 = %f, speaker = %s\n", is0, is1, energy0, energy1, speaker.c_str());`

			`if (!id_only) {`
			`speaker.insert(0, "(speaker ");`
			`speaker.append(")");`
			`}`

			`return speaker;`
			`}`
whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00
			`static void whisper_print_progress_callback(struct whisper_context * /ctx/, struct whisper_state * /state/, int progress, void * user_data) {`
whisper : move progress calculation out of whisper.cpp (#1081) Current `progress_step` was hardcoded into whisper.cpp, this resulted in bindings having to access progress only at that step even if progress callback was being called at every iteration. With this change we get greater granularity progress reporting from whisper.cpp and bindings/implementations can define their own progress step. 2023-07-25 21:23:34 +05:30			`int progress_step = ((whisper_print_user_data *) user_data)->params->progress_step;`
			`int * progress_prev = &(((whisper_print_user_data *) user_data)->progress_prev);`
			`if (progress >= *progress_prev + progress_step) {`
			`*progress_prev += progress_step;`
			`fprintf(stderr, "%s: progress = %3d%%\n", __func__, progress);`
			`}`
			`}`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00
whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`static void whisper_print_segment_callback(struct whisper_context * ctx, struct whisper_state * /state/, int n_new, void * user_data) {`
main : add stereo-channel-based diarization (#64) Not tested - I don't have stereo dialog audio 2022-11-25 22:08:58 +02:00			`const auto & params = ((whisper_print_user_data ) user_data)->params;`
			`const auto & pcmf32s = ((whisper_print_user_data ) user_data)->pcmf32s;`
whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference. 2022-10-22 21:06:50 +03:00
			`const int n_segments = whisper_full_n_segments(ctx);`

main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`std::string speaker = "";`

whisper : fix bug in prompt processing (close #705) Was dereferencing a dangling pointer 2023-04-14 19:16:34 +03:00			`int64_t t0 = 0;`
			`int64_t t1 = 0;`
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00
whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`// print the last n_new segments`
			`const int s0 = n_segments - n_new;`
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00
whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`if (s0 == 0) {`
whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference. 2022-10-22 21:06:50 +03:00			`printf("\n");`
			`}`

whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`for (int i = s0; i < n_segments; i++) {`
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`if (!params.no_timestamps \|\| params.diarize) {`
			`t0 = whisper_full_get_segment_t0(ctx, i);`
			`t1 = whisper_full_get_segment_t1(ctx, i);`
			`}`
main : print colors + no timestamps 2022-10-22 21:09:30 +03:00
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`if (!params.no_timestamps) {`
			`printf("[%s --> %s] ", to_timestamp(t0).c_str(), to_timestamp(t1).c_str());`
			`}`
main : print colors + no timestamps 2022-10-22 21:09:30 +03:00
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`if (params.diarize && pcmf32s.size() == 2) {`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`speaker = estimate_diarization_speaker(pcmf32s, t0, t1);`
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`}`
main : add stereo-channel-based diarization (#64) Not tested - I don't have stereo dialog audio 2022-11-25 22:08:58 +02:00
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`if (params.print_colors) {`
			`for (int j = 0; j < whisper_full_n_tokens(ctx, i); ++j) {`
			`if (params.print_special == false) {`
			`const whisper_token id = whisper_full_get_token_id(ctx, i, j);`
			`if (id >= whisper_token_eot(ctx)) {`
			`continue;`
			`}`
main : add stereo-channel-based diarization (#64) Not tested - I don't have stereo dialog audio 2022-11-25 22:08:58 +02:00			`}`

main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`const char * text = whisper_full_get_token_text(ctx, i, j);`
			`const float p = whisper_full_get_token_p (ctx, i, j);`
main : add stereo-channel-based diarization (#64) Not tested - I don't have stereo dialog audio 2022-11-25 22:08:58 +02:00
whisper : add support for --carry-initial-prompt (#3395) * Add support for --carry-initial-prompt * PR fixes for ruby and go * Refactoring for readability * WIP 1 * WIP 2 * PR fixes * More PR fixes * PR fix * Further simplification * d'oh * One more logic fix * Update src/whisper.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Truncate prompt_past0 upon initialization * Slight simplification --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-10-10 18:51:15 +02:00			`const int n_colors = (int) k_colors.size();`
			`int raw_col = (int) (std::pow(p, 3)*float(n_colors));`
			`if (raw_col < 0) raw_col = 0;`
			`if (raw_col > n_colors - 1) raw_col = n_colors - 1;`
			`const int col = raw_col;`
whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference. 2022-10-22 21:06:50 +03:00
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`printf("%s%s%s%s", speaker.c_str(), k_colors[col].c_str(), text, "\033[0m");`
			`}`
examples : add --print-confidence option to cli (#3150) * examples : add --print-confidence option to cli This commit adds a new command-line option `--print-confidence` to the whisper-cli. When enabled, this option prints the confidence level of each token in the transcribed text using ANSI formatting codes. The confidence levels are represented using different styles: ```console main: confidence: highlighted (low confidence), underlined (medium), dim (high confidence) ``` Refs: https://github.com/ggml-org/whisper.cpp/issues/3135 2025-05-14 19:21:48 +02:00			`} else if (params.print_confidence) {`
			`for (int j = 0; j < whisper_full_n_tokens(ctx, i); ++j) {`
			`if (params.print_special == false) {`
			`const whisper_token id = whisper_full_get_token_id(ctx, i, j);`
			`if (id >= whisper_token_eot(ctx)) {`
			`continue;`
			`}`
			`}`

			`const char * text = whisper_full_get_token_text(ctx, i, j);`
			`const float p = whisper_full_get_token_p (ctx, i, j);`

			`int style_idx = 2; // High confidence - dim`
			`if (p < 0.33) {`
			`style_idx = 0; // Low confidence - inverse (highlighted)`
			`} else if (p < 0.66) {`
			`style_idx = 1; // Medium confidence - underlined`
			`}`
			`printf("%s%s%s%s", speaker.c_str(), k_styles[style_idx].c_str(), text, "\033[0m");`
			`}`
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`} else {`
			`const char * text = whisper_full_get_segment_text(ctx, i);`
whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference. 2022-10-22 21:06:50 +03:00
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`printf("%s%s", speaker.c_str(), text);`
			`}`
whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference. 2022-10-22 21:06:50 +03:00
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00			`if (params.tinydiarize) {`
			`if (whisper_full_get_segment_speaker_turn_next(ctx, i)) {`
			`printf("%s", params.tdrz_speaker_turn.c_str());`
			`}`
			`}`

main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00			`// with timestamps or speakers: each segment on new line`
			`if (!params.no_timestamps \|\| params.diarize) {`
			`printf("\n");`
whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference. 2022-10-22 21:06:50 +03:00			`}`
main : make whisper_print_segment_callback() more readable (close #371) 2023-01-05 21:45:05 +02:00
			`fflush(stdout);`
whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference. 2022-10-22 21:06:50 +03:00			`}`
			`}`

cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`static void output_txt(struct whisper_context * ctx, std::ofstream & fout, const whisper_params & params, std::vector<std::vector<float>> pcmf32s) {`
main : refactor subtitle output 2022-10-22 20:42:11 +03:00			`const int n_segments = whisper_full_n_segments(ctx);`
			`for (int i = 0; i < n_segments; ++i) {`
			`const char * text = whisper_full_get_segment_text(ctx, i);`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`std::string speaker = "";`

			`if (params.diarize && pcmf32s.size() == 2)`
			`{`
			`const int64_t t0 = whisper_full_get_segment_t0(ctx, i);`
			`const int64_t t1 = whisper_full_get_segment_t1(ctx, i);`
			`speaker = estimate_diarization_speaker(pcmf32s, t0, t1);`
			`}`

			`fout << speaker << text << "\n";`
main : refactor subtitle output 2022-10-22 20:42:11 +03:00			`}`
			`}`

cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`static void output_vtt(struct whisper_context * ctx, std::ofstream & fout, const whisper_params & params, std::vector<std::vector<float>> pcmf32s) {`
main : refactor subtitle output 2022-10-22 20:42:11 +03:00			`fout << "WEBVTT\n\n";`

			`const int n_segments = whisper_full_n_segments(ctx);`
			`for (int i = 0; i < n_segments; ++i) {`
			`const char * text = whisper_full_get_segment_text(ctx, i);`
			`const int64_t t0 = whisper_full_get_segment_t0(ctx, i);`
			`const int64_t t1 = whisper_full_get_segment_t1(ctx, i);`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`std::string speaker = "";`

			`if (params.diarize && pcmf32s.size() == 2)`
			`{`
			`speaker = estimate_diarization_speaker(pcmf32s, t0, t1, true);`
			`speaker.insert(0, "<v Speaker");`
			`speaker.append(">");`
			`}`
main : refactor subtitle output 2022-10-22 20:42:11 +03:00
			`fout << to_timestamp(t0) << " --> " << to_timestamp(t1) << "\n";`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`fout << speaker << text << "\n\n";`
main : refactor subtitle output 2022-10-22 20:42:11 +03:00			`}`
			`}`

cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`static void output_srt(struct whisper_context * ctx, std::ofstream & fout, const whisper_params & params, std::vector<std::vector<float>> pcmf32s) {`
main : refactor subtitle output 2022-10-22 20:42:11 +03:00			`const int n_segments = whisper_full_n_segments(ctx);`
			`for (int i = 0; i < n_segments; ++i) {`
			`const char * text = whisper_full_get_segment_text(ctx, i);`
ref #68, #79 : fix segment time output 2022-10-23 13:29:36 +03:00			`const int64_t t0 = whisper_full_get_segment_t0(ctx, i);`
			`const int64_t t1 = whisper_full_get_segment_t1(ctx, i);`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`std::string speaker = "";`

			`if (params.diarize && pcmf32s.size() == 2)`
			`{`
			`speaker = estimate_diarization_speaker(pcmf32s, t0, t1);`
			`}`
ref #68, #79 : fix segment time output 2022-10-23 13:29:36 +03:00
			`fout << i + 1 + params.offset_n << "\n";`
main : fix SRT timestamp to use comma "," instead of dot "." 2022-10-24 18:28:23 +03:00			`fout << to_timestamp(t0, true) << " --> " << to_timestamp(t1, true) << "\n";`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`fout << speaker << text << "\n\n";`
main : refactor subtitle output 2022-10-22 20:42:11 +03:00			`}`
			`}`

whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`static char * escape_double_quotes_and_backslashes(const char * str) {`
main : fix typo in JSON output (#648) * typo in JSON output * fix double quotes in JSON output 2023-03-29 23:26:39 +03:00			`if (str == NULL) {`
			`return NULL;`
			`}`

			`size_t escaped_length = strlen(str) + 1;`

			`for (size_t i = 0; str[i] != '\0'; i++) {`
main : update escape_double_quotes() function (#776) Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string. Changes Made: - Renamed the function to escape_quotes_and_backslashes - Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes. - Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash. Resolves: #769 2023-04-23 08:47:30 -05:00			`if (str[i] == '"' \|\| str[i] == '\\') {`
main : fix typo in JSON output (#648) * typo in JSON output * fix double quotes in JSON output 2023-03-29 23:26:39 +03:00			`escaped_length++;`
			`}`
			`}`

whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`char * escaped = (char *)calloc(escaped_length, 1); // pre-zeroed`
main : fix typo in JSON output (#648) * typo in JSON output * fix double quotes in JSON output 2023-03-29 23:26:39 +03:00			`if (escaped == NULL) {`
			`return NULL;`
			`}`

			`size_t pos = 0;`
			`for (size_t i = 0; str[i] != '\0'; i++) {`
main : update escape_double_quotes() function (#776) Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string. Changes Made: - Renamed the function to escape_quotes_and_backslashes - Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes. - Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash. Resolves: #769 2023-04-23 08:47:30 -05:00			`if (str[i] == '"' \|\| str[i] == '\\') {`
main : fix typo in JSON output (#648) * typo in JSON output * fix double quotes in JSON output 2023-03-29 23:26:39 +03:00			`escaped[pos++] = '\\';`
			`}`
main : update escape_double_quotes() function (#776) Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string. Changes Made: - Renamed the function to escape_quotes_and_backslashes - Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes. - Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash. Resolves: #769 2023-04-23 08:47:30 -05:00			`escaped[pos++] = str[i];`
main : fix typo in JSON output (#648) * typo in JSON output * fix double quotes in JSON output 2023-03-29 23:26:39 +03:00			`}`

			`// no need to set zero due to calloc() being used prior`

			`return escaped;`
			`}`

main : fix double quote escaping in csv output (#2090) 2024-05-13 16:55:32 +08:00			`// double quote should be escaped by another double quote. (rfc4180)`
whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`static char * escape_double_quotes_in_csv(const char * str) {`
main : fix double quote escaping in csv output (#2090) 2024-05-13 16:55:32 +08:00			`if (str == NULL) {`
			`return NULL;`
			`}`

			`size_t escaped_length = strlen(str) + 1;`

			`for (size_t i = 0; str[i] != '\0'; i++) {`
			`if (str[i] == '"') {`
			`escaped_length++;`
			`}`
			`}`

			`char escaped = (char )calloc(escaped_length, 1); // pre-zeroed`
			`if (escaped == NULL) {`
			`return NULL;`
			`}`

			`size_t pos = 0;`
			`for (size_t i = 0; str[i] != '\0'; i++) {`
			`if (str[i] == '"') {`
			`escaped[pos++] = '"';`
			`}`
			`escaped[pos++] = str[i];`
			`}`

			`// no need to set zero due to calloc() being used prior`

			`return escaped;`
			`}`

cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`static void output_csv(struct whisper_context * ctx, std::ofstream & fout, const whisper_params & params, std::vector<std::vector<float>> pcmf32s) {`
main : escape quotes in csv output (#815) 2023-04-23 18:01:59 +02:00			`const int n_segments = whisper_full_n_segments(ctx);`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`fout << "start,end,";`
			`if (params.diarize && pcmf32s.size() == 2)`
			`{`
			`fout << "speaker,";`
			`}`
			`fout << "text\n";`

main : escape quotes in csv output (#815) 2023-04-23 18:01:59 +02:00			`for (int i = 0; i < n_segments; ++i) {`
			`const char * text = whisper_full_get_segment_text(ctx, i);`
			`const int64_t t0 = whisper_full_get_segment_t0(ctx, i);`
			`const int64_t t1 = whisper_full_get_segment_t1(ctx, i);`
main : fix double quote escaping in csv output (#2090) 2024-05-13 16:55:32 +08:00			`char * text_escaped = escape_double_quotes_in_csv(text);`
main : escape quotes in csv output (#815) 2023-04-23 18:01:59 +02:00
			`//need to multiply times returned from whisper_full_get_segment_t{0,1}() by 10 to get milliseconds.`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`fout << 10 * t0 << "," << 10 * t1 << ",";`
			`if (params.diarize && pcmf32s.size() == 2)`
			`{`
			`fout << estimate_diarization_speaker(pcmf32s, t0, t1, true) << ",";`
			`}`
			`fout << "\"" << text_escaped << "\"\n";`
main : escape quotes in csv output (#815) 2023-04-23 18:01:59 +02:00			`}`
			`}`

cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`static void output_score(struct whisper_context * ctx, std::ofstream & fout, const whisper_params & /params/, std::vector<std::vector<float>> /pcmf32s/) {`
main : log probs to text file (#1205) * token/probability file generated with -ls * code comment cleaning * main : indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-08-27 18:09:06 +02:00			`const int n_segments = whisper_full_n_segments(ctx);`
			`// fprintf(stderr,"segments: %d\n",n_segments);`
			`for (int i = 0; i < n_segments; ++i) {`
			`const int n_tokens = whisper_full_n_tokens(ctx, i);`
			`// fprintf(stderr,"tokens: %d\n",n_tokens);`
			`for (int j = 0; j < n_tokens; j++) {`
			`auto token = whisper_full_get_token_text(ctx, i, j);`
			`auto probability = whisper_full_get_token_p(ctx, i, j);`
			`fout << token << '\t' << probability << std::endl;`
			`// fprintf(stderr,"token: %s %f\n",token,probability);`
			`}`
			`}`
			`}`

cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`static void output_json(`
examples : Implement JSON output for Token-Level data in main (#1358) 2023-10-31 21:54:52 +02:00			`struct whisper_context * ctx,`
cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`std::ofstream & fout,`
examples : Implement JSON output for Token-Level data in main (#1358) 2023-10-31 21:54:52 +02:00			`const whisper_params & params,`
cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`std::vector<std::vector<float>> pcmf32s) {`
			`const bool full = params.output_jsn_full;`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`int indent = 0;`

			`auto doindent = [&]() {`
			`for (int i = 0; i < indent; i++) fout << "\t";`
			`};`

			`auto start_arr = [&](const char *name) {`
			`doindent();`
			`fout << "\"" << name << "\": [\n";`
			`indent++;`
			`};`

ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`auto end_arr = [&](bool end) {`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`indent--;`
			`doindent();`
examples : Implement JSON output for Token-Level data in main (#1358) 2023-10-31 21:54:52 +02:00			`fout << (end ? "]\n" : "],\n");`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`};`

ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`auto start_obj = [&](const char *name) {`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`doindent();`
			`if (name) {`
			`fout << "\"" << name << "\": {\n";`
			`} else {`
			`fout << "{\n";`
			`}`
			`indent++;`
			`};`

ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`auto end_obj = [&](bool end) {`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`indent--;`
			`doindent();`
			`fout << (end ? "}\n" : "},\n");`
			`};`

			`auto start_value = [&](const char *name) {`
			`doindent();`
			`fout << "\"" << name << "\": ";`
			`};`

ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`auto value_s = [&](const char name, const char val, bool end) {`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`start_value(name);`
main : update escape_double_quotes() function (#776) Updated the escape_double_quotes() function such that the function now escapes both double quotes and backslashes in the input string. Changes Made: - Renamed the function to escape_quotes_and_backslashes - Modified the condition in the first loop to increment the value of 'escaped_length' for both double quotes and backslashes. - Modified the condition in second loop to add a backslash before the current character if it is a double quote or a backslash. Resolves: #769 2023-04-23 08:47:30 -05:00			`char * val_escaped = escape_double_quotes_and_backslashes(val);`
main : fix typo in JSON output (#648) * typo in JSON output * fix double quotes in JSON output 2023-03-29 23:26:39 +03:00			`fout << "\"" << val_escaped << (end ? "\"\n" : "\",\n");`
			`free(val_escaped);`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`};`

ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`auto end_value = [&](bool end) {`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`fout << (end ? "\n" : ",\n");`
			`};`

ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`auto value_i = [&](const char *name, const int64_t val, bool end) {`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`start_value(name);`
			`fout << val;`
			`end_value(end);`
			`};`

examples : Implement JSON output for Token-Level data in main (#1358) 2023-10-31 21:54:52 +02:00			`auto value_f = [&](const char *name, const float val, bool end) {`
			`start_value(name);`
			`fout << val;`
			`end_value(end);`
			`};`

ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`auto value_b = [&](const char *name, const bool val, bool end) {`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`start_value(name);`
			`fout << (val ? "true" : "false");`
			`end_value(end);`
			`};`

examples : Implement JSON output for Token-Level data in main (#1358) 2023-10-31 21:54:52 +02:00			`auto times_o = [&](int64_t t0, int64_t t1, bool end) {`
			`start_obj("timestamps");`
			`value_s("from", to_timestamp(t0, true).c_str(), false);`
			`value_s("to", to_timestamp(t1, true).c_str(), true);`
			`end_obj(false);`
			`start_obj("offsets");`
			`value_i("from", t0 * 10, false);`
			`value_i("to", t1 * 10, true);`
			`end_obj(end);`
			`};`

ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`start_obj(nullptr);`
			`value_s("systeminfo", whisper_print_system_info(), false);`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`start_obj("model");`
ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`value_s("type", whisper_model_type_readable(ctx), false);`
			`value_b("multilingual", whisper_is_multilingual(ctx), false);`
			`value_i("vocab", whisper_model_n_vocab(ctx), false);`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`start_obj("audio");`
ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`value_i("ctx", whisper_model_n_audio_ctx(ctx), false);`
			`value_i("state", whisper_model_n_audio_state(ctx), false);`
			`value_i("head", whisper_model_n_audio_head(ctx), false);`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`value_i("layer", whisper_model_n_audio_layer(ctx), true);`
ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`end_obj(false);`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`start_obj("text");`
ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`value_i("ctx", whisper_model_n_text_ctx(ctx), false);`
			`value_i("state", whisper_model_n_text_state(ctx), false);`
			`value_i("head", whisper_model_n_text_head(ctx), false);`
main : fix typo in JSON output (#648) * typo in JSON output * fix double quotes in JSON output 2023-03-29 23:26:39 +03:00			`value_i("layer", whisper_model_n_text_layer(ctx), true);`
ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`end_obj(false);`
			`value_i("mels", whisper_model_n_mels(ctx), false);`
whisper : add integer quantization support (#540) * whisper : add integer quantization support * examples : add common-ggml + prepare to add "quantize" tool * whisper : quantization tool ready * whisper : fix F32 support * whisper : try to fix shared lib linkage * wasm : update quantized models to Q5 * bench.wasm : remove "medium" button * bench.wasm : fix custom model button * ggml : add Q5_0 and Q5_1 WASM SIMD * wasm : add quantized models to all WASM examples * wasm : bump DB version number to 2 * talk-llama : update example to latest llama.cpp * node : increase test timeout to 10s * readme : add information for model quantization * wasm : add links to other examples 2023-04-30 18:51:57 +03:00			`value_i("ftype", whisper_model_ftype(ctx), true);`
ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`end_obj(false);`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`start_obj("params");`
ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`value_s("model", params.model.c_str(), false);`
			`value_s("language", params.language.c_str(), false);`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`value_b("translate", params.translate, true);`
ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`end_obj(false);`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`start_obj("result");`
			`value_s("language", whisper_lang_str(whisper_full_lang_id(ctx)), true);`
ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`end_obj(false);`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`start_arr("transcription");`

			`const int n_segments = whisper_full_n_segments(ctx);`
			`for (int i = 0; i < n_segments; ++i) {`
			`const char * text = whisper_full_get_segment_text(ctx, i);`
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`const int64_t t0 = whisper_full_get_segment_t0(ctx, i);`
			`const int64_t t1 = whisper_full_get_segment_t1(ctx, i);`

ggml : sync latest ggml lib 2023-06-25 14:22:21 +03:00			`start_obj(nullptr);`
examples : Implement JSON output for Token-Level data in main (#1358) 2023-10-31 21:54:52 +02:00			`times_o(t0, t1, false);`
			`value_s("text", text, !params.diarize && !params.tinydiarize && !full);`

			`if (full) {`
			`start_arr("tokens");`
			`const int n = whisper_full_n_tokens(ctx, i);`
			`for (int j = 0; j < n; ++j) {`
			`auto token = whisper_full_get_token_data(ctx, i, j);`
			`start_obj(nullptr);`
			`value_s("text", whisper_token_to_str(ctx, token.id), false);`
			`if(token.t0 > -1 && token.t1 > -1) {`
			`// If we have per-token timestamps, write them out`
			`times_o(token.t0, token.t1, false);`
			`}`
			`value_i("id", token.id, false);`
whisper : token-level timestamps with DTW (#1485) * whisper.cpp: impl dtw algo * WIP: producing and placing DTW timestamps on tokens * Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. * Fix mistake causing incorrect alignment of dtw timestamps * implement N_TOP_MOST and CUSTOM alignment heads setting * whisper: fix typo on alignment heads enum * Fix issues related to changes in whisper.cpp * Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function * decoder: save cross QKs only if requested * Calling median filter with ggml_map_custom1 * Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads * Copying cross QKs from decoder backend correctly * dtw: cleanup * Fix incorrect n_frames passed to dtw when near end of audio * Fix aheads_masks_init for backend != CPU * whisper : minor style * main : add dtw (wip) * whisper: fix invalid memory access in aheads_masks_init * main : add dtw (cont) * whisper : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-20 13:25:26 -03:00			`value_f("p", token.p, false);`
			`value_f("t_dtw", token.t_dtw, true);`
examples : Implement JSON output for Token-Level data in main (#1358) 2023-10-31 21:54:52 +02:00			`end_obj(j == (n - 1));`
			`}`
			`end_arr(!params.diarize && !params.tinydiarize);`
			`}`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00
			`if (params.diarize && pcmf32s.size() == 2) {`
			`value_s("speaker", estimate_diarization_speaker(pcmf32s, t0, t1, true).c_str(), true);`
			`}`
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00
			`if (params.tinydiarize) {`
			`value_b("speaker_turn_next", whisper_full_get_segment_speaker_turn_next(ctx, i), true);`
			`}`
main : provide option for creating JSON output (#615) * examples : provide option for exporting also as JSON file (ggerganov/whisper.cpp#614) * main : remove leftovers --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 20:37:36 +01:00			`end_obj(i == (n_segments - 1));`
			`}`

			`end_arr(true);`
			`end_obj(true);`
			`}`

whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`// karaoke video generation`
			`// outputs a bash script that uses ffmpeg to generate a video with the subtitles`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00			`// TODO: font parameter adjustments`
cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`static bool output_wts(struct whisper_context * ctx, std::ofstream & fout, const whisper_params & params, std::vector<std::vector<float>> pcmf32s, const char * fname_inp, float t_sec, const char * fname_out) {`
qual-bench.sh : add quality comparison tool, and update main.cpp to allow using a font file (#569) 2023-03-06 09:18:11 -08:00			`static const char * font = params.font_path.c_str();`

			`std::ifstream fin(font);`
			`if (!fin.is_open()) {`
			`fprintf(stderr, "%s: font not found at '%s', please specify a monospace font with -fp\n", __func__, font);`
			`return false;`
			`}`
whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00
main : fix generated bash script 2022-11-04 18:30:38 +02:00			`fout << "#!/bin/bash" << "\n";`
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`fout << "\n";`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`fout << "ffmpeg -i " << fname_inp << " -f lavfi -i color=size=1200x120:duration=" << t_sec << ":rate=25:color=black -vf \"";`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`for (int i = 0; i < whisper_full_n_segments(ctx); i++) {`
			`const int64_t t0 = whisper_full_get_segment_t0(ctx, i);`
			`const int64_t t1 = whisper_full_get_segment_t1(ctx, i);`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`const int n = whisper_full_n_tokens(ctx, i);`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`std::vector<whisper_token_data> tokens(n);`
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`for (int j = 0; j < n; ++j) {`
whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`tokens[j] = whisper_full_get_token_data(ctx, i, j);`
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`}`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`if (i > 0) {`
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`fout << ",";`
			`}`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`// background text`
			`fout << "drawtext=fontfile='" << font << "':fontsize=24:fontcolor=gray:x=(w-text_w)/2:y=h/2:text='':enable='between(t," << t0/100.0 << "," << t0/100.0 << ")'";`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`bool is_first = true;`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`std::string speaker = "";`

			`if (params.diarize && pcmf32s.size() == 2) {`
			`speaker = estimate_diarization_speaker(pcmf32s, t0, t1);`
			`}`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`for (int j = 0; j < n; ++j) {`
			`const auto & token = tokens[j];`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`if (tokens[j].id >= whisper_token_eot(ctx)) {`
			`continue;`
			`}`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`std::string txt_bg = "";`
			`std::string txt_fg = ""; // highlight token`
			`std::string txt_ul = ""; // underline`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`if (params.diarize && pcmf32s.size() == 2) {`
			`txt_bg = speaker;`
			`txt_fg = speaker;`
			`txt_ul = "\\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ ";`
			`}`

			`txt_bg.append("> ");`
			`txt_fg.append("> ");`
			`txt_ul.append("\\ \\ ");`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`{`
			`for (int k = 0; k < n; ++k) {`
			`const auto & token2 = tokens[k];`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`if (tokens[k].id >= whisper_token_eot(ctx)) {`
			`continue;`
			`}`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`const std::string txt = whisper_token_to_str(ctx, token2.id);`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`txt_bg += txt;`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`if (k == j) {`
			`for (int l = 0; l < (int) txt.size(); ++l) {`
			`txt_fg += txt[l];`
			`txt_ul += "_";`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00			`}`
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`txt_fg += "\|";`
			`} else {`
			`for (int l = 0; l < (int) txt.size(); ++l) {`
			`txt_fg += "\\ ";`
			`txt_ul += "\\ ";`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00			`}`
			`}`
			`}`

unicode : fix character replacement (thanks to @tamo) 2022-11-23 08:24:29 +02:00			`::replace_all(txt_bg, "'", "\u2019");`
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`::replace_all(txt_bg, "\"", "\\\"");`
unicode : fix character replacement (thanks to @tamo) 2022-11-23 08:24:29 +02:00			`::replace_all(txt_fg, "'", "\u2019");`
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`::replace_all(txt_fg, "\"", "\\\"");`
			`}`

whisper : token-level timestamp refactoring (#49, #120) This turned out pretty good overall. The algorithm has been moved from main.cpp to whisper.cpp and can be reused for all subtitles types. This means that now you can specify the maximum length of the generated lines. Simply provide the "-ml" argument specifying the max length in number of characters 2022-11-02 21:18:20 +02:00			`if (is_first) {`
			`// background text`
			`fout << ",drawtext=fontfile='" << font << "':fontsize=24:fontcolor=gray:x=(w-text_w)/2:y=h/2:text='" << txt_bg << "':enable='between(t," << t0/100.0 << "," << t1/100.0 << ")'";`
			`is_first = false;`
			`}`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`// foreground text`
			`fout << ",drawtext=fontfile='" << font << "':fontsize=24:fontcolor=lightgreen:x=(w-text_w)/2+8:y=h/2:text='" << txt_fg << "':enable='between(t," << token.t0/100.0 << "," << token.t1/100.0 << ")'";`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`// underline`
			`fout << ",drawtext=fontfile='" << font << "':fontsize=24:fontcolor=lightgreen:x=(w-text_w)/2+8:y=h/2+16:text='" << txt_ul << "':enable='between(t," << token.t0/100.0 << "," << token.t1/100.0 << ")'";`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00			`}`
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`}`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`fout << "\" -c:v libx264 -pix_fmt yuv420p -y " << fname_inp << ".mp4" << "\n";`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`fout << "\n\n";`
			`fout << "echo \"Your video has been saved to " << fname_inp << ".mp4\"" << "\n";`
			`fout << "\n";`
			`fout << "echo \" ffplay " << fname_inp << ".mp4\"\n";`
			`fout << "\n";`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
main : add some comments for the word-level timestamp algorithm 2022-11-01 22:35:21 +02:00			`fout.close();`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`fprintf(stderr, "# %s: run 'source %s' to generate karaoke video\n", __func__, fname_out);`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00
			`return true;`
			`}`

cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`static void output_lrc(struct whisper_context * ctx, std::ofstream & fout, const whisper_params & params, std::vector<std::vector<float>> pcmf32s) {`
main : add lrc output support (#718) * add lrc output support. * fix wrong comment 2023-04-15 00:35:33 +08:00			`fout << "[by:whisper.cpp]\n";`

			`const int n_segments = whisper_full_n_segments(ctx);`
			`for (int i = 0; i < n_segments; ++i) {`
			`const char * text = whisper_full_get_segment_text(ctx, i);`
			`const int64_t t = whisper_full_get_segment_t0(ctx, i);`

			`int64_t msec = t * 10;`
			`int64_t min = msec / (1000 * 60);`
			`msec = msec - min * (1000 * 60);`
			`int64_t sec = msec / 1000;`
			`msec = msec - sec * 1000;`

			`char buf[16];`
			`snprintf(buf, sizeof(buf), "%02d:%02d.%02d", (int) min, (int) sec, (int) ( msec / 10));`
			`std::string timestamp_lrc = std::string(buf);`
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`std::string speaker = "";`

			`if (params.diarize && pcmf32s.size() == 2)`
			`{`
			`const int64_t t0 = whisper_full_get_segment_t0(ctx, i);`
			`const int64_t t1 = whisper_full_get_segment_t1(ctx, i);`
			`speaker = estimate_diarization_speaker(pcmf32s, t0, t1);`
			`}`
main : add lrc output support (#718) * add lrc output support. * fix wrong comment 2023-04-15 00:35:33 +08:00
main : add diarization support for all current output types (#1031) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-06-25 07:07:57 -05:00			`fout << '[' << timestamp_lrc << ']' << speaker << text << "\n";`
main : add lrc output support (#718) * add lrc output support. * fix wrong comment 2023-04-15 00:35:33 +08:00			`}`
			`}`

main : add cli option to disable system prints (#1740) 2024-01-08 16:41:28 +02:00
whisper : reorganize source code + improve CMake (#2256) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci] 2024-06-26 19:34:09 +03:00			`static void cb_log_disable(enum ggml_log_level , const char * , void * ) { }`
main : add cli option to disable system prints (#1740) 2024-01-08 16:41:28 +02:00
Initial release 2022-09-25 21:23:15 +03:00			`int main(int argc, char ** argv) {`
whisper : remove whisper_load_backends function (#3196) * whisper : remove whisper_load_backends function This commit removes the `whisper_load_backends` function, which was used to load all GGML backends. The motivation for this change push the responsibility of loading backends to user applications to give them more control over which backends to load and when. See the references below for more context. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3182 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801778733 Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801928990 * ruby : add check for rwc is NULL This commit adds a check to ensure that the `rwc` pointer is not NULL before attempting to mark its members in the garbage collector. The motivation for this is an attempt to see if this fixed the CI build as I'm not able to reproduce the issue locally. Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/15299612277/job/43036694928?pr=3196 2025-05-29 08:03:17 +02:00			`ggml_backend_load_all();`

Fixes for Windows (#2790) Fixes for Windows: * MSVC default to utf-8 without BOM. * Console output code page changed to utf-8. --------- Co-authored-by: Judd <foldl@boxvest.com> 2025-02-06 15:37:21 +08:00			`#if defined(_WIN32)`
			`// Set the console output code page to UTF-8, while command line arguments`
			`// are still encoded in the system's code page. In this way, we can print`
			`// non-ASCII characters to the console, and access files with non-ASCII paths.`
			`SetConsoleOutputCP(CP_UTF8);`
			`#endif`

Initial release 2022-09-25 21:23:15 +03:00			`whisper_params params;`

main : allow a response-file as the sole parameter (#2019) * The "main" example now allows a response-file as the sole parameter. A response-file is a text file with command-line parameters, one per line. Prefix the name of the response-file with "@" to identify it as such. It's used under MS Windows to work around command-line length limits. It may be useful under other platforms to simplify character-escaping. * minor : style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-04-09 08:31:16 -07:00			`// If the only argument starts with "@", read arguments line-by-line`
			`// from the given file.`
			`std::vector<std::string> vec_args;`
			`if (argc == 2 && argv != nullptr && argv[1] != nullptr && argv[1][0] == '@') {`
			`// Save the name of the executable.`
			`vec_args.push_back(argv[0]);`

			`// Open the response file.`
			`char const * rspfile = argv[1] + sizeof(char);`
			`std::ifstream fin(rspfile);`
			`if (fin.is_open() == false) {`
			`fprintf(stderr, "error: response file '%s' not found\n", rspfile);`
			`return 1;`
			`}`

			`// Read the entire response file.`
			`std::string line;`
			`while (std::getline(fin, line)) {`
			`vec_args.push_back(line);`
			`}`

			`// Use the contents of the response file as the command-line arguments.`
			`argc = static_cast<int>(vec_args.size());`
			`argv = static_cast<char *>(alloca(argc sizeof (char *)));`
			`for (int i = 0; i < argc; ++i) {`
			`argv[i] = const_cast<char *>(vec_args[i].c_str());`
			`}`
			`}`

Initial release 2022-09-25 21:23:15 +03:00			`if (whisper_params_parse(argc, argv, params) == false) {`
main : gracefully exit when invalid params are passed (#1002) * Refactor whisper_params_parse to return false on failure * Updated help flag behavior 2023-06-25 18:51:59 +08:00			`whisper_print_usage(argc, argv, params);`
Initial release 2022-09-25 21:23:15 +03:00			`return 1;`
			`}`

main : check if input files exist before proceeding (#1872) Until the most recent commit (3d42463), the main.cpp sample file does not check whether the input files exist or not. Consequently, the model is loaded first before reporting whether there was a failure or not when processing a file. In environments with HDD, this can take about 50 seconds or more, depending on the loaded model. This commit addresses this issue by checking in advance whether the input files exist or not. 2024-02-19 05:51:26 -03:00			`// remove non-existent files`
			`for (auto it = params.fname_inp.begin(); it != params.fname_inp.end();) {`
			`const auto fname_inp = it->c_str();`

main : fix file existence check in main.cpp (#1889) In commit dda4b0e of PR #1872, I've introduced a check for the existence of files before loading the model. However, I haven't considered the case where whisper.cpp might read from stdin as well, and in such cases, the checks should ignore the "-" argument as it does not represent a regular file. Additionally, this commit removes the usage of 'stat()' in favor of the recently introduced function 'is_file_exist()' in common.cpp from PR #1871. Apologies for the bug introduced in the previous PR and any inconvenience it may have caused. 2024-02-22 10:01:08 -03:00			`if (*it != "-" && !is_file_exist(fname_inp)) {`
main : check if input files exist before proceeding (#1872) Until the most recent commit (3d42463), the main.cpp sample file does not check whether the input files exist or not. Consequently, the model is loaded first before reporting whether there was a failure or not when processing a file. In environments with HDD, this can take about 50 seconds or more, depending on the loaded model. This commit addresses this issue by checking in advance whether the input files exist or not. 2024-02-19 05:51:26 -03:00			`fprintf(stderr, "error: input file not found '%s'\n", fname_inp);`
			`it = params.fname_inp.erase(it);`
			`continue;`
			`}`

			`it++;`
			`}`

ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00			`if (params.fname_inp.empty()) {`
			`fprintf(stderr, "error: no input files specified\n");`
			`whisper_print_usage(argc, argv, params);`
ref #17 : add options to output result to file Support for: - plain text - VTT - SRT 2022-10-08 17:22:22 +03:00			`return 2;`
ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00			`}`

whisper : language auto-detect (#59) 2022-12-17 17:58:08 +02:00			`if (params.language != "auto" && whisper_lang_id(params.language.c_str()) == -1) {`
refactoring : more readable code 2022-11-25 19:08:51 +02:00			`fprintf(stderr, "error: unknown language '%s'\n", params.language.c_str());`
			`whisper_print_usage(argc, argv, params);`
			`exit(0);`
			`}`

whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00			`if (params.diarize && params.tinydiarize) {`
			`fprintf(stderr, "error: cannot use both --diarize and --tinydiarize\n");`
			`whisper_print_usage(argc, argv, params);`
			`exit(0);`
			`}`

main : add cli option to disable system prints (#1740) 2024-01-08 16:41:28 +02:00			`if (params.no_prints) {`
			`whisper_log_set(cb_log_disable, NULL);`
			`}`

Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`// whisper init`
examples : initialize context params properly (#1852) 2024-02-11 16:39:12 +02:00			`struct whisper_context_params cparams = whisper_context_default_params();`
whisper : use flash attention (#2152) * whisper : use flash attention in the encoder * whisper : add kv_pad * whisper : remove extra backend instance (huh?) * whisper : use FA for cross-attention * whisper : use FA for self-attention * whisper : simplify encoder FA * whisper : add flash_attn runtime parameter * scripts : add bench log * scripts : add M1 Pro bench log 2024-05-15 09:38:19 +03:00
			`cparams.use_gpu = params.use_gpu;`
examples : use -dev/--device and WHISPER_ARG_DEVICE (#3557) Align device selection naming with llama.cpp. 2026-01-21 04:40:30 -03:00			`cparams.gpu_device = params.gpu_device;`
whisper : use flash attention (#2152) * whisper : use flash attention in the encoder * whisper : add kv_pad * whisper : remove extra backend instance (huh?) * whisper : use FA for cross-attention * whisper : use FA for self-attention * whisper : simplify encoder FA * whisper : add flash_attn runtime parameter * scripts : add bench log * scripts : add M1 Pro bench log 2024-05-15 09:38:19 +03:00			`cparams.flash_attn = params.flash_attn;`
whisper : add context param to disable gpu (#1293) * whisper : check state->ctx_metal not null * whisper : add whisper_context_params { use_gpu } * whisper : new API with params & deprecate old API * examples : use no-gpu param && whisper_init_from_file_with_params * whisper.objc : enable metal & disable on simulator * whisper.swiftui, metal : enable metal & support load default.metallib * whisper.android : use new API * bindings : use new API * addon.node : fix build & test * bindings : updata java binding * bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java * metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load * metal : move bundle var into block * metal : use SWIFT_PACKAGE instead of GGML_SWIFT * style : minor updates --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-11-06 17:04:24 +08:00
whisper : token-level timestamps with DTW (#1485) * whisper.cpp: impl dtw algo * WIP: producing and placing DTW timestamps on tokens * Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. * Fix mistake causing incorrect alignment of dtw timestamps * implement N_TOP_MOST and CUSTOM alignment heads setting * whisper: fix typo on alignment heads enum * Fix issues related to changes in whisper.cpp * Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function * decoder: save cross QKs only if requested * Calling median filter with ggml_map_custom1 * Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads * Copying cross QKs from decoder backend correctly * dtw: cleanup * Fix incorrect n_frames passed to dtw when near end of audio * Fix aheads_masks_init for backend != CPU * whisper : minor style * main : add dtw (wip) * whisper: fix invalid memory access in aheads_masks_init * main : add dtw (cont) * whisper : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-20 13:25:26 -03:00			`if (!params.dtw.empty()) {`
			`cparams.dtw_token_timestamps = true;`
			`cparams.dtw_aheads_preset = WHISPER_AHEADS_NONE;`

			`if (params.dtw == "tiny") cparams.dtw_aheads_preset = WHISPER_AHEADS_TINY;`
			`if (params.dtw == "tiny.en") cparams.dtw_aheads_preset = WHISPER_AHEADS_TINY_EN;`
			`if (params.dtw == "base") cparams.dtw_aheads_preset = WHISPER_AHEADS_BASE;`
			`if (params.dtw == "base.en") cparams.dtw_aheads_preset = WHISPER_AHEADS_BASE_EN;`
			`if (params.dtw == "small") cparams.dtw_aheads_preset = WHISPER_AHEADS_SMALL;`
			`if (params.dtw == "small.en") cparams.dtw_aheads_preset = WHISPER_AHEADS_SMALL_EN;`
			`if (params.dtw == "medium") cparams.dtw_aheads_preset = WHISPER_AHEADS_MEDIUM;`
			`if (params.dtw == "medium.en") cparams.dtw_aheads_preset = WHISPER_AHEADS_MEDIUM_EN;`
			`if (params.dtw == "large.v1") cparams.dtw_aheads_preset = WHISPER_AHEADS_LARGE_V1;`
			`if (params.dtw == "large.v2") cparams.dtw_aheads_preset = WHISPER_AHEADS_LARGE_V2;`
			`if (params.dtw == "large.v3") cparams.dtw_aheads_preset = WHISPER_AHEADS_LARGE_V3;`
whisper : add dtw preset for large-v3-turbo (#2481) 2024-10-15 21:00:21 +03:00			`if (params.dtw == "large.v3.turbo") cparams.dtw_aheads_preset = WHISPER_AHEADS_LARGE_V3_TURBO;`
whisper : token-level timestamps with DTW (#1485) * whisper.cpp: impl dtw algo * WIP: producing and placing DTW timestamps on tokens * Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. * Fix mistake causing incorrect alignment of dtw timestamps * implement N_TOP_MOST and CUSTOM alignment heads setting * whisper: fix typo on alignment heads enum * Fix issues related to changes in whisper.cpp * Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function * decoder: save cross QKs only if requested * Calling median filter with ggml_map_custom1 * Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads * Copying cross QKs from decoder backend correctly * dtw: cleanup * Fix incorrect n_frames passed to dtw when near end of audio * Fix aheads_masks_init for backend != CPU * whisper : minor style * main : add dtw (wip) * whisper: fix invalid memory access in aheads_masks_init * main : add dtw (cont) * whisper : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-20 13:25:26 -03:00
			`if (cparams.dtw_aheads_preset == WHISPER_AHEADS_NONE) {`
			`fprintf(stderr, "error: unknown DTW preset '%s'\n", params.dtw.c_str());`
			`return 3;`
			`}`
			`}`

whisper : add context param to disable gpu (#1293) * whisper : check state->ctx_metal not null * whisper : add whisper_context_params { use_gpu } * whisper : new API with params & deprecate old API * examples : use no-gpu param && whisper_init_from_file_with_params * whisper.objc : enable metal & disable on simulator * whisper.swiftui, metal : enable metal & support load default.metallib * whisper.android : use new API * bindings : use new API * addon.node : fix build & test * bindings : updata java binding * bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java * metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load * metal : move bundle var into block * metal : use SWIFT_PACKAGE instead of GGML_SWIFT * style : minor updates --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-11-06 17:04:24 +08:00			`struct whisper_context * ctx = whisper_init_from_file_with_params(params.model.c_str(), cparams);`
Initial release 2022-09-25 21:23:15 +03:00
refactoring : move main + stream in examples + other stuff 2022-10-25 19:13:08 +03:00			`if (ctx == nullptr) {`
			`fprintf(stderr, "error: failed to initialize whisper context\n");`
			`return 3;`
			`}`

whisper : minor OpenVINO refactoring (#1037) Hopefully I didn't break something - haven't tested 2023-07-04 20:28:27 +03:00			`// initialize openvino encoder. this has no effect on whisper.cpp builds that don't have OpenVINO configured`
whisper : add OpenVINO support (#1037) * openvino: use OpenVINO encoder inference * openvino: add python script for OpenVINO model generation * whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * whisper: Fix compilation error * whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures * cmake: Add openvino-encoder as separate object target * whisper : minor style fixes * minor : indentation fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-04 08:56:11 -04:00			`whisper_ctx_init_openvino_encoder(ctx, nullptr, params.openvino_encode_device.c_str(), nullptr);`

main : add command-style grammar (#1998) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-28 03:02:10 -07:00			`if (!params.grammar.empty()) {`
			`auto & grammar = params.grammar_parsed;`
			`if (is_file_exist(params.grammar.c_str())) {`
			`// read grammar from file`
			`std::ifstream ifs(params.grammar.c_str());`
			`const std::string txt = std::string((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());`
			`grammar = grammar_parser::parse(txt.c_str());`
			`} else {`
			`// read grammar from string`
			`grammar = grammar_parser::parse(params.grammar.c_str());`
			`}`

			`// will be empty (default) if there are parse errors`
			`if (grammar.rules.empty()) {`
			`fprintf(stderr, "error: failed to parse grammar \"%s\"\n", params.grammar.c_str());`
			`return 4;`
			`} else {`
			`fprintf(stderr, "%s: grammar:\n", __func__);`
			`grammar_parser::print_grammar(stderr, grammar);`
			`fprintf(stderr, "\n");`
			`}`
			`}`

ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00			`for (int f = 0; f < (int) params.fname_inp.size(); ++f) {`
cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`const auto & fname_inp = params.fname_inp[f];`
			`struct fout_factory {`
			`std::string fname_out;`
			`const size_t basename_length;`
			`const bool is_stdout;`
			`bool used_stdout;`
			`decltype(whisper_print_segment_callback) * const print_segment_callback;`
			`std::ofstream fout;`

			`fout_factory (const std::string & fname_out_, const std::string & fname_inp, whisper_params & params) :`
			`fname_out{!fname_out_.empty() ? fname_out_ : fname_inp},`
			`basename_length{fname_out.size()},`
			`is_stdout{fname_out == "-"},`
			`used_stdout{},`
			`print_segment_callback{is_stdout ? nullptr : whisper_print_segment_callback} {`
			`if (!print_segment_callback) {`
			`params.print_progress = false;`
			`}`
			`}`

			`bool open(const char * ext, const char * function) {`
			`if (is_stdout) {`
cli : avoid std::exchange ggml-ci 2025-05-07 13:22:47 +03:00			`if (used_stdout) {`
cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`fprintf(stderr, "warning: Not appending multiple file formats to stdout\n");`
			`return false;`
			`}`
cli : avoid std::exchange ggml-ci 2025-05-07 13:22:47 +03:00
			`used_stdout = true;`
cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`#ifdef _WIN32`
			`fout = std::ofstream{"CON"};`
			`#else`
			`fout = std::ofstream{"/dev/stdout"};`
			`#endif`
			`// Not using fprintf stderr here because it might equal stdout`
			`// Also assuming /dev is mounted`
			`return true;`
			`}`
cli : avoid std::exchange ggml-ci 2025-05-07 13:22:47 +03:00
cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`fname_out.resize(basename_length);`
			`fname_out += ext;`
			`fout = std::ofstream{fname_out};`
			`if (!fout.is_open()) {`
			`fprintf(stderr, "%s: failed to open '%s' for writing\n", __func__, fname_out.c_str());`
			`return false;`
			`}`
			`fprintf(stderr, "%s: saving output to '%s'\n", function, fname_out.c_str());`
			`return true;`
			`}`
			`} fout_factory{f < (int) params.fname_out.size() ? params.fname_out[f] : "", fname_inp, params};`
ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00
examples : refactor in order to reuse code and reduce duplication (#482) * examples : refactor common code into a library * examples : refactor common SDL code into a library * make : update Makefile to use common libs * common : fix MSVC M_PI .. * addon.node : link common lib 2023-02-15 19:28:10 +02:00			`std::vector<float> pcmf32; // mono-channel F32 PCM`
main : add stereo-channel-based diarization (#64) Not tested - I don't have stereo dialog audio 2022-11-25 22:08:58 +02:00			`std::vector<std::vector<float>> pcmf32s; // stereo-channel F32 PCM`
main : fix dangling pointer when using stdin for input (#65) 2022-11-24 17:53:51 +02:00
examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759) 2025-02-27 12:06:54 +05:00			`if (!::read_audio_data(fname_inp, pcmf32, pcmf32s, params.diarize)) {`
			`fprintf(stderr, "error: failed to read audio file '%s'\n", fname_inp.c_str());`
examples : refactor in order to reuse code and reduce duplication (#482) * examples : refactor common code into a library * examples : refactor common SDL code into a library * make : update Makefile to use common libs * common : fix MSVC M_PI .. * addon.node : link common lib 2023-02-15 19:28:10 +02:00			`continue;`
Initial release 2022-09-25 21:23:15 +03:00			`}`

main : add cli option to disable system prints (#1740) 2024-01-08 16:41:28 +02:00			`if (!whisper_is_multilingual(ctx)) {`
			`if (params.language != "en" \|\| params.translate) {`
			`params.language = "en";`
			`params.translate = false;`
			`fprintf(stderr, "%s: WARNING: model is not multilingual, ignoring language and translation options\n", __func__);`
			`}`
			`}`
			`if (params.detect_language) {`
			`params.language = "auto";`
			`}`

			`if (!params.no_prints) {`
			`// print system information`
Print system info at start of program 2022-10-27 17:22:10 +03:00			`fprintf(stderr, "\n");`
main : merge parallel example in main 2022-10-29 12:26:03 +03:00			`fprintf(stderr, "system_info: n_threads = %d / %d \| %s\n",`
			`params.n_threads*params.n_processors, std::thread::hardware_concurrency(), whisper_print_system_info());`
Print system info at start of program 2022-10-27 17:22:10 +03:00
main : add cli option to disable system prints (#1740) 2024-01-08 16:41:28 +02:00			`// print some info about the processing`
ref #17 : print whisper logs to stderr Only the transcribed/translted text is printed to stdout. This way, one can redirect the result to a file. 2022-10-08 17:28:06 +03:00			`fprintf(stderr, "\n");`
whisper : add batched decoding (#1486) * whisper : add whisper_batch * whisper : move kv_self to whisper_state * whisper : full batched decoding support * whisper : fix memory leak in whisper_batch * whisper : fix mem leak again + remove oboslete function * whisper : clear kv cache when using whisper_decode API * whisper : speed-up sampling * whisper : fix decoders initializer * bench : add batch size 5 bench * whisper : add comment about the KV cache size * whisper : add check for max number of decoders * whisper : avoid starting sampling threads with bs=1 * whisper : enable beam-search by default * cuda : sync llama.cpp fixes 2023-11-15 16:12:52 +02:00			`fprintf(stderr, "%s: processing '%s' (%d samples, %.1f sec), %d threads, %d processors, %d beams + best of %d, lang = %s, task = %s, %stimestamps = %d ...\n",`
main : merge parallel example in main 2022-10-29 12:26:03 +03:00			`__func__, fname_inp.c_str(), int(pcmf32.size()), float(pcmf32.size())/WHISPER_SAMPLE_RATE,`
whisper : add batched decoding (#1486) * whisper : add whisper_batch * whisper : move kv_self to whisper_state * whisper : full batched decoding support * whisper : fix memory leak in whisper_batch * whisper : fix mem leak again + remove oboslete function * whisper : clear kv cache when using whisper_decode API * whisper : speed-up sampling * whisper : fix decoders initializer * bench : add batch size 5 bench * whisper : add comment about the KV cache size * whisper : add check for max number of decoders * whisper : avoid starting sampling threads with bs=1 * whisper : enable beam-search by default * cuda : sync llama.cpp fixes 2023-11-15 16:12:52 +02:00			`params.n_threads, params.n_processors, params.beam_size, params.best_of,`
ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00			`params.language.c_str(),`
			`params.translate ? "translate" : "transcribe",`
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00			`params.tinydiarize ? "tdrz = 1, " : "",`
ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00			`params.no_timestamps ? 0 : 1);`
ref #17 : add options to output result to file Support for: - plain text - VTT - SRT 2022-10-08 17:22:22 +03:00
cli : print color scheme info for --print-colors (#3141) This commit adds a description of the color scheme used in the CLI when the --print-colors option is enabled. The motivation for this is that it is not immediately clear what the color scheme is when using the CLI with the --print-colors option. Example output: ```console $ ./build/bin/whisper-cli -f samples/jfk.wav --print-colors ... main: color scheme: red (low confidence), yellow (medium), green (high confidence) [00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. ``` The description will not be dispayed if the `--no-prints` options is set. Refs: https://github.com/ggml-org/whisper.cpp/issues/3135 2025-05-12 10:43:04 +02:00			`if (params.print_colors) {`
			`fprintf(stderr, "%s: color scheme: red (low confidence), yellow (medium), green (high confidence)\n", __func__);`
examples : add --print-confidence option to cli (#3150) * examples : add --print-confidence option to cli This commit adds a new command-line option `--print-confidence` to the whisper-cli. When enabled, this option prints the confidence level of each token in the transcribed text using ANSI formatting codes. The confidence levels are represented using different styles: ```console main: confidence: highlighted (low confidence), underlined (medium), dim (high confidence) ``` Refs: https://github.com/ggml-org/whisper.cpp/issues/3135 2025-05-14 19:21:48 +02:00			`} else if (params.print_confidence) {`
			`fprintf(stderr, "%s: confidence: highlighted (low confidence), underlined (medium), dim (high confidence)\n", __func__);`
cli : print color scheme info for --print-colors (#3141) This commit adds a description of the color scheme used in the CLI when the --print-colors option is enabled. The motivation for this is that it is not immediately clear what the color scheme is when using the CLI with the --print-colors option. Example output: ```console $ ./build/bin/whisper-cli -f samples/jfk.wav --print-colors ... main: color scheme: red (low confidence), yellow (medium), green (high confidence) [00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. ``` The description will not be dispayed if the `--no-prints` options is set. Refs: https://github.com/ggml-org/whisper.cpp/issues/3135 2025-05-12 10:43:04 +02:00			`}`
ref #17 : print whisper logs to stderr Only the transcribed/translted text is printed to stdout. This way, one can redirect the result to a file. 2022-10-08 17:28:06 +03:00			`fprintf(stderr, "\n");`
Flash + language support (ref #2) - Achieved big performance improvement + memory usage reduction - Can now translate / transcribe different languages 2022-09-28 20:46:05 +03:00			`}`

ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00			`// run the inference`
			`{`
ref #57, #62, #63 : remove unions in C-api + remove designated initializers We are not ready for designated initializers - many compilers do not support this C++ feature yet, so removing it's non-trivial usages. 2022-10-18 18:17:24 +03:00			`whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);`
ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00
main : add command-style grammar (#1998) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-28 03:02:10 -07:00			`const bool use_grammar = (!params.grammar_parsed.rules.empty() && !params.grammar_rule.empty());`
			`wparams.strategy = (params.beam_size > 1 \|\| use_grammar) ? WHISPER_SAMPLING_BEAM_SEARCH : WHISPER_SAMPLING_GREEDY;`
Improve decoding (#291) * whisper : prepare infra for new decoding strategies * whisper : apply logit filters and compute logprobs * whisper : add whisper_get_logits() * whisper : separate self and cross attention memory Initial step needed for supporting parallel decoders * whisper : move probs_id buffer to whisper_context * whisper : refactor kv cache into separate struct * whisper : move self-attention kv cache to whisper_decoder * whisper : wip decoding parameters + strategies * whisper : wip decoding parameters + strategies (part 2) * whisper : wip decoding parameters + strategies (part 3) * whisper : wip decoding parameters + strategies (part 4) * whisper : fix prompt_past update to not include prompt_init * whisper : temperature + best_of support * whisper : support for compression_ration_threshold We actually use entropy, but it is similar * command : fix example to use logits instead of obsolete probs * whisper : handle empty sequence ranking * whisper : add WHISPER_DEBUG + diagnostic prints + new main args * whisper : minor fixes * whisper : add beam-search support * whisper : bug fix when there no previous context * whisper : add comments * stream : disable temperature fallback For real-time processing, we always want a single decoder running at T=0 * whisper.swiftui : update example - fix paths + add empty folders 2023-01-15 11:29:57 +02:00
refactoring : more readable code 2022-11-25 19:08:51 +02:00			`wparams.print_realtime = false;`
main : add option to print the progress (#276) 2022-12-16 20:20:43 +02:00			`wparams.print_progress = params.print_progress;`
refactoring : more readable code 2022-11-25 19:08:51 +02:00			`wparams.print_timestamps = !params.no_timestamps;`
			`wparams.print_special = params.print_special;`
			`wparams.translate = params.translate;`
			`wparams.language = params.language.c_str();`
whisper : add detect-language mode (#853) * add detectlanguage flag * renaming and help * no idea why that last one didn't commit * run language detection if dl is set * help message fix * various fixes * fix quitting * fix language being english on print 2023-05-02 11:51:52 -05:00			`wparams.detect_language = params.detect_language;`
refactoring : more readable code 2022-11-25 19:08:51 +02:00			`wparams.n_threads = params.n_threads;`
			`wparams.n_max_text_ctx = params.max_context >= 0 ? params.max_context : wparams.n_max_text_ctx;`
			`wparams.offset_ms = params.offset_t_ms;`
			`wparams.duration_ms = params.duration_ms;`

examples : Implement JSON output for Token-Level data in main (#1358) 2023-10-31 21:54:52 +02:00			`wparams.token_timestamps = params.output_wts \|\| params.output_jsn_full \|\| params.max_len > 0;`
refactoring : more readable code 2022-11-25 19:08:51 +02:00			`wparams.thold_pt = params.word_thold;`
			`wparams.max_len = params.output_wts && params.max_len == 0 ? 60 : params.max_len;`
whisper : add "split_on_word" flag when using using "max_len" option (#455) * Update whisper.cpp * fix: trim function * feat: added flag to split on word * fix: arguments for main 2023-02-05 13:44:23 +01:00			`wparams.split_on_word = params.split_on_word;`
examples : added audio_ctx argument to main and server (#1857) * added audio_ctx argument to main and server examples * Better default value Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * better default value (again) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-02-12 02:19:07 -05:00			`wparams.audio_ctx = params.audio_ctx;`
refactoring : more readable code 2022-11-25 19:08:51 +02:00
whisper : significantly improve the inference quality (#1148) * Fix MSVC compile error C3688 Instead of simply using 'add_compile_options(/utf-8)' to address the MSVC compile error C3688, a better approach would be to handle it in a way that prevents passing '/utf-8' to NVCC. * Significantly improve inference quality In the function `log_mel_spectrogram_worker_thread`, there's an array out-of-bounds issue occurring during the calculation of complex number moduli. This issue is causing disruptions in the FFT spectrum, which, in turn, is reducing the quality of inference. * Significantly improve inference quality At last, I've pinpointed the actual source of the problem. Given that the frequency spectrum generated from real input data is symmetrical around the Nyquist frequency, there's a for-loop within the `log_mel_spectrogram_worker_thread` function that attempts to fold the frequency spectrum. Regrettably, a bug within this for-loop is causing a frame shift in the frequency spectrum. The previous attempt to remedy this, which involved using `fft_size + 1` when calculating the modulus, was merely a band-aid solution and did not address the underlying issue. * Addressed a few minor issues Fixed the issue of `fft_out` continuously expanding. Resolved the fallback caused by using 'break' instead of `fft_in[j] = 0`. * Significantly improve inference quality Thanks for your patience everyone. It's finally sorted out. Now, the right side of the FFT spectrum is being flipped over to the left, and the amplitudes at corresponding positions on the left and right are added together (the spectrum on the left needs to be shifted by one position), then the average is calculated. FFT_OUT[0] is no longer discarded, making full use of the limited space to pack in more information. * Add annotation and performance improvement * Calculate FFT only when fft_in are not all zero * Some minor performance improvement * Fixed a bug impacting inference quality * The first version after all the analysis is completed. * Fix some bugs and add debug mode * Fixed several bugs * Temporarily disable speed-up mode and add debug mode. * Add debug mode * Disable speed-up mode and add debug mode * Fix CI error (#1) * Fix error * Fix error * Fixed several bugs including [BLANK_AUDIO] problem * Remove Hard-coded hann window * Some Final Fix (#2) * Fix error * Fix error * Probably the last commit * Probably the last commit * whisper : minor coding style changes * whisper : remove debug from public API --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-08-28 00:51:33 +08:00			`wparams.debug_mode = params.debug_mode;`
whisper : add option to speed up the audio tempo by x2 Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example. 2022-11-12 18:03:49 +02:00
whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize (#1058) * add HuggingFace mirror to download ggml model * support tdrz via simple hack overriding solm tokens * fix incorrect translate/transcribe token_ids that are not static const * add apollo 13 sample for tdrz demo * render [SPEAKER TURN] consistently in all terminal output using vocab.id_to_token * extend whisper_segment with speaker_turn_next field and save in json output * fix failing go build * slipped in some python syntax whoops * whisper : finalize tinydiarize support (add flag + fixes) * whisper : tdrz support for word-level timestamps (respect max_len) * java : try to fix tests after adding tdrz_enable flag * main : remove TODO leftover * java : fix params order list after adding "tdrz_enable" * whisper : fix solm and add nosp token * main : print tinydiarize help --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-07-03 23:45:00 -07:00			`wparams.tdrz_enable = params.tinydiarize; // [TDRZ]`

main : pass nullptr when regex is empty (#2070) 2024-04-17 12:23:47 +03:00			`wparams.suppress_regex = params.suppress_regex.empty() ? nullptr : params.suppress_regex.c_str();`
whisper : suppress tokens with a regex (#1997) * Allow a regular expression to describe tokens to suppress. Example: --suppress-tokens-re "[,\.]\|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens. Technique inspired by https://github.com/openai/whisper/discussions/1041 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Blind change to fix Java test. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-04-09 08:27:28 -07:00
whisper : add support for --carry-initial-prompt (#3395) * Add support for --carry-initial-prompt * PR fixes for ruby and go * Refactoring for readability * WIP 1 * WIP 2 * PR fixes * More PR fixes * PR fix * Further simplification * d'oh * One more logic fix * Update src/whisper.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Truncate prompt_past0 upon initialization * Slight simplification --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-10-10 18:51:15 +02:00			`wparams.initial_prompt = params.prompt.c_str();`
			`wparams.carry_initial_prompt = params.carry_initial_prompt;`
whisper : reduce memory usage during inference (#431) * ggml : add "scratch" buffer support * ggml : support for scratch ring-buffer * ggml : bug fix in ggml_repeat() * ggml : error on scratch buffer overflow * whisper : use scratch buffers during inference (base model only) * whisper : update memory usage for all models * whisper : fix encoder memory usage * whisper : use whisper_context functions instead of macros * whisper : fix FF + remove it from README * ggml : reuse ggml_new_i32 * ggml : refactor the scratch buffer storage * whisper : reorder scratch buffers in the decoder * main : add option to disable temp fallback * Update README.md 2023-02-04 09:45:52 +02:00
Improve decoding (#291) * whisper : prepare infra for new decoding strategies * whisper : apply logit filters and compute logprobs * whisper : add whisper_get_logits() * whisper : separate self and cross attention memory Initial step needed for supporting parallel decoders * whisper : move probs_id buffer to whisper_context * whisper : refactor kv cache into separate struct * whisper : move self-attention kv cache to whisper_decoder * whisper : wip decoding parameters + strategies * whisper : wip decoding parameters + strategies (part 2) * whisper : wip decoding parameters + strategies (part 3) * whisper : wip decoding parameters + strategies (part 4) * whisper : fix prompt_past update to not include prompt_init * whisper : temperature + best_of support * whisper : support for compression_ration_threshold We actually use entropy, but it is similar * command : fix example to use logits instead of obsolete probs * whisper : handle empty sequence ranking * whisper : add WHISPER_DEBUG + diagnostic prints + new main args * whisper : minor fixes * whisper : add beam-search support * whisper : bug fix when there no previous context * whisper : add comments * stream : disable temperature fallback For real-time processing, we always want a single decoder running at T=0 * whisper.swiftui : update example - fix paths + add empty folders 2023-01-15 11:29:57 +02:00			`wparams.greedy.best_of = params.best_of;`
			`wparams.beam_search.beam_size = params.beam_size;`

main : add options for temperature control (#2088) Add two options: ``` -tp, --temperature N [0.00 ] The sampling temperature, between 0 and 1 -tpi, --temperature-inc N [0.20 ] The increment of temperature, between 0 and 1 ``` The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at> 2024-05-13 13:59:44 +02:00			`wparams.temperature_inc = params.no_fallback ? 0.0f : params.temperature_inc;`
			`wparams.temperature = params.temperature;`

whisper : reduce memory usage during inference (#431) * ggml : add "scratch" buffer support * ggml : support for scratch ring-buffer * ggml : bug fix in ggml_repeat() * ggml : error on scratch buffer overflow * whisper : use scratch buffers during inference (base model only) * whisper : update memory usage for all models * whisper : fix encoder memory usage * whisper : use whisper_context functions instead of macros * whisper : fix FF + remove it from README * ggml : reuse ggml_new_i32 * ggml : refactor the scratch buffer storage * whisper : reorder scratch buffers in the decoder * main : add option to disable temp fallback * Update README.md 2023-02-04 09:45:52 +02:00			`wparams.entropy_thold = params.entropy_thold;`
			`wparams.logprob_thold = params.logprob_thold;`
cli : add no_speech_thold (#2663) 2024-12-24 08:29:19 +01:00			`wparams.no_speech_thold = params.no_speech_thold;`
main : add "--prompt" command line argument (#90) This allows to provide an initial prompt to be used at the start of the processing. 2022-12-16 19:43:16 +02:00
params : don't compute timestamps when not printing them (#1755) 2024-01-12 11:24:38 +00:00			`wparams.no_timestamps = params.no_timestamps;`

cli : add --suppress_nst support (#2664) 2024-12-24 08:30:07 +01:00			`wparams.suppress_nst = params.suppress_nst;`

vad : add initial Voice Activity Detection (VAD) support (#3065) * vad : add initial Voice Activity Detection (VAD) support This commit add support for Voice Activity Detection (VAD). When enabled this feature will process the audio input and detect speech segments. This information is then used to reduce the number of samples that need to be processed by whisper_full. Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2025-05-12 16:10:11 +02:00			`wparams.vad = params.vad;`
			`wparams.vad_model_path = params.vad_model.c_str();`

			`wparams.vad_params.threshold = params.vad_threshold;`
			`wparams.vad_params.min_speech_duration_ms = params.vad_min_speech_duration_ms;`
			`wparams.vad_params.min_silence_duration_ms = params.vad_min_silence_duration_ms;`
			`wparams.vad_params.max_speech_duration_s = params.vad_max_speech_duration_s;`
			`wparams.vad_params.speech_pad_ms = params.vad_speech_pad_ms;`
			`wparams.vad_params.samples_overlap = params.vad_samples_overlap;`

whisper : move progress calculation out of whisper.cpp (#1081) Current `progress_step` was hardcoded into whisper.cpp, this resulted in bindings having to access progress only at that step even if progress callback was being called at every iteration. With this change we get greater granularity progress reporting from whisper.cpp and bindings/implementations can define their own progress step. 2023-07-25 21:23:34 +05:30			`whisper_print_user_data user_data = { &params, &pcmf32s, 0 };`
main : add stereo-channel-based diarization (#64) Not tested - I don't have stereo dialog audio 2022-11-25 22:08:58 +02:00
main : add command-style grammar (#1998) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2024-03-28 03:02:10 -07:00			`const auto & grammar_parsed = params.grammar_parsed;`
			`auto grammar_rules = grammar_parsed.c_rules();`

			`if (use_grammar) {`
			`if (grammar_parsed.symbol_ids.find(params.grammar_rule) == grammar_parsed.symbol_ids.end()) {`
			`fprintf(stderr, "%s: warning: grammar rule '%s' not found - skipping grammar sampling\n", __func__, params.grammar_rule.c_str());`
			`} else {`
			`wparams.grammar_rules = grammar_rules.data();`
			`wparams.n_grammar_rules = grammar_rules.size();`
			`wparams.i_start_rule = grammar_parsed.symbol_ids.at(params.grammar_rule);`
			`wparams.grammar_penalty = params.grammar_penalty;`
			`}`
			`}`

whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference. 2022-10-22 21:06:50 +03:00			`// this callback is called on each new segment`
			`if (!wparams.print_realtime) {`
cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`wparams.new_segment_callback = fout_factory.print_segment_callback;`
main : add stereo-channel-based diarization (#64) Not tested - I don't have stereo dialog audio 2022-11-25 22:08:58 +02:00			`wparams.new_segment_callback_user_data = &user_data;`
whisper : add new-segment callback Can be used to process new segments as they are being generated. Sample usage in main, for printing the resulting segments during the inference. 2022-10-22 21:06:50 +03:00			`}`

whisper : move progress calculation out of whisper.cpp (#1081) Current `progress_step` was hardcoded into whisper.cpp, this resulted in bindings having to access progress only at that step even if progress callback was being called at every iteration. With this change we get greater granularity progress reporting from whisper.cpp and bindings/implementations can define their own progress step. 2023-07-25 21:23:34 +05:30			`if (wparams.print_progress) {`
			`wparams.progress_callback = whisper_print_progress_callback;`
			`wparams.progress_callback_user_data = &user_data;`
			`}`

whisper : abort callback improvements (#1345) * whisper : initialize abort_callback to null * whisper : add example how to use abort_callback 2023-10-08 16:22:24 +02:00			`// examples for abort mechanism`
			`// in examples below, we do not abort the processing, but we could if the flag is set to true`

whisper : add mechanism for aborting the whisper_full() computation 2022-11-27 20:28:36 +02:00			`// the callback is called before every encoder run - if it returns false, the processing is aborted`
			`{`
			`static bool is_aborted = false; // NOTE: this should be atomic to avoid data race`

whisper : add whisper_state + default state on the whisper_context (#523) * Added whisper state + default state on the whisper_context * Fixed some examples and bindings * Fixed whisper_n_len (which was used in some binding) and added whisper_n_len_from_state * Fixed comments * whisper : reuse kv_cache_free() and fix compiler warnings * whisper : clean-up the API comments --------- Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-03-05 20:42:19 +01:00			`wparams.encoder_begin_callback = [](struct whisper_context * /ctx/, struct whisper_state * /state/, void * user_data) {`
whisper : add mechanism for aborting the whisper_full() computation 2022-11-27 20:28:36 +02:00			`bool is_aborted = (bool)user_data;`
			`return !is_aborted;`
			`};`
			`wparams.encoder_begin_callback_user_data = &is_aborted;`
			`}`

whisper : abort callback improvements (#1345) * whisper : initialize abort_callback to null * whisper : add example how to use abort_callback 2023-10-08 16:22:24 +02:00			`// the callback is called before every computation - if it returns true, the computation is aborted`
			`{`
			`static bool is_aborted = false; // NOTE: this should be atomic to avoid data race`

			`wparams.abort_callback = [](void * user_data) {`
			`bool is_aborted = (bool)user_data;`
			`return is_aborted;`
			`};`
			`wparams.abort_callback_user_data = &is_aborted;`
			`}`

main : merge parallel example in main 2022-10-29 12:26:03 +03:00			`if (whisper_full_parallel(ctx, wparams, pcmf32.data(), pcmf32.size(), params.n_processors) != 0) {`
ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00			`fprintf(stderr, "%s: failed to process audio\n", argv[0]);`
main : add stereo-channel-based diarization (#64) Not tested - I don't have stereo dialog audio 2022-11-25 22:08:58 +02:00			`return 10;`
ref #22 : add option to provide multiple input .wav files 2022-10-05 23:44:10 +03:00			`}`
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00			`}`
Initial release 2022-09-25 21:23:15 +03:00
main : add option for word-leve timestamps (very experimental) 2022-10-30 10:05:58 +02:00			`// output stuff`
			`{`
cli : support "-" for stdout like stdin (#3050) This changes examples/cli/cli.cpp to be like examples/common-whisper.cpp. "-of -" can be specified (or this can be inferred from "-" as the input file) to output to stdout. This is useful for piping to other applications. Log fname_out consistently when not stdout - Terminals have stdout=stderr, so remove the message before successful output to ease copying - Don't affect actual error messages - Move opening the ofstream into the factory, fixing missing open and/or error messages in output_score/output_wts - Fix struct naming convention Closes #3048 2025-05-05 01:15:39 -04:00			`// macros to stringify function name`
			`#define output_func(func, ext, param, ...) if (param && fout_factory.open(ext, #func)) {\`
			`func(ctx, fout_factory.fout, params, __VA_ARGS__); \`
			`}`
			`#define output_ext(ext, ...) output_func(output_##ext, "." #ext, params.output_##ext, __VA_ARGS__)`

			`output_ext(txt, pcmf32s);`
			`output_ext(vtt, pcmf32s);`
			`output_ext(srt, pcmf32s);`
			`output_ext(wts, pcmf32s, fname_inp.c_str(), float(pcmf32.size() + 1000)/WHISPER_SAMPLE_RATE, fout_factory.fname_out.c_str());`
			`output_ext(csv, pcmf32s);`
			`output_func(output_json, ".json", params.output_jsn, pcmf32s);`
			`output_ext(lrc, pcmf32s);`
			`output_func(output_score, ".score.txt", params.log_score, pcmf32s);`

			`#undef output_ext`
			`#undef output_func`

			`if (fout_factory.is_stdout && !fout_factory.used_stdout) {`
			`fprintf(stderr, "warning: '--output-file -' used without any other '--output-*'");`
main : log probs to text file (#1205) * token/probability file generated with -ls * code comment cleaning * main : indentations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 2023-08-27 18:09:06 +02:00			`}`
ref #4 : added transcription timestamps Can be turned off with "-nt" argument. Performance has also improved. 2022-09-29 23:09:04 +03:00			`}`
Initial release 2022-09-25 21:23:15 +03:00			`}`

main : dont print timings with --no-prints (#2108) Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at> 2024-05-13 14:00:19 +02:00			`if (!params.no_prints) {`
			`whisper_print_timings(ctx);`
			`}`
Initial C-style interface for whisper.cpp 2022-10-04 20:35:01 +03:00			`whisper_free(ctx);`
Initial release 2022-09-25 21:23:15 +03:00
			`return 0;`
			`}`