hnesk/whisper-wordtimestamps

openai/whisper with exposed settings for word_timestamps

Input

Configure the inputs for the AI model.

Audio *

Audio file

model

Choose a Whisper model.

language

language spoken in the audio, specify None to perform language detection

Patience

100

optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search

Temperature

100

temperature to use for sampling

Initial Prompt

optional text to provide as a prompt for the first window.

Suppress Tokens

comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations

Word Timestamps

Enable

Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.

Logprob Threshold

100

if the average log probability is lower than this value, treat the decoding as failed

Append Punctuations

If word_timestamps is True, merge these punctuation symbols with the previous word

No Speech Threshold

100

if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence

Prepend Punctuations

If word_timestamps is True, merge these punctuation symbols with the next word

Condition On Previous Text

Enable

if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop

Compression Ratio Threshold

100

if the gzip compression ratio is higher than this value, treat the decoding as failed

Temperature Increment On Fallback

100

temperature to increase when falling back when the decoding fails to meet either of the thresholds below

Output

The generated output will appear here.

No output yet

Click "Generate" to create an output.