victor-upmeet/whisperx

Accelerated transcription, word-level timestamps and diarization with whisperX large-v3

Input
Configure the inputs for the AI model.

Print out compute/inference times and memory usage information

ISO code of the language spoken in the audio, specify None to perform language detection

0
100

VAD onset

Audio file

0
100

Parallelization of input audio transcription

0
100

VAD offset

Assign speaker ID labels

0
100

Temperature to use for sampling

Aligns whisper output to get accurate word-level timestamps

0
100

Maximum number of speakers if diarization is activated (leave blank if unknown)

0
100

Minimum number of speakers if diarization is activated (leave blank if unknown)

Optional text to provide as a prompt for the first window

To enable diarization, please enter your HuggingFace token (read). You need to accept the user agreement for the models specified in the README.

0
100

If language is not specified, then the language will be detected recursively on different parts of the file until it reaches the given probability

0
100

If language is not specified, then the language will be detected following the logic of language_detection_min_prob parameter, but will stop after the given max retries. If max retries is reached, the most probable language is kept.

Output
The generated output will appear here.

No output yet

Click "Generate" to create an output.

whisperx - ikalos.ai