Jump to content
  • A powerful new "whisper" audio filter brings AI transcription to FFmpeg


    Karlston

    • 1.6k views
    • 2 minutes
     Share


    • 1.6k views
    • 2 minutes

    FFmpeg, an essential open-source media tool, now includes a new af_whisper audio filter that enables automatic speech recognition (ASR) directly within the FFmpeg ecosystem. It uses the whisper.cpp library, which adds a powerful AI model to media processing workflows. This is a significant move for FFmpeg because it moves the software beyond traditional media processing into the world of AI.

     

    The new filter’s options allow for flexible transcription, including choosing the AI model, specifying the language, and setting the output format such as text, SRT, or JSON. It can handle pre-recorded files and live audio streams and users can also use Voice Activation Detection (VAD) to improve transcription accuracy and efficiency.

     

    The filter uses a queue technique which allows users to balance between transcription accuracy and processing speed. It also supports GPU acceleration, which can significantly speed up the transcription process. For users, this feature replaces the need for external, multi-step transcription processes, consolidating tasks into a single, efficient command line workflow.

     

    The new filter is able to generate subtitle files, such as SRT files for videos and podcasts, it also enables live audio transcriptions for streaming or other real-time applications. The filter is able to give you output metadata that can be used for further automation within FFmpeg. The new feature simplifies the process for content creators, archivists, and developers and also saves significant amounts of time and effort for anyone who wants to transcribe audio content.

     

    This integration sets a precedent for FFmpeg to add other AI and machine learning models in the future. It also solidifies FFmpeg’s position as an industry-standard media tool. While some people may be concerned with AI, it’s clear that it’s going to permeate most software going forward.

     

    Source


    Hope you enjoyed this news post.

    Posted Thursday 14 August 2025 at 3:47 am AEST (my time).

    News posts... 2023: 5,800+ | 2024: 5,700+ | 2025 (till end of July): 3,458

    RIP Matrix | Farewell my friend  


    User Feedback

    Recommended Comments

    There are no comments to display.



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...