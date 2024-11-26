NVIDIA researchers have introduced Fugatto, a new AI model that can modify or generate audio based on natural language input.

Unlike other AI tools that focus on specific tasks like writing songs or changing voices, Fugatto (short for Foundational Generative Audio Transformer Opus 1) offers a lot of flexibility and can handle many different audio tasks using both text and audio inputs. The model can create audio and music based on text descriptions, change existing songs by adding or taking away instruments, adjust the tone or emotion of voices, and even invent new sounds. It can also improve audio quality and act as a springboard for musical ideas.

Fugatto's design uses a method called ComposableART, which lets users mix different audio instructions while it’s working. This means users can combine things like voice accents and emotions in detailed ways. It can also create long, modulated, evolving audio scenes, such as a rainstorm that gradually changes into the sounds of a morning chorus.

The development of Fugatto took several years and involved a team of people from around the world. They used a large collection of audio samples, plus powerful DGX systems, to build the model's 2.5 billion parameters. One of the main challenges was creating a mixed dataset that would allow the model to handle a variety of tasks effectively. The team used different strategies to create and analyze data, which helped improve the model's functions while minimizing the size of its dataset.

Fugatto is an exciting development in the world of audio creation, ideation, and editing. It clearly has the potential to aid music or film development. That said, it is not a finished product. You cannot install or test Fugatto, and NVIDIA has not provided a timeline for the model's release. Fugatto may simply be a proof of concept.

Source: NVIDIA