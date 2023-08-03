Meta has made an important announcement regarding the open-sourcing of AudioCraft, a collection of generative AI tools designed for creating music and audio from textual prompts. This release allows content creators to generate intricate audio landscapes, compose melodies, and simulate virtual orchestras using simple text descriptions.

AudioCraft encompasses three main components. AudioGen enables the generation of a wide range of audio effects and soundscapes. MusicGen, on the other hand, creates musical compositions and melodies based on textual descriptions. EnCodec, an audio compression codec powered by neural network technology, has recently undergone improvements to generate music with higher quality and fewer artifacts.

The versatile AudioGen tool is capable of producing various audio sound effects such as barking dogs, honking car horns, and footsteps on a wooden floor. In the meantime, MusicGen can generate songs of different genres from scratch based on descriptions like “Pop dance track with catchy melodies, tropical percussions, and upbeat rhythms, perfect for the beach.”

Meta offers several audio samples on its website for evaluation. Although these samples showcase state-of-the-art capabilities, they might not match the professional quality of commercially produced audio effects or music. Recognizing that generative AI models focused on text and images have gained significant attention, Meta seeks to contribute accessible audio and musical experimentation tools to the broader community by releasing AudioCraft under the MIT License.

While Meta’s open-sourced AudioCraft is a significant step forward, it is worth noting that other companies have also been exploring AI-powered audio and music generation. OpenAI’s Jukebox, Google’s MusicLM, and a research team’s project called Riffusion are noteworthy attempts in this field.

Creating high-fidelity audio is a complex endeavor, requiring the modeling of intricate signals and patterns. Music, in particular, poses challenges due to its local and long-range patterns. Although symbolic representations like MIDI or piano rolls have been used, they often fail to capture all the expressive and stylistic elements of music. Recent advancements in self-supervised audio representation learning and hierarchical models have made progress in audio generation.

Meta discloses that MusicGen has been trained on 20,000 hours of music that the company owns or has licensed specifically for this purpose. This approach of utilizing ethical training material may address concerns raised by critics of generative AI models.

The open-sourcing of AudioCraft offers an opportunity for open-source developers to incorporate these audio models into their projects. This could lead to the development of fascinating and user-friendly generative audio tools in the future. The code for AudioCraft tools is available on GitHub for those with coding expertise.