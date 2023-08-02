Tech giant Meta has unveiled a new set of AI tools called AudioCraft, designed to generate lifelike audio and music based on text input. The tool includes three models: MusicGen, AudioGen, and EnCodec. MusicGen is trained on 400,000 recordings, using text descriptions and metadata to create coherent samples for long-term musical structures. Meta compares MusicGen to a synthesizer, stating that it has the potential to become a new type of instrument with enhanced controls.

Meta has released a video clip showcasing the music generated by MusicGen, featuring reggae riffs, ’80s electronic beats, jazz instrumentals, and mellow hip-hop. Meanwhile, AudioGen is trained on public sound effects and can generate environmental sounds such as dogs barking, cars honking, and footsteps on a wooden floor. Meta has also upgraded the EnCodec decoder to ensure higher-quality music generation with fewer artifacts.

Notably, Meta has made the AudioCraft models open-source, enabling researchers and practitioners to train their own models using their datasets. This move aims to advance the field of AI-generated audio and music. However, Meta acknowledges that the training datasets used for the models lack diversity, with a larger focus on Western-style music and limited audio-text pairs in English. By sharing the code, Meta hopes to encourage other researchers to explore different approaches to address bias and misuse of generative models.