Meta has unveiled a new suite of AI models called AudioCraft, designed to generate music and audio based on text prompts. This technology features three models: MusicGen, AudioGen, and EnCodec, providing competition to Google’s MusicLM, a text-to-music generator launched earlier this year.

With AudioCraft, users can instantly create music by providing prompts such as “soulful music for a dinner party” or “movie scene in a desert with percussion.” Meta views this technology as a new instrument, reminiscent of the perception synthesizers initially received.

The MusicGen model, a part of AudioCraft, has been trained using 20,000 hours of licensed and Meta-owned music. Details regarding EnCodec’s training process, including whether copyrighted material was used or if it adheres to the same guidelines as MusicGen, have not been clarified by the announcement.

Training AI models raises concerns within the industry, as it often requires millions or billions of data points to generate high-quality outputs. Some companies have been criticized for training their models on copyrighted material without proper authorization or compensation. Meta has yet to address these queries regarding its training process.

All three models in the AudioCraft suite, namely MusicGen, AudioGen, and EnCodec, will be available as open-source models. This will allow researchers and practitioners to use their own datasets and further develop the AudioCraft tools. Meta aims to address concerns of bias, particularly its inclination towards Western-style music, which makes up the majority of its training set.

Meta asserts that its family of models can produce high-quality audio consistently and with ease of use.