Tools and Perspectives for Interaction with Neural Audio Synthesis

Axel Chemla--Romeu-Santos

Workshop Title

Description (up to 750 words)

The current state-of-the-art in neural audio synthesis — using generative machine learning for sound generation — now provides reliable methods that are already used in numerous domains (speech and music generation, audio effects, synthesizers). In spite of these promising advances, several constraints still hinder these techniques from being both accessible and easily embeddable in most musical creation workflows, making these models still unaccessible for musicians without expert programming skills. Indeed, the mathematical and computational knowledge that is required to use these techniques prevent most interested users to access the full potential of these musical tools. Moreover, few efforts have been made to embed these technologies into the existing interactive workspaces, which are commonly used by most composers and performers. This is due to the computational complexity of the models, incurring real-time issues in actual creative setups. Hence, the extent of these exciting creative possibilities opened by ML remain still opaque to the communities of practitioners, researchers, and users outside the realm of science or engineering.

This workshop intends to merge musical, technical and scientific perspectives in order to discuss new tools and prospects opened up by neural audio synthesis. To this end, we propose to split the workshop in three different stages:

Introductory state-of-the-art: (~60mn) We will present both technical and musical works produced in the last decade leveraging neural audio synthesis, and introducing critical issues raised by these techniques (e.g., model bias, data ownership, ecological impact). Rather than a keynote, our wish is to provide space for participants to discuss presented works, to ask questions on technical specificities and to familiarise non-ML experts on technical aspects.
Hands-on session: (~90mn) We will provide demos of neural audio synthesis tools developed in the ACIDS team at IRCAM, and assist participants in installing and exploring them. Before the workshop, participants will be allowed to fill a short questionnaire to communicate on the tools they would like to explore. Different frameworks, from high-level (Max, VSTs) to low-level (Python, RaspberryPi, modular), may be explored depending on participants’ interests. Depending on the number of participants, groups may be created to focus on specific use cases of these tools.
Open discussion: (~90mn) We will invite participants to share their own experiences and critical perspectives toward neural audio synthesis. We expect that participants will contribute with their own issues on the topic. If required, we envision approaching the following themes and issues to start discussion:
- Creativity: What are the opportunities and caveats for musical expression raised by neural audio synthesis? (e.g. what can be defined as “new” sounds, “new” interactions with sound, and oppositely the risks of standardization in aesthetics)
- Design: How to include other communities of people in both interaction design and practice with neural audio synthesis-based tools? (e.g., non-ML experts, non-expert musicians, marginalised identities and communities)
- Critical perspectives: How to take responsibility toward ethical issues raised by neural audio synthesis? (e.g., aesthetic biases in generative models, ownership of training datasets, human and computational infrastructures required for ML)

In addition to the workshop, a tutorial will be made publicly available on the workshop’s website for NIME practitioners and researchers to prepare their participation, or to catch-up with what came out of the workshop.

This workshop aims to be opened for both new-comers and experienced users, with a particular attention on inclusivity and artistic aspects of neural audio synthesis. Participants will be provided on the support page a comprehensive list of articles, artistic tools, and tutorials to allow them to go further after the workshop if they feel any interest.

Short Description (up to 70 words)

This workshop intends to merge technical, musical, and scientific perspectives to discuss new tools and prospects opened up by neural audio synthesis. It will include an introductory state-of-the-art, a hands-on session with generative tools developed by our team, and an open discussion on creativity, design, and critical perspectives on neural audio synthesis. A website will gather all presentations, tutorials, and outcomes of the workshop.

Organizers

List of organisers including short bios

Mattia Bergomi - VEOS Digital

Mattia is currently principal investigator at Veos Digital, a company that operates in theoretical and applied artificial intelligence. Before, he was a postdoctoral fellow in the Systems Neuroscience lab at the Champalimaud Centre for the Unknown. There, he developed and applied machine learning algorithms. Mattia holds a Ph.D. in Computer Science with a specialization in Mathematics from the University of Milan and the Pierre et Marie Curie University in Paris.

Antoine Caillon - IRCAM

[email protected]

Antoine Caillon is a doctorate in the ACIDS group, and specializes in neural audio synthesis and real-time usage of deep generative models inside regular digital audio workstations. He regularly collaborates with composers to integrate the latest advances in artificial intelligence into their pieces.

Axel Chemla—Romeu-Santos - IRCAM

[email protected]

Axel Chemla—Romeu-Santos is post-doctorate in the ACIDS group, and specializes on creative applications of audio machine learning models and research & creation, with a strong focus on exploration and experimentation. He is also an electronic producer, composer, and instrumentalist.

Ninon Devis - IRCAM

[email protected]

Ninon Devis is doctorate in the ACIDS group, and specializes in embedded neural audio synthesis devices. She notably works on the NeuroRack, a modular synthesis module embedded with neural generative models. She is also DJ, electronic producer and sound designer.

Constance Douwes - IRCAM

[email protected]

Constance Douwes is doctorate in the ACIDS group, as well as a DJ musician and electronic producer. She specializes in neural audio synthesis with a strong focus on ecology and ethics.

Philippe Esling - IRCAM

[email protected]

Philippe Esling is professor at Sorbonne Université, and leads the ACIDS group at IRCAM. He specializes on computer science, audio/symbolical machine learning, with a particular focus on computational creativity, real-time interaction and embedding devices. He is also electronic producer, performer, and sound designer.

Maxime Mantovani - IRCAM

[email protected]

Maxime Mantovani is composer, performer, and improvisator. He is notably specialist at crafting innovative specific interfaces for audio synthesis, notably for using neural audio synthesis in real-time improvisation setup with instrumentalists.

Hugo Scurto - EUR ArTeC

[email protected] | http://hugoscurto.com

Hugo Scurto is a postdoctoral researcher, musician and designer. They completed a PhD in Machine Learning and Music Interaction at IRCAM, under the supervision of Frédéric Bevilacqua. They were also a visiting researcher at Goldsmiths University of London, working with Rebecca Fiebrink on the Sound Control action research project. Hugo’s research employs art, design, and science to craft, prototype, and diffract machine learning in an ecology of music. Their practice consists in creating and performing with learning machines that reveal and reshape our musical entanglements with our environments.

Preferred Length of Workshop

The workshop will be half-day (4 hours). We two sessions for different timezones (see schedule below). We are open to suggestions from the NIME organising committee to adapt the proposed schedule and increase inclusiveness.

Time Zone

UTC-8 (Los Angeles)

UTC+1 (Paris)

UTC+13 (Auckland)

First Session

23:00 - 03:00

08:00-12:00

20:00-00:00

Second Session

11:00 - 15:00

20:00-00:00

08:00-12:00

Technical and Space Requirements

This workshop is conceived as being fully online, hence requiring only a Zoom session (and maybe subdivisions if different groups are needed for the hands-on session).

Links to Supporting Media (optional)