Live-looping of distributed gesture-to-sound mappings

This thesis presents the development of a live-looping system for gesture-to-sound mappings built on a connectivity infrastructure for wireless embedded musical instruments using a distributed mapping and synchronization ecosystem. Following the recent trend of Internet of Musical Things (IoMusT), I ported my ecosystem to an embedded platform and evaluated in the context of the real-time constraints of music performance such as low latency and low jitter. On top of the infrastructure, I developed a live-looping system through three iterations with example applications: 1) a wireless Digital Musical Instrument (DMI) with live-looping and flexible mapping capabilities, 2) an embedded loop synthesizer, and 3) a software harp synthesizer/looper with a graphical user interface. My final iteration is based on a novel approach to mapping, extrapolating from using Finite and Infinite Impulse Response filters (FIR and IIR) on gestural data to using delay-lines as part of the mapping of DMI’s. The system features rhythmic time quantization and a flexible loop manipulation system for creative musical exploration. I release my tools as open-source libraries for building new DMI’s, musical interactions, and interactive multimedia installations.


Introduction 1.1 Repetition as an aesthetic
In his paper (Reinecke, 2009), David Reinecke tells how the German music group Kraftwerk, dissatisfied with the lack of repeatable control of the timbre and volume of acoustic drums, started to use music sequencers to trigger drum sounds for their 1978 album, Die Mensch Maschine. This music-making method became popularized through various genres within electronic dance music, such as house and techno. These genres are driven by electronic drums triggered by fixed-rate clocks accompanied by short repeated phrases that are either synthesized or sampled (Wright, 2017). The timing accuracy and precision is aesthetically desirable, as Mark Butler writes (Butler, 2014): [..] Mechanically precise repetition in this style is not a technologically induced aftereffect but rather a deliberately cultivated aesthetic strategy.
At the San Francisco Tape Music Center, composers Pauline Oliveros and Terry Riley also explored technology-driven repetition in music. In the 1950s, they did pioneering experiments with tape loop techniques and tape delay/feedback systems (Peters, 1996). These systems worked by stringing tape between two tape recorders and feeding the signal from the second machine back to the first, mixing incoming sound with the tape's previously recorded sound. Riley called this system the Time Lag Accumulator and used it in extended solo improvisations on the saxophone to create layers of short loops. Each layer would slowly fade away as the sound was attenuated before being re-recorded onto the tape. Later, digital looping devices reimplemented this concept. Digital memory replaced magnetic tape, and digital loopers are now available in much smaller form factors than magnetic tape recorders. These devices have become popular to form one-human-bands, i.e., a single musician takes on the role of a full band with drums, guitar, and vocals.

Digital Musical Instruments
In • One-to-one, a single control parameter is mapped to a single sound parameter.
• One-to-many (divergent), a single control parameter is mapped to multiple sound parameters.
• Many-to-one (convergent), multiple control parameters are mapped to a single sound parameter.
The mapping may be performed explicitly, where an instrument designer decides which control parameters should be mapped to which sound parameters. It may also be created with a generative mechanism such as a neural network that performs the mapping (Wanderley & Depalle, 2004).
A mapping may consist of multiple layers (A. , in which processing or cross-coupling of the control parameters is happening.

Mapping and loop-based music
Mapping has been used in the context of synthesis engines (A. , between physical models (Wanderley & Depalle, 2004), or audio effects (Verfaille et al., 2006). In these contexts, mappings are mostly focused on facilitating what Malloch et al. categorize as skill-based performance (Malloch et al., 2006). Skill-based performance is characterized by rapid, coordinated movements in response to continuous signals (Malloch et al., 2006). This type of performance often involves instruments with a high level of mapping transparency i.e., the link between a performer's gesture and the resulting sound is clear to both the audience and the performer. As Fels states (Fels et al., 2002): The more transparent a mapping is, the more expressive an instrument can be.
For musicians seeking the aesthetics of accurate and precise timing, this performance type requires a high skill level. On the other hand, existing tools for creating loop-based music such as music sequencers, samplers, and loopers offer beginners a low entry fee (Wessel & Wright, 2002). However, the control mapping of these tools is often opaque and challenging for the audience to infer. In this thesis, I will explore mapping in the context of loop-based music performance with the aspirational goal of creating instruments with a low entry fee and high mapping transparency.

Internet of Musical Things
The Internet of Musical Things (IoMusT) is an emerging research field bridging existing fields such as the internet of things, new interfaces for musical expression, and human-computer interaction (Turchet, Fischione, et al., 2018). An IoMusT ecosystem consists of three core components: 1) Musical Things, 2) Connectivity and 3) Applications and services. The term 'Musical Thing' refers to a computing device on a network capable of sensing, acquiring, processing, actuating, and exchanging data serving a musical purpose (Turchet, Fischione, et al., 2018). The computing device may be in the form of a smart instrument, i.e., an instrument with embedded computational intelligence, wireless connectivity, embedded sound generation and system for feedback to the player (Turchet et al., 2017), a musical haptic wearable, or any other device serving a musical purpose. Connectivity refers to the wireless communication infrastructure, which should meet music performance constraints such as low-latency, high reliability, and tight synchronization between devices. Applications and services refer to applications that can be built on top of the connectivity infrastructure. The applications may be targeted at both performers and the audience and maybe both interactive or non-interactive. In their article, Turchet et al. present several scenarios this ecosystem could support, such as augmented and immersive concert experiences.
The audience can experience multi-modal concerts through devices such as vibrotactile wearables and synthesizers with loudspeakers embedded in clothes. Smart instruments can capture ancillary gestures of performers (Turchet, McPherson, et al., 2018), and these can be mapped in real-time to deliver tactile stimuli to audience members. The opportunities that arise from these technologies are manifold, and this thesis seeks to follow the trend by developing tools suited for IoMusT applications (Vieira et al., 2020) to open for new explorations in collaborative and participatory music interaction.

Structure of this thesis
This thesis has five chapters: 1. Introduction.
2. Literature review of gestural looping, where I review several looping tools and list the project's design requirements.
3. Design overview and connectivity infrastructure, where I describe the porting of a mapping and synchronization platform for an embedded device.
4. Implementation of gesture-to-sound live-looping, where I describe three iterations of implementing a looper application.

5.
Conclusions and future work, where I discuss the perspectives on the work presented in this thesis.
Parts of this thesis are also presented in the forthcoming paper MapLooper: Live-looping of distributed gesture-to-sound mappings (Frisson et al., 2021).

Literature review of gesture-to-sound live-looping
In this chapter, several looping tools involving gesture-to-sound mappings are reviewed. The tools fall into two main categories: a) audio stream loopers with a mapping interface for controlling loop parameters and b) control data stream loopers where the looper itself can be considered a part of the mapping. The referenced tools exemplify different mapping strategies and gestural interfaces. A set of design requirements based on the review is listed at the end of the chapter.  (Vigliensoni & Wanderley, 2010). The performer is holding the actuators, and the ultrasonic sensors are mounted to a microphone stand.

SoundGrasp
SoundGrasp, is a live-looping system, also with a mid-air gestural control interface. The interface consists of a glove, which controls the recording/playback state and parameters for two audio effects, reverb and echo. A posture identification system is implemented with an artificial neural network,  (Mitchell & Heap, 2011). Gestures are recognized using a neural network. The identified postures are used as commands for controlling the looper. Sensor data is also mapped directly to audio effect parameters.

Control stream loopers
Looping devices based on control data streams are inserted between a control interface and a sound generator. From a mapping perspective, control stream loopers can therefore be seen as a mapping layer. Like audio stream loopers, control data is recorded into a buffer and played back in a loop. The control data could be in the form of MIDI or Open Sound Control (OSC) messages or analog control voltages (CV). Control stream loopers, being part of the mapping, have the advantage over audio stream loopers, that mappings can be changed post-recording, giving the possibility to route the control data to different synthesis processes. In this section, a number of control stream loopers is reviewed: the open-source project MidiREX (Kvitek, 2014), the commercial product Midilooper (Instruments, 2020), the mobile app Ribn (Petrovic, 2018), the Eurorack module Tetrapad (Intellijel, 2018), and the virtual-reality system Drile (Berthaut et al., 2010).

MidiREX and Midilooper
MidiREX Random modulation has become an increasingly popular feature of music sequencers as a tool for "humanization" (Rodgers, 2003). Cascone characterizes this trend as the era of "post-digital" music defined by the aesthetics of failure where musicians, as a means of expression, insert audible glitches into their music (Cascone, 2000). On the Midilooper, the random velocity feature, labeled "human velocity", can add dynamic variation to the recorded loops. Both the MidiRex and Midilooper have MIDI clock synchronization, and the Midilooper additionally features analog clock synchronization.

Ribn and Tetrapad
The two other looping tools, Ribn and Tetrapad work differently as the interface for control input is contained within the system. Both devices have a touch interface (see fig. 2 (Kvitek, 2014) and Midilooper by Bastl Instruments (Instruments, 2020). The MIDI protocol allows using any MPE-compatible gestural controller. or vertical gestures. Ribn runs on iOS and uses the touch screen on the mobile devices as its gestural interface. Up to eight sliders can be added to the screen simultaneously, and each slider sends a single MIDI control change message. The recording is started when the slider is touched and ends when the slider is released. The gesture's playback starts immediately after recording, and loop lengths cannot be changed after recording.
Tetrapad is a Eurorack module and has four dedicated touch interfaces that sense both position and pressure, allowing for two-dimensional gesture recordings. Loops can be synchronized externally by an analog clock signal. Tetrapad has eight control voltage outputs that can be patched to any parameter within a eurorack system. With the Tête expander module, recorded sequences can be quantized in both time and value, with the possibility of quantizing control voltage outputs to a selection of musical scales.  (Petrovic, 2018) and Tetrapad + Tête by Intellijel (Intellijel, 2018). Touch sensor is embedded in the interface.

Drile
Drile is a virtual reality-based live-looping system. A bi-manual 6-DoF controller is used to create loops and control audio effects in a 3D space. Like SoundCatcher, the controls are explicitly mapped. Unlike the other looping tools, Drile has both audio and control streams. The looping system is built on the concept of hierarchical live-looping, an alternative to traditional live-looping.
Hierarchical live-looping is based on a hierarchical tree structure for grouping loop layers, in contrast to traditional live-looping, where loop layers are arranged in a flat structure (see fig. 2.5).
This structure has the advantage that hierarchical musical structures can be represented and controlled in useful ways. For instance, a loop node can have children nodes for a bass-loop, a drum-loop, and a keyboard-loop. Triggering the root node causes all children nodes to start playback. Children nodes can be played independently to "solo" a particular node. Multiple trees of nodes can be created to represent different sections of a piece.

Summary
The reviewed projects share several common properties. A

Design requirements
The review above acted as a guide in determining the design requirements of this project. To summarize, the looper should: • be able to record and loop control stream data from different gestural controllers.
• have an open-ended gestural interface for the loop controls.
• support time quantization.
• be able to manipulate loops by random modulation.
• support both explicit and machine learning mapping strategies.
• support both traditional and hierarchical live-looping.
• be able to run on a wireless embedded device.

Design overview and connectivity infrastructure
To build applications for live-looping satisfying the design requirements that I elicited in section 2.4, I first need to develop a connectivity infrastructure for wireless mapping and synchronization. In this chapter, I describe the process of porting existing libraries for mapping and synchronization to a wireless embedded platform. Before diving into the details of the infrastructure implementations, I give an overview of the final instrument's design.

Design overview
The instrument designed in this project consists of several modules which can be combined in different hardware and software configurations. The modules are • Sensor Acquires gesture input.
• MapLooper Module for making mappings between sensor data and synthesis parameters (mapping). The module can playback earlier sensor data readings (looping). The playback is synchronized with other sequencers/loopers (synchronization).
• Synthesizer Produces sound output using a synthesis engine with input parameters.
The modules can be embedded within a single hardware device as seen in fig. 3.1. This is the stand-alone configuration of the instrument where no external systems are required to produce sound. The synthesis engine can be moved to a desktop computer for more powerful audio processing. Additionally, the mapping module can be moved to the computer, allowing for using any sensor hardware as the instrument's controller. The external synthesis and mapping configuration is seen in fig. 3.2. Finally, several instances of the instrument may be used together.
In this configuration, data (sensor data, sound parameters, and synchronization) is shared through the mapping module allowing for a modular approach to instrument building, where an instrument instance represents a subsystem of a complete instrument. The multi-instance configuration is illustrated in fig. 3.3. In all of these configurations, a transport to communicate data between modules is needed. The transport can be either wired or wireless. In this project, I chose a wireless transport to accommodate compatibility with the wireless DMI, T-Stick Sopranino (Nieva et al., 2018), and laptops without ethernet ports.

Mapping framework
To build a looper with advanced mapping capabilities, I chose the mapping software libmapper (Malloch et al., 2015) as the main building block. Among other software candidates, OSSIA (Celerier et al., 2015) also has advanced mapping capabilities, but OSSIA has more dependencies making it more challenging to implement on an embedded device.
libmapper is an open-source, cross-platform software library for making connections between data signals on a shared network. libmapper builds on a distributed approach to mapping,

Synchronization framework
For tight synchronization on wireless networks, I chose to use Ableton Link (Goltz, 2018). Ableton Link is an open-source C++ library for synchronizing tempo, beat, phase, and start/stop commands on wireless and wired networks. Like libmapper, Ableton Link is a peer-to-peer technology and uses multicast for the discovery of peers. The distributed approach allows everyone to change the tempo in a session, removing the main/secondary configuration step of technologies such as MIDI beat clock or OSC sync (Madgwick et al., 2015). Additionally, having phase synchronization makes Ableton Link more capable than the MIDI beat clock protocol, which only synchronizes tempo and sends start/stop messages. Finally, Turchet et al. mention Ableton Link as a candidate for becoming a standard for music synchronization for IoMusT devices (Turchet, Fischione, et al., 2018). Ableton Link is supported on Windows, macOS, and Linux.

Embedded platform
For the wireless embedded platform, I chose the microcontroller ESP32. Another viable platform, Raspberry Pi Zero W, could also have been used, but as the ESP32 is small, cheap, and powerful enough for digital signal processing (Michon et al., 2020), this was the chosen platform. Other projects, from and with close collaborators, such as the DMI T-Stick Sopranino (Nieva et al., 2018), the 1-DOF rotary force-feedback device for DMI's TorqueTuner (Kirkegaard, Bredholt, Frisson, et al., 2020), the algorithmic sequencer T-1, , also employ an ESP32, making these projects compatible with the mapping and synchronization Deallocates the memory allocated by getifaddrs.

libmapper-arduino
To add libmapper support for the T-Sticks Sopranino DMI, I implemented an Arduino version of the libmapper library. Arduino libraries are distributed as source and header files and a text file containing metadata for the library. The Arduino library specification ("Library Specification -Arduino CLI", 2020) has strict directives on structuring the library's source files. The source file structure of libmapper and liblo was not compatible with these directives. According to the specification, a library could alternatively be distributed as a precompiled static library.
This solution was more flexible than the specification's directives on source code structuring, so a build system was written that generated the Arduino library as a collection of header

Testing
After the porting was completed, I measured round-trip latency, jitter, and package loss of signals sent using the library. The test setup consisted of an ESP32 WROVER KIT development board (Espressif, 2020a) running the libmapper-esp library. In the firmware on the ESP32, an input and an output signal were created. The input signal handler was set to forward incoming data to the output signal. A test software was made that ran on a MacBook Pro (16-inch, 2019) running macOS 10.15. The software sent a 100 Hz signal to the ESP32, and the time between sending and receiving the data was measured. The ESP32 was running in access-point mode, i.e., it created an access point, and the computer was connected to this access-point through WiFi. When measuring WiFi applications' performance, the results can be highly affected by the environment (Grigorik, 2013). WiFi provides no bandwidth and latency guarantees, and the activity of nearby WiFi traffic can have a high impact on the performance (Grigorik, 2013). I did the test in a low-density rural area (Gudmindrup strand, Denmark) so that minimum disturbance could be expected. Three more measurements were taken at increasing rates for testing latency, jitter, and packet loss P L at different signal rates. A histogram of the results is seen in fig. 3.8. I found that the system has a significant packet loss for signals at 500 Hz. For signals at 1000 Hz, the packet loss is substantial, with 55% of packets being dropped. The jitter also increases with frequency, which 1 I found that the ESP32 has a WiFi power-saving feature enabled by default. Disabling this feature had a significant impact on low latency performance. I did measurements with power-saving, both enabled and disabled. The mean of the round-trip latency with power saving enabled was 406 ms.
can be observed in the increase of the standard deviation of the latency listed in table 3.3. There is no significant change in the mean latency for signals at 100 Hz and 200 Hz; for signals at 500 Hz, the latency increases by a factor of 3. Signals at 500 Hz and 1000 Hz had similar performance in terms of latency and jitter, but the packet loss increases from 7.8% at 500 Hz to 55% at 1000 Hz (cf. Table table 3.3).

Result evaluation
This project's results for the embedded libmapper implementation were slightly better than previous studies by Wang et al. (Wang et al., 2020), who conducted tests of latency and jitter with OSC communication over WiFi using ESP32. The study measured a mean 6.62 ms round-trip latency, which is slightly higher than the 4.81 ms measured in this project. The difference can be attributed to the computer being connected through an access point created by the ESP32.
Wang et al. found that there are several issues with using WiFi for DMI performances. First of all, the mean end-to-end latency increases as the number of devices increases. For instance, it was found that for 12 devices sending messages at 100 Hz, the latency was 16.45 ms compared to 6.62 ms for a single device. For a single device, though, this is well below the upper bound of 10 milliseconds on the computer's audible reaction to gesture, proposed by Wessel and Wright (Wessel & Wright, 2002). Additionally, they found that the latency varies significantly with the amount of WiFi traffic in the environment.
They conclude that WiFi has been demonstrated to work well under certain conditions, but for timing-critical applications such as DMI's usage, wired connections are preferred to WiFi. This issue is further discussed in section 5.1.

Ableton Link
This section describes the Ableton Link library's technical details and the components of the Ableton Link port for ESP32. Ableton Link works by establishing a global host time, which works as a shared reference between all peers in a session. The global host time works as a shared timeline from which the number of elapsed beats can be calculated. Peers joining a session uses ping-pong messaging to calculate their offset to the global host time. Throughout a session, peers measure their offset periodically to stay synchronized.

Ableton Link API
The main component of the Ableton Link API is the SessionState. This component holds the global Ableton Link session state, i.e., the global timeline, the tempo, and the playback state.
When using Ableton Link within the DSP loop of music applications, real-time safety must be guaranteed to avoid audio buffer underruns. However, the system API calls through which Ableton Link connects to the network are not real-time safe. Therefore, a copy of the session state is saved such that the audio thread can access Ableton Link without waiting for the system calls.
Furthermore, the session state copy cannot be protected by locks as this would also compromise real-time safety. Therefore, Ableton Link has two methods for capturing the session state, one for the audio thread and another for the remaining threads. These methods are listed in

Platform modules
Ableton Link relies on the C++ Standard Template Library (STL) and Asio C++ Library (Kohlhoff, 2020) for cross-platform networking. ESP-IDF supports the STL, and Asio has been ported to ESP32 by Espressif (Cermak, 2020

Context
The Context module allows for the asynchronous operation of Ableton Link. The module was implemented based on FreeRTOS tasks. The Context module implements a task that repeatedly calls the poll() function of an Asio io_service, which handles the network communication.
Other modules can call the async() function of the io_service, giving a function handle as the argument. The referenced function will be executed within the task of the Context module.

Fig. 3.12
The ESP32 implementation of the LockFreeCallbackDispatcher module.

Random
The Random module generates a random identification string for the peer. The ESP-IDF provides the system API function esp_random() that samples noise from the WiFi radio to generate a truly random number. As the identification string needs to be unique for each reboot of the ESP32, this function provides a suitable implementation.

ScanIpIfAddrs
The ScanIpIfAddrs module retrieves information about the available network interfaces on the system. The ESP-IDF provides the ESP-NETIF API, an abstraction layer for network interfaces on ESP32. This abstraction layer allows Ableton Link to work with WiFi, Ethernet, or a custom interface implementation (e.g., serial over USB) through the ESP-NETIF Custom I/O Driver. The implementation of ScanIpIfAddrs iterates through all available interfaces, checks if the interface is enabled, retrieves the interface's IP address, and stores the address in a std::vector. In some cases, which seemed to be when the WiFi connection is weak, the retrieved IP address was "0.0.0.0", which caused the system to crash. I implemented a check such that this address was not accepted. With these modules implemented, Ableton Link was successfully compiled and ran on ESP32.

Testing
To test the port of Ableton Link, I created a test setup for measuring the delay between peers. The test setup consisted of two computers (MacBook Pro (16-inch, 2019) and MacBook Pro (15-inch, 2018), both running macOS 10.15) and an ESP32, all connected to a RIGOL DS1054 oscilloscope.
Two probes were connected to an audio jack from the headphone output of each of the computers.
The probe for the ESP32 was connected to a GPIO. The computers synthesized a pulse signal through Ableton Live (Ableton, 2020). The ESP32 ran a test software outputting a pulse on a GPIO. All devices were connected through an Ableton Link session and outputted a periodic pulse on every quarter note at 120 BPM. A plot of the measurements is seen in fig. 3.15. I found that the ESP32 performs similarly to the two computers in terms of inter-onset delay. The minimum, maximum, and average delay between the ESP32 and Computer 1 is seen in table 3.6. The test lasted 10 minutes, and the average delay between Computer 1 and ESP32 was 3.03 ms. Computer 1 Computer 2 ESP32 Fig. 3.15 Oscilloscope measurement of a Ableton Link session consisting of two computers and an ESP32. All peers output a pulse signal at every quarter note at 120 BPM.

Summary
This chapter described the porting of libmapper, a library for mapping, and Ableton Link, a library for synchronization, to the wireless microcontroller ESP32. Both libraries were tested on the microcontroller for performance, focusing on relevant parameters for real-time music performance.
For libmapper, a round-trip latency of 4.81 ms for signals at 100 Hz was measured. For Ableton Link, an average inter-onset delay of 3.03 ms was measured. The round-trip latency is well below the upper bound of 10 milliseconds on the computer's audible reaction to gesture, proposed by Wessel and Wright (Wessel & Wright, 2002). The inter-onset delay was comparable to devices using the existing Ableton Link implementation for desktop computers. The libraries formed the connectivity infrastructure for building the application described in the next chapter.

Implementation of gesture-to-sound live-looping
This chapter describes the implementation of our gesture-to-sound looper using the previous chapter's connectivity infrastructure. The looper was implemented through three iterations. Two

Early prototypes
This section describes the implementation of two early prototypes. The initial prototype extended the T-Stick DMI by integrating an MPE-based looping tool into the firmware. The looping tool was extended in the second prototype to handle arbitrary signal types.

T-Stick looper
The T-Stick (Malloch & Wanderley, 2007) is a family of DMI's consisting of a long plastic tube incorporating sensors for motion and touch. Specifically, a T-Stick has copper touch strips for capacitive touch sensing, an inertial measurement sensor (an accelerometer or, more recently, a 9-DOF IMU), a long force-sensing resistor, and a piezoelectric sensor. The T-Stick Sopranino is seen in fig. 4.1 it straightforward to prototype mappings for a music performance with the looper (see fig. 4.3).
MPE specifies the three dimensions of control as the following types of MIDI messages: By assigning a MIDI channel to each Note on/off message, these messages can be sent on the same MIDI channel, thereby achieving polyphonic control of pitch and timbre of individual notes.
The T-Stick looper sends MIDI messages to the VST-host over Bluetooth using the MIDI over Bluetooth Low Energy (BLE-MIDI) specification (The MIDI Manufacturers Association, 2015).
The BLE-MIDI implementation allowed the T-Stick looper to be used with both desktop and mobile platforms.

Implementation
The implementation of the T-Stick looper involved several steps. First, I ported libmapper to the ESP32 as described in section 3.5.1. The port allowed the integration of libmapper into the firmware for the T-Stick Sopranino (Nieva et al., 2018 Second, I implemented a looping module consisting of 3 loop layers, each with a 2-bar long buffer. I integrated libmapper into the module and added an input signal for each layer. The input signals were sampled at a rate synchronized through Link. In the music performance, Link was used to synchronizing the looping module with a pre-recorded drum sample played back on a computer. As the VST synthesizer that I used only created sound after receiving a MIDI Note on message, I added a note trigger to the module. The note trigger sent a note on message at the beginning of each loop, re-triggering the synthesizer.

Performance mapping
For the performance fig. 4.3, I created a mapping from the T-Stick looper to the VST synthesizer.

Advantages and limitations
The T-Stick looper worked well as a quick prototype system for testing different mappings to VST plugins. However, the MIDI resolution of the 7-bit message was limiting for musical expression, and the BLE-MIDI interface only allowed the output to be routed to a single device. Also, the looper was limited to 3 layers due to the adherence to the MPE standard. Several workarounds such as adding more voices or adding control change layers were considered, but it was deemed that a more general solution would integrate better with libmapper.

Hash table sequencer
To overcome the limitations of the T-Stick looper, I created a second prototype. This prototype was able to record any number of layers and supported 32-bit floating-point resolution. Additionally, the sequencer used a more open-ended data representation, inspired by the data translation approach of libmapper (Malloch et al., 2015). A Signal class was created as a simple wrapper around libmapper signals to be used as a reference during playback. When the sequencer encountered a Frame with a value that needed to be updated, the string was used to look up the right signal for updating its value through libmapper. As with the T-Stick looper, the input signal values were sampled at a rate synchronized with Link. In the block diagram fig. 4.4, this is represented by the Clock class.

Advantages and limitations
The hash table sequencer added a more flexible data representation and better resolution than the T-Stick looper. The prototype's main limitation was that it did not support signal vectors and signal instances. Support for signal vectors could readily be added, but support for signal instances would require a new implementation of the functionality already existing in libmapper.

Final implementation
After developing the first two prototypes, I found that more complete integration with libmapper was possible by implementing the loop mechanism within the map expression system of libmapper.
The theory behind the final implementation, which I called MapLooper, is explained in the following sections. The MapLooper software is available on GitHub at https://github.com/mathiasbredholt/ MapLooper.

Looping with a delay-line
As mentioned in section 3.2 libmapper has support for FIR and IIR filtering of signals. Discretetime systems can be implemented as part of mappings by entering the difference equations into the map expression, allowing filtering techniques such as low-and high-pass. A digital delay-line is a special case of IIR filtering. By adding feedback to the delay-line, a digital looper can be built.
Extrapolating from the existing libmapper support for FIR and IIR filtering to delay-lines with feedback is the core idea behind the final implementation.
The discrete-time system implementing a digital looper can be expressed in terms of a linear interpolation between an input x[n], and a delayed output term y[n − D], with the linear interpolation factor representing a record signal r[n] A block diagram of this system is seen in fig. 4.6. For most live-looping devices, the record/playback state is binary, and the signal r[n] is an integer, that is, either 0 or 1. When r[n] = 0, only the delay-line output is passed to the system's output. When r[n] = 1, the input is passed directly to the output and into the delay-line, thereby being recorded. For 0 < r[n] < 1, a sort of overdub feature can be achieved as the input is mixed with the delayed input. However, at the time of implementation, delay-lines in libmapper were non-interpolating in terms of delay-length, and hence the delay-length was integer only. I solved this issue by sampling the input at a rate given by an integer subdivision of the tempo, ensuring that delay-lengths were always an integer multiple of the loop-length in beats. A sample-and-hold structure was added to the system to implement tempo-synchronized sampling. A clock signal c[n] synchronized with the tempo triggers a sampling of the input signal x [n]. The rate of the clock determines the quantization. This rate is commonly given for analog synchronization systems in the unit pulses per quarter note (PPQN). In the final implementation, the clock is synchronized with Ableton Link. A block diagram of this system can be seen in fig. 4.7.

Loop manipulation
As specified by the design requirements (section 2.4), a system for manipulating recorded sequences was added. A simple modulation system based on the sample-and-hold structure was implemented. The modulation source was a uniform noise generator, sampled at the same rate as the input.
The modulation signal was added to the system within the feedback path, with the result that an input sequence could be recorded, after which modulation could be applied to make the sequence slowly 'evolve' over time. A block diagram of the system can be seen in fig. 4

Implementation details
The  The record signal represents the r[t] signal in fig. 4.6. The length and division signals determines the length D of the delay-line in fig. 4.6 by the relation The modulation signal represents the m[t] signal in fig. 4.8. The mute signal was added to control whether the output from local/recv propagates to the output signal. When the Loop instance is initiated, a convergent map is created between the control signals, the local/send, and the local/recv signal. A map expression is created for the map, describing the system in fig. 4.8. In the Loop class update method, the input is sampled at a rate synchronized with Link. The sampled value is sent to the local/send signal, and the map expression is evaluated. Finally, if the Loop instance is not muted, the value of the local/recv signal is copied to the output signal. By mapping a gestural controller to the input signal and a sound generator to the output signal, a DMI with looping capabilities can be implemented.

Local signals
The local signals' purpose is to control the sampling, propagation, and life-time of the map. In libmapper, maps are updated when an input signal is updated. Without a local signal, the gestural controller would be responsible for controlling the sampling rate, which is undesirable as the looper should control the time quantization. Additionally, the local signals allow the Loop class to control whether the signal propagates to the output. Finally, in libmapper, maps are destroyed when one of their signals is removed, removing the map's buffer. The local signals prevent the map from being destroyed when devices go online and offline throughout a session.

Memory requirements
The libmapper API supports 3 data types for signals. These data types are seen

Auto-mapping
During the development, it was found useful to create default mappings for the example applications automatically. A simple auto-mapping system was created for setups, where the mapping between signals on multiple physical devices was needed. For each input and output signal of Loop, a function taking a string as an argument was created. The function creates a subscriber to the libmapper graph that looks for newly found signals. When a signal is found, its name is compared to a string provided as the argument. If the strings are equal, a map is created. This functionality makes it possible to create setups with default mappings to other libmapper-enabled devices.

Graphical user interface
For testing the implementation with a user interface, a cross-platform GUI application for desktop was created. The GUI is based on the JUCE framework ("JUCE: class index", n.d.) and contains six sliders and a button as seen on fig. 4 A musical demo of using GUI was made by mapping the output signal to a harp synthesizer implemented in SuperCollider. The harp synthesizer was based on a Karplus-Strong string model. It had two input signals, frequency and amplitude. These controlled the frequency and amplitude of the string model. The frequency input was quantized to a melodic scale within the synthesizer, and a slope detector on the quantized frequency triggered the string excitation. The result was that when moving the input slider, melodic notes were triggered along with the range of the slider. The interaction had a similar feel as when sliding fingers over the strings of a harp. The SuperCollider code for the demo is seen in fig. 4.13.

Embedded sound synthesis
I also created a proof-of-concept demo of using the looper with embedded sound synthesis for implementing the stand-alone configuration as described in section 3.1. The demo was based on the ESP32 LyraT board (Espressif, 2020b), which contains an ESP32 WROVER module and an audio codec chip along with 1/8 inch TRS connectors for headphones and auxiliary audio input. An image of the board can be seen in fig. 4.14. The demo project, which is available at https://github.com/mathiasbredholt/MapLooper-faust uses the Faust (Orlarey et al., 2009) library for compiling a DSP program to the LyraT board, which is supported by the Faust compiler (Michon et al., 2020). The DSP program used in the demo is listed in fig. 4.15. The program generated pink noise and passed it through a Moog-style voltage-controlled filter emulation. The program had three parameters, cutoff frequency, and resonance of the filter, and gain of the output. A Loop was created for each parameter mapped to a libmapper signal that updated the parameter when receiving a value. A random number generator sent an input signal to each of the loop layers. The recorded signal was 1.0 when the program started and was set to 0.0 after 10 seconds. The program continued indefinitely, repeating the same 1 bar sequence. A block diagram of the demo program is seen in fig. 4.16.

Testing
For verifying and testing the final implementation, I created a testing software for controlling and logging signals. The software created an instance of Loop with a loop-length of 2 beats. A test signal, a single ramp, was passed to the input of the loop. The record signal was set to 1.0 for the duration of the loop. At the beginning of the new cycle, the record signal was set to 0.0. After the next cycle, the modulation signal was to 0.1, and the noise modulated the ramp. After two cycles, the modulation signal was set to 0.0, and the cycle repeated itself unchanged until the program ended. The test was run with two different time quantization levels, 16 PPQN, and 2 PPQN. All signals were logged and saved to a file. Plots of the tests can be seen in fig. 4.17.

Advantages and limitations
The final implementation had several advantages over the early prototypes. First, the solution was more scalable, as the implementation through map expressions added support for signal vectors and signal instances. All libmapper signal types could also be used for control stream looping, allowing for looping integer sequences. Also, the map expression interface allowed flexible mapping configurations for loop manipulation. The random modulation implementation could be changed to use any modulation signal by merely changing the map expression.
One limitation compared to the hash As the map expression delay references the previous sample, it needs to be updated once for every time quantization step. If an update is skipped, the phase is skewed relative to the timeline of Link.
The phase skew could be solved by driving the read pointer with Link, but this is not currently possible with the libmapper API. Finally, using local maps is a workaround that might confuse  Plot of signals from test and verification software. The y-axis represents the unit-less signal value. The plot at the top has a time quantization of 16 PPQN. The plot at the bottom has a time quantization of 2 PPQN. After 4 beats, the modulation signal is set to 0.1, and the output signal is modulated by uniform noise. After 8 beats, the modulation signal is set to zero, and the output signal is repeated. all map updates could be controlled by a clock source such that the update happens at regular intervals. If maps could persist when their signals are removed, rerouting loops' outputs to new synthesis processes would be possible.

Summary
In this chapter, I have described the implementation of a gesture-to-sound looper built on the infrastructure presented in chapter 3. Two early prototypes were built, the first based on MPE message streams, and the second based on hash tables, before settling on a final implementation based on a delay-line model using libmapper map expressions. I have discussed the advantages and limitations of each iteration and verified the final implementation. Finally, I have presented several musical applications built with the tool: 1) A looper integrated into the T-Stick DMI, 2) an embedded synthesizer generating looping sequences, and 3) a SuperCollider harp synthesizer/looper with a graphical user interface.

Conclusions and future work
I have presented the development of a live-looping system for gesture-to-sound mappings built on a connectivity infrastructure for wireless embedded musical instruments using a distributed mapping and synchronization ecosystem. I ported my ecosystem to an embedded platform and evaluated in the context of the real-time constraints of music performance such as low latency and jitter.
On top of the infrastructure, I developed a live-looping system through three iterations with example applications. This chapter will discuss perspectives on the work described in this thesis and comment on what could further improve the project.

Scalability of WiFi for music interaction
I have implemented a connectivity infrastructure and a gesturee-to-sound looper application for wireless embedded devices. With these tools, DMI's and other musical applications for loop-based music can be created for expressive collaborative performances. The scalability of these applications depends on the scalability of WiFi for real-time musical applications.

Compensating latency
In the case of this project, some factors remedy latency issues. As gestures are recorded through libmapper, all samples are time-tagged, which means that latency could be subtracted during playback to achieve accurate timing during playback. Such a system would require peers to continuously measure the latency between them, which could be implemented by periodically sending a heartbeat signal between peers and keeping a record of each peer's round-trip latency.
This idea is similar to how host time offsets are handled with Link (see section 3.5.2).
In section 3.5.1, it was found that the libmapper implementation on the ESP32 can maintain a signal rate at 200 Hz, which Wanderley and Depalle describe as a typical gestural acquisition sampling frequency (Wanderley & Depalle, 2004). At frequencies above 500 Hz, the implementation had significant reliability issues. However, when using rhythmic time quantization as an aesthetic strategy, the need for high signal rates diminishes. Besides, each peer can acquire the gestural data at a higher sampling rate for local usage while only sending quantized data to the network. Finally, for applications where these constraints are too limiting, a wired version of the mapping framework can be realized using ESP32 boards with Ethernet connection such as ESP32-POE (Olimex, n.d.).
These boards also provide power through Ethernet (POE), removing the need for a battery.

Visual and haptic feedback
When recording sequences in a looper, the instantaneous feedback is lost when the recording is finished, as the auditory feedback no longer corresponds to the physical gesture currently being held. For recordings of a single loop layer, the GUI implemented in section 4.2.6 provides visual feedback through a slider that displays the looper's current output value. For more complex mappings, such as the one made for the T-Stick looper in section 4.1.1, where three layers are being recorded simultaneously, the current system provides no feedback on what has been recorded.
For short loops, with a duration ranging from 200 milliseconds to six seconds, feedback may be of less importance, as the precognitive sensory information is maintained in our echoic memory, which for audio and visual stimuli, appears to last in this range (Brower, 1993). For longer loops, feedback could potentially serve as a guide for the performer in expressing a musical idea. This feedback could be in the form of visualization on a screen, displaying multiple recorded sequences simultaneously. With the distributed design of libmapper, such a system could be implemented by mapping the output of loop devices on the network to a computer running a visualization software. This idea is similar to the workings of the mapping visualization tool WebMapper (see section 3.2). The loop visualization tool could even be developed as an extension of WebMapper, taking advantage of existing work, and contributing to the development.
Additionally, feedback could be given in the form of haptic feedback. The TorqueTuner project, which I have contributed to, (Kirkegaard, Bredholt, Frisson, et al., 2020), uses libmapper for changing meta-parameters of haptic effects. A loop could be mapped to display a force related to a recorded sequence.
Also, the Vibropixels project (Hattwick et al., 2017), a wearable wireless vibrotactile display system, could be used to display a recorded sequence or to give cues on the loop-points similar to SoundCatcher (see section 2.1.1).

Improvements
Several things could be done to improve the project further and explore gesture-to-sound mapping and looping.

Mapping strategies
First, only explicit mappings strategies were explored in the example applications. Interesting loopers could be built using mapping strategies with artificial neural networks and other machine learning algorithms. Also, hierarchical live-looping, as mentioned in the review of Drile section 2.2.3, could be interesting to implement for scenarios with many loop layers. Here, a hierarchical mapping structure could help with controlling many parts simultaneously. Hierarchical live-looping would be straightforward to implement using the WebMapper node visualization seen in fig. 3.4.

Multiple read pointers
Other improvements to the looper, such as multiple variable-speed read pointers, could be implemented to explore new looping techniques inspired by multi-tap delays and granular timestretching audio effects. Here, the support for signal instances could be used, such that each signal instance represents a read pointer. A single loop layer could control several voices by mapping the instances to voices on a polyphonic synthesizer, adding variations on a micro time-scale. Also, non-constant time quantization could add the shuffle effect popular on many drum machines and featured on the Midilooper (see section 2.2.1).

Run-time implementation
Several opportunities would arise if the suggested API changes to libmapper for allowing persistent maps and clock synchronized signal updates were implemented. For example, the looper implemented here could be implemented with libmapper during run-time through map expressions.
Implementation during run-time would increase the availability to users, as loopers could be created from different libmapper front-ends such as WebMapper. For users of the computer music systems, Max/MSP, Pd, and SuperCollider, custom sequencers and loopers for digital laptop orchestras could be built using the existing libmapper bindings (and the SuperCollider extension developed in this project) to create distributed loops shared among orchestra members. For synchronization, Link could be added as a libmapper device and used as a clock source for loopers.

Availability
Another way to increase availability would be to package MapLooper and Link as an Arduino library for ESP32 similar to the libmapper Arduino library developed in this project. Tapping into Arduino's ecosystem would make the tool available to the maker community, allowing more people to use it.

Embedded platform advancements
Announced in 2019, Espressif Systems has released a new SoC, ESP32-S2 (Espressif, n.d.), which adds several new features to the ESP32 platform, some of which are relevant for this project. The ESP32-S2 has USB connectivity, which would allow wired applications of libmapper using Ethernet over USB. Within recent years, computer manufacturers have removed Ethernet ports in favor of USB ports, and therefore in situations where the constraints of WiFi are too limiting, having a USB solution for embedded libmapper would be an advantage for users. The chip manufacturer claims better WiFi performance and stability for the ESP32-S2 compared to the ESP32. This project's measurements should be repeated for the ESP32-S2 to verify these claims. Additionally, the ESP32-S2 has a feature for measuring the time of flight of WiFi packets. This technology can be used for indoor geolocation, which would be interesting to explore for mapping with dance performances, interactive installations, and participatory art.