Skip to main content
SearchLoginLogin or Signup

HyperModels - A Framework for GPU Accelerated Physical Modelling Sound Synthesis

Published onJan 22, 2022
HyperModels - A Framework for GPU Accelerated Physical Modelling Sound Synthesis
·

Abstract

Physical modelling sound synthesis methods generate vast and intricate sound spaces that are navigated using meaningful parameters. Numerical based physical modelling synthesis methods provide authentic representations of the physics they model. Unfortunately, the application of these physical models are often limited because of their considerable computational requirements. In previous studies, the CPU has been shown to reliably support two-dimensional linear finite-difference models in real-time with resolutions up to 64x64. However, the near-ubiquitous parallel processing units known as GPUs have previously been used to process considerably larger resolutions, as high as 512x512 in real-time. GPU programming requires a low-level understanding of the architecture, which often imposes a barrier for entry for inexperienced practitioners. Therefore, this paper proposes HyperModels, a framework for automating the mapping of linear finite-difference based physical modelling synthesis into an optimised parallel form suitable for the GPU. An implementation of the design is then used to evaluate the objective performance of the framework by comparing the automated solution to manually developed equivalents. For the majority of the extensive performance profiling tests, the auto-generated programs were observed to perform only 6\% slower but in the worst-case scenario it was 50\% slower. The initial results suggests that, in most circumstances, the automation provided by the framework avoids the low-level expertise required to manually optimise the GPU, with only a small reduction in performance. However, there is still scope to improve the auto-generated optimisations. When comparing the performance of CPU to GPU equivalents, the parallel CPU version supports resolutions of up to 128x128 whilst the GPU continues to support higher resolutions up to 512x512. To conclude the paper, two instruments are developed using HyperModels based on established physical model designs.

Author Keywords

NIME, GPU, High-Performance, Physical Modelling,

CCS Concepts

  • Computing methodologies → Computer graphics → Graphics systems and interfaces → Graphics processor

  • Applied computing → Arts and humanities → Sound and music computing

  • Computing methodologies → Modeling and simulation

Introduction

In the 1950s, digital audio synthesis was beginning to gain traction with components such as digital oscillators, filters and stored lookup tables [1], these were used to generate sound, and later, to build synthesis techniques including AM and FM [2]. These simple techniques are often highly computationally efficient [3] [4] in order for them to be used in real-time applications at times when the available computation was very limited by today's standards [5]. Furthermore, these methods are considered `abstract' as they do not directly associate with a physical interpretation. Physical modelling methods contrast with abstract synthesis as they are built on direct interpretation of physical phenomena. Although a broad field of physical modelling methods exists, the direct numerical physical models are the most authentic forms that directly simulate the vibrations through a discretised mathematical representation [6]. Direct numerical physical models simulate an environment by approximating vibration values through N-dimensional space and time. Increasing the resolution of the simulated space has several advantages, including: improved accuracy, more stable simulations and the space to create more sophisticated instruments. However, increasing the resolution proportionally increases the computation required to run simulation. Considering the strict real-time requirements of audio synthesis [7] [8], the usefulness of these methods has been heavily restricted. But with the modern advancements in computer systems, physical modelling synthesis is seeing a possible resurgence [9]. For example, many academics involved in the Next Generation Sound synthesis (NESS) collaboration project believe physical models will play an important part in the future developments of sound synthesis. In 2015 and 2016, the NESS project published dozens of papers and demonstrations related to physical modelling audio synthesis and processing [10], including thorough experimentation and discussions of utilising GPU acceleration for physical modelling sound synthesis [11][12][13]. The collective output from the NESS project highlights the benefit of increasing data throughput using the GPU. For example, in [14], GPU acceleration was used to improve offline processing of room acoustics and, was shown to improve performance by 46×46\times over the equivalent serial CPU version. However, they also address the issues of processing various physical modelling techniques in parallel - particularly the incompatibility of specific mathematical methods for parallelisation [15]. For instance, iterative methods like the Newton Raphson [16] for solving implicit schemes inherently involve serial stages that limits some of the parallel capability of the GPU. The considerable literature on GPU accelerated physical modelling is mainly used for advanced, non-linear systems that are processed offline. However, simple linear systems are well suited for the GPU parallel architecture and for meeting real-time performance. There is a gap in the literature here to comprehensively explore the application of GPUs for real-time physical modelling synthesis for linear systems. Some recent designs have been presented within the context of the CPU and implemented in real-time. For example [17][18][19][20][21][22]. However, these are often restricted to one-dimensional or low resolution two-dimensional models. Research such as [23] demonstrates that the GPU can be used to support higher resolution physical models in real-time with significantly greater scale than on the CPU [24]. Zappi’s proposed design requires extensive knowledge of not only physical modelling synthesis, but the graphics domain and GPUs. Therefore, removing the complications of the GPU architecture and automating the GPU acceleration of physical modelling could expand the designers’ accessible synthesis sound space. The GPU is a nearly-ubiquitous massively parallel processing unit that can now be accessibly programmed for general computation because of the modern general-purpose GPU (GPGPU) architecture [25]. GPGPU has been successfully used for processing a range of numerical and scientific computation techniques like molecular dynamics [26]. A well-known and successful example of this is the folding@home project [27], which reported a speedup of 20 - 30X when accelerated on the GPU. The linear finite-difference based numerical methods used for physical modelling synthesis are often described as "embarrassingly parallel" [28], meaning they can be efficiently mapped to the parallel GPU architecture. Previous investigations in [29] demonstrated that for a particular physical model on a modern desktop, the CPU could support two-dimensional models up to 64x64 in real-time. The equivalent approach on the GPU was shown to support resolutions as high as 512x512 in real-time.

In this paper, we present HyperModels, a framework for describing high-resolution, linear physical models that utilise GPU hardware acceleration for real-time synthesis. This framework aims to improve the accessibility of real-time physical modelling to developers for interaction and performance. This could lead to unforeseen musical expression that artists can explore beyond the technical achievements that non-linear physical models can achieve offline [30] [31]. Instruments are described using high-level descriptions of the physics equations and an instrument's shape, e.g. the strings or membrane, that are automatically translated into optimised low-level code that utilises the real time capabilities of modern GPUs. Our approach enables the instrument designer to focus on the sound design aspect of a new instrument, without necessarily requiring the advanced low-level architecture and programming knowledge often required to access parallel GPU programming. An example covered later in Section Instrument 1: Hyper Drumhead is a physically modelled drum instrument. Image 1 shows the screenshot of the GPU optimised application that is formed using the HyperModels framework when provided with the physics equations and vector graphic description in Image 2.

Image 1

Screenshot of Hyper Drumhead application.

Image 2

Vector graphics and physical model description for u and v in system a.










The rest of this paper provides details of the HyperModels framework as a design and implementation, including an analysis of how it performs compared to equivalent hand-written GPU simulations. To demonstrate the system in practice, two existing designs based on physical modelling instruments are implemented using HyperModels. The first is the Hyper Drumhead, previously demonstrated by Zappi et al. in hand optimised GPU code [23][29]. The second is a variant of Willemsen et al. hammered dulcimer [21], that originally operated with a plate model of 17x6 points running on the CPU, this has been ported to the GPU using HyperModels to support resolutions up to 256x256. The remaining sections of the paper are structured as follows:

Numerical Physical Modelling

One of the most foundational numerical methods used for approximating derivatives is the finite-difference method. Although it has some drawbacks compared to other methods, such as the difficulties handling curved boundaries [32], it is an effective method for digital audio synthesis and is inherently suited to parallelisation. These numerical methods date back further than the earliest examples of digital audio synthesis, with its origins in engineering and general physics [33]. The adoption and research of numerical methods for sound synthesis have only recently become practical outside of a theoretical context because of the abundance of computational power that is available in modern personal computing devices.

Simple Harmonic Oscillator

The simple harmonic oscillator is one of the core building blocks of digital audio synthesis. A simple harmonic oscillator can also be formed as a physical model using an ordinary differential equation (ODE). An ODE is a differential equation containing one or more functions of one independent variable and the derivatives of those functions [34]. The ODE for the simple harmonic oscillator is:

utt=ω02uu_{tt} = - \omega^{2}_{0}u(1)

where state variable u=u(t)u = u(t) describes the displacement of the mass from its equilibrium (in m), utt=2ut2u_{tt} = \frac{\partial^2 u}{\partial t^2} is its acceleration and ω0\omega_0 is the angular frequency of oscillation (in rad/s). To solve this using finite differences, the state of the mass as well as the derivatives must be approximated, or discretised. Using t=nTt=nT, with time index n=0,1,...n = 0, 1, ... and time step TT, the simple harmonic oscillator in Equation 1 can be discretised to:

un+12un+un1T2=ω02un\frac{u^{n+1} - 2 u^{n} + u^{n-1}}{T^2} = - \omega^{2}_{0}u^n(2)

Here, un+1u^{n+1} refers to the value of the oscillator at the next timestep n+1n+1. Using values of the current timestep unu^n and previous timestep un1u^{n-1} along with the size of the timestep TT, the value of the oscillator at the next timestep can be computed by rearranging Equation 2 to the following update equation:

un+1=(2ω02T2)unun1u^{n+1} = \left(2 - \omega^{2}_{0} T^2\right) u^n - u^{n-1}(3)

By stepping through time and recursively calculating Equation 3, the simple harmonic oscillator is simulated. Image 3 shows that at each timestep, the explicit scheme depends on unu^{n} and un1u^{n-1} to calculate un+1u^{n+1}. As un+2u^{n+2} depends on the result of un+1u^{n+1}, the scheme can only be processed serially by a single processor, as there is only one stream of values to simulate that depend on each other.

Image 3

Serial processing of ODE finite-difference schemes.

1D Wave Equation

Partial differential equations (PDE) are an extension to the concept of ODEs, like the simple harmonic oscillator discussed in Equation 1. Where ODEs involved a single independent variable, PDEs involve two or more independent variables [35]. The 1-dimensional wave equation is an example of a partial differential equation:

utt=c2uxxu_{tt} = c^2 u_{xx}(4)

where state variable u=u(x,t)u = u(x,t) is now dependent on time tt as well as spatial coordinate xx. Assuming a system of length LL (in m), the spatial domain becomes x[0,L]x\in [0, L]. Furthermore, cc is the wave speed (in m/s).

To approximate Equation 4, one can subdivide the spatial domain into NN intervals with equal length XX. The spatial coordinate can then be approximated according to x=lXx=lX with spatial index l={0,...,N}l = \{0, ..., N\}. Using the spatial index as well as the temporal index nn described above, one can define grid function ulnu_l^n which is a discrete approximation to u(x,t)u(x,t).

Approximating uttu_{tt} and uxxu_{xx} using finite-differences, the following recursively solvable explicit scheme can be formed:

uln+1=2uln+λ2(ul+1n2uln+ul1n)uln1u_{l}^{n+1} = 2 u_{l}^{n} + \lambda^2 (u_{l+1}^{n} - 2 u_{l}^{n} + u_{l-1}^{n}) - u_{l}^{n-1}(5)

where λ=cT/X\lambda = c T/X is the Courant number and needs to abide the condition for the scheme to be stable [36]: λ1\lambda \leq 1. To solve this scheme, the simulation now needs to calculate uln+1u_l^{n+1} not just for one position, but for N+1N+1 positions in the model. All of these positions will need to calculate Equation 5 and store the result in uln+1u^{n+1}_{l} for all ll positions as shown in Image 4. These calculations do not interfere or depend on each other, making each spatial point for the timestep solvable in parallel. Parallel streams of processing like those found on the GPU can each handle a single grid point and provided NN is big enough to fully utilize all parallel processors, speed up the calculation of the system for each timestep.

Image 4

Parallel processing of PDE finite-difference schemes.

2D Wave Equation

One can extend the 1-dimensional wave equation in Equation 4 to 2 dimensions. The 2-dimensional wave equation is defined as

vtt=c2(vxx+vyy)v_{tt} = c^2 (v_{xx} + v_{yy})(6)

with wave speed cc (in m/s) and where state variable v=v(x,y,t)v=v(x,y,t) is defined over two spatial dimensions xx and yy.1 For a rectangular system of side lengths LxL_x (in m) and LyL_y (in m), (x,y)D(x,y)\in \mathcal{D} where D[0,Lx]×[0,Ly]\mathcal{D} \in [0, L_x]\times[0, L_y].

Similar to before, Equation 6 this can be discretised using finite differences. Discretising time as usual (t=nTt = nT), space can be subdivided into NxN_x intervals in the xx-direction and NyN_y intervals in the yy-direction, yielding x=lXx = l X and y=mXy = mX,2 with l{0,...,Nx}l \in \{0, ..., N_x\} and m{0,...,Ny}m \in \{0, ..., N_y\}. Using these definitions, the state variable v(x,y,t)v(x,y,t) can then be approximated using grid function vl,mnv_{l,m}^n. Discretising Equation 6 and solving for vl,mn+1v_{l,m}^{n+1}, yields the following update equation:

vl,mn+1=2vl,mnvl,mn1+λ2(vl+1,mn+vl1,mn+vl,m+1n+vl,m1n4vl,mn)v^{n+1}_{l,m} = 2 v^{n}_{l,m} - v^{n-1}_{l,m} + \lambda^2 (v^{n}_{l+1,m} + v^{n}_{l-1,m} + v^{n}_{l,m+1} + v^{n}_{l,m-1} - 4 v^{n}_{l,m})(7)

where the following stablitlity condition must be satisfied [5]: λ12\lambda \leq \frac{1}{\sqrt{2}}.

GPU Acceleration

Modern computer systems are built as heterogeneous platforms [37], meaning systems utilise different processing devices simultaneously for maximum efficiency. The CPU and its typical 4-14 fast processors [38] are usually reserved for most general processing, while the hundreds of parallel processors on the GPU can be used to offload and accelerate suitable tasks such as graphics. Although the CPU and GPU share many similarities, there are key differences that need to be understood to efficiently target both devices. An abstract view of the modern GPGPU architecture is shown in Figure 5 [39]. Here, the CPU interfaces with the GPU unit across a bridge and a host interface loads instructions and programs onto the GPU [40]. The device then loads the programs across the compute devices; these manage the execution of instructions from the program across their numerous Processing Elements (PE). According to the program instructions, all the PEs then execute the same instructions simultaneously on different sections of data in memory by using the ID of the stream of execution to index into memory. This means that each compute unit supports the single-instruction multiple-data (SIMD) paradigm [41].

The finite-difference methods used in this paper require solving simple linear systems of equations that are entirely independent from one another. Therefore the entire state of the grid can be contained in GPU global memory such that each point on the grid can be accessed and processed concurrently, without any explicit synchronisation or communication between PEs. Further, scaling up the number of points does not impact on this advantage. For example, to calculate a point in space and time requires accessing neighbouring values, but the result of the calculation only requires writing to a single, mutually exclusive location. Therefore, when mapping numerical physical models to the GPU, a core can be allocated to solve each equation at each position in space and write the resulting value to memory without affecting other neighbouring cores. This mapping is the case for models based on linear systems, more sophisticated, non-linear variants can often not be solved as a recursively solvable explicit scheme.

Image 5

The modern GPU architecture.

HyperModels Framework

The HyperModels framework is outlined in Image 6. There are four distinct software components: physical-model-representation, svg-generator, svg-parser and then a gpu-interface. The physical-model-representation provides the mapping between the finite-difference schemes to parallel processing environment. The vector graphic representation of the physical model geometry is described using an annotated scalable vector graphics (SVG) [42] format in svg-generator. The SVG is then parsed in svg-parser to produce a two-dimensional grid representation of the models that is suitable for the finite-difference schemes. The physical model is captured inside a JSON object that the gpu-interface can load into the GPU and integrated into an application to create an instrument.

Image 6

Relationship diagram of the HyperModels GPU physical modelling tools.

Model to GPU Mapping

Mapping finite-difference equations into a parallel processing environment requires defining an appropriate representation and program structure. Here, we define the GPU as system aa with a grid of Cx×CyC_x \times C_y cores. A single core is denoted by acx,cya_{c_x, c_y} where cx{1,...,Cx}c_x\in\{1, ..., C_x\} and cy{1,...,Cy}c_y\in\{1, ..., C_y\} are the core indices in the horizontal and vertical direction of the GPU grid respectively. The update equation of one grid point can then be assigned to a single core, such that these can be executed in parallel. Considering the 2D system presented in Equation 7, mapping a grid point of a scheme with spatial index (l,m)(l,m) to a core with index (cx,cy)(c_x,c_y) is denoted by acx,cyvl,ma_{c_x, c_y}\Leftarrow v_{l,m}.

The cores of the GPU aa must then be processed by recursively traversing the two-dimensional system aa and calculating the values of an+1a^{n+1} at each position depending on what models are assigned at each position. This process is repeated recursively and once all of an+1a^{n+1} is calculated, time index nn can be incremented. A serial representation of this process would be formed using two nested for loops:

Image 7

Here, every position in the grid is visited by iterating over all possible values for x and y. These are first used to check if the position in the two-dimensional grid is inside the model uu. If it is, then Equation 7 is used to calculate the value at acx,cyn+1a^{n+1}_{c_x,c_y}. In the equation, components such as acx+1,cyna^{n}_{c_x+1,c_y} require accessing neighbouring values of the currently considered position at cxc_x and cyc_y by adding 11 to x, leading to a[n][x+1][cy]. This is done for all neighbouring values. This serial approach can be rewritten to the following, so that it is suitable for parallel processing:

Image 8

Here the nested for loops are implicit, with the loop’s indices represented through the identifiers getGridX() and getGridY(), respectively, where Cx×CyC_x \times C_y process streams are dispatched to the processor. Thus, instead of iterating over two nested for loops, the ID of each process stream is used to check which model’s equation to use, such as uu for accessing aa to calculate the next timestep an+1a^{n+1}. The state of the system aa is contained in global GPU memory as it is only directly accessed once by each PE.

Multiple models can be added to the simulation, where the equation at each position in the system aa depends on the cxc_x and cyc_y coordinates. For example, one can add the 1D wave equation described in Equation 5 as a second model to the GPU. This one-dimensional equation can be mapped into the system aa according to acx,cyula_{c_x, c_y} \Leftarrow u_l. The code then includes an additional conditional statement checking if the coordinate (cx,cy)(c_x, c_y) identifies uu:

Image 9

This mapping has been implemented using a domain specific language (DSL) [43] where finite-difference schemes can be defined. The DSL compiler generates a GPU program that captures the parallel representation. The details of this implementation will be covered in future work.

Vector Based Representation

A method for mapping models such as uu into the system aa is required for describing the shapes of the models. The SVG format is a compact and extendable vector graphics protocol that supports descriptions of shapes and their properties as XML tags and attribute values. To support the physical models, the standard SVG format is extended to capture important information associated with each shape. An example of an SVG containing a rectangle that includes physical modelling attributes is as follows:

<svg viewbox='0 0 12 8' width='12' height='8' 
    interface_device='custom' connections=''>
    <rect id='1' interface_osc_address='' interface_type='pad' 
        interface_osc_args=''
        width='2' height='10' x='4' y='2' 
        style='fill:rgb(88,111,124);'
        physics_program='...'/>
</svg>

Inside the SVG svg tags, the viewbox, width and height describe the resolution of the entire physical modelled environment. Defining these as viewbox=’0 0 12 8’ width=’12’ height=’8’ creates a 12x8 simulation environment. This dictates the possible space to work in and how detailed shapes will be in the simulation. The connections field contains a list of coordinates that are connected together between models. The physics_program contains the generated GPU program from the physical-model-representation stage. Each shape contains two additional attributes, id and physics_program. The id is the unique identifier for each shape, this is used to fill in the id of the shape when generating the 2-dimensional grid in the svg-parser.

Vector Parser

Simulating the physical models using finite-difference based methods requires a Cartisian grid of points to apply calculations to. Therefore, the SVG vector image format must be used to generate a 2-dimensional grid. The design presented here is based on the SVG parser in [44], with modifications making it appropriate for the physical modelling framework. Image 10 covers the stages used by the SVG parser.

Image 10

Interface tessellation and rasterization of single rectangle.


First, the SVG description is tessellated, this produces a set of triangles that when considered holistically, form the original shape. These triangles prepare the shape for rasterization, where all points inside the respective shape’s triangles are populated with the ID of the shape. Here, the rect with a height of 10 and width of 2 is tessellated and points inside the triangles are filled with the shape ID inside the 12x8 simulation grid. The output of the parser is then used to populate a JSON object with the following data:

{
    "models" : [
    {
        "id": 1,
        "rgb": "rgb(217,137,188)",
        "args": [
        0.39
        ]
    },
    . . .
    ],
    "environment": [
        [
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
        ]
    ] ,
    "physics_program": "...",
    "connections": ...
    "interface" : "custom"
}

Here the Cartisian grid that contains each model’s ID is contained in environment, while a list of models contains details for each model, particularly the id that indicates what equation will be executed for each point inside the GPU physics_program. The connections field contains a list of coordinates that are linked together between models.

GPU Interface

The whole physical model environment captured inside the JSON object can now be used by a GPU interfacing API to load the GPU program, prepare the memory modelling the environment and then execute it to generate audio samples. The following C++ function declarations are used in this implementation:


void createModel(const std::string aPath);

void step();
void fillBuffer(float* input, float* output, uint32_t numSteps);
void updateCoefficient(std::string aCoeff, uint32_t aIndex, float aValue);

void setInputPosition(int aInputs[]);
void setOutputPosition(uint32_t aOutputs);

createModel() takes a file path to the JSON file containing the physical model description. The GPU program for the target GPU interfacing API can then be loaded onto the GPU. step() is a private function that is used for advancing the physical model one timestep. This is used by fillBuffer to recursively advance the physical model and at each time step extract samples from the output position to fill the output buffer. is used to change the value of coefficients. The input and output positions for excitation and sample generation respectively can be moved using setInputPosition() & setoutputPosition().

The performance of this framework is evaluated by profiling an implementation of the HyperModels framework. This will provide details of the frameworks strengths and limitations that will lead into the development of two example instruments.

Performance Evaluation

The convenience provided by automating development is often contrast with a trade-off with performance. This section provides a brief evaluation of the performance by comparing the auto-generated GPU programs from HyperModels with manually developed equivalents. To create a controlled testing environment, only the GPU programs will be modified. This means the interfacing methods and physical model representation will be the same between versions. Further, measurements taken from equivalent parallel CPU versions have been included for comparison.

Comparative Tests

The benchmarking suite operates by following the real-time profiling technique used in [45]. This approach runs a collection of tests through the following template:

void realtimeTest() {
    if (isWarmup) {
        executeTest();
    }
    while (numSamplesComputed < sampleRate) {
        startTimer();
        executeTest();
        endTimer();
    
        numSamplesComputed += bufferLength;
        checkTestResults(testResults);
    }
    elapsedTimer();
    cleanup(hostVariables, deviceVariables);
}

Here, the executeTest function represents the physical modelling synthesis function that generates bufferLength samples. This function is repeatedly called until a seconds worth of samples has been computed at a specified sample rate. This template can then be repeated and the average time to process at the sample rate can be measured. The isWarmup flag is used to execute the test once without measuring performance as the first time a GPU program is executed it can be considerably slower than all subsequent executions as the GPU prepares and optimises on the first execution.

In this evaluation, the test will be considered when processing at the minimum acceptable sample rate of 44100Hz [7]. The tests will be repeated for a series of escalating grid dimension sizes, starting at 64x64 increasing by powers of 2 up to 1024x1024. Four tests have been designed to test the basic functionality of the physical models and reveal contextual differences in performance between the automatic and manually written GPU physical model programs. For conciseness in this paper, only the salient results from two of the tests 3 will be considered in this analysis and are as follows:

  • Simple Single Model - A single 2-dimensional wave equation square physical model. Utilisation=88%88\% (Image 11)

    Image 11

    SVG representation of Simple Single Model model geometry.

  • Complex Multiple Models - Two 2-dimensional square physical models connected by a single string. Utilisation=51%51\% (Image 12)

    Image 12

    SVG representation of Complex Multiple Models model geometry.

Manually writing the GPU program provides opportunities for a competent developer to exploit contextual elements using foresight that automated tools either have difficulty identifying or can not safely implement. Some of the optimisations exclusive to the manual version are full constant folding [46] in equations and grouping models with identical equations into one control dependency. Each of these tests have a utilisation measurement shown as a percentage. This is an important feature of each tests as it denotes how intensive the simulation is to process.

Results

The test results for simple single model are shown in Image 13. Considering the GPU auto-generated and manual versions, grid resolutions between Cx=Cy=64C_x = C_y = 64 and Cx=Cy=256C_x = C_y = 256 appear to show comparatively marginal difference. However, from Cx=Cy>256C_x = C_y > 256, the disparity begins to emerge, where the manually written program begins to perform increasingly better. With this projection, the manually written version can operate at higher resolutions up to around Cx=Cy=700C_x = C_y = 700, while the auto-generated version reaches Cx=Cy=512C_x = C_y = 512. The limited scope of the simple single model test means the only notable optimisation added to the manual version is the constant folding of all redundant calculations. This is where any constant values involved in the equations defined for the model are calculated once on the CPU and uploaded as one constant coefficient to the GPU, instead of redundantly calculating it on the GPU. This optimisation appears effective for this test as it is applied across 88%88\% of the simulated space. Therefore, as expected, optimisations that reduce computation in the model’s scheme are effective.

When considering the performance of the CPU against the CPU, the GPU can support far higher resolution models than the CPU. For this single model test, the manually written CPU version can almost support Cx=Cy=128C_x = C_y = 128 which would involve 16384 grid points. The auto-generated GPU version can support Cx=Cy=512C_x = C_y = 512 that involves a considerably higher 262144 grid points, 16X more than the CPU can support.

Image 13

Execution time for a second's worth of sample at 44.1KHz for the simple single model test.

Image 14 displays the results recorded for the Complex Multiple Models test. The results follow a similar trend as the Simple Single Model test, however, the differences between the auto-generated and manual versions are negligible. Both the GPU versions support resolutions up to around Cx=Cy<700C_x = C_y < 700. It appears for the test that involves advanced models and connections, the manual optimisations are less effective. This might be partly because the constant folding optimisation is less effective for the smaller grid utilisation of 51%51\%. This test involves two connection points between the three models yet the performance does not appear to be negatively effected by this. This suggests the technique used for the connections in the design is effective, at least for few connection points. However, it can not be assumed to extrapolate as more connections are added, comprehensive experimentation is needed to establish how this scales for hundreds of connections.

The results presented here support the effectiveness of the HyperModels auto-generated programs. For the majority of the tests, the performance difference was a negligible 6% in favour of the GPU manual version over the auto-generated equivalent. Although, under certain contexts the manual version was significantly faster. For example, the manual version of simple single model test at Cx=Cy=700C_x = C_y = 700 was 50% faster than the auto-generated program.

Image 14

Execution time for a second's worth of sample at 44.1KHz for the complex multiple model test.

Case Studies

Two instruments will be presented here to demonstrate the expressiveness of the HyperModels framework. Both instruments will operate within the system aa defined in HyperModels that will have a resolution of (Cx,Cy)(C_x, C_y) where models can be defined to operate at specific positions determined by the vector based description of the model shapes.

Instrument 1: Hyper Drumhead

Video 1

Instrument 1: Hyper Drumhead

Instrument 1 implements the Hyper Drumhead physical model from [23] [31]. In their implementation, the Hyper Drumhead is a two-dimensional drumhead simulation that has been accelerated on the GPU using the graphics pipeline. The Hyper Drumhead was reported to support resolutions up to 320x320 for most of the systems tested on. In their implementation, Zappi et al. mapped the audio domain of the physical model directly into the graphical domain using the OpenGL graphics rendering API. By conforming to the graphical domain, this design requires an additional field of knowledge and imposes some limitations. For Instrument 1, the Hyper Drumhead will be ported to the HyperModels framework with an extended resolution of Cx=Cy=512C_x = C_y = 512. The source code and recordings of this instrument are publicly available online 4.

The physical model PDE used in the Hyper Drumhead is based on the two-dimensional wave equations with a frequency independent damping component [47]. Equation 5 can be extended to include damping, and the recursively solvable explicit scheme used for the qqth model vqv_q where q=1,2q = 1, 2 will be defined as:

(1+σq)vq,l,mn+1=2vq,l,mn(1σq)vq,l,mn1+(λq)2(vq,l+1,mn+vq,l1,mn+vq,l,m+1n+vq,l,m1n4vq,l,mn)\begin{aligned} (1 + \sigma_{q})v^{n+1}_{q,l,m} &= 2 v^{n}_{q,l,m} - (1 - \sigma_q) v^{n-1}_{q,l,m} \\ & + (\lambda_{q})^2 (v^{n}_{q,l+1,m} + v^{n}_{q,l-1,m} + v^{n}_{q,l,m+1} + v^{n}_{q,l,m-1} - 4 v^{n}_{q,l,m}) \end{aligned}(8)

where the Courant number λq\lambda_q is defined as in Equation 5, and σq\sigma_q is the frequency independent damping coefficient (in s1^{-1}). To maintain stability inside the model, the conditions λ12\lambda \leq \frac{1}{\sqrt{2}} and 0<σ<10 < \sigma < 1 must be met. The mapping of vqv_q into the grid of cores in aa is illustrated as an SVG in Image 2. The physical model described so far is then contained in the instrument component in the application visualised in Image 15. Here, the CPU application program written in the JUCE5 audio framework interfaces with the GPU instrument program requesting 44100 samples per second. However, the input/output samples between CPU and GPU uses a buffering technique. Therefore, data transfers occur at a rate of fsbs\frac{f_s}{b_s}. So for a buffer length bs=256b_s = 256 at fs=44100f_s = 44100, there are 44100256=173\frac{44100}{256} = 173Hz data transfers per second. The state of the system aa stays in the GPU global memory and is not transferred back to the CPU. Instead, for the on-screen visualisation of the model an OpenGL graphics program is called at a rate of 15Hz. This efficiently maps the state of the instrument into coloured pixels that are then sent directly to a display device. Interactions with the instrument are controlled using two Sensel Morphs 6. The Sensel morph is a high-resolution pressure sensor that detects the position and amount of pressure of contacts. The two Sensel morph’s have been connected to the application, one mapping to model v1v_1 and the other to v2v_2 by adding excitation to the system acx,cya_{c_x,c_y} at a position cxc_{x} and cyc_{y} that is detected by the Sensels. The sensel’s are polled for contacts at a rate of 150Hz as this provides a maximum detection latency of 6.6ms [21]. A screenshot of Instrument 1’s application GUI is shown in Image 1.

Image 15

Application overview for Instrument 1: Hyper Drumhead.

Instrument 2: String-Plate Connections

Video 2

Instrument 2: String-Plate Connections

Instrument 2 aims to demonstrate that complex, interconnected models can be represented by the GPU accelerated framework. In [21], Willemsen et al. design instruments using strings and plate models that are connected together to form instruments such as the sitar, hammered dulcimer and Hurdy Gurdy. In their implementation, the instruments were executed on the CPU and operated for small resolutions, with strings involving a maximum of 50 points and plates of 20x10. The GPU accelerated implementation will support higher resolution models with multiple strings of 280 points and a plate of 236x121 inside an environment aa with Cx=Cy=256C_x = C_y = 256. A plate model vv and multiple strings uqu_q will be defined where the subscript qq is used to identify each string between q=1,,13q = 1, \dots, 13.

Image 16

Application overview for Instrument 2: String-Plate Connections application.

The string models are defined as using the stiff string equation [48] along with frequency independent and frequency dependant components [49]. Using finite-differences, the recursively solvable explicit form uq,lnu^{n}_{q,l} is formed:

(1+σq,0T)uq,ln+1=(22(λq)26(μq)24σq,1TX2)uq,ln((λq)2+4(μq)2+2σq,1TX2)(uq,l+1n+uq,l1n)(μq)2(uq,l+2n+uq,l2n)+(1+σq,0T+4σq,1TX2)uq,ln12σq,1TX2(uq,l+1n1+uq,l1n1)\begin{aligned} (1 + \sigma_{q,0} T) u^{n+1}_{q,l} &= \left(2 - 2 (\lambda_q)^2 - 6 (\mu_q)^2 \frac{4\sigma_{q,1} T}{X^2}\right) u^{n}_{q,l}\\ & \left((\lambda_q)^2 + 4(\mu_q)^2 + \frac{2\sigma_{q,1} T}{X^2}\right)(u^{n}_{q,l+1} + u^{n}_{q,l-1})\\ & - (\mu_q)^2 (u^{n}_{q,l+2} + u^{n}_{q,l-2})\\ & + \left(-1 + \sigma_{q,0} T + \frac{4 \sigma_{q,1} T}{X^2} \right) u^{n-1}_{q,l}\\ & - \frac{2 \sigma_{q,1} T}{X^2} \left(u^{n-1}_{q,l+1} + u^{n-1}_{q,l-1}\right) \end{aligned}(9)

with μq=κqTX2\mu_q = \frac{\kappa_q T}{X^2}, stiffness coefficient κq\kappa_q, frequency independent damping σq,0\sigma_{q,0} and frequency dependant damping σq,1\sigma_{q,1}.

The linear plate equation [50] uses a similar description of physics as the stiff string including frequency independent and dependant components, but is extended to operate across two-dimensions. By discretising the equation using finite-differences, the following recursively solvable explicit scheme is formed.

(1+σ0T)vl,mn+1=(220μ24S)vl,mn+(8μ2+S)(vl+1,mn+vl1,mn+vl,m+1n+vl,m1n)2μ2(vl+1,m+1n+vl+1,m1n+vl1,m+1n+vl1,m1n)μ2(vl+2,mn+vl2,mn+vl,m+2n+vl,m2n)+(σ0T1+4S)vl,mn1S(vl+1,mn1+vl1,mn1+vl,m+1n1+vl,m1n1)\begin{aligned} (1 + \sigma_0 T) v^{n+1}_{l,m} &= (2-20\mu^2-4 S) v^{n}_{l,m}\\ & + (8\mu^2 + S)(v^{n}_{l+1,m} + v^{n}_{l-1,m} + v^{n}_{l,m+1} + v^{n}_{l,m-1})\\ & - 2 \mu^2 (v^{n}_{l+1,m+1} + v^{n}_{l+1,m-1} + v^{n}_{l-1,m+1} + v^{n}_{l-1,m-1})\\ & - \mu^2 (v^{n}_{l+2,m} + v^{n}_{l-2,m} + v^{n}_{l,m+2} + v^{n}_{l,m-2})\\ & + (\sigma_0 T - 1 + 4S) v^{n-1}_{l,m}\\ & - S (v^{n-1}_{l+1,m} + v^{n-1}_{l-1,m} + v^{n-1}_{l,m+1} + v^{n-1}_{l,m-1}) \end{aligned}(10)

Where parameters are the same as in Equation 9 but with the additional coefficient S=2σ1TX2S = \frac{2 \sigma_1 T}{X^2}.

The geometry for instrument 2 is displayed in the SVG in Image 18. Again, a JUCE application running on the CPU interfaces with the instrument on the GPU at a samplerate of 44100Hz as shown in the instrument overview in Image 16. The Sensel Morphs are used as input, one being mapped to pluck across the strings uqu^q and the other to strike the plate vv. The key difference in this arrangement is that the visualisation of the instrument is processed on the CPU using the JUCE framework. Therefore, the state of the grid must be transferred from the GPU to the CPU to update the JUCE graphical components to then load onto the GPU at a frame rate of 15Hz. A screenshot of the Instrument 2 application is shown in Image 17 and the recordings and source code are available online 7.

Image 17

Screenshot of Plate-String Connections application.

Image 18

Vector graphics and physical model description for uq and v in system a.








Conclusion

This paper has presented an overview of the HyperModels framework for assisting the development of GPU accelerated physical model synthesisers. The performance of HyperModels has been evaluated and shown to outperform CPU versions for resolutions above 64x64, particularly at 512x512 where the auto-generated GPU code was approximately 4X faster than the manually written parallel CPU version. However, the manually written GPU code was consistently faster than the auto-generated equivalent, being around 6% slower for most tests and a maximum of 50% for a particular case. The design for HyperModels is still relatively restricted to support linear systems that can be simulated using recursively solvable explicit schemes. These linear systems although highly suited for parallel processing and meeting real-time requirements, are fundamentally limited. However, the framework could be extended to support certain non-linear systems using iterative methods such as Newton Raphson method [11]. Other future work involves optimising the HyperModels implementation further to match the manual program performance and to present the comprehensive technical details of the entire framework.

Acknowledgments

This work was supported by a grant from the HSA Foundation and the UWE 50-50 PhD fund and has been funded in part by the European Art-Science-Technology Network for Digital Creativity (EASTN-DC), project number 883023.

Ethics Statement

Being a technical paper that presents software designs, the research and contents of this paper involves no ethical considerations and/or implications.

Comments
0
comment

No comments here

Why not start the discussion?