Image of an arrow

Resampling impact on performance in embedded audio pipeline with Pipewire

Avatar

emontmasson

TL;DR

  • Audio pipelines often use asynchronous devices, which must be synchronized with resampling.
  • Measurements show CPU load of resampling reach almost 30% of a CPU core on an i.MX8M Nano.
  • A possible optimization is to use feedback with USB UAC2 audio gadget.
  • alsaloop already supports it, as Pipewire since a recent version.

A sound server is a software that aims to let multiple applications access audio devices. It also processes conversions, as devices can use different sample rates and formats. To do so, resampling is almost always performed to adapt sample rates.

In the Linux ecosystem, the most used sound servers at the time of writing are JACK and Pulseaudio. Pipewire is a more recent project that aims to offer a good alternative to both of these sound servers. It has compatibility with the JACK and Pulseaudio APIs and offers promising performance improvement. Due to this, it is gaining a lot of traction and is now available on multiple distributions.

This article will focus on the CPU usage of resampling with embedded devices. We will use Pipewire as well as asynchronous devices to simulate a complex use case. We expect a non negligible CPU load due to resampling that could be improved on embedded devices.

For more information about Pipewire, you can read the previous two articles on the subject:

Experimental setup

Hardware and Linux distribution

The measurements were run on an  i.MX 8M Nano UltraLite EVK, and the following audio devices:

  • An AUD-EXP-42448 Audio Card integrating a cs42448 chip: 6 input channels, 8 output channels. We will refer to it as the cs42448 in this article.
  • A USB UAC2 gadget, receiving and transmitting audio from a host: 8 input channels, 8 output channels.

Figure 1: measurement environment

The Linux distribution used on the i.MX8 is built using the Yocto Project, Kirkstone version. It runs a Linux mainline kernel with RT patch, version 6.1.26-rt8, and a Pipewire daemon version 0.3.71.

Pipewire configuration

We want Pipewire to mix audio from sources with different sample rates. This requires all sources to be synchronized. The rate of processing graph will also change to trigger or not the resampling.

About the Pipewire configuration, we can note that:

  • Unless specified otherwise, the processing graph rate is 48 kHz.
  • Source and sink of a same device are configured with the same clock.name property. This notifies Pipewire that the underlying hardware of each pair of nodes rely on the same clock. Pipewire can then skip unnecessary resampling, helping us show the impact of resampling.
  • The resampler uses the default value of 4 for resample.quality. A higher value means better resampling and audio with less aliasing, but a higher CPU load. According to Pipewire documentation, 4 is “good compromise between quality and performance”.

Setups

We have tested 4 different setups:

  • Crossed audio (44.1 kHz – 48 kHz): sources of USB are routed to sinks of cs42448, and vice versa. Sample rates are 44.1 kHz for the USB, 48 kHz for the cs42448, meaning Pipewire will have to perform resampling.
  • Crossed audio (48 kHz – 48 kHz): same as above, but USB also has a sample rate of 48 kHz. Resampling occurs as devices have different reference clocks. For the cs42448, it comes from its own audio PLL. For the USB gadget, its audio is synchronized with the USB host through the isochronous transfer.
  • CS42448 loopback (48 kHz): simple loopback for the cs42448, from its source to its sink. No resampling will be done as:
    • Sources and sinks rely on the same clock.
    • The processing graph and the device have the same rate.
  • CS42448 loopback (44.1 kz): same as previous setup, but the processing graph rate is set to 44.1 kHz. It forces resampling between device and processing graph.

To do the measurements, we use 2 well known tools:

  • htop: to get a global idea of the Pipewire process CPU load.
  • perf: to get more precise measures about the CPU load of the resampling.
    • We measured on the complete system, illustrating the impact in a potential use case. The command used was:
perf record -ag -- sleep 30

Measurements of the resampling CPU load

Table 1 presents the results for the 5 setups previously described. It shows the measured CPU load with htop, as well as the CPU load of the resampling deduced with the perf measurements. All measurements have been done at least twice to assure consistent results.

A few remarks regarding the configuration and the results:

  • The i.MX8M Nano has a 4 CPU cores, the theoretical maximum CPU load is 400%.
  • Measurements with htop are not precise. The general idea of the CPU load is enough to show the impact of the resampling.
  • Each measurement was done for at least 20 seconds. Peaks outside of the intervals might happen for short periods of time. As we measure the whole system, we assumed they are outliers.
  • Pipewire performs resampling with libspa-audioconvert.so, calling the functions below. <type> is the implementation used and may change depending on the platform.
    • do_resample_full_<type>()
    • do_resample_inter_<type>()
    • do_resample_copy_c()
  • For our measurements, <type> was neon.
  • The CPU load of the resampling was computed from perf and htop measurements. perf shows the percentage of samples spent in a function. Samples only represent the active load of the CPU. Therefore, the equation to compute the resampling CPU load is (resampling-perf-percentage / pipewire-perf-percentage) * pipewire-CPU-percentage.
Table 1 : CPU load measurement results
SetupCPU load with htopperf samples related to resamplingperf samples related to pipewireResampling CPU load
Crossed audio
(44.1 kHz – 48 kHz)
50% ~ 55%33.37%62.91%26.52% ~ 29.17%
Crossed audio
(48 kHz – 48 kHz)
24% ~ 30%14.38%31.65%7.60% ~ 9.50%
CS42448 loopback
(48 kHz)
9% ~ 12%No sample24.26%0%
CS42448 loopback
(44.1 kHz)
29% ~ 31%25.73%50.43%14.80% ~ 15.82%

We observe that Pipewire avoids useless resampling. With CS42448 loopback (48 kHz), all sample rates and clocks match, so no resampling is done. In CS42448 loopback (44.1 kHz) however cs42448 sample rate don’t match with the graph, resampling is done.

Crossed audio (48 kHz – 48 kHz) only has its device clocks not matching. Compared to Crossed audio (44.1 kHz – 48 kHz), Pipewire performs less resampling. Configuration is then an important step to reduce resampling and CPU load.

However the impact is still significant, reaching almost 30% of CPU load in some cases. This can be a real problem with complex audio embedded applications.
An external ASRC (Asynchronous Sample Rate Converter) could for example perform this task instead of the CPU, but would increase hardware cost.

Adapting audio speed with USB feedback

USB feedback principle

With recent versions of Linux, the USB UAC2 gadget now supports the feedback endpoint. It allows USB devices to signal the host to adapt the amount of sample sent over time. The USB device then assumes the captured audio rate match whatever is the reference. Resampling is then not needed anymore even if hardware clocks are different.

Figure 2: USB feedback principle

On the device side, alsaloop already implements the use of the feedback endpoint. With a USB UAC2 gadget as capture in async mode, alsaloop skips resampling. This can be verified with perf, searching for calls of libsamplerate.so. If we connect the USB source to the cs42448 sink with alsaloop, we observe a reduction of CPU load. htop shows 38% CPU load without feedback, 10% with. Once again, not doing resampling saves a lot of CPU resources.

Pipewire implementation

Since version 0.3.72, Pipewire supports the feedback endpoint. We updated the stress test setup to use it. To achieve this, it is necessary to adapt the configuration. The sample rate of the USB device must match the sample rate of the cs42448. The USB gadget configuration must also set the capture sync type to “async”.

Table 2 shows results of the Crossed audio (48 kHz – 48 kHz) setup with the USB gadget feedback support. The cs42448 and USB audio gadget have the same sample rates, so USB feedback can be used. Results without feedback are copied from table 1 for comparison.

Table 2 : Resampling CPU load for Crossed audio (48 kHz – 48 kHz) setup with feedback support
Use feedback endpointCPU load of pipewire with htopperf samples related to resamplingperf samples related to pipewireResampling CPU load
No

(USB gadget in adaptive mode)

24% ~ 30%14.38%31.65%7.60% ~ 9.50%
Yes

(USB gadget in async mode)

20% ~ 23%No sample35.25%0%

With adapted sample rates, Pipewire does not resample streams anymore. The CPU load reduction with the feedback is between 7% and 10%. It is interesting for performance and only requires some configuration.

Conclusion

Real use cases can be more complex or ask for heavier processing than the measurement setups. It is common to have other software on top of the audio pipeline, for example doing noise cancellation. In this case, 20% of CPU load for the resampling is not something to be neglected. But with the right tools and configuration it can be reduced.

The USB audio gadget is a good example. We showed that the feedback functionality of the USB helps reduce the CPU usage. Others tasks could then run, or the setup complexity could be increased. The sample rate of the USB audio can be changed more easily than other devices. If needed, the USB host can perform resampling ahead of sending audio via USB, which frees up resources on the device.

Other techniques could be explored to reduce the usage of resampling. In our case, the i.MX8 board provides a hardware asynchronous sample rate converter. Resampling could be offloaded to this module instead of running on CPU.

Leave a comment

Your email address will not be published. Required fields are marked *


Similar articles

Image of an arrow

Nuremberg, April 9th, 2024 – In an era where cybersecurity threats are increasingly sophisticated and pervasive, Savoir-faire Linux, a leading provider in open-source technological innovation and software engineering for embedded systems across North America and Europe, is proud to announce the launch of its Cybersecurity professional services tailored specifically for product engineering and embedded systems. […]

Savoir-faire Linux is happy to introduce the v2.4.0 release of the official Yocto Project extension for VS Code. These developments were carried out as part of the investment provided by the Sovereign Tech Fund to the Yocto Project to improves the long-term sustainability of the project by attracting a new generation of developers. The changelog […]

Savoir-faire Linux is proud to introduce the v2.3.0 release of the official Yocto Project extension for VS Code. These developments were carried out as part of the investment provided by the Sovereign Tech Fund to the Yocto Project to improves the long-term sustainability of the project by attracting a new generation of developers. The changelog […]

Savoir-faire Linux is proud to announce the v2.2.0 release of the official Yocto Project extension for VS Code. These developments were carried out as part of the investment provided by the Sovereign Tech Fund to the Yocto Project to improves the long-term sustainability of the project by attracting a new generation of developers. The changelog […]

Power saving has always been a major preoccupation in embedded systems, as by definition, they could have energy constraint. Of course now, energy saving is still on the heart of the discussions. Energy saving is always a set of compromise. In terms of system or peripherals availability, time to wake… In this blog, we will […]