Immersive audio and using the X.Y.Z syntax

Walt Zerbe, senior director of Technology & Standards at CEDIA EMEA investigates the meaning of immersive audio.

“Immersive audio” – a term that is thrown around in most project briefing conversations, whether by the integrator or the client. But are both parties on the same page as to what they mean by immersive audio?

It’s our understanding that there may be quite a gap between integrator and client awareness and expectations when it comes to immersive audio. For example, some consumers may have what they deem as incredibly immersive experiences merely with a TV and soundbar if the programme they're watching is compelling enough and they haven't experienced anything better. Whereas a more experienced integrator may be so invested in the detailed technicalities of picture and sound quality in a reference level system, that their "suspension of disbelief" is impaired.

Understanding the background to immersive audio can help integrators facilitate the required client conversation to get both parties on the same page.

Walte Zerbe

Meaning: Immersive

Originally, the term immersive was applied to audio when the sound field went from being a linear ring of sound around the listeners to becoming a dome of sound around and over them. However, it could also be regarded as something of a misnomer, depending on one's experience. After all, as Peter Aylett of Officina Acustica and Chair of the CEDIA/CTA-RP22 says: "Some of the most immersive experiences I've ever had were with mono audio." In the context of home AV systems however, immersive audio is quite specific. The point here is to not lose sight of what combination of elements can create an overall immersive experience.

In the research article from Frontiers, ‘Arousing the Sound’, the authors state that "sounds provide information about the geometry of the space we are in". While this study focused on children, the findings were unequivocal and can easily translate to adults – that "the use of 'emotionally marked' sound in the design and production of a sound story will generate richer and more detailed mental images in the listener than a soundtrack without this emotional intention," and "a soundtrack mixed [and reproduced] in 3D sound format will elicit a more intense emotional response in the listener compared to a soundtrack mixed in stereo".

Movies, TV shows, music, or games that utilise immersive audio all have one important thing in common – the content creators have chosen this as a powerful tool in expressing their creative intent. Integrators have a responsibility to reproduce this as faithfully as possible within the confines of the space and budget.


The composition of an immersive audio system is commonly described by the three-integer "x.y.z," where each letter represents a type of channel count. For example, 5.1.4. While it appears a seemingly simple syntax, it can too easily be misused, leading to misinterpretation.

Engineering an immersive audio system requires the designer to interpret product specifications that may include x.y.z capabilities, namely in the audio processor or AVR. In turn, the designer will ultimately produce a system specification that may be summarised using x.y.z that will need to be interpreted by others. If the numbers aren't clear in their meaning, then they are essentially useless. Using the numbers in a broadly accepted and consistent way makes using and interpreting them much simpler and more useful for all.

Immersive audio systems should be described simply by the number of discretely rendered channels as a clear indication of its spatial imaging capability, and not necessarily by the number of speakers.

The relative inconsistency and lack of industry standards for how to use the x.y.z syntax is the very reason why proper interpretation is needed. Where different terms may be used to describe the same thing, each may not necessarily be right or wrong; the key is knowing what is meant and how to relate terms to applications. This also extends to the use of the syntax to describe an immersive audio system – where one format might say 9.1, another could present it as 5.1.4 or 7.1.2.

Support and further info

CEDIA has developed a white paper on this topic, entitled ‘Immersive Audio and Using the X.Y.Z. Syntax’. This resource explores the origins and development of immersive audio, clarifies some of the terminology and what the “x.y.z” syntax means, and how to interpret and use it in describing the spatial resolution of immersive audio systems.

Click here for more information and to download a copy of the white paper.


Most Viewed