Audiophile matters: “Seeing the music”


Seeing the music: average note thickness, stage dimensions, and separation

In my previous ‘music lover vs audiophile’ post I addressed the relationship between stage dimensions and separation. This time I will go a little bit further into detail, and have added some pictures to illustrate this. Of course it is important to note that the stage will ultimately depend on your own source. These representations are with the AK380cu. The AK’s stage is both wide and deep, creating an overall spacious and airy stage with a nice sense of three-dimensionality.

Other players might create a more or less similar stage in proportions (like the Cowon Plenue S for example), but in most cases the width and depth will vary greatly between players or especially a smartphone. The Lotoo Paw Gold for instance has a centrally-focused stage, resulting in a narrower but deep presentation. Besides overall larger dimensions, the AK further has the advantage of allowing more variance in the iem’s stage representation, while the LPG predominantly creates a cube-sized stage so to speak. So keep in mind these representations should be viewed in relation to each other, not as a portrayal of how a stage will ‘always’ sound when using one’s own source.

Stage dimensions

Let’s start with what I refer to as a classic stage; it has a nice bit of width and is noticeably wider than deep and tall, but the proportions are good, creating a realistic stage.


Next, we move on to a different type of stage, which I always refer to as a cube stage as it has almost even proportions in width, depth and height. It is the most common type of stage besides the classic stage, at least for high end monitors. While many iems can share a cube-sized stage, they can differ in their overall size. Some examples are the Zeus-XIV and Prelude. Below we see the S-EM9 and Vega, where they have a similar shape, but Vega has slightly larger dimensions.


Separation: average note size and stage dimensions

The stage is where we set the music. However, the stage itself is only one of the factors determining how well the music is separated, or how well you can locate each individual tone in space. While it is very important, the average size of notes and their relationship with the stage, is even more important. The average note size results from dips and peaks throughout the frequency range, from the mid-bass up to the treble. We often read that an iem can have a ‘full-bodied’ presentation, but this can result from two different and equally important constructs: midrange density and note thickness.

Midrange density
The first is midrange density, or the ‘body’ of midrange notes. This can be directly related to vocal depth and power. The midrange density is the result of the tuning of the midrange frequencies; manipulating different key frequencies will consequentially affect the forwardness, size, density and warmth (and tone) of the midrange. Attenuating the midrange frequencies in favor of bass and treble on the other hand will result in a loss of density and focus (or intensity) of instruments, as well as a thinner vocal presentation.

Average note thickness
The average note thickness refers primarily to the size of notes, which can be large or lean. But  an iem can have a thick note presentation without it being overly dense, or being a so-called vocal specialist. This is because the average note thickness will for a great deal rely on bass quantity, especially the mid- and upper- bass. Bass-heavy monitors as the Rhapsodio Solar, Heir 8.A, or Campfire’s Vega create a very full sound, despite having a V-shaped signature when their frequency response is plotted on a graph. The advantage is that thicker notes tend to have a more engaging quality due to their size and forwardness: guitars sound powerful, cellos sound full, and male vocals will sound forward and warm. The average size of notes will further depend on dips and peaks throughout the frequency range; for instance, if an iem has either a laid-back mid-bass or upper midrange dip, it might create leaner notes.

However, the average note thickness has an inverse relationship with separation, for the simple reason that thicker notes by nature tend to take up more space in the stage. So, the relationship between the note size and stage dimensions will determine the separation: thicker notes will require a proportionally larger stage to equal the separation of a leaner presentation. We will demonstrate this relationship by sticking with the S-EM9 and Vega. As you can see, the Vega has larger dimensions than S-EM9, which we could score as a 9 vs 8.5 for the S-EM9. However, due to the Vega’s powerful bass and enhanced mid- and upper-bass, it has a thick note recreation which gives it its engaging quality that we previously mentioned. The S-EM9 on the other hand creates leaner instrument tones due the tuning of its frequency response.

This is what I previously wrote in the Vega review:

“…The S-EM9 has a bump in the center midrange, surrounded by a dip in the upper bass as well as upper midrange. This gives it a nice vocal presentation, slightly warm and forward, but leaner and more delicate instrument notes resulting in a cleaner stage. Vega in turn creates thicker notes, and has a more forward instrument presentation due to the enhanced upper bass and upper midrange…

When it comes to separation, both are very good in different ways. Vega has an overall larger stage, with more depth. Because of its stage dimensions, the thicker notes are clearly separated. The S-EM9’s stage might be slightly smaller, the separation is very clean and effortless due to the leaner midrange notes.”

This is what it looks like:


We can see the different layers, and how depth determines this layering ability. It also demonstrates the effect of height for separation. The tones in the back are placed a bit higher, and will be easier to perceive compared to when the stage is flatter. While it might not be as important as width and height, it still contributes to how easy we can see the overall picture. We can further see that while the S-EM9 has leaner midrange notes, it still has a slightly forward vocal presentation with nice density; it is more of a W-shape than a V-shape. In addition, we can see that while the S-EM9’s notes might be slightly leaner due to an upper midrange dip, its vocal presentation has a nice bit of body and forwardness, while Vega’s is a bit more laidback and smaller in size, despite the thicker midrange notes. So the vocal presentation has an independent relationship from the average note thickness; either can be thin or thick.

We can further see that as a result of the thicker notes, the Vega’s notes take up relatively more space in the stage. Furthermore, by nature of their size, larger notes might have a tendency to obscure finer detail if the stage is too cramped, especially its depth. This is not necessarily the case with Vega, as its stage dimensions are proportionate to its thicker notes. However, due to the leaner body of its notes the S-EM9’s separation is actually slightly better (e.g. 9 vs 8.5), or at least more effortless, despite the smaller stage dimensions. Finally, this further demonstrates that almost every tuning choice comes with an advantage and disadvantage, and it is important to accurately mention both sides. Full-bodied or lean, warm or cold, wide or focused – there’s always an up and a down.

Resolution and separation

One of the most important factors contributing to the visual display of music, is resolution. Resolution in audio is almost identical to visual resolution, like comparing a blurry tv with a 4K flatscreen. Higher resolution will create a more realistic and better-defined image. In doing so, it will also contribute to separation. With higher resolution, often tones that were previously heard as one will be better separated. Examples are background singers, violins playing simultaneously, or distinct treble tones. So as a ‘side effect’ of higher definition, a presentation will become more detailed due to better separation.

And this is what the stage might look like seen from above; due to better definition, finer detail in the periphery of the stage might be uncovered (the top part of the picture represents the rear end of the stage, with the vocal in the middle).


In my review series I will be including a 3D image of the stage dimensions and note presentation.

Related posts:
-Introduction to flinkenick’s 2017 flagship shootout: scoring signatures and technical properties
-Audiophile matters: Music lovers vs. audiophile approach
-Audiophile matters: Describing tonality


About Author

Nic is currently in pursuit of a PhD degree in social neuropsychology, while trying not to get too distracted by this hobby. In pursuit of theoretical knowledge by day, and audiophile excellence at night. Luckily for him, both activities are not mutually exclusive which helps to lighten the workload. Always on the go, Nic's enthusiasm for hi-fi is focused on all chains of the portable system: iems, cables and daps.


  1. Matt on

    An excellent article and one I think that is very much needed. I’ve often thought there should be some way to measure soundstage and wondered if it wasn’t the missing ingredient in so many objective tests — the frequency response and other stats only tell so much of the story. But I’ve think this could be measured too and even depicted as you do it. (Note thickness and other matters may be more a relationship of the interaction of these other factors, such as soundstage and FR, and not need separate measurements). I’ll be sure to read this article again sometime and continue following your series… Thanks for writing it!

    • flinkenick on

      Thanks for the kind words Matt, much appreciated. You know, when we listen to equipment we always get this feeling that since there are basic physics involved, it should be able to be measured. I share your sentiment, but I think the reality is that the technology simply isn’t there, and there’s probably not enough economical incentive to develop it. It’s not only aspects as stage, how about a measurement of how a DAP sounds (or even a cable)? The differences in sound between two music players can be quite large, but all we can do right now is measure its power output.. Who knows, maybe in 5 or 10 years.

  2. Vel on

    I simply love reading how you communicate what you hear 🙂

    • flinkenick on

      Thanks Vel, that means a lot to me.

  3. Giorgi on

    I got no idea about any of this, but.. you can definitely add this phrase at the end: “Science, bitch!”

    • flinkenick on


Leave A Reply