#27: Providing the right aggregation level
If data cleaning is the most time-consuming part of the data analyst’s job, understanding the chart’s purpose is the most important one. Before we design the visuals, we must know what question the chart should answer and why it is relevant. We need to know ‘What’ and ‘Why’ to choose ‘How’ — which is selecting the most efficient chart for the job. Easy as it sounds, it’s way more complicated than one might think.
I’ll use the Eurostat chart related to EU trade in goods with Russia as an example. At first glance, the graph seems fine — the line chart shows change over time (which is correct), the design is neat (also desirable), and the layout supports the analysis. But somehow, after looking at the chart, we are left with — ‘wait, what?’. After closer examination, we notice that the charts don’t optimally present data. Showing data on two different aggregation levels is like comparing apples to oranges. Or I should say — apples to apple pies. Technically we can do this but with much additional and unnecessary work. Let’s see how the small formatting changes will make data coherent and comparison effortless.
Elements that work in this chart
Graph layout
The ultimate advantage of this graph is the layout. Placing one chart above the other enhances data analysis capabilities. We can independently compare the change over time for both dimensions or analyze them simultaneously.
Different perspectives
Another advantage is including both the percentage share and trade balance value. This enriches the analysis by providing a different angle. Thanks to the layout, we can analyze two dimensions at once, comparing percentage change with the value change.
Elements that don’t work in this chart
Merging two scales
This graph is an extreme variant of a dual axis where two axes are… merged despite having different units. The top part of the line is in percent, whereas the bottom is in billions of Euros. Such a solution is not only misleading but also unsustainable. In cases when export exceeds import, and the balance is positive, the bar would appear on the other side of the scale, which currently holds the different units. The alternative can be switching to the same unit and showing the import and export as the values rather than share. If implemented, it would look similar to the example published by The Federal Statistical Office of Germany (Destatis).
Comparing apples to oranges
There is also another issue, harder to notice but impactful. The top chart compares the actual import and export shares over time. So we know what the shares were at any given point, but we must estimate the difference between them visually. Because we compare the distance between two changing lines, there is no common baseline, which makes the task challenging. Even though the bottom part of the chart is already aggregated (shown as a difference in the export and import value) without complex visual assessment, we are comparing apples to… apple pies.
Step-by-step improvements
Remove dual axis
The first step should be separating the two charts. The easiest way is by changing the location of the x-axis. Putting it between the charts splits them visually while emphasizing the common dimension. The new placement makes it easier to analyze two charts independently because the scale is proximate to both data points.
Adjust chart type
Even though using a line chart to show import and export is technically not a mistake, there is a more insightful way to present this data. U.S. Energy Information Administration provides an interesting alternative that was my inspiration. Instead of using two separate lines, we can switch to bars placed on both sides of the axis — export above and import below the zero line. This change makes the logic behind the balance calculation becomes self-explanatory — the negative balance means that more was imported than exported, which correlates with the chart above. Lastly, reducing the gap between the bars makes scanning easier. And using the same width simplifies the layout.
Calming down colors
Switching from a line chart to a bar chart and widening bars increase the data-ink and make it more prominent. We can calm the layout by reducing the color intensity — using the less saturated variant of pink and removing the color coding for import and export. The latter we already encoded with the position above or below the zero line. Therefore we can use neutral grey with slightly different brightness to make the zero line division more prominent. An additional benefit of such a solution is the possibility of adding information about the net trade. Choosing two light shades of grey will provide enough contrast with the added line.
Working on formatting
The cherry on the cake is removing horizontally oriented axes’ labels by incorporating them into the chart. This cleans up the layout and makes reading easier as we no longer have to twist our necks. And lastly, we can create a visual hierarchy by making the scale less prominent (using lighter shade), emphasizing the zero lines, and separating two charts even further by coloring the x-axis.