ETX2250/ETF5922

The Good, the Bad and the Ugly of Data Visualisations

Lecturer: Kate Saunders

Department of Econometrics and Business Statistics


  • etx2250-etf5922.caulfield-x@monash.edu
  • Lecture 2
  • <a href=“dvac.ss.numbat.space”>dvac.ss.numbat.space


Reminder

Why Data Visualisation?

It’s important to realise that there are different purposes for visualisation.

The two main purposes that we will use are:

  • visualisation for exploration (to understand and identify important features in the data), and

  • visualisation for communication (to communicate your understanding to others).

Today we will focus on visualisation for communication

Today’s lecture

What’s you’ll learn

  • Tufte’s principles of graphical excellence

  • Human perception of visual elements

    • This includes colour scales
  • Learn how to avoid bad plots

  • What makes a misleading visualisations

Principles of graphical excellence

Let’s start by looking at an example

Birth place from the 2021 Australian Census

Birth place Count %
Australia 17,020,422 66.9
Not Stated 1,358,658 5.3
England 927,490 3.6
Other 759,173 3.0
India 673,352 2.6
China 549,618 2.2
New Zealand 530,492 2.1
Philippines 293,892 1.2
Vietnam 257,997 1.0
South Africa 189,207 0.7
Malaysia 165,616 0.7
Italy 163,326 0.6
Sri Lanka 131,904 0.5
Nepal 122,506 0.5
Scotland 118,496 0.5
Korea South 102,092 0.4
United States America 101,309 0.4
Germany 101,255 0.4
Hong Kong 100,148 0.4
Iraq 92,922 0.4
Greece 92,314 0.4
Pakistan 89,633 0.4
Lebanon 87,340 0.3
Indonesia 87,075 0.3
Thailand 83,779 0.3
Ireland 80,927 0.3
Iran 70,899 0.3
Fiji 68,947 0.3
Netherlands 66,481 0.3
Singapore 61,056 0.2
Afghanistan 59,797 0.2
Bangladesh 51,491 0.2
Canada 50,223 0.2
Taiwan 49,511 0.2
Brazil 46,720 0.2
Poland 45,884 0.2
Japan 45,267 0.2
Croatia 43,302 0.2
Egypt 43,213 0.2
North Macedonia 41,786 0.2
Zimbabwe 39,714 0.2
Myanmar 39,171 0.2
Cambodia 39,043 0.2
Turkey 38,568 0.2
France 36,019 0.1
Malta 35,413 0.1
Papua New Guinea 29,984 0.1
Chile 29,860 0.1
Wales 29,250 0.1
Samoa 28,107 0.1
Bosnia Herzegov 26,171 0.1
Mauritius 25,981 0.1
  • Tables are be useful

  • But quickly seeing differences between two large numbers is hard

  • Visualising this data well would make the differences clearer

Birth place from the 2021 Australian Census

Which birth place is the third largest among people in Australia?
(see related article in the Guardian)

Birth place from the 2021 Australian Census

Can you read the labels without tilting your head?

Birth place from the 2021 Australian Census


Even if we fix the labels, something is not working with this plot.


There is a lot of empty wasted space


Hard to tell what’s going on


Not ‘showing’ the data.

Graphical Principle: Data Density

Data Density

Definition: Data-to-density is the ratio of how much information is communicated in the visualisation compared to the size of the whole visualisation

Objective:

  • Use space effectively to communicate ideas

  • The data density should be such that patterns and trends are clear

Why:

  • Complicated graphics are hard to interpret and can be overwhelming

  • Under-dense graphics waste space and might not show enough detail

Example: In the previous example the data to density ratio is too low. Patterns in the data are not clear.

Mathematically



\[\mbox{Data density}=\frac{\mbox{Number of data points}}{\mbox{Area of graphic}}\]

  • Generally want a high data density

  • When the data is small (only a few points) - you may be better using a table.

Top 5 countries of birth outside Australia

Improving the data-density makes for a much better plot!

The bar text shows the percentage of 25,422,788 Aussie residents born in that place.

Graphical Principle: Data to ink ratio

Data to ink ratio

Definition: Is the proportion of “ink” used to show the data compared with the total ink used.

Objective:

  • Reduce non-essential elements that don’t convey data messages (like extra borders, colours, shading, or decorative graphics).
  • Want a clean and focused design.

Why: Want a data-to-ink ratio that promotes readability and clarity.

Example: In the previous example the colour looked cool, but its not adding anything to the data story.

Mathematically



\[\mbox{Data ink ratio}=\frac{\mbox{Ink used to display data}}{\mbox{Ink used in graphic}}\]

  • Generally want a high data to ink ratio

Important points

NO CHART JUNK !!!

Chart junk is adding decorative elements we just don’t need - think heavy gridlines, unnecessary text, extra pictures.

Think back to Lecture 1 - were there any visualisations with really bad chart junk?

Understanding the difference

Data-to-ink ratio is about how efficiently you use ink—avoiding unnecessary decoration so most ink directly represents data.

Data-density is about how much data you show in a given space—how tightly information is packed.

India is No. 3

Color now enhances the data story!

Top birth place in Australia is Australia with 66.9%. Also note 5.3% of Australian residents did not state their birth place. .

Good practice in data visualisation

Tufte’s Principles of Graphical Excellence

  • Show the data
    • Maximise data ink
  • Avoid distorting what the data have to say
    • Do not create misleading visualisations
  • Present your data in a small space; efficiently with clarity.
    • Optimise data density
  • Make large data sets coherent
    • Use small multiples
  • Encourage the eye to compare different pieces of data
    • Group your data appropriately

See the ‘The Visual Display of Quantitative Information’ by Edward Tufte.

Human Perception

Human Perception

What is human perception?

  • Human perception is the way our brains and sensory systems (especially vision) receive, process, and interpret information from the world.

  • It’s relevant in data visualisation as it relates to how people see and understand patterns, shapes, colours, positions, and sizes on a graph or chart.

Why it is important for us?

  • Data visualisations are only useful if people can quickly and accurately interpret them.

  • If a graph conflicts with how human perception works, it can confuse, mislead, or overwhelm the audience.

  • Good visualisation uses human perception!

Human Perception

Key elements of perception

These include:

  • Pre-attentive processing
  • Visual hierarchy
  • Encoding efficiency
  • Gestalt principles
  • Color perception
  • Pattern and trend recognition
  • Managing cognitive load
  • Cultural and contextual factors

Pre-attentive Processing

What is it?

It’s the stage of visual perception where the human brain detects certain visual properties almost instantly (within approx 200-250 milliseconds), before conscious attention is engaged.

Example

If I give you the following sequence, how quickly you can count the number of 7s

35617482457135176492764527256772

What about now?

35617482457135176492764527256772

By using pre-attentive attributes we can help our audience to see what we want them to see before they even know they are seeing it!

Lots of ways to catch the eye

If used sparingly, preattentive attributes can be very useful

More examples

There is no pre-attentive attribute here.

I have used colourto get your attention.

I have used size to get your attention.

I have used enclosure to get your attention.

I have used weight to get your attention.

I have used italic to get your attention.

I have used space position space to get your attention.

I have used underline to get your attention.

What do you think?



Break-out Discussion

Discuss in your group which of these pre-attentive attributes are the most eye catching!

Decide your top 3.

Visual Hierachy

What do we notice first?

  • Pre-attentive attributes grab our attention.

  • Some attributes draw your eyes with greater or weaker force than others.

  • NOTICE THAT LARGER or brighter elements are seen as more important

  • So beyond drawing our audience’s attention to where we want them to focus, we can use pre-attentive attributes to create visual hierarchy.

  • Combining multiple pre-attentive attributes together to make our visuals scannable, emphasising some components and de-emphasising others.

Encoding Efficiency

Judging differences

Our brain is better at more accurately telling difference between some visual elements than others.

  • Most accurate encodings: Position, length (e.g., bar charts).

  • Less precise encodings: Area, color hue (e.g., pie charts).

Gestalt Principles

URL

Gestalt Principles

Gestalt Principle

  • “Gestalt” is German for form or shape.

  • A set of laws to address the natural compulsion to find order in disorder by perceiving a series of individual elements as a whole.

Gestalt Principles

Different Principles

  • Proximity — things that are close together are perceived as belonging together.

  • Similarity — similar shapes, colours, or sizes are seen as part of the same group.

  • Enclosure — objects within a boundary are grouped together.

  • Closure — we mentally fill gaps to see complete shapes.

  • Continuity — we follow smooth, continuous lines or patterns.

  • Connection — elements connected by lines or arrows are seen as related.

  • Figure–ground — we separate objects (figure) from the background (ground).

Where do you see relationships?

Break-out Discussion

Where do you perceive objects to be grouped in this image?

Healy, Kieran. Data visualization: a practical introduction. Princeton University Press, 2024.

Let’s look at an example

Data in the News

India has overtaken China and New Zealand to become the third largest country of birth for Australian residents, 2021 census data has found.

– The Guardian

Birth place Count % Census Year
England 907,570 3.9 2016
New Zealand 518,466 2.2 2016
China 509,555 2.2 2016
India 455,389 1.9 2016
Philippines 232,386 1.0 2016
England 927,490 3.6 2021
India 673,352 2.6 2021
China 549,618 2.2 2021
New Zealand 530,492 2.1 2021
Philippines 293,892 1.2 2021

Aussie residents 2016 compared with 2021

India has overtaken China and New Zealand to become the third largest country of birth for Australian residents.

Does this show that India overtook China and New Zealand?

Let’s use the Gestalt Principles

Proximity

  • By placing elements closer together, it makes it easier to make comparisons.

Continuity and connection

  • We can easily track patterns following a continuous path
  • Connected items appear related

Changes in Aussie Residents

Better!

Clear that India has overtaken China and New Zealand to become the third largest country of birth for Australian residents.

But, should we show percentage instead of counts?

Changes in Aussie Residents

  • Note whether the trend is up or down changes.

  • So should you use percentage or total number?

  • Depends, e.g. For housing availability, total numbers of new residents are more important

Changes in Aussie Residents

Even Better!

By adding the names to the plot directly we can:

  1. make clearer what line belongs to which country, and

  2. increasing the data-density ratio.

We can also change the theme to reduce ‘chart junk’.

One more example

Data story

Census 2021 shows far more women born in Phillipines and China migrate to Australia than men born in their respective countries, whilst more men born in India migrate to Australia than women born in India.

Graphical Principle: Similarity

Law of Similarity

  • When objects share similar attributes, they are perceived as being part of the same group.

Notice that the countries are colored by their continent (Europe, Asia, and Oceania).

Graphical Principle: Enclosure

Law of Enclosure

Objects collected within a boundary-like structure are perceived as a group.

You may also like to look revisit this example of exit polls from Lecture 1.

Colour

Color Perception

Colour effectivness

  • Contrast Sensitivity: Luminescence (brightness /transparency) differences are more noticeable than colour differences

  • Colour scales: Different colour scales are better at representing different types of data

  • Colourblind Accessibility: Need to choose colour scales that are accessible (colour vision deficiencies in ~8% of men).

Qualitative palettes

  • Designed for a categorical variable with no particular ordering
colorspace::hcl_palettes("Qualitative", plot = TRUE, n = 7)

Sequential palettes

Designed for ordered categorical variables, or numbers going from low to high (or vice-versa).

colorspace::hcl_palettes("Sequential", plot = TRUE, n = 7)

Diverging palettes

Designed for ordered categorical variables, or number going from low to high (or vice-versa) with a neutral value in between

colorspace::hcl_palettes("Diverging", plot = TRUE, n = 7)

Colourblindness

Colourblindness affects roughly 1 in 8 men.

Check your color choices using the colorblindr package or otherwise.

More on Human Perception

Other aspects

Perception of Patterns and Trends

The brain is good at identifying linear relationships.

  • Anomalies and groupings are quickly noticed.

Managing Cognitive Load

Overly complex visuals overwhelm viewers.

  • Remove unnecessary elements (e.g., gridlines, 3D effects).

  • Focus on clarity and minimalism.

Other aspects ctd.

Cultural and Contextual Factors

  • Be aware of symbolism

    • e.g. I’ve had to tell my mum not to send certain emojis
  • Colours and shapes can have different meanings across cultures

    • e.g. Red can mean good luck and prosperity in China
  • Tailor visualisations to your audience’s cultural context.

    • e.g. Visualising election data, use the party colours
  • Be mindful of universal symbols and interpretations

    • e.g. red = hot, blue = cold, or

    • e.g. red = danger or hazard, orange/yellow = warning or watch out, green = safe

Human Perception Take Aways

Human perception impacts how we interpret visualisations.

Key takeaways:

  • Use pre-attentive attributes effectively.

  • Leverage Gestalt principles and visual hierarchy.

  • Prioritise accurate encoding methods.

  • Design for accessibility and simplicity.

  • Consider cultural and contextual nuances.

Bad Plots

Bad plots

Bad plots

What makes plots bad can be broadly put into three categories:

  • Taste (Aesthetic)
  • Perception
  • Data

Errors of perception

Note

  • Data visualisation is all about mapping data in such a way that the viewer can understand what’s going on.

  • The way the data is displayed therefore should not be done in an inaccurate or misleading way.

  • The following plots provide some examples of what can go wrong.

Beware 3D

Beware 3D

Beware 3D

Warning

  • Difficult to line up the heights of bars with the actual values.

  • Closer green bar (MSN) looks bigger.

  • On the pie chart rendering in 3D makes the blue segment (Google) look the biggest.

  • Do not use three dimensions when two will work well.

Road miles (from Tufte)

Effects

Warning

  • The data says that mileage rose from 18 to 27.5 which is a 53% increase.

  • The line on the graph increases from 0.6 inches to 5.3 inches which is a 783% increase!

  • The lie factor is \(783/53 \approx 14\)

  • The size of the effect visually displayed is not the same as the effect size in the data

Here is another example where the data size of visual elements does not relate to the effect size in the data.

Lie factor

The lie factor is given by

\[\mbox{Lie factor}=\frac{\mbox{Size of effect in graph}}{\mbox{Size of effect in data}}\]

  • Ideally, the lie factor should be 1.

  • Tufte recommended \(0.95 < \text{lie factor} < 1.05\).

Bad data

Bad data

Note

  • Sometimes there is nothing wrong with the plot but with the data.

  • On the following slide is a plot comparing the cost of going to college in the US against the salaries of college graduates.

  • Can you find problems with this graph?

College cost

Problems

Warning

  • There is nothing incorrect about this graph.

  • However the message is misleading.

  • The income is a yearly income while the cost of college is over four years (and only paid once).

  • Also it does not show the income of people who are not college graduates.

  • Think carefully about comparisons on a plot.

  • Make sure your conclusions align with what is in the plot.

Misleading plots

Stock prices

From this graph we conclude that Twitter stock prices increased dramatically on April 26.

A longer term view

Not that dramatic anymore. This is an example of using a misleading x-axis.

The y-axis

Note

  • Watch this video.

  • Are we interested in the size of the variable rather than changes in the variable?

  • Is zero a reasonable value for the variable to take?

  • Are we using a bar chart?

  • Answering yes to these questions means we should give more consideration to including zero on the y-axis.

Summary

Summary

What we’ve covered

  • We’ve seen that data visualisation can be good and bad.

  • Understood the principles of graphical excellence and their application

  • Learnt how to use human perception in our visual designs

  • Learnt how to use color to improve clarity and accessibility.

  • Can identify common pitfalls in bad plots

  • Recognise how visuals can misrepresent data and are going to avoid this!