Lecturer: Kate Saunders
Department of Econometrics and Business Statistics
Infographics
Infographics are powerful forms of visual communication: Making complex data accessible, memorable, and engaging. They are:
Visual representations of data that combine graphics, text, and numbers.
Go beyond just visualisation
Guide the reader to the key messages
Commonly use a simple, narrative form
Help reduce the cognitive load
And we’ve seen a few examples already
There is a big downward spike in 2020. Adding text here helps the audience understand why.
What is the difference?
They aren’t necessarily different from one another.
The key difference is that infographics usually contain additional text or other graphics, like icons.
Infographics are just a form of data visualisation that use additional narrative elements
Hang on
Isn’t adding extra elements a from of chart junk?
Some of the reasons
To deliver the message quickly
To explain a complex process
Alert the audience to an important part of our figure
Summarise a long report or blog succintly
Create a visual that is easy to share
Plastic recycling can be a boring
This image is engaging and colourful
The text guides the viewer
Notice the use of human perception
Infographics
Infographics use many of the same principles of good data visualisation!
Here is more details on key principles
Here is a more general guide from Monash library on infographics
Some more advanced examples of infographics in R
Guidance on creating graphical summaries for an Article or Report
Your turn
Let’s take some time to look some examples of infographics and their design elements:
Note infographics are common on social media, but less so in technical reports.
Why do you think that is?
Messages
What messages are you trying to convey?
Where do you need to draw your reader’s eye?
What situational context can you provide to improve their understanding?
For example, important events like COVID-19 hurt the economy.
Or, apple announces a new product that drives up the stock price
Text layer
You already know how to create a visualisation.
Now let’s add some extra narratvie layers to our plot.
There are many ways to do this in ggplot
geom_text()geom_label()annotate()geom_point().geom_label wraps the text inside a rectangleAdditional
Required aesthetics:
label: the text you want to displayUseful addition inputs:
nudge_x and nudge_y: shifts the text along the x and y axisThe text can be added using a character string
The text can be also added using a data frame
The text can be also added as a label
most_efficient = mtcars |>
arrange(desc(mpg)) |>
slice(1)
plot <- ggplot(data = mtcars) +
geom_point(aes(x = wt, y = mpg)) +
geom_label(aes(
x = most_efficient$wt, y = most_efficient$mpg,
label = paste(rownames(most_efficient), "is the most efficient car")),
size = 4,
alpha = 0.8,
hjust = - 0.1) +
ggtitle("Lighter cars have better fuel economy") +
xlab("Weight (in lbs)") +
ylab("Miles per Gallon")Note
annotate() function is useful for adding small annotations (such as text labels)Go to this link for more information.
mtcars |>
ggplot(aes(x = wt, y = mpg)) +
annotate("point", x = 2.2, y = 32.45, color = "orange", size = 10) +
geom_point() +
annotate("segment", xend = 2.25, yend = 32.5, x = 3, y = 32.5, color = "orange", arrow = arrow(length = unit(3, "mm")), size = 2.5) +
annotate("rect", xmin = 3.1, xmax = 4, ymin = 31, ymax = 34, fill = "blue") +
annotate("text", x = 3.55, y = 32.5, label = "Here is some text", color = "white", size = 5)stock <- read_csv("data/big-tech-stock-price.csv") |>
mutate(date = ymd(date)) |>
filter(stock_symbol == "AAPL",
year(date) > 2016)
ggplot(stock) +
geom_line(aes(x = date, y = close)) +
geom_vline(xintercept = as.numeric(as.Date("2020-01-01")), linetype = "dashed",
color = "red", alpha = 0.4) +
annotate("segment", x = as_date("2018-01-01"), xend = as_date("2019-12-01"),
y = 140, yend = 100, color = "red") +
annotate("label", x = as.Date("2018-01-01"), y = 150,
label = "COVID-19 Pandemic", color = "white", fill = "red") +
labs(x = "Date", y = "Closing price",
title = "Apple Inc stock price during the pandemic") +
theme_bw() +
theme(
aspect.ratio = 0.5,
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
)Your turn
Copy the code provided.
Run each line to see what it does - check you understand.
Try it yourself: Add a vertical dashed line, label and segment for the other important events.
The Apple spring 2022 event on 2020-03-18
The M1 Chip announcement on 2020-11-10
Question:
Background: Lucy was a math major in college and got top marks on all her exams in probability and statistics. Which do you think is more likely:
That Lucy is a portrait artist? or
That Lucy is a portrait artist who plays poker.
if(!require(waffle))
remotes::install_github("hrbrmstr/waffle")
library(waffle)
people_data = data.frame(
Person = c("Maths", "Math Artists", "Math Artists that Play Poker"),
count = c(20*20 - 10, 7, 3))
ggplot(people_data) +
geom_waffle(aes(fill = Person, values = count),
n_rows = 20, size = 0.5, colour = "white"
) +
scale_fill_colorblind() +
coord_fixed() +
geom_label(x = 30, y = 18,
label = "The number of maths majors \n who are just artists \n is greater than the number \n that are artists and play poker!",
color = "black", size = 4) +
theme_minimal() +
labs(title = "One Million Majors in Maths") +
theme(axis.text = element_blank(),
legend.position = "bottom",
plot.title = element_text(size = 20),
legend.text = element_text(size = 10)) +
xlim(c(0,40))Explaining Frequencies
Presenting the problem as before -
People get the answer wrong 80% of the time.
However, if asked the same question in terms of frequencies this is reversed.
For example: Estimate out of 100 math majors how many are:
Read more about the Conjunctive Fallacy and Linda Problem here.
Important
People often prefer certainty and struggle with probabilistic reasoning, preferring definitive answers instead of ranges or likelihoods.
People commonly overestimate rare events (e.g., plane crashes) and underestimate common ones (e.g., car accidents).
How information is presented (e.g., “10% failure” vs. “90% success”) influences interpretation and decision-making.
Understanding probabilities requires numerical literacy, which varies widely among the public.
Probabilities and uncertainties when given as numbers alone often are abstract.
Note
Like a pie chart you see the data as part of a whole
Like a bar chart you are also able to effectively compare the size of categories
Key difference: Bigger parts are broken down into the number of individual components instead of being shown in a solid colour.
This blogpost is a great resource for more details and examples.
You can also look at the Waffle plots on data-to-viz.
Uncertainty
Waffle plots are great for visualising uncertainty
For example, 1 in 10 chance can be easily shown
This is one of the reasons they are so commonly used in infographics
Waffles with Pictures
You can also make waffle plots using picture icons in R using geom_pictogram
However it requires installing additional icons which is tricky.
This is quite advanced for this unit.
Your turn
Try to recreate my waffle edits
Change the number of rows in the waffle to 10
Change the waffle background colour to “gray90”
Make the waffle size 1
Add a rectangle to show the math majors who are artists
Edit the label text
Visualisation from WWII by statistician Abraham Wald Source
This visualisations shows bullet holes
The pattern of damage shows locations where planes can sustain damage and still return home.
The missing areas show where the plane should be reinforced
Important
Understanding what data is not there and why is very important
Missing data or incomplete data can lead to a wrong conclusion
You should always think consider the missing data
Visualising what is missing is important
For small datasets you can visual missing data using the R package naniar.
Type 1: Missing completely at random (MCAR)
The cause of missingness is unrelated to both the independent variables and the dependent variables.
Example: A students car breaks down and they miss their exam.
Reason: The missingness (the student missing the exam) is due to an unpredictable, unrelated external event (a car breakdown). It is not related to any of the independent variables (like the student’s academic history) or the dependent variable (their potential exam score).
This is the easiest type to deal with: You can ignore the missing values or interpolate them.
Type 2: Missing at random (MAR)
The missingness can be explained by a variable in the dataset.
However, the missingness is not related to the dependent variables.
Example: Students in a group all catch COVID and miss the exam.
Reason: The missingness (students missing the exam) is related to an observed variable (belonging to a specific group). However, it is not directly related to the unobserved variable (their exam scores).
Here we are assuming the groups are not related to academic performance.
Depending on the data this may require more sophisticed techniques to deal with.
Type 3: Missing Not At Random
This missingness should not be ignored
The cause missing data is related to the underlying variables.
Example: Students who fail the assignments are more likely to skip the exam.
Reason: The missingness (students who miss the exam) is directly related to the value of the missing data (their exam scores).
The missing data, the exam scores, is more likely becasue of the failed assignment grades.
Your turn
What type of missingness is each of the following:
In a tobacco study:
Younger participants report their values less often (regardless of how much they smoke).
A survey participant unintentionally skips a question.
Participants who smoke intentionally withhold details about their smoking habits.
Summary
Learnt about infographics
Know to combine text and other other narrative elements to improve the communication of the key messages
Also learnt people aren’t great at understanding probabilities and uncertainty
Uncertainty is challenge for visualisation and communication
Waffle plots are great a communicating chance
Discussed the importance of visualising missing data
ggplot(stock, aes(x = date, y = close)) +
geom_line() +
geom_vline(xintercept = as.numeric(as.Date("2020-01-01")), linetype = 2, color = "red", alpha = 0.4) +
geom_vline(xintercept = as.numeric(as.Date("2020-03-18")), linetype = 2, color = "blue", alpha = 0.4) +
geom_vline(xintercept = as.numeric(as.Date("2020-11-10")), linetype = 2, color = "blue", alpha = 0.4) +
# spring
annotate("rect", xmin = as.Date("2021-02-01"), xmax = as.Date("2022-12-01"), ymin = 20, ymax = 38, fill = "blue") +
annotate("segment", x = as.Date("2020-03-30"), xend = as.Date("2021-03-01"), y = 40, yend = 30, color = "blue") +
annotate("text", x = as.Date("2022-01-01"), y = 30, label = "Apple spring 2020 event", color = "white") +
# m1
annotate("rect", xmin = as.Date("2021-07-01"), xmax = as.Date("2023-01-01"), ymin = 70, ymax = 88, fill = "blue") +
annotate("segment", x = as.Date("2020-11-30"), xend = as.Date("2021-08-01"), y = 90, yend = 80, color = "blue") +
annotate("text", x = as.Date("2022-04-01"), y = 80, label = "M1 announcement", color = "white") +
# covid
annotate("rect", xmin = as.Date("2017-03-01"), xmax = as.Date("2018-11-01"), ymin = 140, ymax = 158, fill = "red") +
annotate("segment", x = as.Date("2018-01-01"), xend = as.Date("2019-12-01"), y = 140, yend = 100, color = "red") +
annotate("text", x = as.Date("2018-01-01"), y = 150, label = "COVID-19 Pandemic", color = "white") +
labs(x = "Date", y = "Closing price USD", title = "Apple Inc stock price during the pandemic") +
theme_bw() +
theme(aspect.ratio = 0.5,
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
)library(waffle)
library(ggthemes)
people_data = data.frame(
Person = c("Maths", "Math Artists", "Math Artists that Play Poker"),
count = c(20*20 - 10, 7, 3))
ggplot(people_data) +
geom_waffle(aes(fill = Person, values = count),
n_rows = 10, size = 1, colour = "gray90"
) +
scale_fill_colorblind() +
coord_fixed() +
geom_rect(
aes(xmin = 39.5, xmax = 40.5),
col = "black", fill = NULL,
ymin = 0.5, ymax = 10.5, alpha = 0
) +
geom_label(x = 50, y = 7,
label = "The maths majors \n who are artists and play poker \n are also just \n maths majors who are artists",
size = 4) +
theme_void() +
labs(title = " All the Maths Majors") + theme(axis.text = element_blank(),
legend.position = "bottom",
plot.title = element_text(size = 20),
legend.text = element_text(size = 10)) +
xlim(c(0,60))Note
(MAR) In a tobacco study, younger participants report their values less often (regardless of how much they smoke).
(MCAR) A survey participant unintentionally skips a question.
(MNAR) In a tobacco study, participants who smoke intentionally withhold details about their smoking habits.
The other missingness examples can be found here
ETX2250/ETF5922