ETX2250/ETF5922

Visualisations in R: Part 2

Lecturer: Kate Saunders

Department of Econometrics and Business Statistics


  • etx2250-etf5922.caulfield-x@monash.edu
  • Lecture 6
  • <a href=“dvac.ss.numbat.space”>dvac.ss.numbat.space


What we’ve covered

We’ve learnt …

  • We can map variables to different geometries (bar charts, line plots, scatter plots etc), and

  • We can map variables to different pre-attentive attributes (colour, shape, size, etc)

  • We also know small multiples can be highly effective to show the data when there is too much information on a single plot

  • Breaking a plot into panels by category can help!

Today’s Lecture

Learning Objectives

  • Grouping
    • group aesthetic mapping
  • Small multiples in R aka. facetting
    • facet_wrap()
    • facet_grid()
  • Combining plots together
    • patchwork Package

Groups

Example Data

library(tidyverse)
energy = read_csv("data/energydata.csv")
weather = read_csv("data/weather.csv")
energy_weather = full_join(energy, weather, by = c("Date", "State"))
head(energy_weather)
# A tibble: 6 × 9
  Date       Day   State Price Demand NetExport MaxTemp WindDir WindSpeed
  <date>     <chr> <chr> <dbl>  <dbl>     <dbl>   <dbl> <chr>       <dbl>
1 2018-07-15 Sun   NSW    51.7  7564.   -1231.     18   NW              2
2 2018-07-16 Mon   NSW    87.9  8966.     -18.6    18.5 W               7
3 2018-07-17 Tue   NSW    62.8  8050.    -643.     22.5 WNW             9
4 2018-07-18 Wed   NSW    54.5  7840.    -742.     20.8 SSW             1
5 2018-07-19 Thu   NSW    64.2  8168.     -40.6    20.8 NNW             1
6 2018-07-20 Fri   NSW    60.9  8254.     318.     15.5 W              13

Let’s plot it

What is the relationship between Temperature and Demand?

ggplot(data = energy_weather,
       aes(x = MaxTemp, y = Demand)) + 
  geom_point() + 
  geom_smooth()

Something weird is going on?

  • geom_smooth() is meant to show smoothed trend line
  • But this isn’t what we would expect of this relationship

Latent Variable

Found it

State variable wasn’t visualised! - Show using col

But - we need a trend for each State

Code for previous plot

ggplot(data = energy_weather,
       aes(x = MaxTemp, y = Demand)) + 
  geom_point(aes(col = State)) + 
  geom_smooth()

Enter groups

ggplot(data = energy_weather, aes(x = MaxTemp, y = Demand)) + 
  geom_point(aes(col = State)) + 
  geom_smooth(aes(group = State))

Even Better

Your turn

Your turn

  • Join the electricity and weather data together

  • Recreate the plot on the previous slide with coloured trend lines

  • Think about which aesthetic mappings go in the top ggplot() call and which in the geom layers

Take a moment

Reflect

  • Does this plot show our data well? Could we do better?

Some notes on geom_smooth()

  • geom_smooth() can imply a trend or relationship that isn’t really there

  • Only use it if smoothing is appropriate for your data!

Facetting

Facetting

Back to Tufte’s Principles

  • Sometimes we cannot display everything on a single plot

  • Small multiples are a way to display information on a plot efficiently

In R

  • facet_wrap: wraps a sequence of panels based on one variable

  • facet_grid: forms a matrix of panels defined by row and column variables.

Let’s see how this works

Groups or facets


Comparisons

  • facet_wrap is useful to look at trends/patterns in each category

  • The group aesthetic is useful for comparing trends/patterns across categories in a single plot

Example use:

energy_weather |> 
  ggplot(aes(x = MaxTemp, y = Demand)) +
  geom_point(alpha = 0.4) + 
  geom_smooth() + 
  facet_wrap(vars(State))

facet_wrap

Catch


Catch

The previous example isn’t a good use of data-density - lot’s of empty space.

We aren’t showing our data well either - trends by state aren’t clear

Fixed or Free Scales


Tip

  • We could let the scales differ, or be “free”

  • But… humans compare things well on a common scale

  • And… we can accidentally create misleading plots if we use different scales

It’s a balancing act!

Free Scales in facet_wrap

This is definitely better

Code

energy_weather |> 
  ggplot(aes(x = MaxTemp, y = Demand)) +
  geom_point(alpha = 0.4) + 
  geom_smooth() + 
  facet_wrap(vars(State), scales = "free") 

Your turn


Caution

  • Filter to the State of Victoria (VIC)

  • Plot the MaxTemp against Demand

  • But wrap by the Day of the week

What I want

Tricky!

Warning

You may find your plot looks different to mine.

  • The facet order is different

  • There is missing data

  • And a different numbers of rows

This is where Generative AI is super useful to help you understand how to change each of these parts.

facet_grid

facet_grid

Sometimes you want to display facets using two categorical variables e.g. State and Day

Notes: Use ~ to indicate a relationship between variables in R e.g. Day ~ State means row ~ column in this context

Example use:

energy_weather |> 
  filter(!is.na(Day)) |>
  ggplot(aes(x = MaxTemp, y = Demand)) +
  geom_point(alpha = 0.25) + 
  geom_smooth() +
  facet_grid(Day ~ State)

Demand by Day and State

Bit Much


But

We expect a difference in demand when people are at home and at work

Let’s try another way to visualise this

Weekday vs Weekend

Let’s make a new variable for weekday or weekend.

Here I’m using if_else.

energy_weather |> 
  filter(!is.na(Day)) |>
  mutate(
    Day_type = if_else(Day %in% c("Sat", "Sun"),
                       "Weekend", "Weekday")
  ) 

Weekday or Weekend Demand by State

Your turn


Caution

  • Run my code to create the new variable for weekday and weekend

  • Plot the MaxTemp against Demand

  • Create a grid that shows Day_type and State

Best Version

Best Version

Why can’t we have both?

Using group and facet_wrap seems to show:

  • the weekday/weekend relationship best, and

  • show the differences between states

Subtle point

Scales = "free" allows for different scales for each category.

This doesn’t mean each plot will have it’s one x and y scale.

Higher Dimensions

Pairs plot

Pairs plot

  • A pairs plot gives an array of plots.

  • This can be implemented using the ggpairs function in the GGally package.

  • You will get different plots depending on the type of variables in your data set

  • Look at the examaples here

Example use:

library(GGally)
ggpairs(energy_weather)

Combining Plots

Patchwork

Patchwork

Sometimes we need plot panels that don’t share a common variable or geometry

For this we can use patchwork

Look at the patchwork documentation for more customisable layouts

Show you some basic ones

Some example plots

tas_data = data = energy_weather |>
         filter(State == "TAS")

temp_vs_demand_plot <- ggplot(tas_data) + 
  geom_point(aes(x = MaxTemp, y = Demand)) 

demand_distrib_plot <- ggplot(tas_data) +
  geom_density(aes(y = Demand)) 

weather_distrib_plot <- ggplot(tas_data) +
  geom_density(aes(x = MaxTemp))

Example - side by side

Example - side by side

library(patchwork)
temp_vs_demand_plot + demand_distrib_plot +
  plot_layout(widths = c(3, 1))

Example - top and bottom

Example - top and bottom

temp_vs_demand_plot / weather_distrib_plot + 
  plot_layout(heights = c(3, 1))

Wrap Up

Summary

What we’ve covered

Today has been all about how create panels of plots

  • We’ve learnt how use the group aesthetic

  • We learnt how to create small multiples using facets

    • facet_wrap()
    • facet_grid()
  • Learnt how to combine multiple plots together with patchwork

Key Message

There are lots of ways to visual data! Consider what comparisons are important.

Solutions

Answers

ggplot(data = energy_weather, 
       aes(x = MaxTemp, y = Demand, col = State, group = State)) + 
  geom_point() + 
  geom_smooth() 

Answers

energy_weather |> 
  filter(State == "VIC") |>
  filter(!is.na(Day)) |>
  mutate(Day = factor(Day, 
                      levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))) |>
  ggplot(aes(x = MaxTemp, y = Demand)) +
  geom_point(alpha = 0.25) + 
  facet_wrap(vars(Day), ncol = 2)

Answers

energy_weather |> 
  mutate(
    Day_type = if_else(Day %in% c("Sat", "Sun"),
                       "Weekend", "Weekday")
  ) |>
  ggplot(aes(x = MaxTemp, y = Demand)) +
  geom_point(alpha = 0.25) + 
  geom_smooth() +
  facet_grid(Day_type ~State, scales = "free")

Answers

energy_weather |> 
  mutate(
    Day_type = if_else(Day %in% c("Sat", "Sun"),
                       "Weekend", "Weekday")
  ) |>
  ggplot(aes(x = MaxTemp, y = Demand, group = Day_type, col = Day_type)) +
  geom_point(alpha = 0.25) + 
  geom_smooth() +
  facet_wrap(vars(State), scales = "free")