Lecturer: Kate Saunders
Department of Econometrics and Business Statistics
Learning Objectives
Understand the importance of data visualisation
Become familiar with Power BI’s visualisation capabilities
Learn how to use Power BI’s visualisation pane
Learn best practices for effective data visualisation
What is Power BI?
It is a common tool used by business analysts.
Does the standard stuff:
What are the key advantages?
There are many options for business analysts
Data visualisation tools (not particular order):
And many more …
We can’t teach them all, so we aim to give you transferable foundations.
Power BI
Power BI is ideal for day-to-day business reporting.
Pros:
Cons:
R
R is great for data scientists and analysts who need in-depth control over visualisations.
Pros:
Cons:
Important
If you master data visualisation in both Power BI and R, you’ll have a diverse skillset.
We exposure you to these tools, so you can see the full spectrum of data visualisation tools used in business analytics.
We know R is harder, but its is important to challenge yourselves!
Industry Applications of Power BI
There are many, many examples!
Tourism Australia Which country has the most flights each week into Melbourne Airport? NZ
Vic Health What was the most common infection in the Local Government area of Monash? Influenza
Recycling Victoria How many tonnes of paper were collected from Monash City Council in 2022? 52% 5860t
Notice these examples show mostly simple visualisations (e.g. map, line plot, bar charts) linked with tables. This is where Power BI excels.
Your turn
Take some time to explore these dashboards
Consider the ease of use and functionality
Form an opinion on whether these are good or bad plots
Try answering my questions from the previous slide
Please see the Moodle for a guide about how to install Power BI on your laptop! (You should have completed this before class in your own time.)
Walkthrough:
Introduce key components of the Power BI interface:
When you are happy with the plot you’ve chosen for your data it is time to polish the plot.
Steps
Steps
Chose an appropriate colour scale
Align your visual elements in your plot for clarity
Check the final plot conveys the message you intend
May like to add additional text or colour to draw the eye to important parts of the plot you want to highlight
Your turn
Import the same data as me!
Try recreating the same plot in Power BI
Visualisation Pane
Helps users to create visualisations
Default visuals include: bar charts, line charts, tables, maps, pie charts, treemaps etc.
Users drag and drop fields into the “Values”, “Axis”, “Legend”, etc., depending on the type of visualisation chosen.
Users can also import a custom visual from a file or the marketplace by clicking on the “…” icon.
R and Python scripts can be integrated to create custom visuals and performing advanced data analytics.
About
This plot works best with variables of the type numeric and categorical.
Simple and easy to understand.
Each level of the categorical variable is represented as a bar.
The length of the bar represents its numeric value.
Ordering bars and providing clear annotation are often necessary.
These plots are commonly used to compare different categories such as sales performance across regions.
Particularly useful when there are limited number of levels for comparison.
Common Mistakes
Don’t get confused with histogram (we cover this later in the course).
Do you have long axis labels? Consider an horizontal version.
Do not overload the plot with too many levels!
Not sorting bars in a meaningful way
Use of 3D effects on charts to make them visually appealing.
Not including data labels, axis labels or a clear legend to explain the chart.
About
This plot works best with variables of the type categorical.
Simple to understand at a glance.
The circle is divided into slices that represent a category’s proportion of the whole.
Often used to show proportions where the sum of the sections equal to one
For example, how different products contribute to total sales.
It is most effective when used with a small number of categories.
Common Mistakes
Use sparingly.
Do not use 3D effects.
Do not use a legend, annotate directly each slice.
Make sure proportions add up to one.
Do not include too many slices.
Do not include slices that are very close in size.
Sometimes labeling the proportions are helpful.
Do not use similar colours or distracting colors for slices.
Displaying the slices in a random or alphabetical order.
Do not use several pie charts one beside each other to compare them.
About
This plot works best with variables of the type categorical.
It is very closely related to pie charts.
Therefore, suffers the same drawbacks as seen before.
It is better to use them sparingly.
Alternatively, we can use bar plots or lollipop plots.
–>
About
This plot works best with variables of the type numeric and categorical that have a nested structure.
Useful to visualise large numbar of categories as the plotting area is used efficiently
Also highly useful for hierarchical data
It displays data as a set of nested rectangles.
Each rectangle represents a category or subcategory within a larger data set.
The area of each rectangle represents a quantitative value, such as sales, revenue, etc., with larger rectangles indicate larger values.
Colours can be used to represent another variable (or dimension) such as a performance metric.
About
In business analytics, these plots are often used to represent the relative proportion of financial metrics such as sales revenue or profit across various categories such as regions, products or departments.
If we have many levels in the hierarchy (>2), it is recommended to build an interactive figure. For example, clicking on a upper level of the structure will reveal the next level and so on.
Treemaps can be cluttered and hard to interpret if there are too many categories or subcategories with very small values.
The area of the rectangle gives visual sense of the magnitude of the proportion. However, it can be difficult to pinpoint the exact value without labels or tooltips.
Common Mistakes
Do not annotate more than three levels of the hierarchy as it makes the plot unreadable.
Prioritize the highest level of the hierarchy as they represent the broadest and most meaningful categories in the data.
About
Maps allow us to visualise geospatial data (it contains coordinate information such as latitude and longitude, which allows features to be drawn on a map).
Once the map is drawn, we can
colour each region (choropleth map)
add points or bubbles (bubble map)
reshape the region (cartogram)
show the connection between several regions (connection map)
Common Mistakes
Selecting the appropriate projection is important.
Indicate your source of information and the projection used.
Using colours without obvious purpose could inadvertently communicate something that is not intended, potentially leading to misinterpretation of the data.
Your turn
Try recrating some of the other plot types
What do you notice about the plots
Which ones work best for the data and understanding the key messages?
About
Basic barplots can be extended to introduce a secondary categorical variable.
The levels in the secondary variable divide each of the levels in the primary categorical variable. e.g. 1. Country of Birth and 2. Census Year
If length of the bar represents the frequency (or count) of the primary variable, the secondary categorical variable divide each bar’s length into sub levels.
We can create three different types of plots:
Stacked barplots
Clustered (grouped) barplots
Percentage stacked barplots
About
Each bar in a standard barplot is divided into a number of sub-bars stacked end-to-end, each one corresponds to a level of the secondary categorical variable.
It allows us to see the composition of the total value for each level
An example is comparing total sales from different regions, but you may like to know what categories makes up those sales.
Use the domain knowledge or context to determine which variable will be the primary categorical variable and which will be the secondary categorical variable.
Good for comparing the total number of residents.
Be Mindful
Ordering of levels for both the primary and secondary categorical variables.
Choose appropriate colours to represent the levels of the secondary categorical variable.
As the bars are stacked, it can become difficult to compare individual segment sizes across bars.
One goal of a stacked barplot is to make relative judgement about the secondary categorical variable (making precise judgements are not as important).
If precise judgement is important, we can use clustered barplots.
About
Bars are grouped by position for levels of the primary categorical variable.
The colours indicate the levels of the secondary categorical variable wihtin each group.
It is used to look at how
the secondary categorical variable changes within each level of the primary categorical variable (within group).
the primary categorical variable changes across levels of the secondary variable (between group).
It is not suitable to compare totals across levels of individual categorical variables.
Better at comparing differences in the number of residents between census years for each country.
Be Mindful
Ordering of levels for both the primary and secondary categorical variables.
Choose appropriate colours to represent the levels of the secondary categorical variable.
If there are too many sub-categories or the categories themselves are too broad, these plots can become cluttered and difficult to interpret.
About
A variation of stacked barplots.
Each primary bar is scaled to have the same length.
It makes each sub-bar a percentage contribution to the whole at each primary level.
It allows us to perform a better analysis of the secondary groups’ relative distributions.
Be Mindful
It can be difficult to interpret small differences in the percentages between segments.
If there are too many sub categories, the plot can become visually cluttered, making it harder to distinguish between the individual segments.
Shows the percentage by of residents from each census.
About
Also known as faceted charts.
It allows us to display multiple visualisations (or “panels”) of the same chart type
Each panel represents a subset of data based on a specific category
Good for comparing patterns across different groups while keeping the visual structure consistent.
Best Practices
The dimension we use to break down the data should have a meaningful number of categories.
(Where appropriate) Ensure that all charts within the small multiples grid use the same axis scales (for both the \(x\)- and \(y\)-axes).
Use chart types that naturally lend themselves to small multiples, such as line charts, barplots, scatter plots, etc.
Keep the colours, labels, and design consistent across all small multiples to make the comparison easier and visually intuitive.
Your turn
Try these different types yourself
Again think about which works best for visualising your data
About
Other commonly used plot types are
Line charts
Area charts
Scatter plots
Bubble charts
Ribbon charts
Waterfall charts
so on
Your turn
Explore more about these different plot types and try them yourself.
Note
Common plots that are not supported by the default visualisation pane:
Boxplots
Histograms
Density plots
Contour plots
Heatmaps
so on
We’ll show you how to create some of these in R next week.
About
Custom visuals are particularly useful when we need a unique chart, graph, or visualization that is not available in the standard Power BI visual library.
We can create custom visuals using R, Python, JavaScript and TypeScript.
To create a custom visual in Power BI, we need the Power BI Visuals Tools (pbiviz), which provide a framework for building, testing, and packaging custom visuals.
After creating the custom visual, you can import it into your Power BI reports.
What we have covered
An overview of visualisation in Power BI
Covered the standard types of visualisations available
Comfortable choosing the right visualisation for the data
Discussed best practices for effective visualisations in Power BI
Material developed by Dr. Kate Saunders with contributions from Dr. Shanika Wickramasuriya
ETX2250/ETF5922