Welcome to the IcyData VanHAC 2017 Talk blog post!
I’m going to go into more detail regarding my talk at the Vancouver Hockey Analytics Conference.
Contents
Required Materials
Here are some materials that you will need:
- Tableau Desktop
- The finished product
- The blank workbook (make sure you click the Download button on this one, and then click ‘Tableau Workbook’ so you can edit it!)
Once you’ve got all those continue on to the next step!
Looking At and Understanding the Finished Visualization
I believe one of the most common visualizations in hockey analytics is the corsi and fenwick of a game. This is the type of visualization we will be building today. Open the finished visualization. The right most tab on the bottom, ‘Dashboard’, contains our goal visualization.
It contains a Corsi and Fenwick graph for a specific game and includes goals scored. There’s also a Corsi Heartbeat viz depicting the frequency of Corsi events.
And finally there’s a scoreboard. There are three different visualizations we are going to create.
Getting Started
Connecting to a Datasource
The first portion of this tutorial is setting up the datasource. You won’t have to do this, but screenshots are included to see how I did it.
Feel free to skim this section and move onto ‘Visualizing Corsi’
The first thing you need to do to connect to a datasource. Tableau connects to a lot of them!
Once you’re connected, and chosen a datasource you’ll be able to begin adding data to the Tableau workbook with the data preparation tool. If you’re using a relational database, Tableau will attempt to resolve the foreign keys between tables.
Data Preparation
I also recommend adding the actual data you want to look at first. For instance, we want to see how many game events are there. So we add the game events table first. Not the games table and then join events to games. You want the highest granularity on the data you want to look at.
You can also inspect what Tableau is doing with the joins in case it does not properly pick up the foreign key restrictions (or there are no foreign keys).
For ease of use you can also rename tables that you add.
Extracts
Because I am connecting directly to a database, I don’t want to continually query the database so I will select ‘Extract’ in the top right.
This will create an internal database within Tableau that the application will use to prevent hitting your database to create the visualizations. This is particularly useful for sharing workbooks for tutorials online because you can share the workbook with the extract.
And lastly, you can add data source level filters to exclude data you don’t find useful. In this case, I only included regular season data from this year.
Visualizing Corsi
There’s a lot of data. To start off, I’m going to add a filter to only select one game. This will make understanding the data much easier.
Once, I’ve limited the sheet to one game (you can choose any ID you want!) we can begin to delve into the data. The ‘Number of Records’ measure is a simple field that always returns 1 for each row in your data set. When Tableau attempts to visualize this data it converts the measure to a calculation: SUM([Number of Records]) — this will return the number of records in my current view.
Drag and drop ‘Number of Records’ onto the rows shelf or double-click it to add it to our visualization. Tableau automatically converts it to aggregate by SUM — letting us know how many events there are in the game (remember we’re using a filter!). As an exercise to the reader remove the filter and see how many events are in our entire data set. (Ctrl+Z to undo your change!)
Under ‘Game Events’ we can add the ‘Event Type’ dimension to our columns shelf. Your visualization should now look like this:
So in the game we can see how many events there were of each event type. Of course, for Corsi, we only care about events with the type ‘BLOCK’, ‘MISS’, ‘SHOT’ or ‘GOAL’ so lets go ahead and add a filter for those events. You can either right-click Event Type and drag and drop it to the filters shelf or right-click the event types in the visualization and then click ‘Exclude’.
So now we can see all our corsi events! Next lets attribute them to a team! We need to do some scratch work for this. So, let’s duplicate our current sheet. Right-Click the sheet tab at the bottom and click duplicate.
Creating a Calculated Field
This data is scraped directly from the NHL.com website. It’s a little rough around the edges, and while the event description does have the team’s acronym (or abbreviation) we need to extract that information. Remove the SUM([Number of Records]) and add [Description] to the rows shelf.
Now we can see the descriptions for each event type. This will allow us to create a calculated field to extract the team name.
Note: because we are excluding events that are not Corsi (or Fenwick) you don’t see events that start without a team name. The calculated field I have created adjusts for this accordingly.
Right-Click somewhere in our data pane to and click ‘Create Calculated Field…’
A dialog will appear allowing us to create our calculated field. Use ‘Event Team’ as our calculated field’s name at the top of the dialog.
Here is a code snippet that I’ll explain.
IF LEFT([Description], 3) == [Acronym] THEN [Acronym] ELSEIF LEFT([Description], 3) == [Acronym (Nhl Teams)] THEN [Acronym (Nhl Teams)] ELSE "UNK" END
You’ll notice a pattern in the event descriptions: the first three characters are always the team acronym (or abbreviation). I have also added data about what teams are involved with each game. You can see those under ‘Home Team’ and ‘Visiting Team’. This calculated field will compare the first three characters to the acronym of each of the teams. If the first three characters don’t reference a team it will return ‘UNK’ for Unknown.
Now drag ‘Event Team’ to the columns shelf. You can compare the values for Event Team to the Descriptions and verify that we’re correctly extracting the team’s name. Now, let us go back to our first sheet!
Drag ‘Event Team’ onto the Color square under the Marks pane. Tableau will automatically add a color legend for the two different teams involved in the game!
Corsi Over Time
Now, we want to see the Corsi over the course of the game. Let’s drag and drop the ‘Period’ measure onto columns shelf and remove the ‘Event Type’.
You’ll notice we’re getting some pretty interesting values for ‘Period’ despite the fact there were only three periods in this game. Much like when we added the ‘Number of Records’ measure it automatically converted it to a SUM. We don’t want this for the ‘Period’. In fact, the Period isn’t really a measure at all. It’s a dimension. So let’s fix that: Right-Click our ‘Period’ measure and click ‘Convert to Dimension’. The ‘Period’ dimension will appear under ‘Game Events’. Replace our ‘Period’ measure (which is indicated by a green pill on the shelf) with our new ‘Period’ dimension (which are indicated by blue pills on the shelf).
We’re getting closer! We want to see a team’s Corsi over time with a bit more precision. So, under ‘Game Events’ we have a Time dimension. Drag that onto our columns shelf.
Now, remember when I mentioned that the ‘Number of Records’ measure just returned 1? That’s what we’re seeing now. For every event in the database (for this game!) we have the value of 1. Why is this important? Tableau is going to display it’s visualization on a row by row basis. So what we’re seeing now is that, for every Corsi event in the database, we have the value of 1.
Transforming Fields
Now, we’re going to see if we can get a better representation of time passing throughout the game. Right-Click our ‘Time’ dimension, click ‘Transform’ and then ‘Split’. Our time field is in this format: ##:## — Tableau will automatically split the minutes from the seconds for us by noticing patterns in the data.
It’s going to create two new dimensions. ‘Time – Split 1’ and ‘Time – Split 2’. Split 1 will be our minutes and Split 2 will be the seconds into the minute. Go ahead and rename those dimensions by Right-Clicking and clicking Rename.
I named them ‘MinutesIntoPeriod’ and ‘SecondsIntoMinute’. Add them to columns shelf.
Make sure you let Tableau know they should be continuous fields. Do the same for the ‘Period’.
So, we’re not there quite yet. We want to see a time scale from start to finish instead of three columns. Let’s create a calculated field that will signify how many seconds into the game we are!
I called this calculated field ‘SecondsIntoGame’
(([Period] - 1) * 60 * 20) + ([MinutesIntoPeriod] * 60) + [SecondsIntoMinute]
Let’s drag our new calculated field onto the columns shelf, and remove the others.
Now, Tableau is really going to show you some of its muscle. Right now, because of our Number of Records field we’re still showing only a value of 1 throughout the game. Right-Click our SUM([Number of Records]) in the rows shelf. Under ‘Quick Table Calculation’ select ‘Running Total’.
And just like that, our Corsi graph (minus some artistic improvements) is done. Let’s help our future reader out by adding some visual cues. Click the ‘Analytics’ tab near the top left and drag a ‘Constant Line’ onto the viz. Make sure you select ‘Seconds into Game’ for our constant line.
Enter a value for our constant line for the end of the first period. (1200 seconds)
Do this for the end of each period.
- 1200 seconds
- 2400 seconds
- 3600 seconds
- End of OT (3900 seconds)
You can also edit the labels of constant lines by Right-Clicking one and clicking ‘Edit’
Visualizing Fenwick
Duplicate your Corsi visualization and modify the Event Type filter to exclude ‘BLOCK’. (The screenshot shows ‘MISS’ but the operation is still the same)
Now we have two visualizations showing Corsi and Fenwick! But what if we want to plot them on the same visualization?
Dual Axis
Let’s create calculated fields for both Corsi and Fenwick.
Name one Corsi Number of Records and one Fenwick Number of Records.
IF [Event Type] == "SHOT" OR [Event Type] == "GOAL" OR [Event Type] == "MISS" OR [Event Type] == "BLOCK" THEN 1 END
And for Fenwick Number of Records:
IF [Event Type] == "SHOT" OR [Event Type] == "GOAL" OR [Event Type] == "BLOCK" THEN 1 END
Now in our original Corsi visualization remove everything from our rows shelf and add our new Corsi and Fenwick fields we created. Make sure you remember to add our Running Total table calculation like we did earlier!
Now Right Click the second measure in the rows shelf and select ‘Dual Axis’.
It’s hard to compare the two metrics if their axes are not using the same scale so now we will synchronize the axes by right clicking on the right axis and selecting ‘Synchronize Axis’.
We can rename our sheet now to something a little more descriptive.
As well as renaming the titles on our Axis to be more informative.
Showing the Score
Datasource Filters
To ensure we continue work on the same game on our next worksheet, we’re going to make the selected game filter apply to all worksheets. Right click the filter and then under the ‘Apply to Worksheets’ sub-menu select ‘All Using this Datasource’.
Create a new sheet. You can rename it to ‘Score’. Add an Event Type filter that only includes ‘GOAL’ and drag the Event Team and Number of Records to Text on the marks card..
Depending on the game you selected, you may only see one team here! This is because a team may have recorded a shutout and so the team that does not have a GOAL event will not be displayed. We still want to show both teams so we need to go about a different way of showing the score.
Much like the Corsi Number of Records we will create a Goal Number of Records with one addition!
IF [Event Type] == "GOAL" THEN 1 ELSE 0 END
This will add a record of 0 for a team that has not scored any goals.
Corsi Heartbeat
We’re going to get into some advanced functions here. Much like our Corsi and Fenwick viz we’ll want to have ‘Seconds Into Game’ on our columns shelf.
We’re also going to need a calculated field to calculate the time between each event. We’ll do this using the LOOKUP function. The LOOKUP function can actually lookup any value along the axis.
So, here’s what our calculated field might look like:
[Seconds into Game] - LOOKUP([Seconds into Game], -1)
The -1, means to lookup the previous value. You’ll also notice you get an error.
This is because the Seconds into Game measure is an aggregate. And we want the specific value on a specific row. We can do that using the ATTR function.
Now, we want to do the same thing with Corsi. We’re also going to create a calculated field to make our lives a little easier. Head back to our Corsi and Fenwick viz and double click on the Corsi Number of Records table calculation in our rows pane. Tableau will actually tell you what it’s doing. Copy and paste that value into a calculated field called Corsi. Then, we’ll use that new calculated field to create another calculated field that will tell us the Corsi between rows.
And finally, create a calculated field to calculate what the Corsi per second is:
IF NOT ISNULL([Time Since Last Event]) THEN FLOAT([Corsi Since Last Event) / [Time Since Last Event]) ELSE 0 END
Creating the Dashboard
Creating dashboards are really easy! Simply click the ‘New Dashboard’ button on the bottom.
Now, on the left you can see vizzes from your workbook you can drag into the dashboard. Try dragging the vizzes into formation.
We can also hide some quick filters we don’t need to see.
Hide some axes we don’t need to see.
And hide some indicators we don’t need to see.
And voila! Our dashboard is designed.
Only we’re missing the ability to change between games. Go back to our original sheet and create a calculated field called ‘Game Name’.
"Game No: " + STR([Number (Nhl Games)]) + " " + [Name (Nhl Teams)] + " @ " + [Name]
Now let’s add it to our sheet.
And add it to our dashboard. (Make sure you apply the filter to all worksheets like we did earlier!)
Extra Credit
Goal Events
In order to add goal events to our viz, we’re going to need to combine Corsi and Fenwick into it’s own measure on the viz so we can dual axis our goal events. Drag Fenwick onto the left axis.
Then, we drag the ‘Measure Names’ dimension onto detail. The detail card will let Tableau know not to display the detail in the viz, while still allowing differentiation between the measures.
Now, let’s dual axis with our ‘Goal Number of Records’
Hmm, not quite what we want. Time for a calculated field. Let’s think about what we want to do: we want some sort of indicator on our existing visualization for a goal and we want it to appear on the Corsi line. Create a calculated field called Goal Event.
IF ATTR([Goal Number of Records]) == 1 THEN RUNNING_SUM(SUM([Corsi Number of Records])) END
This calculated field has some important nuances. When there is a Goal Event we want to value to be equal to the Corsi value. Remember when we added the running sum for Corsi? It was actually just a quick shortcut to adding a function within Tableau that you can use in your calculated fields. This function is RUNNING_SUM. Secondly, if there is not a goal event, the field returns nothing, that is to say, it returns NULL. Dual Axis our new calculated field. But, uh oh! We can’t synchronize the axes because they’re not the same type. Quick fix:
Now let’s add some styling with a symbol and size adjustments.
Tooltips
Our tooltips aren’t super expressive at the moment, so let’s fix that. When you click on the ‘Tooltip’ card for ‘Goal Event’ we can see the fields we’re using. By default, any field used in the specific viz can be used in the tooltip. But we want to add some things that are not in the tooltip.
Change our tooltip:
<Event Team> @ <ATTR(Time)> of Period <ATTR(Period)> <ATTR(Description)>
You’ll notice that Tableau didn’t replace the values appropriately. This is because we need to add the fields we want to the ‘Tooltips’ card.
There. That’s better.
Help!
For questions and comments please feel free to reach out to me on Twitter (@willlang) or Email (william\at\icydata\dot\hockey)