7 min read

What are Data Types when Analyzing Hockey Data?

What are Data Types when Analyzing Hockey Data?

In this Edition

  • What are Common Data Types?
  • How do Data Types Map to Specific Analyses?
  • What are Examples of Different Data Types in Hockey Data?
  • How are those Data Types used in Hockey Analyses?

What are the Common Data Types?

In computer science and programming, a data type is a classification or categorization of data based on its characteristics, values, and operations that can be performed on it. Data types define the representation, storage format, and range of values that a variable or piece of data can hold. Each data type has specific rules and constraints that determine how the data can be manipulated and the operations that can be performed on it. Some common, general data types are string, numeric, Boolean, and decimal.

💡
Computer Science defines abstract (or general) data types. You will find that each programming language or tool may give different terms for these data types depending on the language or tool. 

The following extract of NHL team stats from the 2022-2023 regular season shows five variables: ID, NAME, GAMES_PLAYED, WINS, and LOSSES. All but the NAME variable here are examples of numeric data types with NAME being a string value. A rule that would apply to these data types would be that you can run mathematical operations on the numeric data types, but not the string data type.

Another example is the following Python code snippet, which declares one variable with the string data type called player_name and assigns it the value of "Connor McDavid" and then declares two more variables (goals and assists) with the numeric data type and adds them together to create a third variable (points) of numeric data type. Because the numeric data types are the same data type, you can add them. Conversely, you could not mathematically add the string value of player_name and points because they are different data types. This is another example of a constraint or rule.

# Variable of string data type.
player_name = "Connor McDavid" 

# Variables of numeric data type.
goals = 64
assists = 89

# Operation that adds two values together.
points = goals + assists

# Print function to print variables to the console.
print("Points:", player_name, " - ", points)
💡
Don't worry if you've not programmed before. We will publish more lessons on Python and data analysis for hockey in our L200 and L300 series.

In the context of hockey data, data types play a crucial role in organizing and representing the different types of information collected and analyzed. Data types will dictate the type of analysis you perform on that data. For example, as you've seen above goals are represented as numerical data, so you can perform statistical/mathematical operations on them such as calculating the average goals per team.

The most common data types in data analysis are as follows:

  • Numeric Data: Numeric data types include integers (whole numbers) and floating-point numbers (decimal numbers). Numeric data is commonly used for quantitative analysis, such as calculating averages, performing mathematical operations, and creating visualizations like line charts, bar charts, and scatter plots.
  • Categorical Data: Categorical data represents discrete values that fall into specific categories or groups. Examples include positions (e.g., right-wing, center, etc.), marital status (single, married, divorced), or team names. Categorical data is often used for descriptive analysis, including frequency counts, cross-tabulations, and creating pie charts or stacked column charts to visualize the distribution of categories.
  • Textual Data: Textual data comprises unstructured text information, such as fan reviews, survey responses, or social media posts. Textual data analysis involves techniques like sentiment analysis, topic modeling, or natural language processing (NLP) to extract insights from the text. It can be used to understand customer sentiment, identify key themes, or conduct text-based clustering.
  • Date and Time Data: Date and time data represent specific points or intervals in time. They are used for time series analysis, trend analysis, and creating time-based visualizations like line charts or heat maps. Date and time data can be analyzed to identify patterns, seasonality, or correlations over different time periods.
  • Boolean Data: Boolean data has only two possible values: true or false, yes or no, or 0 or 1. Boolean data types are often used for binary analysis, where the presence or absence of a certain condition or characteristic is analyzed. Boolean data can be used for filtering data, performing logical operations, and conducting conditional analysis.

How do Data Types Map to Specific Analyses?

The choice of data type influences the type of analyses that can be performed.

💡
Assuming no data type conflicts, you can often translate (or re-cast) a variable of one data type into another data type of the need arise. In tools like Microsoft Excel, you can right click a cell, select Format Cells and choose another data type. Most programming languages have built-in functions that enable you to translate from one data type to another.

Below are how certain data types will be used with specific types of analyses.

Numeric Data

  • Summarization: Numeric data types allow for the calculation of summary statistics such as mean, median, standard deviation, or variance.
  • Regression Analysis: Numeric data is commonly used as dependent or independent variables in regression analysis to model relationships between variables.
  • Correlation Analysis: Numeric data enables the calculation of correlation coefficients to measure the strength and direction of relationships between variables.

Categorical Data

  • Frequency Analysis: Categorical data is used to calculate frequencies and proportions, providing insights into the distribution of categories.
  • Cross-Tabulation: Categorical data can be cross-tabulated to analyze relationships and dependencies between different categories.
  • Chi-Square Test: Categorical data is often employed in chi-square tests to determine if there is a significant association between two categorical variables.

Textual Data

  • Sentiment Analysis: Textual data is analyzed to determine sentiment polarity (positive, negative, neutral) using techniques like sentiment analysis or machine learning algorithms.
  • Topic Modeling: Textual data can be processed using topic modeling algorithms to identify key themes or topics within a collection of texts.
  • Text Classification: Textual data is used in text classification tasks, where texts are categorized into predefined classes or labels.

Date and Time Data

  • Time Series Analysis: Date and time data is used to analyze trends, seasonality, and patterns over time using techniques like moving averages, autoregressive integrated moving average (ARIMA), or exponential smoothing.
  • Seasonal Decomposition: Date and time data can be decomposed into trend, seasonal, and residual components to understand different underlying patterns.

Boolean Data

  • Filtering: Boolean data can be used to filter datasets based on specific conditions or criteria, allowing for subsets of data to be analyzed.
  • Logical Operations: Boolean data enables logical operations like AND, OR, or NOT, which are useful for conditional analysis or combining multiple conditions.

It's important to note that data analysis often involves a combination of different data types, as well as using advanced techniques that can handle multiple data types simultaneously (or convert the data from one data type to another – e.g., "Yes" to 1 and "No" to 0). The choice of data type and appropriate analysis techniques depend on the research questions, goals, and characteristics of the dataset at hand.


What are Examples of Different Data Types in Hockey Data?

Below are some examples of different data types you will come across in hockey data.

  • Numeric Data Types: Numeric data types, such as integers (whole numbers) and floating-point numbers (decimal numbers), are commonly used in hockey data. For example, player statistics like goals, assists, points, or shooting percentages are typically represented as numeric and floating-point data types. These data types allow for calculations, aggregations, and comparisons between different players or teams.
  • String Data Types: String data types are used to represent text or alphanumeric information in hockey data. For instance, player names, team names, or venue names are stored as string data types. String data types facilitate operations like string concatenation, searching, and formatting when dealing with textual information.
  • Boolean Data Types: Boolean data types have two possible values: true or false. In hockey data, boolean data types might be used to represent binary information such as the result of a game (win or loss) or the outcome of a penalty (penalty taken or not). Boolean data types enable logical operations and condition checking in data analysis.
  • Date and Time Data Types: Date and time data types are crucial in hockey data analysis for representing game dates, start times, or durations. They allow for temporal calculations, date comparisons, and time-based analyses such as identifying trends over specific time periods.
  • Categorical Data Types: Categorical data types are used to represent data that falls into specific categories or groups. In hockey data, this could include variables like player positions (forward, defenseman, goalie), game outcomes (win, loss, tie), or penalty types (hooking, tripping). Categorical data types facilitate groupings, aggregations, and descriptive analysis based on specific categories.
  • Composite Data Types: Composite data types, such as arrays or structures, can be used in hockey data to group related pieces of information. For example, an array of player statistics or a structure containing player attributes (name, age, position, etc.) can provide a more organized and cohesive representation of player data.

How are these Data Types used in Hockey Analyses?

The various data types mentioned earlier are used in different hockey analyses in many different ways.

  • Player Performance Analysis: Numeric and floating-point data types are utilized to represent player statistics such as goals, assists, points, shooting percentage, or time on ice. These numeric values are used to assess individual player performance, compare players, calculate averages, or identify outliers.
  • Team Analysis: Numeric and floating-point data types enable the calculation of team-level metrics like goals scored, goals against, power-play conversion rate, penalty kill percentage, or save percentage. These metrics provide insights into team performance, strengths, weaknesses, and overall efficiency.
  • Player and Team Identification: String data types are used to store player names, team names, or venue names. These string values are crucial for identification purposes, grouping and filtering players or teams, and creating informative visualizations or reports.
  • Game Outcome Analysis: Boolean data types can represent the outcome of a game, such as win (true or 1) or loss (false or 0). Analyzing game outcomes using Boolean data types allows for calculations of winning percentages, win streaks, or examining performance in different game situations.
  • Time Series Analysis: Date and time data types are essential for analyzing hockey data over specific time periods. They facilitate time series analysis, trend identification, and seasonality assessments. Date and time data are used to study performance over different seasons, track player development, and evaluate team progress.
  • Position Analysis: Categorical data types represent player positions (forward, defenseman, goalie). Analyzing categorical data allows for comparisons between positions, identification of position-specific performance patterns, and understanding positional roles within a team.
  • Penalty Analysis: Categorical data types are used to categorize penalty types (hooking, tripping, slashing, etc.). Analyzing penalty data allows for assessments of penalty frequency, penalty kill effectiveness, or identification of players with high penalty minutes.

These are just a few examples of how different data types are utilized in various hockey analyses. The specific data types used in each analysis depend on the research question, objectives, and the nature of the data being analyzed. By leveraging the appropriate data types, analysts can gain valuable insights into player performance, team dynamics, game outcomes, and various other aspects of hockey.


Subscribe to our newsletter to get the latest and greatest content on all things hockey analytics!