Building an on-ball screen dataset using supervised and unsupervised learning

Date

2018-09-01

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Applications of statistics and machine learning in sports analytics have made significant advances since the early days of Bill James' publication, Baseball Analyst. In the publication, analysts and researchers would investigate matters such as how often a batter would reach first base to determine which players were the most optimal to acquire. This type of data that recorded when players reached bases was relatively simple and cheap to acquire. Over time, more complex data became available to different professional sporting leagues. In the National Basketball Association (NBA), cameras were installed in arenas to track player and ball movements. Movement tracking data enabled analysts and researchers to explore locations of players during in-game events, instead of providing insights from box score summaries and play-by-play data. The new tracking data has enabled teams to create insights from focus points that teams find valuable, such as on-ball screens. Important, detailed annotations like on-ball screens are not recorded in standard data sets. In order to capture the time stamps of the different events that are not recorded in standard data, teams have to employ analysts to record these time stamps and provide additional labels to the data. As available performance data expanded, the scientific methods and tools used to analyze the data expanded as well. Recent advances in machine learning have shown different applications of neural network layers to be able to create abstract information from large and complex data sets. With the advance in applications of neural networks, several complex problems in sports analytics have been approached with this new technology. We propose using unsupervised learning methods to create detailed labels of identified on-ball screen instances. To create the initial set of on-ball screen instances and to assist in proposing new instances, we use convolutional neural networks to classify positive and negative screens. Once our model is trained, we develop a framework to propose new annotation times to expand the data set, at the same time minimizing the amount of time for analysts to view game footage when verifying new screens. Using the established set of screens, we use several unsupervised learning methods to provide additional details to the screens identified. The new detailed labels provide additional insight into the screens identified as well establishing a framework to eliminate the costly process of creating multi-labelled instance of on-ball screens using human annotations.

Description

Keywords

Basketball

Citation