Statistics, systems and stories
How more granular data changed how the NBA evaluates players
Most outcomes are best predicted in the context of a system and its incentives
To best understand outcomes, you need to understand the behaviors of actors
Better data helps you better explain behaviors and influence outcomes
In the absence of data, people rely on stories which may not be accurate
The more granular the data, the better the story you can tell about the system
Stories help our understanding, inform our strategy and influence outcomes
Stories can be helpful but are always incomplete
As covered in the last edition of Forestview describing how all narratives are wrong, but some are useful it is a truism that stories, while incredibly powerful, are always incomplete. With the start of the NBA Finals this week between the Golden State Warriors and the Boston Celtics, fans of professional basketball around the globe will be watching, reading, and listening to stories about the series. I’ve followed basketball since I was young and read the sports pages in our local newsletter daily. We did not have cable TV where most of the games were broadcast, so I relied upon the game summary written by a reporter and the box score statistical summary of each game to form an understanding of how my favorite team performed.
For those unfamiliar with a basketball box score, it is a table of statistics that provide insight into how each player performed. You can cumulate the individual player statistics to generate team statistics, and both sets of numbers help coaches, scouts, executives, and fans understand what took place in the game. Of course, the box score is an abstraction of the actual game play, but it is a useful summary. By contrast, an entire description of every action by every player for the full 48 minutes would be a more accurate summary but not easy to consume. There is power and perspective gained by reducing the game to a set of statistics.
Historically, box scores contained what are known as “counting stats” including points (based on the number of 2-point and 3-point baskets each player made), assists (how many passes a player made that directly led to a teammate making a basket), and rebounds (how many missed shot attempts did a player grab). Other statistics included the number of minutes a player was in the game and the number of fouls they committed. Players were generally evaluated based on how much counting statistics they accumulated, and scoring points was considered the most valuable trait. Unfortunately, most of the counting statistics favored offensive play; there was a lack of statistics that adequately captured a player’s defensive abilities in limiting scoring by the other team. For evaluation on defense and in many other areas of the game, spectators needed to rely on the “eye test”: what behaviors did you observe, how would you describe it verbally to someone, and how would you evaluate the effectiveness towards achieving the objective.
Based on the limited collection of statistics, the parts of the game that were challenging to quantify properly usually ended up as narratives. These stories were necessary incomplete but often brought in emotion and ascribed unseen motivations to players: they might be described as “having a will to win”, “the heart of a champion”, “would not be defeated”, etc. Losing teams “lacked heart”, were “chokers” or simply “disappeared”. These narratives feed our need for understanding, but they were at best incomplete and at worst inaccurate. Do players who have competed and succeeded at every level from youth basketball through high school and college to become a professional player in the world’s toughest league really lack “the will to win”? I doubt players on the losing team lift less weighs, run less sprints, or practice less than their counterparts on the winning team.
So why do these narratives exist? They are formed because our understanding of why one team wins and one team loses was incomplete. The reason our understanding was incomplete was because we lacked enough statistics to fully comprehend what took place in the system known as a basketball game. A basketball game has well-established rules and players each have a role on the team that the coach has asked them to perform, generally one designed to leverage their strengths as a player and minimize their weaknesses. Each coach devises strategies to maximize the strengths of their team and exploit the weaknesses of the other team. The formation of each team’s strategy and each team’s ability to execute that strategy is what determines which team wins.
Better numbers lead to better narratives
There has been a lot of talk about the analytics revolution in sports since the publication of the book Moneyball by Michael Lewis and subsequent movie starring Brad Pitt. While baseball was spotlighted in Moneyball, each professional sport has seen the rise of its own analytics community as have many other industries as documented by the book Supercrunchers by Ian Ayres among others. The analytics revolution began in the National Basketball Association (NBA) in 1996-97 with the advent of play-by-play data and shot charts. Previously, games were described by radio and TV play-by-play announcers that called the action but this information was not captured digitally. With the new play-by-play data, a subset of the data that announcers verbally described was captured digitally, including which player was on the floor at any given time in the game, which shots were made and missed along with the location on the court where the attempt was made. In addition to individual player statistics, team and game statistics were also captured such as scoring runs (when a team scored several points in a row consecutively) and the number of ties and lead changes in a game. Importantly, this data was made available via API to services such as ESPN.com who would provide real-time updates of the game action on their website as well as a community of data scientists.
New metrics began to appear such as plus-minus which captured the net number of points that occurred while a player was on the court. For example, a plus-minus of +15 for an individual player meant that while that player was on the court during the game, his team scored 15 more points than the other team. The player may or may not have been directly responsible for scoring some of those points by shooting or assisting on made baskets, but while that player was on the court his team performed better. Plus-minus (and its many subsequent derivations) provided new insights into the effectiveness of individual players and were the first meaningful way of capturing the impact of players on the defensive side of the ball. While imperfect, plus-minus started to capture the impact a player was having that previously was only apparent to a keen observer using the “eye test”. Defense was still difficult to evaluate on an individual player level; after all, each player works within a defensive strategy that is performed by the team simultaneously. However, plus-minus went beyond traditional counting stats such as blocks and steals which individually can be a good play but can also be a sign of risk-taking and breakdowns in the team defensive structure.
Other insights were unlocked based on this new digitally-enabled play-by-play data including which players were efficient at scoring from which locations on the basketball court. No longer was the conversation about a player simply about how many points he accumulated, but how those points were accumulated as well. Slowly, broader insights were recognized and became more impactful, such as shots that were right at the basketball rim such as dunks and layups were converted at a higher percentage than a “mid-range” shot taken at an intermediate distance. Three-point shots became more highly valued because, while they are converted at a lower percentage, the extra point awarded made these shots more efficient than a “long 2” attempted from just inside the 3-point arc. Shot profiles for individual players were assembled showcasing where they had the most successes and failures, influencing both offensive and defensive strategies. Perhaps the Houston Rockets best epitomized this new philosophy by adapting their style to “Morey-ball”, named after their general manager Daryl Morey, a co-founder of the Sloan Sports Analytics Conference and former MIT Sloan graduate and consultant who did not previously play professional basketball.
Granular behavioral data yields detailed insights
The NBA went a step farther in the data collection and analytics world beginning in 2013-14 with the advent of player tracking cameras installed in all arenas. The cameras use computer vision to identify each player and capture granular level data about their behaviors including speed and distance travelled, number of times touching the basketball, number of passes, defensive impact based on being the closest defender to a player with the ball, rebounding opportunities, and much more. While the cameras were not perfect in capturing all details that an observer might want to know (for example, they captured the center-of-mass of a player but not the location of their arms and hands when defending a shot), the tracking data provided a treasure trove of granular data that data scientists could use to develop advanced statistics and predictive models.
The NBA is a $10 billion a year enterprise where players can make up to $50 million annually in salary (and this amount gets higher every year) and even players who earn the minimum salary receive $925,000. Rosters consist of up to 15 players on each of the 30 teams for a total of 450 players. With so much money at stake and teams competing to go far into the playoffs and win a championship, there is a lot riding on player evaluation (and coaches for that matter). Determining the value of a player is essential, so teams have entire analytics departments along with coaches and scouts to help in this objective. But how does a team determine which individual players contribute the most to team success? Win shares is a common metric that attempts to do just that. A full set of metrics, based on predictive algorithms, some of which are public but most of which are proprietary to each team, is used in this process.
The upshot of all of this granular data and advanced algorithms to make sense of it all is that our traditional ways of evaluating players and strategies are evolving, and the stories we create are changing too. Perhaps no player best epitomizes this trend then Russell Westbrook. Westbrook is a 9-time NBA All-Star who won the 2017 Most Valuable Player award as selected by the media, a year in which he was the first player to average double digit totals in points, assists, and rebounds (a “triple-double”) for an entire season since Oscar Robertson in 1962. Yet Westbrook’s selection as MVP that season was controversial and led to a lot of heated conversations in the media and among fans about how a player’s value should be measured - by traditional “counting stats” as Westbrook clearly was or by newer metrics such as wins shares per 48 minutes. In compiling his incredible MVP season, Westbrook was not a particularly efficient shooter and many of the rebounds he collected were classified as “uncontested” by the tracking data. (A contested rebound where both teams are fighting to seek possession of the ball after a missed shot is much more valuable than an uncontested one where the shooting team simply retreats to set up their defense and allows the ball to be captured by the other team.)
In Westbrook’s subsequent years, he again averaged a triple-double in 3 of the next 4 years but he never came close to winning MVP again. This past year, Westbrook joined LeBron James on the Los Angeles Lakers and had perhaps the worst season of his career, continuing to struggle with his shooting accuracy. What is notable about Westbrook’s struggles is that his play has always been characterized as high-energy and having a lot of “heart” - qualities that were still present this season despite his disappointing play.
See the system, then use stats to craft a story
While watching the NBA Finals over the next two weeks or observing any number of phenomena and attempting to understand outcomes based on behaviors, it is critical to first “see the system” and try to understand its unique rules, logic, and incentives as a way to predict outcomes. For example, in the past teenagers were considered bad drivers and males were far more prone to have losses than females. Auto insurers would gather information about a driver’s age, gender, marital status along with counting stats such as the number of prior accidents and moving violations to determine risk and establish a premium. Today, telematics provides detailed information on behaviors such as hard braking, excessive speeding and acceleration, sharp turns and more to provide a much more comprehensive picture of a driver’s true behavior and propensity to have a loss. This provides teenager drivers with a greater ability to demonstrate their relative driving acumen and “earn” a lower premium rather than simply hoping to qualify for a good student discount.
A better understanding of the system through the collection, processing, and refined analysis of granular data can produce more accurate descriptions and understanding of the system and behaviors of the actors within it. In turn, this allows firms to develop better strategies just as NBA teams can when playing in the Finals. Finally, when explaining outcomes we can create better narratives that provide greater insights and understanding than simply “she has the heart of a champion” or “he wanted it more”, even if that story is being written by a robot.
Are there behaviors or phenomena in your business that are hard to understand? Where do you tend to use narratives the most to describe areas that lack quantification? Are there new technologies available to capture more granular system data that you can use to generate new insights and influence outcomes to help achieve your long-term objectives?