Thursday, December 16, 2010

One of the more intriguing concepts to emerge in sports analysis lately is win probability: the chances of winning the game given where you are in that game. It's not a new concept, but it has experienced a wave of popularity in recent years, mostly in baseball but also in the NFL. Today, we'll see what we can learn by applying win probability to CIS volleyball.

First, some quick background on win probability in general. The idea is to look at the state of the game (which means the score, time remaining if applicable, field position if you're talking about football) and figure out who's more likely to win at that point, and how likely they are to win. This allows for two types of descriptive analysis: not only can we chart the likelihood of a team's win probability as the game goes on (e.g., last year's Super Bowl or a crazy baseball playoff game between the Giants and Braves), but we can also credit individual players for increasing the chances of a win for their team (or "blame" them for decreasing the chances).

The "credit" part of the analysis is related to the common criticism that most stats could be accumulated in close games, blowouts, or anything in between. So if we incorporate win probability into volleyball, say, then a block at a crucial time is worth more. We'll come back to this idea later.

It turns out that win probability analysis is actually rather straightforward in volleyball, compared to other sports. Given any volleyball "game state" (like 0-0 in the first set, 20-0 in the second, or 10-10 in the fourth), there are only two outcomes: either one team wins the point, or the other team does. No touchdowns or field goals; no triples or strikeouts or double plays. So it's quite simple to figure out the odds of winning at each game state (well, simple if you know your conditional probability theory and your Markov chains).

We can do all the calculations and say that, for example, in a game between two evenly-matched teams, a team that is up 13-12 in the fifth set has a 69% chance of winning that match. Say they get a service ace, so they're now up 14-12--which corresponds to an 88% chance. If they then win another point, they've won the game. This, of course, means they have a 100% chance of winning the game.

The eagle-eyed reader will notice in this example that the point that ends the game moved the chances of winning by 12 percentage points (88 to 100), whereas the previous point moved the chances by 19 (69 to 88).

"How can a point that wins the game be less important than a point that doesn't win the game?", you might ask. This makes sense when you think about it: being up 14-12 is "safer" than being up 13-12. Losing a point in the first situation and dropping to 14-13 still puts you in a pretty good position to win on the next play. However, losing a point at 13-12 brings you back to a 13-13 tie, and means you now need two points to win. Essentially, your odds of winning the game change more when you score that 14th point, even if you haven't actually won yet.

**

That's a lot of theoretical talk. Let's see what happens when we apply this to an actual volleyball game.

Take the 2010 Canada West men's final, a five-set win by Alberta over Trinity Western. We have the play-by-play, so we can enter the score after each point is scored and track the win probability at every point during the match (click for bigger version):


Even in a five-set match, the Spartans' chances of winning were never that great; in fact, they were never better than 50% after their first point of the game. (By convention, the game begins with a 50-50 chance of either team winning, even though this is not strictly true if you have a lopsided matchup.) Alberta won the first two sets, as you can see from the gradual increase in win probability. Then they lost the next two, bringing it back to a 50-50 game. The fifth set was more or less all Alberta, as they quickly went up 10-4 and won 15-9.

This is not groundbreaking research; anyone watching that fifth set would tell you that Trinity, once down 10-4, probably didn't have much of a chance, and it's self-evident that going up two sets to none greatly increased Alberta's chances of winning the match. But it does provide a handy overview of how the game progressed, which is rather helpful if you didn't see the game yourself (as I didn't), and it puts a number on what you're observing as you watch. We know whether any given point is valuable, but "how valuable?" is the question we can now answer.

**

That's the first way we can use this analysis to describe what happened in a game. The second way, in which credit (or blame) for each play is assigned to players, allows us to see who was involved in the important plays. Since I'm working with play-by-play logs from several months ago, I have to assume each point is attributable to just one player (whoever's named in the play-by-play), which is not not strictly true* but good enough for now. To that player, we assign the change in win probability for that point (often called Win Probability Added, or WPA).

* Just to clarify: ideally you would divide credit among, and assign blame to, several players on each play. No one player is solely responsible for scoring a point; successful kills are often the result of successful sets, of course. But the game was nine months ago, so I hope you'll accept a simplified version.

Let's look at the first point in the first set of that Canada West final. The point was awarded after an attack error by Jason DeRocco, which moved Alberta's chances of winning from 50% (the beginning of the game) to 48% (down 1-0 in the first set). So DeRocco is "credited" with minus-2 WPA for that play. On the next play, Paul Lindemulder made a service error, and is "credited" with minus-2 WPA (as the game went back from 48% to 50%). Assign the WPA for every play, and add up everyone's totals, and you get something like this:

WPA (2010 Canada West final)
Alberta
+29 Jarmoc
+29 DeRocco, J
+28 Merta
+27 Lidster
+4 Leiske
-2 Alberta TEAM
-15 DeRocco, M

Trinity
+23 Doornebal
+22 Verhoeff
+18 Marshall
+10 Howatson
+0 Ball
-3 Kufske
-5 Offereins
-5 Lindemulder
-9 Schalk

What this means is that four Alberta players were involved in most of the plays that led to Alberta's win, since their contributions were all worth at least 25 points in WPA. This doesn't mean that each of those players is worth a quarter of a win, or even worth more than their teammates, because WPA depends on the situations in which players find themselves. It just means they were, on the whole, involved in plays that were very beneficial to their team.

There are some other interesting results. Mike DeRocco, despite being on the winning team, still cost Alberta 15 in WPA (so for him it's more like "Win Probability Subtracted"). There's never a good time to commit a service error (or four), but his errors came at crucial times: for example, when the Bears were down 21-20 in the third set. We can also confirm that Josh Doornenbal and Rudy Verhoeff performed well in the situations given to them, albeit in a losing cause.

This was also a game to forget for Mikiah Schalk: not only did his Spartans lose in five sets, but Schalk, through his minus-9, was involved in more plays that helped the Bears win than a couple of Bears were.

Also note that Alberta's total adds up to 100%, and TWU's adds to 50%, giving Alberta an edge of +50. This is not a coincidence: Alberta won the game, which means we will have an edge of +50 in their favour, to move from the 50-50 starting point to the final result (once you win, you obviously have a 100% chance of winning!).

Again, though, this isn't a value judgment. It doesn't mean those players are worth a quarter or more of a win in this game, but that they were involved in plays that were worth that much.

**

We can do this sort of analysis for any game that has play-by-play (looking at you, OUA). And in fact, there's one more thing you can do with this analysis: figure out which situations are the most crucial--that is, when you really don't want to make a mistake--and by exactly how much. To do that, we simply figure out the change in win probability between winning and losing the next point.

For example, winning a point when it's 23-23 in the first set is 4.4 times more important than winning the first point in that set, what we'll call a Leverage Index of 4.4. The LI for a 23-23 tie (or 24-24, or 25-25, and so on) in a 1-1 match is 5.8.

And if you're tied at 13 (or above) in the fifth set, winning that point is worth nearly twelve times as much (LI of 11.6). So I think we've handled the "points at crucial times should be worth more" requirement. (And for your enjoyment, here is the Leverage Index for all volleyball game situations.)


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque volutpat volutpat nibh nec posuere. Donec auctor arcut pretium consequat. Contact me 123@abc.com

0 comments:

Post a Comment