I would like to point out five common misconceptions about stats that I have noticed on twitter. This is a mostly general discussion about how some people can come to the wrong conclusion about stats.
1) How can a 4th liner have better numbers than a 1st line player?
4th liner Bart Simpson will have better shot and expected goals numbers than 1st liner Lisa Simpson. Some people will see this and immediately question the stat. “How can anyone take this stat seriously. I mean, bottom-6 Bart is better than top-6 Lisa? This is stupid”. This is one of the bigger misconceptions I notice by the general public. One see’s that a 4th liner has better possession numbers than a first liner and they infer this to mean the numbers says the 4th liner is a better overall player than the first liner. This is false.
If a 4th liner has good possession numbers, it most likely suggests they are performing well in their 4th line role.
If a 1st liner has poor possession numbers, it suggests they are under-performing in their 1st line role.
This does not indicate the 4th liner is a better hockey player than the first liner. Instead, it suggests the 4th liner may be ready for a bigger role. Their good numbers could suggest more minutes or a bump up to the third line. Likewise, a 1st liner with poor numbers may simply require a lighter workload. Maybe less responsibility, or a 2nd line role would fit them better.
These were hypothetical scenarios. The point is, a players possession numbers mostly indicate how they are performing IN THEIR CURRENT ROLE.
2) If a player has better numbers in one stat, it means they are the better overall player.
This is something that mostly occurs when someone already has a biased opinion on a particular player. They like Bart Simpson, and they don’t like Lisa Simpson. When they see a stat tweeted showing Bart with a better number than Lisa, they quote that particular tweet with the words “See, Bart is much better than Lisa”.
The problem is, one stat does not tell the whole story on a player. To get a wider picture, one needs to analyze many stats, possession, shot generation, shot suppression, expected goals, goal production, assist production, even strength, power play. penalty kill, etc.
Say Lisa was the far superior hockey player compared to Bart, I would still be able to find some stat in which Bart was better. One can always cherry pick stats to make an inferior player appear to be the superior one.
3) CONTEXT – noun – The circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood and assessed.
One should always try to add context to stats. How a coach uses the player, who are they playing with, who are they playing against, etc. A quick example would be Teuvo Teravainen last season. There was a two or so month stretch were his offense dried up and his point production stat was low, but why? Well he was put on a line with Andrew Desjardins and Phillip Danault and used in a more defensive role. Put any offensive player on a line with Desjardins and Danault, and their offensive production will more than likely start to dry up. This is an example where one used context to analyze Teravainen’s drop in production.
In the above scenario, we identified that Teravainen’s scoring was down. Utilizing that stat we analyzed his usage and found he was playing with sub par offensive linemates. The conclusion would be to split up that line and put Teravainen back into a more offensive role with the team.
Currently, Patrick Kane and Jonathan Toews have poor possession numbers than their career norms. On the surface one could just make random assumptions based on this. “Their in decline”, yells a rival fan. “Just look at their possession number!”
Okay, but lets add more context to their poor numbers. They have been playing on the same line together, which is unusual. Their numbers together have been horrible. However, they have produced good possession numbers when they play apart from each other. Right now they look bad because Quenneville tried the nuclear option (putting Toews and Kane together) and there was a meltdown.
Not dogging Q, I would have tried the same exact thing. But we know as they play apart from each other, they should continually improve on their current numbers.
Context is key, especially when dealing with small sample sizes.
4) Five goals in their first 7 games, this guy is having a breakout season and is now a top-6 forward.
Sample size. Sample size Sample size. These words are repeated over and over because early season stats are extremely volatile. If I go to a basketball court and take 10 three point shot attempts and drain 5 of them, I look like a pro. A 50% three point shooter, I am good. But as I continually shoot, and add to my sample size, I will regress back to what my true talent level. Sadly, I am not the next Steve Kerr.
This is why one will always see the “beware the small sample size” tagline for early season stats.
Richard Panik had 10 points in his first 10 games this season – *beware the small sample size*.
Richard Panik now has 10 points in 22 games.
A bigger sample size is always better. 22-games of data on Richard Panik is better than 10-games. 100-games is better than 22-games. The bigger the sample size, the less volatile the data.
Skaters, Goalies, and Teams can be streaky. When analyzing the first 10-games of the season, Richard Panik would appear to be a top-6 forward, point per game player. But we have to add context.
- It’s a small sample size.
- His career numbers are 57-points in 203-career games, well below the point per game pace he started the season on.
- At age-25, he probably won’t improve much over his previous output. He has proved to be a quality, bottom-6 forward.
One should never get too excited over a 10-game stretch.
5) Highly touted prospects aren’t allowed to have bad stats or be criticized.
There is a young prospect who has a promising future, but they are currently putting up poor numbers and hurting their NHL team. The general public normally does not like to told the prospect is not playing well.
The best example of this is the Blackhawks Gustav Forsling. He is a 20-year-old defenseman that Q loves. Forsling has been playing a lot and it has been hurting the team so far. Opponents are out-shooting and out-chancing Chicago with Forsling on the ice. He’s the only Chicago defenseman with consistent negative possession stats across the board. Shots, scoring chances, expected goals, they are all poor. Opponents are controlling the game when Forsling is on the ice.
Some people will take the above paragraph as a personal attack on Forsling, but it’s not. He is playing poorly and hurting the team, but he’s only 20-years-old. He could very well develop into a quality top-4 defenseman. But right now, he doesn’t appear to be ready for the NHL. Players like Erik Gustafsson, Trevor Van Riemsdyk, and possibly Ville Pokka would more than likely be outperforming Forsling’s current output. This doesn’t mean that those three players will be the better player long term. It just means they would more than likely help the team win better than Forsling can RIGHT NOW. In the future, obviously Forsling should be a much, much better player than TVR. But right now, even a guy like TVR put up better stats last season than Gustav Forsling is this year.
People tend to not enjoy hearing negative comments made about a prospect they like. They can sometimes jump to conclusions that if a prospect has bad stats that an analytics person is saying they will never be good. This is false. A highly touted prospect may simply not be ready for the current role they are in.
Gustav Forsling’s bad numbers don’t indicate “he’s a bad player”. Instead, it suggests that he’s not ready for his current role in the NHL. This wouldn’t matter too much if Chicago was a rebuilding team that could afford to have under performing, developing players in the NHL. But they are actively trying to win a Stanley Cup. They need to field the most optimal lineup in order to win.
Stats are an extremely valuable asset when analyzing hockey. However, jumping to the wrong conclusion about a certain stat renders it useless.
When you’re sick, you need to go to the doctor to get a proper diagnosis. You’ve probably studied up on the symptoms and already have a good idea what the problem is before visiting the doctor. And then there is your Uncle Jim. Jim hears about your symptoms and suggests leeches are the answer.
Stats are similar. A statistician can give one the proper diagnosis of the data. An informed fan should be able to decipher whats meaningful and what isn’t. And then there is Uncle Jim who sees a stat and jumps to a crazy conclusion. “More Hits” yells Jim.
I wish I was better at analogies, but here we are.
Stats aren’t something one can automatically understand. One needs to read and educate themselves if they want to become a more informed fan. It takes time. It takes seeing a stat more and more and becoming more familiar with it. This isn’t for everyone. Uncle Jim doesn’t like stats, and that’s fine. Advanced stats can take some of the emotion out of the game. By attempting to quantify everything, some of the magic can disappear.
Hopefully, I wrote something today that made those who want to learn become more informed fans.