What Does Regression Mean?

When I was seven, I picked up my orange basketball and began the eight block stroll to the local basketball court. After stretching, adjusting my headband, and testing for wind direction, I approach the three point line. The imaginary game clock starts ticking down in my head. 5… 4… 3… 2… I jump and release my three point attempt. The ball soars through the air rotating towards the basket until, swish, nothing but net. I run off the court in victory. One 3-point shot attempt. One 3-pointer made.

In my excitement, I start going door-to-door to inform the neighborhood that the worlds best 3-point shooter lives in their town. But the townsfolk didn’t take the news with wonderment and awe. No, they slammed the door in my face.

I told every person that I am a 100% three point shooter. I make every three pointer I attempt. Still, they laugh, except for Ms. Corsi, known around these parts as the “Crazy Cat Lady”.

She asks me how big is the sample size? How many three pointers have I attempted in my lifetime that I have made 100% of them?

“Well, just the one shot today”, I reply.

She motions towards the basketball court and says “Go back tomorrow, shoot some more 3-pointers, and then visit me again”.

The next day, I inform the crazy cat lady that I made 4 three pointers, and I attempted 9. This brings my total to 5 made and 10 attempted making myself a 50% three point shooter, which is better than Michael Jordan. I was destined for the NBA.

Again, Ms. Corsi told me to go back to the court the next day and each day after that.

I spent that whole summer shooting three pointers. Each day I would usually miss more than I would make, and my three point percentage continually dropped. I finished making only 100 of my 1000 three point shot attempts. That’s only 10% from three-point land. I went from future NBA star to hopeful towel boy. What happened?

Ms. Corsi informed me that as I shot more and increased my sample size, I regressed toward the mean. Being a 100% or 50% shooter was unsustainable for me.

Now that was a made up story, but it provides a simplified example of regression. Sure, I can make 1 of 1, or 5 of 10 shots. But as I keep shooting more three pointers, I’m going to move towards my actual talent level. I regressed toward the mean.

In statistics, regression toward the mean is defined as the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement. To avoid making incorrect inferences, regression toward the mean must be considered when interpreting data.

My first measurement of 1 shot made on 1 shot attempt was very extreme. When looking at that data, anyone can easily identify shooting 100% as unsustainable.

My second measurement was a total of 5 shot made on 10 shot attempts. Shooting 50% is more reliable, but its still extreme.

Every day I took a measurement, and I slowly regressed to a 10% 3-point shooter. 10% on 1000 shots is a much better representation of my true talent level than 100% on one shot or 50% on 10 shots.

Let’s take a look at some hockey examples.

Artem Anisimov this season has 6 goals on 16 shots on goal. That’s a 37.5 shooting percentage. This indicates he is currently scoring a goal on 37.5% of his shots on goal. Anisimov’s career shooting% is 12.4%. Similar to how my early basketball shooting% was unsustainable, so is Anisimov’s shooting%. Over the season, he will regress towards his 12.4% average.

Richard Panik has 6 goals on 12 shots on goal. That’s a 50% shooting%. His career shooting% is 13.6%. As he adds more games and more shots this season,. he will be unable to sustain his current pace of scoring on every other shot on goal. He will begin regressing to the mean.

But it isn’t just about lucky players coming back down, it’s also works for underperforming players.

Patrick Kane has 2 goals on 29 shots on goal. That’s a  6.9% shooting%. Kane for his career is a 12.4% shooter. Over the course of the season, Kane should score more goals and regress to the mean. In this scenario regression to the mean is an increase in scoring.

Duncan Keith has zero goals on 24 shots on goal. His career shooting% is 4.8%, so one could expect Keith to light the lamp soon.

Lastly, let’s look at NHL teams even strength shooting percentage so far this season. Now that we know about regression, we can take a look at which teams are bound to come back to earth.

Above, you will see all 30 teams ranked by 5v5 shooting%. As a reference, I added three lines. Last season, the New York Rangers had a league best 9.01% shooting%. This is noted with the top black line. The middle black line is the 2015-16 NHL league average. The third line below those two is the NHL worse Toronto Maple Leafs who had a shooting% of 6.36% last season. Using this, we can guess that a team probably won’t finish this season too far outside either of the top or bottom lines.

The Minnesota Wild have been extremely lucky. There is a very slim chance they shoot around 14% for the whole season. The Wild’s true talent level probably isn’t being represented by their current record.

On the unlucky end is the Nashville Predators. They are shooting around 4% which shouldn’t last long. Their current record of 2-5-1 doesn’t represent Nashville’s true talent level. The Predators will regress to the mean and start scoring more soon.

And this is why stat enthusiasts despise early season hot takes just like my neighbors grew angry when I knocked on their door to tell them I was the worlds greatest three point shooter.