Unconventional Wisdom: Pearson Correlation Coefficients part 2

Last time I took a look at the all situation data for the past nine NHL seasons to find out what statistics best correlate with team success. What we learned in that post was that the past three to four seasons have been significantly different than the prior five years. So going forward I decided to only look at the past three seasons, 2021-22 through 2023-24, since that is the amount of time that they have had 32 teams in the league. We are still looking at team data, but now we are breaking it down into even strength, power play, and penalty kill situations. In addition, I have included the all situation goaltending data from the past three seasons to try to see how an individual player impacts the numbers.

One thing I forgot to point out last time that we must keep in mind: correlation does not imply causation. Just because two values are correlated does not tell us for certain that one caused the other, they could both be being influenced by some other factor that we are not analyzing. And even if it were the case, it does not actually tell us which one causes the other, just that both trend in a similar direction at the same time. Also, while we can see which of the expected goals models more closely correlates with actual goals, it does not necessarily mean that stats is more accurate at predicting goals. It could be off by a significant percent, but if it is consistently off then the correlation will align in a close pattern even though it doesn’t tell us who is best at estimating scoring. However, since our purpose of using the data is to get an idea of who is more likely to win, then we don’t need to exactly predict the correct number of goals but rather who is likely to score more of them.


5-on-5 Situations

The chart itself looks more daunting than it actually is, so let’s go over what it shows us. First the core stats that we are using to compare to the rest of the data: P is points in the standings, GF is offensive goals for, GA is defensive goals against, and GF% is the goal differential. Then offensive puck possession: SF is shots on goal for, FF is unblocked shot attempts for, and CF is all shot attempts for. Offensive expected goals is still a puck possession metric, but we have four different sites values to compare: PA is Puckalytics xGF, MPF is MoneyPuck xGF, NSTF is Natural Stat Trick xGF, and EHF is Evolving Hockey xGF. Then we have the same thing for defensive puck possession: SA is shots on goal against, FA is unblocked shot attempts against, and CA is all shot attempts against. The defensive expected goals as well we have PAA for Puckalytics xGA, MPA for MoneyPuck xGA, NSTA for Natural Stat Trick xGA, and EHA for Evolving Hockey xGA. Then puck possession differentials: SF% is shots for percent, FF% is Fenwick for percent, and CF% is Corsi for percent. Lastly the expected goal differential: PA% is Puckalytics xGF%, MP% is MoneyPuck xGF%, NST% is Natural Stat Trick xGF%, and EH% is Evolving Hockey xGF%. The rows are split by season while 3 year data is the 2021-22 through 2023-24 season and 2 year data is 2022-23 and 2023-24.

Core Stats

As expected, goal differential has the strongest correlation with points in the standings. 5-on-5 GF% accounts for about 90% of the team’s points, whoever is outscoring their opponent tends to win the game. What is interesting though is that while goals for is a strong correlation with standings points we see that in the 2 and 3 year data there is a very strong correlation with goals against, particularly because of the 2022-23 season. We see this continue in the 5-on-5 GF% where GA has a stronger correlation than GF. So just like last time, we learn that in recent years defense seems to matter more than offense. Not that offense isn’t important too, its just that we do see a benefit to focusing on preventing your opponent from scoring.

Offensive Puck Possession

Over the past three seasons Fenwick for has been slightly better than the other offensive puck possession metrics at influencing points in the standings as well as goal differential. However, shots on goal for has a stronger correlation with goals for. So what this seems to tell us is that shots more strongly influence goal scoring, which makes sense because a missed shot will never be a goal, but unblocked shot attempts are more useful for determining the ability to outscore opponents and win games as it suggests you are more likely to have the puck in your opponents’ end more often and thus slightly less likely to get scored on. They are close enough though that it doesn’t make much difference, just pick which one you prefer, as long as you stay away from Corsi for.

Offensive Expected Goals

The most effective xGF over the past three seasons has been the one used by Puckalytics, which is also what we saw with the all situation data, it has the strongest correlations amongst points in the standings, goals for percent, goals for, and even some influence on goals against. The rest of them are pretty close though, so ultimately you can pick whichever you prefer. Over 3 years, MoneyPuck is slightly better at determining points whereas Evolving Hockey has the Edge over 2 years for both points and goals for. However, MoneyPuck has a weaker correlation with GF% than the others, perhaps due to their lower correlation with goals against, so Natural Stat Trick comes out ahead over 3 years while Evolving Hockey is better over two years.

On the other hand, if we look at which of the sites is better at predicting goals, we find that Puckalytics tends to underestimate scoring whereas the other usually overestimate it. Over the past 3 seasons Natural Stat Trick was remarkably close, overestimating actual 5-on-5 goals by a mere +0.01%. They weren’t as accurate over a single season, it ranged from -3.3% back in 2021-22 to an average of +1.8% these past two seasons, but over the three year span they were damn near spot on. MoneyPuck was at +1.2% and Evolving Hockey a +1.5% over the prior three seasons while Puckalytics was under by -1.6%. However, in 2021-22 everybody was under by a couple percent, so if we look at the 2-year data the most accurate was Puckalytics at -1.1% then Natural Stat Trick +1.8%, MoneyPuck +2.5%, and Evolving Hockey +3.7%. So it is interesting that despite how well they correlate with scoring that they are that far off at predicting actual goals.

Defensive Puck Possession

One thing we notice is that the defensive puck possession numbers actually have a strong correlation with points in the standings. This tracks with what we noticed about how goals against is more important in recent years, we are in the era of defense and strong goaltending, you can no longer survive by being a defense optional team that relies on outshooting and outscoring your opponents. Over the past 2 to 3 seasons FA was better at predicting your ability to outscore opponents and win games, although SA had the edge in its ability to accurately predict GA. The least useful is CA, except in that is has the strongest correlation with GF, which is telling us that if you face fewer attempts against it suggests you have the puck more often than your opponent and thus are more likely to score. Ultimately you can use shots or Fenwick, whichever you prefer, or even continue to look at both.

Defensive Expected Goals

Over the past 2 to 3 seasons the xGA model used by Evolving Hockey has been the best at predicting team success via points in the standings or goal differential, although Puckalytics is slightly ahead over the 3 year span of xGA vs GF%. However, over 3 seasons it is MoneyPuck who is best able to predict actual GA, but they are less effective than everybody else for telling us who is going to score more and win games, most likely due to the fact that their model doesn’t account for the small influence xGA has on a team’s offensive output, e.g. if your opponent doesn’t have possession it increases your likelihood to score. They are close enough though that it doesn’t make a big difference on which you choose to use, but if given the choice Evolving Hockey and Puckalytics may be your best bet.

Puck Possession Differential

What we see here is that SF% is the preferred metric for determining whether you outscore your opponent, holding its edge with both GF and GA, and determining who wins the match. FF% is close, so of course there is nothing wrong with using unblocked shot attempts rather than shots on goal if you wish to have a large sample size of data points to work with. But we do see that unlike the early days of analytics there is no good reason to use Corsi. Not that there ever was, people misused it all the time back then as well, but now it has even lost its utility as a proxy for puck possession and zone time.

Expected Goal Differential

Over the past 2 years it is Puckalytics whose xGF% is best able to predict points in the standings. However, for the 3 season data it is Evolving Hockey who comes out ahead and also maintains an edge in predicting GF%. When we look at offensive and defense though, Puckalytics does the best at predicting GF while Evolving Hockey is best at GA. MoneyPuck is the worst at GF and Natural Stat Trick is the worst at GA. Overall though they are all still so close together that you can use whichever you prefer, although if given the choice Evolving Hockey and Puckalytics may be best while Natural Stat Trick is worst.


5-on-4 Situations

One thing we notice with the power play is that, unsurprisingly, we are entirely focused on offense and do not care about defense. Goals against as well at the defensive puck possession and expected goals against models have a weak correlation with our PP data. The exception is that PP GA is very strongly correlated with GF%, however as PP GF% is merely a moderate correlation with points in the standings we don’t really care all that much. So all we really care about as far as the PP is concerned is if we can score goals.

PP Offense

We do see that PP goals for has a moderate to strong correlation with points in the standings, roughly 60% over the past 2 to 3 seasons. This is expected though, as we saw first hand what happens to the Pittsburgh Penguins when their power play struggles. Of the offensive puck possession models the strongest correlation was with SF, which if you think about it makes perfect sense. Only a shot on goal can score, if you miss the net or the puck is blocked then it is not going in. I think because of the way the PP works, being lopsided with extended O-zone time, we don’t need the advantage of making a proxy for puck possession but rather are better off focusing on who can get pucks through traffic to get a chance on net.

Over the past 2 to 3 seasons MoneyPuck had the clear advantage in PP xGF with Puckalytics being the next best. This does track with what we saw in the All Situation data when I speculated those two sites focused more heavily on shots on goal and unblocked shot attempts. However, it is close enough that you won’t see a significant difference if you prefer using Natural Stat Trick or Evolving Hockey even though they are clearly worse at predicting PP GF. Although if we compare how good they are at predicting actual PP GF then Evolving Hockey overestimated scoring by +9.9% over 2 years and +6.8% over 3 years. Natural Stat Trick overestimated by +5.2% over 2 years and +2.7% over 3 years. MoneyPuck actually underestimated by -3.0% over 2 years and -4.6% over 3 years. The best though was Puckalytics who underestimated by a mere -0.3% over 2 years and -2.2% over 3 years.


4-on-5 Situations

The penalty kill in general seems to be less important to your place in the standings. Defense is of course most strongly correlated with success, but it is still just a moderate to weak correlation. We do see some benefit from shorthanded scoring though, you aren’t going to outscore your opponents on the PK but you can add some offense which will help you be able to outscore the opposition. Over 3 seasons Puckalytics was best at predicting SH GF at -1.7% whereas for the past 2 years MoneyPuck was a mere +0.2% off of SH GF. Natural Stat Trick and Evolving Hockey were much further off, underestimating it by 4 to 5% over 3 years. So if you want to examine SH scoring then MoneyPuck and Puckalytics are your best bet for xGF and shots on goal for puck possession.

PK Defense

We find that PK goals against has a moderate correlation with points in the standings, averaging about 54% over the past 2 to 3 seasons. If we think about it I can see how this makes sense, as good as your penalty kill unit is they are still out manned by the opponent and at a clear disadvantage, even the best players can’t hold out forever. The most effective of the puck possession models actually appears to be SA. Of course this makes sense as the opponent has an advantage in PK situations, so it is more important to keep pucks from reaching the net than it is to keep the opponent from possessing the puck. The least effective was CA which my speculation is because, contrary to popular opinion, blocking shots actually is good defense, especially on the PK. If you block the puck it did not go into the net, and often means the opponent loses possession and you get a chance to clear the zone.

As for the expected goals models, the clear winner of PK xGA over the prior 2 to 3 seasons was MoneyPuck. Puckalytics was close behind, while Natural Stat Trick and Evolving Hockey trail behind. It does seem to be a common theme going where MoneyPuck and Puckalytics do best at special teams. They are all close enough that you can use whichever you prefer, but they are going to do the best at predicting penalty kill defense. However, unlike what we saw in other situations it seems that on the PK the expected goals models are merely as good or slightly worse than shots on goal against. That suggests that on the man advantage we care less about shot quality and more about getting the puck through traffic. The shot quality of the shooter still matters, but the defenders cannot significantly influence that so for them it is more about stopping the puck from reaching the net.


Goaltenders

One thing different we had to do with goaltenders was make a TOI cutoff to limit sample size outliers. So what we have to work with is 226 goalies who have played over 500 minutes in a single season during the past three years. The core stats we are comparing to are win percent and goals against. The different save percent models are also included in this chart, although since goals against are part of the calculation it will by definition be very strongly correlated.

Goalie Scoring

Goals against is strongly correlated with wins, which in turn is strongly correlated with points in the standings. The ability to limit goals against accounts for about 75% of the goalie’s ability to win games. Sometimes they lose even if they play well because the team in front of them performs poorly, and other times they can get credit for a win even if they have a bad game. But by and large a goalie wins by being able to limit the number of goals that get through to the net. Save percent is strongly correlated with goals against, but only moderately correlated with wins. That makes sense when you think about it, because even the best goalie in the league loses sometimes and even a mediocre backup can pull off a win. There is a very minor difference between the metrics, but the one that factors in unblocked shots against seems to be ever so slightly more useful. Blocked shots are not made by the goalie himself, so perhaps that is why Corsi is lowest, because it is reliant on the team in front of him rather than his own abilities.

Goalie Puck Possession

The goalie himself cannot directly influence puck possession against, although to a very limited extent their positioning can impact missed shots, but not enough to make a statistically significant difference. So what we see here mostly tells us about the team in front of the goaltender. The metric that has the biggest impact on wins is unblocked shot attempts, teams that face fewer FA are just slightly more likely to win. However, it is SA which is most strongly correlated with GA, the fewer pucks that get through to the net the fewer goals that can be scored. The least important is CA, which we can speculate has to do with the fact that blocked shots are actually a good thing defensively and thus those pucks having not gone anywhere near the net are not a factor in determining goals against.

Goalie Expected Goals

As expected goals are a type of puck possession metric it is once again not something the goaltender himself has a direct influence on. To a certain extent his body positioning will influence shot quality as he can limit the angles a shot can be taken from, but it has more to do with the shooter and with the defenders. Puckalytics and Evolving Hockey are most effective at determining the netminder’s ability to win, with Evolving Hockey having an edge over 2 seasons but Puckalytics being better over 3 years. However, when it comes to estimating actual goals against, it is MoneyPuck who is the best.

The one who is best at estimating actual xGA is Puckalytics who was off by a mere +0.03% over 3 years and -0.03% over 2 seasons. The next best is Natural Stat Trick who ranged from -1.3% to +0.5% and MoneyPuck who ranged from +0.7% to 1.9%. The worst though is Evolving Hockey who was off by +2.5% over 3 seasons and +4.9% over 2 years. So once again we see that having the best correlation does not mean being the best as predicting the number of goals scored. With all of this in mind it does suggest that we are better off using Puckalytics for evaluating goaltenders.


Conclusion

We mostly come to the same conclusions when looking at the team data broken down as we did when we looked at All Situations. The majority of what happens seems to be in 5-on-5 situations. This is useful to us as when we are evaluating individual skaters we want to be able to see how they perform at even strength because we know the all situation data skews more to making power play specialists look better as they get more goals for and fewer goals against than those who spend a lot of time on the penalty kill.

We also see that in the past few years the league has shifted so that success has more to do with playing effective defense than it does being able to score. You need to be able to do both, of course, but as we see with the Pens a team whose systems focus on all out offense while playing poorly in their own end tend to struggle. You can no longer get by having a superior offense, you now need to play effectively in the D zone to come out ahead in the standings.

Puck possession is a major factor in scoring, both in getting your own goals as well as keeping the opponents pucks out of your net. However, we see that shot quality matters more than shot quantity, you have to make your chances count. So a team, like the Pens again for example, who focuses on flinging everything they can at the net will not do as well as a team who utilizes effective scoring chances. Shots on goal are most useful for determining goal scoring, but unblocked shot attempts are better for determining who outscores the opponent and who wins more games.

The different expected goals models all have positives and negatives, some are better at special teams, some more effective for offense, and others more effective for defense. But they are all close enough that you can probably just go ahead and use your favourite and not see a big difference in your evaluations. One thing that may account for differences in the xG metrics is that none of the sites can agree on what counts as 5-on-5. They have a different TOI, different shots and attempts, and even differ in the number of goals scored by situation. So this will partly influence the way their proprietary metrics are calculated.

My next step will be to look at the skaters, separated out by forwards and defensemen. That is a bit of a daunting task as three seasons worth of data will be several thousand players and I need to make sure the rows line up to be able to compare the different sites’ xG models. I may only look at the 5-on-5 data because the sample size for special teams will be much smaller. We need to do a TOI cutoff like we did with the goalies, and there will be far fewer skaters who have over 500 PP or PK minutes in a single season. Maybe if I did a 3-year combined data, but neither MoneyPuck not Puckalytics allow me to combine multiple seasons so I would be stuck manually going through summing hundreds of entries. Still, I am curious to see how the skaters differ by position compared to looking at the team as a whole.

Author: TKNoodle

I write about hockey, mostly focused on the Pittsburgh Penguins but will occasionally write about Vegas Golden Knights and Seattle Kraken. NHL, AHL (Wilkes-Barre/Scranton Penguins, Henderson Silver Knights, and Coachella Valley Firebirds), ECHL (Wheeling Nailers), and various prospects from my teams playing in Europe (SHL, Liiga, KHL, etc...), the Canadian Major Juniors (OHL, QMJHL, and WHL), and in the NCAA (and some in lower tier juniors prior to joining the NCAA).

Leave a comment

SinBin.vegas

Praise Be To Foley, Vegas Golden Knights Hockey Website

Skating On The Susquehanna

The Official Blog of the Wilkes-Barre/Scranton Penguins

Chirps from Center Ice

A fan blog about the AHL's Wilkes-Barre / Scranton Penguins

WonderPens Blog

Vegas Golden Knights and Pittsburgh Penguins Hockey Coverage

Design a site like this with WordPress.com
Get started