Team-by-team 2021/22 season preview series: How much to read into it all?
This is the last introductory piece to the whole season preview series, I swear. It’s coming eventually, I promise. I know this whole build-up has been excruciatingly long and exhausting — trust me it feels like it for all of us. But while I mostly just begged you for your support initially, this second piece is actually an ultra-essential guide on what to make of what’s coming, and where are the pluses/minuses of my approach.
I mentioned in the first introductory piece that my sort of a season preview will look to build on your existing knowledge instead of trying to instill it from scratch like some other season previews do. That’s not to say my season previews will be any better, they will just look to provide something different.
That also means my sort of a preview typically ignores all the readily available public data like goals, assists, clean sheets, ie. simply the whole Fortuna:Liga database that’s just a few clicks away for all of us. This is, again, not to dismiss all the mainstream stats completely (though we’ll get to their natural limits in a bit); there’s just no point in banging on about them. Everybody knows — or can know if they care — how many points Sima has delivered, but do they know how often he stretches teams with a cross-field pass? That’s where I come in. You may not find such data interesting or even relevant, but you can’t argue about them bringing something different to the table, can you?
Another reason why I’m choosing to largely ignore these public stats — and this is where I lose some of you — is because they are too often out of one player’s control and thus have a very limited value. During 2020/21 you could see me lament that a goalkeeper had cracked the official Team of the Week shortlist (being a Top 3 goalkeeper that round) despite not making a single save or making just one routine stop when facing a hopeful long-range shot. This would happen on numerous occasions, and the reason why he got in was plain and simple: a clean sheet. Damn clean sheet. Same with assists; can you really call Vakho Chanturishvili “unproductive” when he has produced multiple Grade A chances only to be let down by his teammates? I mean, you can — on some level he’s definitely not prolific — but you shan’t use it as a stick to beat him. Again, this is a case of pure luck that’s out of one player’s control.
Now, I know that results ultimately matter the most (to owners anyway), which in turn makes the player’s end product matter more than anything else to owners and fans alike. But listen, players get hot for a while, they go cold for another while, and typically, during those hot stretches they rack up points while during the cold ones they don’t. You don’t need me to tell you this, obviously; it’s no secret or rocket science. But it’s worth repeating. And to be clear, there’s definitely some value in a player who’s able to navigate those cold stretches and keep them relatively short or productive. Consistent production is no small feat, but again: those players have typically posted strong underlying numbers without stopping, they just briefly ran out of luck.
The challenge, of course, is to come up with an alternative set of metrics that would largely take the Lady Luck out of the equation. Firstly, is it even possible? Short answer: no. Slightly longer answer: it’s certainly worth trying.
Below are a few caveats that apply to metrics I use as much as to any other:
Wary the sample
I’m sure you’ve heard this one before, and I’m sure no one would dispute the fact that a sample size matters and you can’t compare a 30-game mainstay with a guy who got to play a mere 200 minutes in the league, or something. I do the most basic sample size control by drawing the line at 900 minutes (ie. 10 full games) and I have a zero tolerance policy at that , so sincere apologies to Ladislav Kodad or Zdeněk Folprecht who — even including added time, which is my bit of benevolence to those *just below the line* — sadly topped out at 884 and 895 mins respectively and won’t feature. But even that is not enough to clear all doubts. You should still note a difference between 20 and 30 games; a difference between mostly starting and mostly coming off bench to face tired opposition; a vast difference one favourable matchup vs Příbram can make to your attacking numbers as part of a smaller sample, etc.
Perhaps the trickiest group to navigate is, as ever, the goalkeeping crop. While other defensive positions are largely untouched by this, goalkeepers tend to suffer the bigger their sample gets. The more games you start, the more opportunity you have to rack up numbers, of course, but in the case of most goalkeepers, it works the other way — and they end up using those extra starts to fuck up their previously great numbers. Without delving too much of it (and spoiling the actual previews), this was the case of Florin Niță and, to a lesser extent, it’s slashed Ondřej Kolář’s stunning numbers too (whose one crazy matchup with Příbram, in fact, worked too much against him).
It’s perhaps no coincidence that 3 of the 6 top goalkeepers according to my model that uses weighed mix of advanced metrics started 13, 12 and 12 games, whereas the noted workhorse Kolář fares surprisingly poorly and lands in the 27,6 percentile (ie. he only graded out better than 27,6% of custodians with 900+ mins under their belt). Does that mean Kolář is a below par goalkeeper? Obviously not. But does it mean my model is useless for it? I’d argue it also does not, because this is where we get to the second point…
Wary the context
As with any arbitrary model that strives to be objective, mine has a significant drawback too in that it cannot distinguish the nuances and proverbial intangibles. It’s clear that starting in goal for Slavia is a very different job to just about any other goalkeeping gig in the Czech top flight, which shows here as well. While Slavia coaches may value sweeping and distributing qualities above everything else (or at least value more than most coaches out there), I cannot possibly put more emphasis on these skills compared to the traditional goalkeeping skill of… you know, keeping balls out of the goal. Then again, you could well argue Kolář did that brilliantly as well (hence all the clean sheets and consecutive games without conceding), but he saw relatively little action which is where the “sample argument” comes back into play.
It is a minefield when trying to objectively evaluate goalkeepers especially, trust me, which is why my model is just one of numerous tools I myself use. The so-called “eye test” (what you see live with your eyes) remains the most important tool all of us have at our disposal, and this whole piece is not trying to change it one bit; it’s more that different data sets, different analytical models, different metrics may point in the direction of an error in your judgement, or conversely, back up your arguments why the said player is better than anyone else in one particular thing. It’s just all about interpretation because numbers themselves mean a precious little.
So when you’re reading the previews, don’t focus on the overall percentiles too much. Those who come on top should, in theory, have less holes in their game/be more effective in all phases of the game than anyone else on their position which, in practice, actually passes the eye test more often than not. But even when your favourite player doesn’t sit in the comfortable 90+ percentile, it doesn’t mean he’s useless. It might mean he’s a one-trick pony; but a very good one. It quite possibly means he’s not a two-way winger or a very constructive holding midfielder, but that doesn’t take away anything from the fact they remain bloody dangerous/defensively sound options.
Sometimes, the context also equals opportunity. Kolář not facing too many shots (hence not preventing too many goals in the strictest, comparative sense) is one thing; but him not leaving the goal-line too often to claim a cross is another such example. He’s dead last in the league in that respect, which isn’t just caused by his passivity (a legitimate concern at times, mind) but mostly by the fact his teammates simply don’t allow too many balls to enter his territory. Similarly, his passing accuracy on various distances is actually not as shiny as you’d expect given his pedigree, which is simply because he’s on the ball way more often than others. Kolář misplaces the most passes inside his own half per 90 mins, which knocks him down in the eyes of my model but not in the eyes of Slavia coaches who want him to be inventive, adventurous.
So yeah, context matters a lot, and the context sometimes equals the club you play for simply because your surroundings ultimately provide you with the opportunities. It shows on other positions too; a Příbram centre back will always block more shots than a Slavia centre back, and a Sparta striker will always miss more high-danger chances than an Opava striker. Quite naturally.
I already have a few ideas for how to narrow the field, but as of now, for this pilot season, such caveats (and my interpretations below all the pizza charts) will have to do. Don’t believe anything you see straight away and never take any number at its face value. That’s not how any stats, any data works, and my model — however meticulously constructed (and trust me, I’ve gone back and forth on many of these things) — won’t be any different. Not now, not ever.
And the bottom line is: goals, assists and clean sheets are a product of circumstances, opportunity and sample just as much as any other stat, so it’s not like you could lean on them without giving a thought to all of the above.
Or at least you shouldn’t.