Numbers game: How stats could save us
Putting an exclamation point on an embarrassing run of poor form, Australia’s ineptitude in the second Test against South Africa raised enough questions to fill Bellerive Oval. Unfortunately, national selectors don’t have enough answers to fill out a decent starting XI.
Five consecutive Test losses since they took on Sri Lanka in Kandy haven’t made for pretty reading, with the Aussies now nine months removed from their most recent victory in the long form of the game.
But the Hobart horror show was a new low. Twenty Australian wickets fell for just 246 runs (yes, an average of just 12.3), while the bowlers allowed 326 in their one innings, on the back of 242 and 8/540(dec) in Perth.
When the chips are down, few positions in our sporting ranks feel the wrath of armchair critics and keyboard warriors quite like Cricket Australia’s chairman of selectors.
On the surface, it seems a pretty easy gig. But when the task is to assess the entire nation’s cricket talent and narrow it down to just 11 blokes (and then hope they perform), you have to wonder how badly those picking the side are impeded by a lack of useful data.
Behind the numbers
One of the fun things about having an interest in multiple sports is being able to see how one could improve another.
Not in the way it was said AFL could be better if it adopted a basketball-style shot clock, of course. But wouldn’t it be nice to see cricket follow in the footsteps of the sports that have successfully embraced analytics (and no, I don’t mean with woeful ideas such as the Duckworth-Lewis method).
While there is surely more number-crunching going on behind closed doors than the casual fan is privy to – at least, I’d like to think there is – cricket stats, by and large, are archaic. Traditionalists still rule the sport. (Or, in the case of implementing DRS or censoring media coverage, India still rules the sport.)
We have the technology and the capacity to scrutinise aspects of the game like never before, yet we still vastly oversimplify our analysis by continually spitting out the same old superficial numbers.
Selectors often seem swayed by small sample sizes, commentators spend an infuriating amount of time living in the past, and viewers at home are forced to settle for the sorts of bar graphs, tables and pie charts you’d expect to see in high-school Powerpoint presentations.
In justifying the most common stats, we are all too willing to forget that the opener who “rescued the innings with a brave century” was dropped on 14 and again in the 30s. Meanwhile, the teammate who was sent packing on 25 after a ripping delivery “again failed to convert a start into a big score”.
With each wicket so precious, and often only a centimetre or two deciding one’s fate, why not try to find numbers that help reflect the element of chance?
Surely it wouldn’t take much effort to keep track of, say, a play-and-miss percentage (let’s call it PAM%), which could be calculated using a pretty basic equation: (number of shots offered in which bat missed ball) ÷ (number of balls faced).
This formula could be used for everyone, offering a sense of whether batsmen were particularly lucky or unlucky (perhaps they really did lose their wicket on “the only bad shot they played all day”), and whether or not bowlers had been justly rewarded for effort.
Such a simple stat could help explain career arcs ("his PAM% has gone through the roof since he turned 32") or anticipate regression to the norm. Players who have a low PAM% but have been losing their wicket early might be closer to putting together a few decent innings than you'd expect from simply looking at their recent average. And vice-versa, obviously.
Not all runs are created equal
In cricket, most of the numbers on which opinions are formed lack context. They tell only a small part of the overall narrative.
By traditional statistical measures, ODI and Twenty20 players are punished if they throw away their wickets trying to chase down a score in the late overs. Play it safe, don’t go out, and you increase your average; mishit one trying to make up for your teammates’ mistakes, and your average goes down.
Bowlers raise the ball for a five-wicket haul – whether their victims were five recognised batsmen or a string of tail-enders doesn’t matter in the record books. And we still judge them on economy and strike rate – with misfields and overthrows for some reason counting against them, penalising those surrounded by poor fielders more than others.
It speaks volumes that we still rank the best bowling performances purely on number of scalps. Stuart Broad ripped through the Australians at Trent Bridge last year, claiming figures of 8-15 in just 9.3 overs, yet conventional wisdom says Rangana Herath, who took 9-127 against Pakistan a year earlier, has the better tally. I know which I’d prefer.
As for batsmen, Steve Smith scored an unbeaten 48 in Australia’s recent first-innings 85-run meltdown against the Proteas, in which debutant Joe Mennie (10) was his only teammate to reach double figures. I’d be willing to wager Smith is more proud of that gritty personal display than at least a couple of slightly higher scores he has under his belt, despite the fact he failed to record a half-century.
Let's put it in context
It's no secret that in some innings a 50 seems more valuable than a 60, but wouldn't it be nice to have numbers that helped measure context. Let's try one I'm calling Relative Run Production (RRP).
Using the equation for RRP, we could plug in the numbers for any given sample size to gauge how a player has performed relative to those around him. Here's the equation:
(player runs ÷ player dismissals) ÷ [(total runs – player runs – extras) ÷ (total dismissals – player dismissals)]
That's essentially just a complicated way of dividing a batsman's average over a certain number of innings by the combined average of another group who played in the same games.
If the result is more than 1.0, the player is outperforming his peers. Less than 1.0 and he's underperforming. Simple enough, right?
Suddenly, using a player’s RRP, a lower score in some parts of the world might start comparing favourably to a slightly higher one in other places where runs are easier to come by. This data wouldn't limit itself to just one innings or match – it could be expanded to cover any sample size, allowing us to compare guys playing on the same pitch as easily as if they were on different continents, or from different eras. As always, the larger the sample size, the more accurate the reading.
Remember, most stats tell only part of the story – and RRP is clearly far from an exact science. Perhaps the numbers would serve to support an idea that has already been established. But maybe they'd offer a different way of thinking, or provide a launching pad for further analysis.
Looking for an edge
For those tasked with giving Australia the best chance at cricket success, a little extra information could go a long way. It goes without saying that any potential edge should be explored when competing at an elite level.
Isn't it about time we started thinking outside the box and using numbers to our advantage? The analytical possibilities are endless – with hundreds of deliveries a day and play constantly stopping and starting, cricket is precisely the kind of sport that should be making use of big data. But is CA open-minded enough to dedicate time and resources to the cause?
In horse racing, they grade the tracks to offer insight for performance – could we eventually see cricket pitches declared a “firm two” or a “heavy nine” to help us compare venues, evaluate players or predict what an acceptable score might be?
In baseball, they combine exit velocity and launch angle off the bat to put a percentage on how often a similarly hit ball safely clears the fence or finds the grass. They also flip this information to assess the defence, calculating how often a ball hit at a given speed and at a certain gradient is caught by a fielder who started a particular distance away.
For a sport in which making the most of half-chances is so crucial, could cricket ever embrace such technology? Perhaps some of these metrics are unrealistic (for now, at least), but surely – surely – the time has come to be start embracing the analytics movement.
Of course, this would all be great for public consumption as well. Fans crave information and it’s not as if the five-day format in particular doesn’t allow time for the media to try out some new material on its audience.
But first thing’s first. Let’s start by simply accepting the idea that traditional measures of success – while still useful – don't tell us nearly as much as we need to know, and that new data could offer valuable insights in terms of evaluating player performance and helping selectors get the best possible product on the field.