ACCUPREDICT METHOD

FORECASTING NASCAR

Jan 11, 2012 by Cliff DeJong

Thus far, we have looked at performance in past races in a number of ways:

  • Past finish positions of all recent races.
  • Races at the same track.
  • Races at similar tracks.

And, also made some new discoveries:

  • A revision of track types from similar physical characteristics to those with similar statistics was assessed and found to be beneficial.
  • Practice speeds at the next to last practice are useful.
  • Starting position (or qualifying results) is also valuable

CONSISTENTLY BETTER OR WORSE

Each season is a new start for drivers and their teams, and may bring a new crew chief or even a new team for some drivers.

I have observed that each year has drivers that seem to do consistently better or consistently worse than expected, based solely on their past performance from previous years.

As a consequence, another measure that I have found to be useful is the current year-to-date standings of the drivers.

I do not use year-to-date statistics until after four races have been run.

DRIVER POINTS STANDINGS VS. DRIVER RATING

I have used point standings and Driver Ratings in past years, and found that driver rankings based on Driver Ratings are a little better to use.

I have not done a formal analysis, but at the end of the 2010 season found that the Driver Rating was correlated to average finish with a value of 0.93 while the correlation of points (after taking out the Chase points adjustment) to average finish was 0.76.

These are correlations to the final values for the entire 2010 season and cannot be compared to other correlation measures in this report that are for single races.

Driver Rating, as defined by NASCAR loop data, combines several measures of driver performance, including green flag passes, green flag times passed, fast laps, laps led, and more.

I collected several of these measures during the 2011 season for assessment and will show these in the next section.

2011 SEASON ASSESSMENT OF SELECTED METRICS

For the 2011 season, a number of performance measures were collected for each race. Figure 16 shows those measures and their correlations with finishing position.

Here are the definitions of each measure:

  • L18-F: Average of the finishing position of the last 18 races
  • L4-DR: Ranking of average Driver Rating for the last 4 races
  • YTD-DR: Ranking of average Driver Rating for the year to date (Not used in the first four races)
  • SType-F: Average finish position for races at the same type track (This uses the traditional track groupings, before revisions above, and averages over 4-12 races for different tracks.)
  • SType-DR: Average Driver Rating for races at the same type track, as above
  • SType-4F: Average finish position for the last four races at the same type track
  • STrack-F: Average finish position for races at the same track, over 2-11 races
  • STrack-DR: Ranking of average Driver Ratings at the same track, over 2-11 races
  • STrack-Pwr: Ranking of average Driver Ratings at the same track over the past two years
  • Start: Start Position, defined as qualifying results
  • Practice: Ranking of fast speeds in the next to the last practice
  • Bonus Points: Average of bonus points earned
  • Pass Dif: Average of green flag passes, less green flag times passed, over the last 2-11 races
  • Laps Led: Number of Laps Led at the same track, averaged over the last 2-11 races
  • Fast Laps: Number of Fast Laps at the same track, averaged over the last 2-11 races

These metrics were developed as possibly useful in unpublished analyses of past seasons. I offer no rationale for their selection; they have evolved over time.

BEST AND WORST METRICS IN 2011

Some of the measures are much better than others; the average finish over the last 18 races is the best, with year-to-date driver rating the second best. Others like the green flag pass differential have little information with relatively poor correlations.

HOW TO BEST COMBINE METRICS?

Given these 15 measures of driver performance, how can they be combined to give the best estimate of finish position for each driver and race?

This is far from an obvious question, because all of these are heavily correlated to each other, that is, a driver that has finished well in the last 18 races, is also placed highly in the year-to-date Driver Ratings, etc. If two measures are highly correlated, then the second measure adds little new information to the information in the first measure.

An additional complication is that not all measures are always available for each driver; Trevor Bayne, for example, had no prior Sprint Cup history at Daytona before the 2011 season opening race.

The desire is to find a simple method for combining selected measures.

First, all measures must be transformed mathematically so that a small number will indicate a likely good finish. The easy way to do this is to fit the various measures to the average finish and then use the curve fit data to represent the measure in question.

A score defined as a simple average of all the transformed measures gives a correlation of 0.538 to the actual finishes.

The standard deviation of the estimated finishes based on the simple average for 2011 is 9.45.

A large number of perturbations on combinations of the measures were examined, and the best approach was to average:

  • L18-F
  • YTD-DR
  • STY-F
  • STR-DR
  • Start
  • Practice

This gave a correlation for the score to actual finish of 0.550 and a standard deviation of 9.36 for the estimated finishes.

DETERMINING THE BEST POSSIBLE FIT FOR EACH METRIC

To determine the best possible fit of the metrics to the finishing positions, a multiple regression was calculated, using all 15 metrics as inputs. Regression, strictly speaking, is only valid for independent variables, and these are not independent. Still, in practice, regression can be very useful even here.

For data points without all 15 metrics, the simple averages of the best combinations in the previous paragraph were used. This gave a correlation of 0.559 and a standard deviation of 9.29.

The disadvantage of using this approach, however, is complexity and the fact that the regression is highly tuned to the data in the 2011 season.

The regression for 2012 data will almost certainly be different.

Plus, the 0.559 is not dramatically better than the 0.550 found by experiment.

OTHER 'BEST POSSIBLE FIT' APPROACHES

There are other approaches to maximize the correlation of a combination of correlated variables.

For statistics geeks, I tried Principal Component Analysis external site and a method in a paper by Keller and Olkin external site. The required assumptions are only partially met, and results were a correlation of 0.550-0.551, not quite as good as the regression results.

These have also been tried in earlier seasons, with similar results, and have the same drawbacks as the regression method.

RANKING DRIVERS FOR ALL METRICS

Another interesting approach that I tried was to look at each driver as ranked for all metrics. If a driver was ranked ahead of another driver on more of the metrics, then he was ranked higher in the combination.

This was only slightly different from the averages of the metrics, and performance was slightly worse.

ALL APPROACHES AVERAGED ABOUT 70%

For all approaches, the likelihood of a driver finishing ahead of a lower ranked driver was calculated. It varied by race, but all of the best approaches averaged about 70%.

The approach of a simple average of the metrics was chosen.

With this, the performance of the best combination is very poor for the restrictor plate races, so they were split out. Best combinations for these were L18-F, YTD-DR and Practice. Correlations for the plate races improved from 0.243 to 0.318, and finish standard deviation went from 11.5 to 11.2. This is still poor performance.

When this was combined with the non-plate races and their best combinations, the final correlation of score with finish is 0.554, and the standard deviation is 9.29.

Corresponding ACCUPREDICT results were 0.535 and 9.53.

FINAL ACCUPREDICT METHOD FOR 2012

The method proposed is somewhat better than the approach used in 2011.

  1. Each race, the top 35 drivers in points are identified.
  2. Their finishes in the last 15 races are averaged.
  3. Their performance in year-to-date driver ratings is ranked.
  4. Similar track types are identified, using the revised definitions in a previous section, and finishing position is averaged over the last eight races on those tracks.
  5. The average driver ratings at the last eight races at the same track are ranked.
  6. Practice speeds at the next to last practice are ranked.
  7. Start position is used.

These six performance measures, or whatever exist for each driver, are averaged, and the resulting score gives the expected finishing position by a simple curve fit to the 2011 data.

For restrictor plate races, the average of the last 15 races, year-to-date Driver Ratings, and the practice rankings are used.

It is noted that the 2012 proposed metrics are slightly different from the 2011 season metrics used to select combinations.

The last 15 races are used, and the track groupings into related track types have been revised. The number of races averaged for 2012 for same type and same track metrics is eight, rather than variable 2-14 races in 2011. The rationale for this is that the numbers and groupings have been changed to improve correlations and they are measuring very similar information.

IMPROVED CORRELATIONS ON SAMPLES

This new approach was applied to five sample races from 2011, one from each track type, and correlations improved on average from 0.470 to 0.502.

ABOUT THE AUTHOR

Cliff DeJong

Cliff DeJong (pronounced De Young), the man behind AccuPredict, is a research scientist who has been crunching numbers his entire life. An avid NASCAR fan, Cliff was introduced to fantasy NASCAR by his brother (who beat him at just about everything).

Cliff put his Carnegie Mellon Computer Science degree and Iowa State University Mathematics degree to use creating successful methods to predict each Cup race based on NASCAR statistics.

It is an obsession that has consumed untold hours.

Cliff would love to hear your comments, questions and suggestions at moc.liamg@tciderpucca

WANT A COPY OF THIS ARTICLE?

Download this article in its original PDF (1.25MB) format for free!