Jan 11, 2012 by Cliff DeJong

- 1 (current)
- 2
- 3
- 4
- 5

**NOTE:** This article was written at the start of the 2012 NASCAR season to provide a better understanding of the algorithm used in our AccuPredict - NASCAR Driver Finish Predictions tool. The basis of the formula remains as described in this article, although some tweaks to the weighting of the formulas are made by Cliff after each NASCAR season.

AccuPredict is an exclusive on-line fantasy NASCAR statistics tool that uses traditional and Loop Data statistics to predict NASCAR driver finish positions. Available to OWNER level subscribers of Fantasy Racing Cheat Sheet.

The AccuPredict method is an ** algorithm developed by research scientist Cliff DeJong**. This multi-page article by Cliff DeJong is

The entire article is also * available in PDF format for free download* for easier off-line reading.

This report examines several driver performance measures and develops a method for predicting the finishing order of NASCAR Sprint Cup races.

The plot shown in figure 1 (click to enlarge) shows one of the better measures that I have found for predictions. It shows the actual finish of each driver plotted against the average of the last 18 races prior to that race for the 2011 season, 1260 data points. Only finishes of 35 and better are included. I have also shown the trendline as a summary of these data.

The spread of the data is amazing, and it is not obvious that this can be useful. Yet there are tendencies that are valuable since the data are clustered about the trendline.

The important fact is that * the order of drivers in a specific race can be predicted in a meaningful way*.

- This paper will identify several metrics that will be used to forecast NASCAR outcomes.
- It will also address how to combine these to get the best forecast.

It is intended to show the methods used in general terms. The weekly implementation of this method is available as ACCUPREDICT to *OWNER* level subscribers on this site. The AccuPredict method was used without alteration to score * 22nd overall in 2011 on nascar.com NASCAR Fantasy Live* out of several thousand competitors. Click the screenshot image above to see the standings for yourself. Additionally, this site has received several great success stories from members who have had using it in their particular fantasy NASCAR games.

NASCAR data from 1991 through 2011 are used to develop performance metrics.

The * key driver performance measures* identified here are:

- Average finish over the last 15 races
- Year-to-date driver rating
- Finishes at the last eight tracks of the same type
- Driver ratings at the same track for the last eight races
- Practice
- Starting position

Driver Rating is the NASCAR Loop Driver Rating, a formula that combines wins, finishes, green flag passes and several other driver performance measures.

In this paper, track types are examined and a regrouping of types is suggested by statistical considerations. Restrictor plate races are scored by a subset of the key measures listed above.

Driver scores based on the above measures are correlated with the actual finishes for the 2011 season with a value of 0.554. ** During the 2011 season, ACCUPREDICT achieved a correlation of 0.538**.

Predictions of almost anything are either ** historically based**, assuming the past repeats itself,

Predictions based on historical databases are looking for similarities with the past. If a situation has come up before:

- What has happened?
- How does that apply to this week's race?

In other words, *if a driver has done well at a particular track in the past, does this mean he will do well this weekend?*

Maybe...you can also consider how well he is doing ** this year**, and at

For NASCAR, there is a ** rich dataset of past races**: I have each race back to 1991 in my database with the finishing positions of each driver. This database is from the LeonardFrye web site, which is an excellent source for NASCAR statistics.

There are over 19000 data points. The database is in a computer-readable form, not scattered over various web sites, so it is relatively easy to process. Plus, each week in the season, and for past races, there are driver loop data, practice data and qualifying results, and other data such as bonus points earned, laps led, etc. My primary source for these data is FantasyRacingCheatSheet.com.

There are some very good fantasy NASCAR expert picks available on the web at no cost and some better ones that cost a subscription fee, including ACCUPREDICT, which is the result of this analysis.

Not all of these expert picks rank all the drivers; some only give a list of the top 5 or so drivers, and perhaps a dark horse or two.

Success in the fantasy leagues often depends on ** how well the low-ranked drivers do**. These drivers are necessary picks because of fantasy salary constraints. So, I wanted to be able to rank each driver...

** Metric** is a quantifiable measure of a driver's performance. Metrics available each week for each driver include:

- Performance in the last several races
- Performance at the same track
- Performance at the same type track
- Practice
- Qualifying
- Expert opinions (cheat sheets)

** Performance** can be measured in two primary ways:

- Finishing position
- Loop Data Driver Rating

Lots of other data reflecting performance are also available, for example, ** Laps Led**,

These later metrics are not as easy to process, but they are available on web sites such as fantasyracingcheatsheet.com, and will be addressed for the 2011 season only.

Cheat sheets are opinions of experts, often based on unspecified statistics, and not used in this analysis. I have found that other cheat sheets often do not score middle and lower ranked drivers.

DNFs or other major problems during a race can easily move a top ranked driver from a predicted top five to a finish of 40th. ** I define a DNF as finishing behind anyone who does not complete the race** - that is a clear indication of a major problem, not just poor performance.

Typical DNF rates are 15-20%. Since DNFs are unpredictable, there is no obvious way to include them. Their ** effects on finishing position** are in the database that is used.

The process of how to combine the various metrics is a complex subject that takes serious effort but, as will be seen, providing little gain beyond simple measures. The ** metrics are not independent** - a driver who has done well at a particular track has generally done well at the same track types, and he is likely to practice well and qualify well.

How do you ** measure the effectiveness of a metric or combinations of metrics?** There are two primary ways that I use:

- Correlation with the predicted finish
- Standard deviation of predicted finish

I also use less frequently the likelihood that a higher-ranked driver will finish ahead of a lower-ranked driver.

** Correlation** is a standard statistical measure that essentially plots one variable (the actual finish, for example) as a function of the other variable (the metric, practice speed, for example), and measures how well a straight line will fit the data.

** Correlation** ranges between -1 and 1, with the two extremes indicating a perfect fit.

A correlation of zero indicates that the result is independent of the metric. In other words, ** a very low correlation indicates the metric is not a useful indicator of a driverâ€™s finishing position**. I will show some plots later to make this a lot clearer.

** Correlation** can also be expressed as a percentage: -100% to 100%.

Typically, in NASCAR, numbers range from 30 to 50%, that is, there is a lot of randomness in NASCAR. The ** data** shown in the introduction has about

A ** negative correlation** means that as the metric gets larger, the actual finish gets smaller.

- Correlations for NASCAR finishing positions are
when past performance is measured by*positive*, that is, a small actual finish is expected when the average finish over the last several races is good (or low).*Finishing Positions* - When performance is measured by
, correlations are*Loop Data Driver Rating*since a high Driver Rating number implies a better driver and therefore a better predicted finish.*negative*

In this paper, I deal only with positive correlations by scaling the metrics - for example, the ** Driver Rating becomes a simple ranking of the drivers**, with the best driver scored a one, second best a two, etc.

The ** Standard Deviation** of the predicted finish is a measure of how accurate the prediction is. In essence, it is a measure of

Almost 70% of the data are within plus or minus one standard deviation. It is larger than you might think: typical numbers are 9 to 10, showing, again, a lot of variability in NASCAR. This is not at all unreasonable if you think about a DNF rate of about 20%. A driver that finishes 1, 2, 3, 4 and 35 (due to an accident), will average only a 9th place finish for these five races, despite four outstanding races.

The ** relative average finishing positions** among drivers is the important point.

Drivers will be ranked by a score, based on the metrics selected. The likelihood that a higher ranked driver finishes ahead of a lower ranked driver is calculated by comparing each driver with every other driver ranked below him. The ** percentage of correct rankings** is then calculated, and

This percentage is higher if the difference in rankings is high, and less if differences are small. This measure is not used often, since it is related closely to correlations.

- 1 (current)
- 2
- 3
- 4
- 5

Cliff DeJong (pronounced *De Young*), the man behind AccuPredict, is a research scientist who has been crunching numbers his entire life. An avid NASCAR fan, Cliff was introduced to fantasy NASCAR by his brother (who beat him at just about everything).

Cliff put his *Carnegie Mellon* Computer Science degree and *Iowa State University* Mathematics degree to use creating successful methods to predict each Cup race based on NASCAR statistics.

It is an obsession that has consumed untold hours.

Cliff would love to hear your comments, questions and suggestions at moc.liamg@tciderpucca