Thursday, 22 January 2015

Assumptions to avoid when predicting MMA

Here's a quick checklist of things not to do when trying to predict mixed martial arts matches:

1. Do not assume that all fights are comparable and that information gathered from one fight is useful to another. Think about which information is transferable and select your study sample accordingly. Causes of systematic variation may include:
  • MMA organisations
  • Weight classes
  • Gender
  • Epoch
  • Title versus regular fights
  • Main events versus seed events (e.g. ultimate fighter)
  • etc.

2. The future strongly bares on the past in MMA, so one should not randomly sample when splitting training, validation and testing sets. If you are using future data to predict past fights, you are including significant predictive information that will not be available to you when forecasting.

3. Some information such as height or reach for example, may safely be assumed to have  stayed constant throughout a fighters career. This is not the case for most other information. For example, weight, weight class, striking/subs/takedown biases, associated gym, etc. Most of the information on the profile page cannot be used because it pertains to the present and may be prescient from the back-testing perspective.

4. Some features have a value which decays with time. Including takedown statistics from ten years ago for example, may be misleading.

5. If your goal is to win money on betting exchanges by superior prediction then you must predict better than the odds implied probability for whatever subset you're betting on. This requires that one evaluates how well they assign probabilities to outcomes rather than just the binary result. Have a read of the background and evaluation sections here for more details on how.

6. Start with a theory not regression. For example, "I think that super heavyweight fights with guys from wrestling backgrounds tend to be decided by striking from a distance, so I think that comparing strike accuracy, strike output and knock outs on record will strongly bare on who would win the fight". I'd suggest watching several fights in the category, picking them at random, and loosely confirming the theory, before finally making it rigorous with carefully collected statistics and a regression model.

7. Linear regression isn't the only tool available! Try to be theory led in tool choice, be familiar with a breadth of tools and be clear on the limits of a tool and what relationship between variables the method assumes.

8. Be vary of your cross-validation results. If your sample is relatively small, you may find that you have significant results by chance. Since fights results a binary for most part, you can use the standard binomial significance test as a guide for how lucky you would have to have been. If you have enough data, you could also be more sophisticated and create a bootstrap distribution over n predictions to get a better sense of the variance that sampling contributes.

9. Money where your mouth is! If you've got a model that cross-validates, and you intend to make money on it then forward test and preferably with some actual bets! The human propensity for self-deception shocks me in myself sometimes and you may surprise yourself with what you've over looked when there is real money on the line.

No comments:

Post a Comment