Patterns for Research in Machine Learning

“This page lists a handful of code patterns that I wish I was more aware of when I started my PhD. Each on its own may seem pointless, but collectively they go a long way towards making the typical research workflow more efficient. And an efficient workflow makes it just that little bit easier to ask the research questions that matter.

My guess is that these patterns will not only be useful for machine learning, but also any other computational work that involves either a) processing large amounts of data, or b) algorithms that take a significant amount of time to execute.

Disclaimer: The ideas below have resulted from my experiences working with MATLAB. Other IDEs, languages or frameworks may have better solutions for the kinds of problems that I’m trying to address.

Here they are:

  1. Separate code from data.
  2. Separate input data, working data and output data.
  3. Save everything to disk frequently.
  4. Separate options from parameters.
  5. Do not use global variables.
  6. Record the options used to generate each run of the algorithm.
  7. Make it easy to sweep options.
  8. Make it easy to execute only portions of the code.
  9. Use checkpointing.
  10. Write demos and tests.