ML Research Interview Handbook

October 13, 2019 | UPDATED February 9, 2020

What some of your interviews might feel like

Who is this for?

The following is the result of curating numerous ML questions from years of interviews (combined with questions assembled by the dilligent Subhrajit Roy); both where I’ve been the interviewee and the interviewer. I created this post with 3 main audiences in mind: Aspiring ML Engineers, Hiring managers and Senior ML Engineers, and people navigating ML research that don’t want to lose sight of first principles.

Aspiring ML Engineers

In keeping with the themes of previous posts of mine, the current state of Machine Learning interviewing is pretty pitiful. As you have probably observerved, all the Kaggling and side projects may not have prepared you all that much for the subject matter of a machine learning interview. With all the resources out there pointing to a bunch of different textbooks and lectures, you probably want to spend less time consolidating and organizing notes and actually studying.

If you want to judge your skills, select a random question from the table of contents. DO NOT JUST LOOK TO THE ANSWER RIGHT AWAY. Write down or record your response, or better yet have someone else answer you. If you truly want the benefits, copy JUST the questions from the Table of contents, and try filling in your own answers before jumping straight to my solutions below.

I should also specify that for the most part this guide covers the ML-specific questions within interviews, but not the hackerrank-style whiteboarding algorithm questions. Studying for those is much more about memorization than most engineers are willing to acknowledge. While you can get ahead with a deep understanding and intuition of such algorithms, the sad truth is you’ll need to compete with a lot of people taking shortcuts. Your competitors for a position will range from the 20-year-old that memorized every LeetCode solution, to the cabal sharing interview solutions over WeChat, to the engineer that got good at these algorithm questions by accident because they refused to use any of the frameworks that would make their code maintainable. This algorithm part of the interview is an attempt at standardized test by the industry, that missed most of the benefits of the “standardization” part and instead became a cargo cult. With that rant out of the way, I put together a separate resource on learning the whiteboard algorithms you’ll find in about 95% of interviews. Not only does this contain many of the algorithms, but it will also help you get faster returns your Leetcode/HackerRank practicing.

Hiring managers and Senior ML Engineers

Aside from incredibly basic tests (i.e., seeing if someone knows what linear regression is), there are not a lot of resources for judging someone’s competence. If you’re interviewing someone for an ML Engineer or Data Scientist postition, chances are you’re lucky if the cost of a false positive is only $12,000/month (not counting other damage done to your organization’s growth or continued functioning). When it comes to interviewing, most of the resources out there are for software engineers (i.e., the usual algorithm sites), but there is comparatively little for researchers and designers of the algorithms.

If you want a quick repository of questions to ask someone, to guage where they fall on the novice-expert spectrum of machine learning, feel free to use this resource.

People Navigating ML research

Maybe you’re a grad student working on a research project. Maybe you’re an undergrad trying to get into a research lab. Maybe you’re out of school and you’re reading papers or talking about papers at some kind of ML meetup or reading club. At some point, you’ve probably had the nagging feeling that you’re forgetting some kind of critical, basic information of the material you’re studying. You may be looking over a project (either yours or someone else’s) and you might be thinking to yourself “I have the feeling some other technique might be more justifiable in this case, but I just can’t put my finger on it”.

This list is a quick reference you can use to make sure you don’t lose sight of the important basics of a variety of subtopics in machine learning. This list covers everything from mathematical basics, to bayesian machine learning, to neural network theory, to niche implementations in areas like computational bio. You can bookmark this page as your go-to quick reference for always remembering the important basics.

Oh, there’s one last group this was made for…

The people who always repeat the phrase “let’s think about this from first principles” without actually going into more detail (you know who you are)

I’m not sure if this a phrase that some professor kept repeating until it became a verbal tic for you, or if this was a tactic to make coworkers at a previous job stop asking questions for fear of looking dumb in a meeting, or if you just want to sound like Elon Musk.

Regardless of where this habit came from, here’s my rebuttal:

All of this verbal intelligence signalling is wasting everyone’s time including yours. In fact, to call it signalling would be putting it to kindly, as it’s more akin to noise than actual signal.

In hopes of making up for lost productivity caused by this, here’s a resource with some ACTUALLY HELPFUL ADVICE.

…

Now that’s out of the way, enjoy.

The Series

1. The Math Behind ML (the important stuff)

October 20, 2019 | UPDATED October 22, 2019

Important mathematical prerequisites for getting into Machine Learning, Deep Learning, or any of the other space

2. Machine Learning Fundamentals

October 22, 2019 | UPDATED October 24, 2019

Fundamentals of all types of machine learning, deep learning or otherwise

3. Deep Learning Concepts every practitioner should know

October 24, 2019 | UPDATED October 26, 2019

A deep dive into the important 'deep' learning concepts

4. Deploying and Scaling ML

November 4, 2019 | UPDATED December 26, 2019

Practical considerations of scaling and implementing ML in the real world

Okay, I just went through these. Give me MORE!

Still eager to learn? Some more things you can do include:

Build your first neural network with Keras.
Apply neural networks to Visual Question Answering (VQA).
Experiment with bigger / better neural networks using proper machine learning libraries like Tensorflow, Keras, and PyTorch.
Try your hand at using Neural Networks to approach a Kaggle data science competition.
Review notes from Stanford’s famous CS231n course on CNNs.
Take one of many good Neural Networks courses on Coursera.

I plan on writing more about Neural Networks in the future, so subscribe to my newsletter if you want to get notified of new content.

Thanks for reading!

Cited as:

@article{mcateer2019mlrih,
  title   = "ML Research Interview Handbook",
  author  = "McAteer, Matthew",
  journal = "matthewmcateer.me",
  year    = "2019",
  url     = "https://matthewmcateer.me/series/ml-research-interview-handbook/"
}

If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me] and I would be very happy to correct them right away!

See you in the next post 😄

Matthew McAteer

ML Research Interview Handbook

Who is this for?

The Series

1. The Math Behind ML (the important stuff)

2. Machine Learning Fundamentals

3. Deep Learning Concepts every practitioner should know

4. Deploying and Scaling ML

Okay, I just went through these. Give me MORE!

Tags:

Matthew McAteer @MatthewMcAteer0

DISCUSS ON

Matthew McAteer

ML Research Interview Handbook

Who is this for?

The Series

1. The Math Behind ML (the important stuff)

2. Machine Learning Fundamentals

3. Deep Learning Concepts every practitioner should know

4. Deploying and Scaling ML

Okay, I just went through these. Give me MORE!

Tags:

Matthew McAteer @MatthewMcAteer0

SHARE THIS POST

DISCUSS ON