Blog

During the Spring and Summer of 2021, I interviewed at four FAANGMULA companies for L5 or L6 data science positions. In addition to interviewing dozens of candidates at my previous firm, I learned a lot during the ~2 month process of interviewing, connecting with recruiters, completing take home tests, and networking to learn more about the positions. Informally, this information has been useful to folks that I’ve chatted with; I’m informally mentoring some aspiring data scientists, and work with undergrads in DS or Stats or CS programs to help them get an understanding of the industry, and so far the interview process especially at top companies has been very mysterious to them. This post is directed towards more junior folks who have no clue about the process or for experienced folks who are not in big tech but are looking to learn more about the process, especially if you come from consulting, nonprofit, or academia and aren’t sure how you’ll be evaluated in the tech space. In this post, I’ll talk about some of the things I learned in the process and some general thoughts.

Disclaimers:

  • This post is a single data point, so my conclusions aren’t powered up. If you have more tools or recommendations to add in the comments, please do!
  • Sometimes posts like this become a pissing contest about who has gotten further in the industry in how many years and what is your total compensation or you’re not a real data scientist blah blah and I don’t give a shit about all that. I saw an opportunity to move roles and industries and I took it because it works for me, my goals, my family, etc. Leave that toxic stuff on Team Blind (though at times toxic, team blind can be extremely useful for getting insider knowledge or TC comparisons, to be clear >:).

The positions I interviewed for included “Data Scientist”, “Senior Data Scientist”, “Staff Data Scientist”, and “Research Scientist”. There are major differences in what these titles mean across firms, so I found it was best to compare scope and compensation in order to compare more effectively. I also looked on LinkedIn to find how many people had a certain title and what their credentials were in order to find out the general profile of a person in that role. Levels.fyi can be helpful to understand how different firms treat titles and avg comp by region.

Of the four positions I interviewed for, I reached out / cold called for a referral for one, used my network for two more, and had a recruiter reach out for the fourth. For each of these connections (save the recruiter) I had a ~30 minute phone call with the person to chat about the organization, interview process, and any important names or resources I should look into before my first technical screen. I kept everything organized using Notion with different pages and sections for each of the companies, with prep materials and interviewer specific notes. I tried to find something to talk about to build rapport for each of my interviewers, and for one of the interviewers, the thing I had in common made a strong connection which used ~10-15 minutes of the 60 minute interview, and made an excellent impression.

The general format for the process at all these companies was the following:

  1. Recruiter Screen
  2. Technical Phone Screen
  3. Virtual Onsite made up of 5-6 sessions/modules/panels lasting between 45mins - 1 hour
  4. Team matching & offer negotiation

Optional Elements:

  • Take home test, usually between steps 1 and 2
  • Second technical phone screen between steps 2 and 3

The first phone screen aims to hit several areas, usually not in much depth. Think of this interviewer’s incentives as not to let anyone unqualified hit the onsite. The goal here is to not make any fatal flaws and to have at least one area where you really shine. This will get you recommended to move on to the next round.

Leetcode medium level coding questions Either SQL or Python, aka leetcode databases and algorithms respectively. You should be able to do the following things:

  • Talk through at least one approach, or two and discuss brief tradeoffs. I usually write this in steps in the comments, and then ask if they think it’s a reasonable approach before I jump into the code.
  • Answer the question and write code that can run without errors
  • Discuss the time and space complexity of your solution
  • Discuss approaches or modify your code with new constraints

To study for this, I do recommend doing as many leetcode questions as you can, and looking in the “discussion” tab to find how others have approached the problem. If you can understand the pros and cons that are discussed in that section, you’ll be in good shape. If the DS role you’re applying for is more of a software engineering heavy role, you might need to be more familiar with algorithm names, data structures, and be able to answer more technical questions.

Past project you have worked on You should have 2-3 projects fully prepared to talk about during any one of your interviews. You should have the following elements prepared:

  • A brief synopsis of your project: describe the problem, why you or your team chose the approach you did, how did you make certain decisions along the way, and a quantifiable impact.
  • Answers to more specific questions about model selection, or interpretation of output / coefficients
  • How did you benchmark your results against reality? Do you have ground truth data and how did you compare and evaluate the confusion matrix of your model, for instance.
  • What is the future of the project? For engineering heavy projects, this is a maintenance question. For research projects, who can pick up your work and what would they do? A common question I got was “If you had unlimited data and resources, how would you have continued the project?”

Why you want the job / why are you transitioning from your current role? Have a very brief statement about what you’re looking for, and keep it focused on you looking for new challenges. Examples:

  • “In my current position, I’ve been able to build out the department and coach younger data scientists and learn about [field of current company]. However, I’m not spending as much time as I want to on the research aspect, which I’m really passionate about. I know from [connection / referral] that [company / team] is working on [challenge X] and has more robust systems already in place, which would allow me to explore more of the technical and research components of the role, rather than the administrative or fundamental aspects of data science.”
  • “In my current role, I’m spending 90% of my time coding and delivering results, but there’s not as many opportunities to communicate my results and work cross functionally. I saw in the job description for this role that this team works heavily with the business side and I’m excited for the opportunity to work in a more dynamic space.”

The case study is one of the harder interviews to study for. You have to be familiar with the business problems that the company is facing, and usually need some exposure to issues and terms that are specific to the industry or even the team you’re interviewing for. This is where chatting with a connection or someone from the company beforehand can give you a leg up on the lingo.

Format: The interviewer will usually introduce some hypothetical problem by saying “a PM comes to you and says…” or “someone from our communications department comes to you and wants to know…” or something equivalent. The question is usually intentionally vague, and it’s your task to answer it and talk through a process you might use to make progress on the business problem.

Steps to answering the question:

  1. Make sure you understand what the goal is. Reiterate what they have said and say things like “just to make sure I’m understanding–the deliverable is an estimate of ad traffic for the month of May?” or “how will the deliverable be used by this fictional marketing team?” This will ensure that you don’t go off on the wrong path or answer a question that wasn’t asked.
  2. List the most important factors that you’re considering. I usually take the naive approach first, and iterate later instead of trying to come up with something particularly sophisticated off the bat.
  3. Determine your most important metric or metrics, and explain why before you go on to estimate them. Example for: “I’ll use average number of users who have ‘checked in’ with the app at this restaurant over the last 14 days to determine if our UI change had an impact.”
  4. Establish what data is available and briefly describe the features. Even better if you already know what the data that is actually used internally looks like, or what columns those tables contain. Example: “Given a table that contains user_id, time_of_check_in, col3, col4, I would want to compute [metric] from [subset of users] in the following way…”
  5. Select a model that you could use to estimate the metric you’re interested in, or a model that you could use to compare the two groups in an A/B test if that is part of the problem’s premise. Again, I usually mention a simple linear or logistic regression, or tree based model as a first pass in order to move the problem along. Usually the follow up questions are things like “how would you handle high leverage points / outliers?” or “how would you interpret the coefficient on [covariate X_j] in the model output?” or “can your model handle categorical features? How do you interpret the output in that case?” or “if [modification to the original question], how will that change your estimate?”
  6. A final recommendation. How will you decide how to proceed? Usually there is a clear “yes” or “no” answer that will be set up for you, along with some explanation of the tradeoffs involved.

How can you practice for the case study?

I think one helpful strategy is to look at the app or product and determine everything that could possibly go wrong, or think of features you wish the app / product had and think of ways to implement it. For instance, when you go to a maps app, it might say “this location is somewhat busy at this time”. How do they compute that? What level of certainty do they have?

Check in with your interviewer to confirm: “is this what you had in mind? I want to make sure I’m answering the question and not going off on a tangent.” Experienced or kind interviewers will give you hints or keep you on track instead of trying make a “gotcha” moment where they expose how much you suck.

Give a “first pass” first and let the interviewer ask follow up questions. If you’re talking for too long, it’s an opportunity to get off track which you don’t want.

Treat the time like a collaboration where you’re going back and forth with the interviewer. Some of them have been explicitly trained to measure your ability to go back and forth or “jam” and it may be part of your evaluation.

Give your first impression of the problem and it’s initial tradeoffs before you dive into solving it.

Ask how many questions they are hoping to get through in the time!! I got mildly burned by this in one of my interviews because I thought there was just one question remaining, when in fact the interviewer wanted to get through 2-3 more. I looked at the clock and saw we had time so decided to explore and really flesh out the problem, and it ended up with me passing the interview but with lukewarm reviews :(.

There are much better guides out there for salary stuff but basically, your salary is made up of base salary, stock options or RSUs, annual bonus (usually paid out a few months into the next calendar year) and a signing bonus. These numbers tend to be based on: number of years of experience, interview performance, what level you will be joining at, whether you have a competing offer, and certain idiosyncratic effects of if you have research or product experience in the same area as your position (though often this is endogenous to interview performance since case studies are usually domain specific). Broadly, you can think of these numbers as a way for firms to incentivize you to stay for longer (depending on the stock vesting structure–see Amazon) or as a way to get you on board ASAP (see Uber’s 35% vest in the 1st year)

The lowest salary I was quoted during this time period was for an L4 equivalent position (I ended up getting levelled at L5 for this firm but I think the recruiter wasn’t in the loop so she quoted me on L3 and L4 numbers), and it was 140k base, 250k stock /4 years, 20k bonus. The highest salary I was quoted during my time was for an L6 position: 250k base, 800k stock / 4 years, and 50k bonus (they adjusted it down when I didn’t perform well on one of the interviews). I almost spit out my coffee when the recruiter said those numbers; it would be almost double my previous TC and it comes out to an even 500k/year, which is insane. I ended up with an L5 ish position with some leadership responsibilities at around 400k TC at a different firm. It was less than another offer by ~20k but it was much better in terms of team, growth potential, and domain.

Interviews can often be a sort of Shibboleth test Meaning that they’re looking for some special password or to see that you already know their approaches, methods, and have memorized some equation or rule rather than testing your critical thinking. At one of the firms I interviewed with, I went through two cycles for two different positions. I failed one and then passed the second one to get to the offer stage. However, I recycled a lot of information from the first interview process and was seen as “insightful” or “impressive” when really I just had some insider knowledge from asking one of my favorite questions back to the interviewer at the end: “How could I have approached this problem in a way that would have really impressed you?” or some variant like “what components or angles on this problem did I neglect during our time?”. In the case that you fail the interview, you might still get a shot down the road, so it’s good to keep notes like this to have some “ground truth” benchmarks.

Take Home test: mine was about the length of an undergraduate statistics or statistical computing homework assignment. Explore some data, share some results. Very broadly, I made 2-3 neat figures showing trends over time using a time series model for one insight and a regression for another. Then there was some probablilty and SQL questions which were relatively simple. The whole thing is supposed to take no more than 2-4 hours but I really wanted to go overboard and show my skills so I ended up spending closer to 6-8 hours. I know, we all hate take home tests but I took it as an opportunity to shine and challenge myself so it ended up being a bit fun in that regard.

Good Data Scientists != Good Interviewers I had a hard time connecting with or understanding several of my interviewers. Some of them seemed to be skeptical straight off the bat. You could call this an “idiosyncratic match effect” that takes into account your fit with the interviewer, and it’s impossible to maximize for every possible interviewer; sometimes you just have to really shine in every other possible way and hope they don’t write you a bad review.

  • One of them was in a lower position, but we graduated from the same program (she finished a few years before me) and when I tried to connect with her about that, she asked “wait, are you applying for [position above her]??” and then got super weird. I should have not connected with her about that…
  • Another interviewer really attacked my first approach and grilled me with questions instead of encouraging or collaborating with me.
  • I had only one interview that was extremely well done, and I think it’s because the person had teaching experience in the past and was also disillusioned with the interview process in tech in general, so they did an excellent job of setting me up for success and explaining / keeping the time more collaborative.
  • One of the interviewers had not practiced the interview before administering it, and they said it was only their 2nd time going through it. We had technology issues and then his feedback (according to the recruiter) was that I didn’t get enough done in the time. Wtf? I was pissed about that and sent an email to the recruiter to forward to the hiring manager, but I don’t think that feedback does anything but make me look like a Karen, lol. Managers always trust their people on the inside more than some random candidate.

Levelling: One of my interviews I really kicked ass. They were so impressed that they wanted to level me up and I swear for the week leading up to my onsite the recruiter was treating me like royalty quoting these massive numbers for TC and mentioning that if I performed the same way in the onsite, they could have the comp team work on a special package for me, lol! I ended up doing well in the onsite, but not knocking it out of the park, and I could feel the tone change back to a normal requisition process haha. Moral of the story is that you should treat every interview like it’s an opportunity to get levelled up, and there’s a chance that you can get one of those comp packages that you’ve always dreamed about.

Interviewing as a classification problem: it’s not great. So just don’t take it too personally when you don’t score well in an interview or don’t have the right answer. It’s unfair in a lot of ways, and biased towards a certain genre of candidate. Just always assume you’re a Type II error and channel that energy into learning more, practicing more, and polishing your interview skills!

  • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing - and Emma Ding’s YouTube channel where she walks through a lot of excellent examples.
  • Introduction to Statistical Learning - especially the chapters on linear models. I prefer ISL to ESL because it reads easier and gives some great examples that help you develop the language to talk about the problems confidently.
  • Leetcode (I got a premium membership for the months I was interviewing)
  • interviewing.io’s blog was extremely useful for my prep and confidence. Aline is taking on a huge challenge in trying to reform the overall big tech interview ecosystem and I love her content.
  • Engineering blogs / research output from the department / team you’re interested in. I swear I read more research & technical blog posts than I ever have in those two months and it helped a lot to develop intuition on each company’s problems.