Data Scientist applicants have rated the interview process at Google with 4 out of 5 (where 5 is the highest level of difficulty) and assessed their interview experience as 100% positive. To compare, the company-average is 68.7% positive. This is according to Glassdoor user ratings.
Here are the most commonly searched roles for interview reports -
I applied through an employee referral. I interviewed at Google (Mumbai) in Jan 2020
Interview
It was very unique experience which may happen in FAANG and they want to understand your way of thinking and problem solving skills.
I think if you know mathematics behind all concepts then you are good to go.
Interview questions [1]
Question 1
General Statistics
In what situation would you consider mean over median?
For sample size n, the margin of error is 3. How many more samples do we need to make the margin of error 0.3?
What is the assumption of error in linear regression? (Solution)
Given data from two product campaigns, how could you do an A/B test if we see a 3% increase for one product?
Statistical Probability
I have a deck and take one card at random. What is the probability you guess it right?
Explain a probability distribution that is not normal and how to apply that.
Given uniform distributions X and Y and the mean 0 and standard deviation 1 for both, what’s the probability of 2X > Y? (Solution)
There are four people in an elevator and four floors in a building. What’s the probability that each person gets off on a different floor?
Make an unfair coin fair. (Solution)
Machine Learning
If the labels are known in a clustering project, how would you evaluate the performance of the model?
Why use feature selection? (Solution)
If two predictors are highly correlated, what is the effect on the coefficients in the logistic regression? What are the confidence intervals of the coefficients?
What is the difference between K-mean and EM?
When using a Gaussian mixture model, how do you know it is applicable?
I applied online. The process took 2 weeks. I interviewed at Google in Aug 2021
Interview
There was one coding round and one round for role-related knowledge. First round went well, but the second one was harder. Interviewers were friendly. Questions related to Python, systems design, DBMS, SQL and architecture. Basics of DL were also asked.
I applied through an employee referral. The process took 5 weeks. I interviewed at Google (Mountain View, CA) in Aug 2021
Interview
Very standard process of Google DS interview. 2 technical sessions, 1 bq, then another 2 technical sessions.
My experience is very standard except that 2/4 technical sessions are ruined by the interviewers asking for inappropriate questions.
I complained and got a final round of back up session, but it is useless and could not change my overall review.
Difficulty is average as those hard ones are either wrong, or out of control of the interviewers (even the interviewers solved it wrong!)
Interview questions [2]
Question 1
R1: You need to diagnose an error in the program:
The google maps team wants to understand whether dismiss rate is a reasonable metric to help understand user experience of a button in the app. The hypothesis is that, the higher the dismiss rate, the worse the user experience. Hence, they perform a simulation in the A/A comparison scenario. In the simulation, signal = all interactions on the button (click, dismiss, ignore, ...), and negative signal = dismiss.
The pseudo code is as follows. Note that we might refer to the numerator and denominator often in later discussions.
result_pval = []
for replica in (1:1000):
# the number of overall signals follows a roughly bell shaped distribution
num_signal_control = round(random.normal(150, std = 30))
num_signal_treatment = round(random.normal(150, std = 30))
# given the number of overall signals, the number of negative signals follows a binomial distribution
num_negative_signal_control = random.binomial(num_signal_control, 0.5)
num_negative_signal_treatment = random.binomial(num_signal_treatment, 0.5)
# define numerator and denominator of the test statistics
# the idea of the denominator is: we use Normal approximation to estimate the variance of the numerator
p_hat_control = num_negative_signal_control / num_signal_control
p_hat_treatment = num_negative_signal_treatment / num_signal_treatment
numerator = p_hat_treatment - p_hat_control
denominator = sqrt(
p_hat_treatment*(1-p_hat_treatment)/num_signal_treatment
+ p_hat_control *(1-p_hat_control) /num_signal_control
)
testing_statistics = numerator / denominator
# calculate p value and append to the result vector
p_value = 2*std_normal_area_under_curve(
lower = abs(testing_statistics), upper = infinity)
result_pval = append(result_pval, p_value)
plot_histogram(result_pval)
The histogram of the p-values is skewed to the right on [0,1]. In other words, there are more p values < 0.5 than p values > 0.5.
Q1: Is such a distribution of p-value expected?
R4:
Assume the distribution of children per family is given by:
# children 0 | 1 | 2 | 3 | 4 | >=5
p 0.3 | 0.25 | 0.2 | 0.15 | 0.1 | 0
Consider a random girl in the population of children. What's the probability that she has a sister?