Data Scientist Stage Interview Questions

40,227 data scientist stage interview questions shared by candidates

1) Provided a table with user_id and dates they visited platform, find the top 100 users with the longest continuous streak of visiting the platform as of yesterday. 2) Provided a table with page_id, event timestamp and a flag for a state (which is on/off), find the number of pages that are currently on.
avatar

Data Scientist

Interviewed at Meta

3.6
Apr 29, 2020

1) Provided a table with user_id and dates they visited platform, find the top 100 users with the longest continuous streak of visiting the platform as of yesterday. 2) Provided a table with page_id, event timestamp and a flag for a state (which is on/off), find the number of pages that are currently on.

There is a table that tracks every time a user turns a feature on or off, with columns user_id, action ("on" or "off), date, and time. How many users turned the feature on today? How many users have ever turned the feature on? In a table that tracks the status of every user every day, how would you add today's data to it?
avatar

Data Scientist

Interviewed at Meta

3.6
Mar 29, 2017

There is a table that tracks every time a user turns a feature on or off, with columns user_id, action ("on" or "off), date, and time. How many users turned the feature on today? How many users have ever turned the feature on? In a table that tracks the status of every user every day, how would you add today's data to it?

They asked probability question: 1) The probability that item an item at location A is 0.6 , and 0.8 at location B. What is the probability that item would be found on Amazon website. 2). I have table 1, with 1million records, with ID, AGE (column names) , Table 2 with 100 records with ID and Salary then the interviewer gave me the following SQL script SELECT A.ID,A.AGE,B.SALARY FROM TABLE 1 A LEFT JOIN TABLE 2 B ON A.ID = B.ID + WHERE B.SALARY > 50000 ( HE ASKED TO MODIFY THIS LINE OF QUERY) How many records would be returned? 3. Give a csv file with ID, and Quantity columns, 50million records and size of data is 2gig, write a program in any language of your choice to aggregate the QUANTITY column.
avatar

Data Scientist

Interviewed at Amazon

3.5
Oct 27, 2016

They asked probability question: 1) The probability that item an item at location A is 0.6 , and 0.8 at location B. What is the probability that item would be found on Amazon website. 2). I have table 1, with 1million records, with ID, AGE (column names) , Table 2 with 100 records with ID and Salary then the interviewer gave me the following SQL script SELECT A.ID,A.AGE,B.SALARY FROM TABLE 1 A LEFT JOIN TABLE 2 B ON A.ID = B.ID + WHERE B.SALARY > 50000 ( HE ASKED TO MODIFY THIS LINE OF QUERY) How many records would be returned? 3. Give a csv file with ID, and Quantity columns, 50million records and size of data is 2gig, write a program in any language of your choice to aggregate the QUANTITY column.

Given the following data: Table: searches Columns: date STRING date of the search, search_id INT the unique identifier of each search, user_id INT the unique identifier of the searcher, age_group STRING ('<30', '30-50', '50+'), search_query STRING the text of the search query Sample Rows: date | search_id | user_id | age_group | search_query -------------------------------------------------------------------- '2020-01-01' | 101 | 9991 | '<30' | 'justin bieber' '2020-01-01' | 102 | 9991 | '<30' | 'menlo park' '2020-01-01' | 103 | 5555 | '30-50' | 'john' '2020-01-01' | 104 | 1234 | '50+' | 'funny cats' Table: search_results Columns: date STRING date of the search action, search_id INT the unique identifier of each search, result_id INT the unique identifier of the result, result_type STRING (page, event, group, person, post, etc.), clicked BOOLEAN did the user click on the result? Sample Rows: date | search_id | result_id | result_type | clicked -------------------------------------------------------------------- '2020-01-01' | 101 | 1001 | 'page' | TRUE '2020-01-01' | 101 | 1002 | 'event' | FALSE '2020-01-01' | 101 | 1003 | 'event' | FALSE '2020-01-01' | 101 | 1004 | 'group' | FALSE Over the last 7 days, how many users made more than 10 searches? You notice that the number of users that clicked on a search result about a Facebook Event increased 10% week-over-week. How would you investigate? How do you decide if this is a good thing or a bad thing? The Events team wants to up-rank Events such that they show up higher in Search. How would you determine if this is a good idea or not?
avatar

Data Scientist

Interviewed at Meta

3.6
Apr 23, 2021

Given the following data: Table: searches Columns: date STRING date of the search, search_id INT the unique identifier of each search, user_id INT the unique identifier of the searcher, age_group STRING ('<30', '30-50', '50+'), search_query STRING the text of the search query Sample Rows: date | search_id | user_id | age_group | search_query -------------------------------------------------------------------- '2020-01-01' | 101 | 9991 | '<30' | 'justin bieber' '2020-01-01' | 102 | 9991 | '<30' | 'menlo park' '2020-01-01' | 103 | 5555 | '30-50' | 'john' '2020-01-01' | 104 | 1234 | '50+' | 'funny cats' Table: search_results Columns: date STRING date of the search action, search_id INT the unique identifier of each search, result_id INT the unique identifier of the result, result_type STRING (page, event, group, person, post, etc.), clicked BOOLEAN did the user click on the result? Sample Rows: date | search_id | result_id | result_type | clicked -------------------------------------------------------------------- '2020-01-01' | 101 | 1001 | 'page' | TRUE '2020-01-01' | 101 | 1002 | 'event' | FALSE '2020-01-01' | 101 | 1003 | 'event' | FALSE '2020-01-01' | 101 | 1004 | 'group' | FALSE Over the last 7 days, how many users made more than 10 searches? You notice that the number of users that clicked on a search result about a Facebook Event increased 10% week-over-week. How would you investigate? How do you decide if this is a good thing or a bad thing? The Events team wants to up-rank Events such that they show up higher in Search. How would you determine if this is a good idea or not?

Viewing 11 - 20 interview questions

Glassdoor has 40,227 interview questions and reports from Data scientist stage interviews. Prepare for your interview. Get hired. Love your job.