Data Engineer Interview Questions

20,965 data engineer interview questions shared by candidates

# # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg | TINYINT | # | | units_sold | DECIMAL | | | gross_weight | DECIMAL | # | | transaction_date | DATE | | | net_weight | DECIMAL | # | +------------------+---------+ | +---------------------+---------+ # | | # | # promotions | # product_classes # | +------------------+---------+ | +---------------------+---------+ # +----| promotion_id | INTEGER | +----| product_class_id | INTEGER | # | promotion_name | VARCHAR | | product_subcategory | VARCHAR | # | media_type | VARCHAR | | product_category | VARCHAR | # | cost | DECIMAL | | product_department | VARCHAR | # | start_date | DATE | | product_family | VARCHAR | # | end_date | DATE | +---------------------+---------+ # +------------------+---------+ # */ # Question 1: # -- What percent of all products in the grocery chain's catalog # -- are both low fat and recyclable? #
avatar

Data Engineer

Interviewed at Meta

3.6
Jun 8, 2020

# # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg | TINYINT | # | | units_sold | DECIMAL | | | gross_weight | DECIMAL | # | | transaction_date | DATE | | | net_weight | DECIMAL | # | +------------------+---------+ | +---------------------+---------+ # | | # | # promotions | # product_classes # | +------------------+---------+ | +---------------------+---------+ # +----| promotion_id | INTEGER | +----| product_class_id | INTEGER | # | promotion_name | VARCHAR | | product_subcategory | VARCHAR | # | media_type | VARCHAR | | product_category | VARCHAR | # | cost | DECIMAL | | product_department | VARCHAR | # | start_date | DATE | | product_family | VARCHAR | # | end_date | DATE | +---------------------+---------+ # +------------------+---------+ # */ # Question 1: # -- What percent of all products in the grocery chain's catalog # -- are both low fat and recyclable? #

first round - written: 3 sql and one about what will you do to improve the fastness of an insert on a huge table second round - get the players with highest streak get the employee details who has maximum members in a team. python-return the numbers which have maximum count in a list round 3: behavioral questions and 1 question on python lists. from the 2 lists get the numbers that are common , and return the numbers in the following way. [1,2,3,3,1,1,1],[1,1,2,2,3] - return [1,1,2,3]
avatar

Data Engineer

Interviewed at Amazon

3.5
Apr 8, 2021

first round - written: 3 sql and one about what will you do to improve the fastness of an insert on a huge table second round - get the players with highest streak get the employee details who has maximum members in a team. python-return the numbers which have maximum count in a list round 3: behavioral questions and 1 question on python lists. from the 2 lists get the numbers that are common , and return the numbers in the following way. [1,2,3,3,1,1,1],[1,1,2,2,3] - return [1,1,2,3]

python question: given a two dimensional list for example [ [2,3],[3,4],[5]] person 2 is friends with 3 etc. find how many friends does each person has. note one person has no friends. SQL question: find the top 10 college/company that a average social person interacts with. something in those lines. I split the query in two. Not able to finish coding but was able to explain and write both the parts but didn't have time to test it. also had data modeling questions. on a social network website. cant give details.
avatar

Data Engineer

Interviewed at Meta

3.6
Nov 16, 2020

python question: given a two dimensional list for example [ [2,3],[3,4],[5]] person 2 is friends with 3 etc. find how many friends does each person has. note one person has no friends. SQL question: find the top 10 college/company that a average social person interacts with. something in those lines. I split the query in two. Not able to finish coding but was able to explain and write both the parts but didn't have time to test it. also had data modeling questions. on a social network website. cant give details.

Python 1 #1.returns the number of times a given character occurs in the given string s1='missisipi' #print(s1.find('s')) res=[] for i in range(len(s1)): #print(s1[i]) if s1[i]=='s': res.append('s') print(len(res)) #2.[1,None,1,2,None} --> [1,1,1,2,2] arr=[None,1,2,None] new_l=[] for i in range(0,len(arr)): if arr[i] != None: new_l.append(arr[i]) else: new_l.append(arr[i-1]) print(new_l) #2. (python) Given two sentences, construct an array that has the words that appear in one sentence and not the other. A = "Geeks for Geeks" B = "Learning from Geeks for Geeks" d={} for w in A.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 for w in B.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 unmatchedW=[w for w in d if d[w]==1] print (unmatchedW) 3. d = {"a": 4, "c": 3, "b": 12} [(k, v) for k, v in sorted(d.items(), key=lambda x: x[1], reverse=True)] #[('b', 12), ('a', 4), ('c', 3)] SQL # # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg |… Show More 1. find top 5 sales products having promotions Select Sum(s.store_sales), brand_name, count(p.product_id) from products p inner join sales s p.product_id = s.product_id where promotion_id is not null group by brand_name having count(p.product_id) =1 /* single-channel media type */ order by 1 desc limit 5 2. # -- % Of sales that had a valid promotion, the VP of marketing # -- wants to know what % of transactions occur on either # -- the very first day or the very last day of a promotion campaign. select sum(case when valid_promotion = 1 then 1 else 0 end)/count(*) * 100 as percentage from sales where day = First_day(date) or day = last_day(date) or select sum(case when transaction_date = (select min(transaction_date) from sales) then 1 else 0)/count(*) as first_day_sales, sum(case when transaction_date = (select max(transaction_date) from sales) then 1 else 0)/count(*) as last_day_sales from sales or select avg(transaction_date in (p.start_date,p.end_date))*100 as first_last_pct from sales s join promotions p using(promotion_id)
avatar

Data Engineer

Interviewed at Meta

3.6
Aug 25, 2020

Python 1 #1.returns the number of times a given character occurs in the given string s1='missisipi' #print(s1.find('s')) res=[] for i in range(len(s1)): #print(s1[i]) if s1[i]=='s': res.append('s') print(len(res)) #2.[1,None,1,2,None} --> [1,1,1,2,2] arr=[None,1,2,None] new_l=[] for i in range(0,len(arr)): if arr[i] != None: new_l.append(arr[i]) else: new_l.append(arr[i-1]) print(new_l) #2. (python) Given two sentences, construct an array that has the words that appear in one sentence and not the other. A = "Geeks for Geeks" B = "Learning from Geeks for Geeks" d={} for w in A.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 for w in B.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 unmatchedW=[w for w in d if d[w]==1] print (unmatchedW) 3. d = {"a": 4, "c": 3, "b": 12} [(k, v) for k, v in sorted(d.items(), key=lambda x: x[1], reverse=True)] #[('b', 12), ('a', 4), ('c', 3)] SQL # # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg |… Show More 1. find top 5 sales products having promotions Select Sum(s.store_sales), brand_name, count(p.product_id) from products p inner join sales s p.product_id = s.product_id where promotion_id is not null group by brand_name having count(p.product_id) =1 /* single-channel media type */ order by 1 desc limit 5 2. # -- % Of sales that had a valid promotion, the VP of marketing # -- wants to know what % of transactions occur on either # -- the very first day or the very last day of a promotion campaign. select sum(case when valid_promotion = 1 then 1 else 0 end)/count(*) * 100 as percentage from sales where day = First_day(date) or day = last_day(date) or select sum(case when transaction_date = (select min(transaction_date) from sales) then 1 else 0)/count(*) as first_day_sales, sum(case when transaction_date = (select max(transaction_date) from sales) then 1 else 0)/count(*) as last_day_sales from sales or select avg(transaction_date in (p.start_date,p.end_date))*100 as first_last_pct from sales s join promotions p using(promotion_id)

Viewing 31 - 40 interview questions

Glassdoor has 20,965 interview questions and reports from Data engineer interviews. Prepare for your interview. Get hired. Love your job.