Serialize and deserialize binary tree
Big Data Engineer Interview Questions
1,228 big data engineer interview questions shared by candidates
1.SQL: **d_customers** +-------------+-----------------------+---------------------+ | customer_id | membership_start_date | membership_end_date | +-------------+-----------------------+---------------------+ | 114 | 2015-01-01 | 2015-02-15 | | 116 | 2015-02-01 | 2015-03-15 | | 120 | 2015-02-15 | 2015-04-01 | | 221 | 2015-03-15 | 2015-10-01 | | 120 | 2015-05-15 | 2015-07-01 | +-------------+-----------------------+---------------------+ **d_shipments** +-------------+------------+-----------------------+----------+ | shipment_id | ship_date | receiving_customer_id | quantity | +-------------+------------+-----------------------+----------+ | 1 | 2015-02-13 | 114 | 2 | | 2 | 2015-03-01 | 116 | 4 | | 2 | 2015-03-01 | 116 | 1 | | 3 | 2015-06-01 | 116 | 1 | | 4 | 2015-03-01 | 120 | 6 | | 5 | 2015-10-01 | 120 | 3 | | 6 | 2015-03-01 | 321 | 10 | +-------------+------------+-----------------------+----------+ Populate **a_shipments** +-----------+-----------+----------+----------+----------+ | ship_date | customer_id | is_member | quantity | +-----------+-----------+----------+----------+----------+ the column [is_member]: if [ship_date] is between [membership_start_date] and [membership_end_date] then 'y', else 'N' sample of otput: 2015-03-01 | 116 | Y | 5 | 2015-06-01 | 116 | N | 1 | 2. Coding task. Check whether a string is palindrome. I have been asked to code a solution by iterative and recursive approach. 3. Big Data questions: 3.1. What format of files in Hadoop do I know? What is a difference between Avro and Parquet format? 3.2. How compression is used in Avro and Parquet formats? 3.3. Most difficult big data performance challenges you have faced and resolved? 3.4. Spark optimization. Spark cost based optimizer
SQL question, how to retrieve data using a join condition along with windowing features.
Talk about your personality, strengths, and skills you have acquired from prior experiences
What have you done with hive?
What is combinebykey SCD1 logic Different between edge node and data node Where the code will be deployed? (edge node or in cluster) YARN architecture What are all the versions of spark you have worked? Diff btw SchemaRDD and df Different ways to create dataframe what is bundle in oozie? fork action in oozie? distcp command how do you decide number of mappers in sqoop job? what is the optimal number of mappers provided there is no restriction in establishing connection to DB? how to do you pull clob,blob datatype in oracle to HDFS? semi join,anti-join in scala diff between logical plan and physical plan where can we see logical plan?
State the use cases for batch processing and streamline processing.
Describe your previous work experiences.
interview questions were mostly from experience and easy.
How you would code for AES256 encryption? They were encountering this with one of their current clients and wanted to know how I had done in my previous projects.
Viewing 1171 - 1180 interview questions