# Questions d'entretiens - Senior data engineer

# 2 k

Questions d'entretien pour Senior Data Engineer partagées par les candidats## Principales questions d'entretien

### How would you test if survey responses were filled at random by certain individuals, as opposed to truthful selections?

4 réponses↳

This is a very basic psychometrics question. Calculate Cronbach's alpha for the survey items. If it is low (below .5), it is very likely that the questions were answered at random. Moins

↳

I would design the test in a way that certain information is asked two different ways. if two answers disagree with each other I would seriously doubt the validity of the answers. Moins

↳

We need to find the histograms of the questions in the survey to see the distribution of each answer in each question. All question histograms will likely follow the normal distribution if they are truthful selection. If one response with more than of half of total answers being located outside of 95% confidential interval in each histogram, the response will be categorized as random fall out of mean plus tw Moins

### How would you build and test a metric to compare two user's ranked lists of movie/tv show preferences?

4 réponses↳

1) Develop a list of shows/movies that are representative of different taste categries (more on this later) 2) Obtain ranking of the items in the list from 2 users 3) Use Spearman's rho (or other test that works with rankings) to assess dependence/conguence between the 2 people's rankings. * To find shows/movies to include in the measurement instrument, maybe do cluster analysis on large number of viewer's viewing habits. Moins

↳

Look at the mean average precision of the movies that the users watch out of the rankings. So if out of 10 recommended movies one user prefers the third and the other user prefers the sixth, the recommendation engine of the user who preferred the third would be better. InterviewQuery.com has it more in depth of an answer. Moins

↳

It's essential to demonstrate that you can really go deep... there are plenty of followup questions and (sometimes tangential) angles to explore. There's a lot of Senior Data Scientist experts who've worked at Netflix, who provide this sort of practice through mock interviews. There's a whole list of them curated on Prepfully. prepfully.com/practice-interviews Moins

### 1. Given the sample: id, status 1, active 2, active 3, active 4, pending 5, expired 6, expired 7, expired 8, pending Pull the unique statuses that show up consecutively 3 times, e.g. from the sample, the output would be 'active', 'expired'. 2. Given the sample: employee, in_out, time A, IN, 6:00 B, IN, 7:00 A, OUT, 8:00 C, IN, 9:30 A, IN, 9:00 A, OUT, 10:00 B, OUT, 11:00 C, OUT, 10:00 Determine which employees are in the building at 10:30.

4 réponses↳

I was perturbed since I thought this was going to be a Behavioral Interview. I could not answer. Moins

↳

select distinct status from (select *, case when status = lead(status,1) over(order by id) and lead(status,1) over(order by id) = lead(status,2) over(order by id) then 1 else 0 end as consecutive from tab) where consecutive =1 Moins

↳

with cte as ( select * , dense_rank() over(partition by employee order by time) as rnk from table ) select distinct a.employee from cte as a, cte as b where a.employee=b.employee and a. in_out='IN' and b. in_out='OUT' and a.rnk = b.rnk-1 and a.time=10:30 Moins

### Given a list, create a new list that does not include the duplicates of the original list.

3 réponses↳

a = old list b = new list code : a = set(a) b = list(a)

↳

Maybe they were asking to do it in-place. In that case, switch the duplicate elements to the end. Moins

↳

python 4 lines of code.

### The percentage of female customer base

3 réponses↳

Wrote the SQL query to answer this question

↳

Do you have any details on Python questions?

↳

You need demographics data for this. Query would be fairly simple

### If you can build a perfect (100% accuracy) classification model to predict some customer behavior, what will be the problem in application?

3 réponses### Technical case interview which is a mix of modelling skills + classical case interview structure

3 réponses↳

Hi there, Thank you for sharing your experience. Just a quick question - do you remember how long you waited till you heard back after the business case interview stage? Thanks! Moins

↳

Hi there, Sorry you had a bad experience with this interviewer - do you mind giving us the first name of this interviewer? Or at least first and last initials? I'll be sure to contact this employee and point them to training resources at BCG. Thanks. Moins

↳

wow, sorry to hear that. of all of gamma’s shortcomings, lack of common courtesy/EQ would not be on my radar’s radar. Moins

### Imagine you have N pieces of rope in a bucket. You reach in and grab one end-piece, then reach in and grab another end-piece, and tie those two together. What is the expected value of the number of loops in the bucket?

3 réponses↳

Is the question and answer makes sense? I thought the answer is 1/(2n-1). I don't understand why the solution adds all probability from 1 to N case together? For the 2 ropes case, the p(1 loop) = 1/3. So expected number of loop is also 1/3, but why the answer is 1+1/3= 4/3?Am I missing something? Moins

↳

You are right, the long answer failed simple boundary condition: if you tie once after pick two end, the max number of loops is one! So the p(n) is [0,1], lol Moins

↳

I got the correct answer, but the mathematician yelled at me for arriving to slowly at such an "easy" answer. Moins

### If you take 3 subsequent number (n, n+1, n+2) and know, that n and n+2 are prime numbers, can you proove, that n+1 is always dividable by 6?

3 réponses↳

3, 4, 5. 3 and 5 are prime. 4 is not divisible by 6.

↳

n+1 will be divisible by 2 since n and n+2 are prime now n,n+1 or n+2 any one of them should be divisible by 3 n and n+2 are prime so n+1 should be divisible by 3 Hence proved Moins

↳

1. if n and n+2 are prime, the n+1 is dividable by 2; 2.one of the three subsequent number (n, n+1, n+2) must be dividable by 3. because n and n+2 are prime, then n+1 is dividable by 3. So n+1 is dividable by 6. Moins

### Describe the metrics one would use to evaluate a binary classifier.

2 réponses↳

Precision, Recall, F-score, Accuracy, ROC

↳

Through questions like this, interviewers are mostly trying to test your skillset (and its relevance to the role) as robustly as possible, so be prepared for multiple offshoots and followups. It could be a useful exercise to do mocks with friends or colleagues in Bumble to get a real sense of what the interview is actually like. Alternatively Prepfully has a ton of Bumble Senior Data Scientist experts who provide mock interviews for a pretty reasonable amount. prepfully.com/practice-interviews Moins