Questions d'entretiens - Senior data engineer

2 k

Questions d'entretien pour Senior Data Engineer partagées par les candidats

Principales questions d'entretien

Trier: Pertinence|Populaires|Date
On a demandé à Senior Data Scientist...21 octobre 2014

How would you test if survey responses were filled at random by certain individuals, as opposed to truthful selections?

4 réponses

This is a very basic psychometrics question. Calculate Cronbach's alpha for the survey items. If it is low (below .5), it is very likely that the questions were answered at random. Moins

I would design the test in a way that certain information is asked two different ways. if two answers disagree with each other I would seriously doubt the validity of the answers. Moins

We need to find the histograms of the questions in the survey to see the distribution of each answer in each question. All question histograms will likely follow the normal distribution if they are truthful selection. If one response with more than of half of total answers being located outside of 95% confidential interval in each histogram, the response will be categorized as random fall out of mean plus tw Moins

Afficher Plus de réponses

How would you build and test a metric to compare two user's ranked lists of movie/tv show preferences?

4 réponses

1) Develop a list of shows/movies that are representative of different taste categries (more on this later) 2) Obtain ranking of the items in the list from 2 users 3) Use Spearman's rho (or other test that works with rankings) to assess dependence/conguence between the 2 people's rankings. * To find shows/movies to include in the measurement instrument, maybe do cluster analysis on large number of viewer's viewing habits. Moins

Look at the mean average precision of the movies that the users watch out of the rankings. So if out of 10 recommended movies one user prefers the third and the other user prefers the sixth, the recommendation engine of the user who preferred the third would be better. has it more in depth of an answer. Moins

It's essential to demonstrate that you can really go deep... there are plenty of followup questions and (sometimes tangential) angles to explore. There's a lot of Senior Data Scientist experts who've worked at Netflix, who provide this sort of practice through mock interviews. There's a whole list of them curated on Prepfully. Moins

Afficher Plus de réponses

1. Given the sample: id, status 1, active 2, active 3, active 4, pending 5, expired 6, expired 7, expired 8, pending Pull the unique statuses that show up consecutively 3 times, e.g. from the sample, the output would be 'active', 'expired'. 2. Given the sample: employee, in_out, time A, IN, 6:00 B, IN, 7:00 A, OUT, 8:00 C, IN, 9:30 A, IN, 9:00 A, OUT, 10:00 B, OUT, 11:00 C, OUT, 10:00 Determine which employees are in the building at 10:30.

4 réponses

I was perturbed since I thought this was going to be a Behavioral Interview. I could not answer. Moins

select distinct status from (select *, case when status = lead(status,1) over(order by id) and lead(status,1) over(order by id) = lead(status,2) over(order by id) then 1 else 0 end as consecutive from tab) where consecutive =1 Moins

with cte as ( select * , dense_rank() over(partition by employee order by time) as rnk from table ) select distinct a.employee from cte as a, cte as b where a.employee=b.employee and a. in_out='IN' and b. in_out='OUT' and a.rnk = b.rnk-1 and a.time=10:30 Moins

Afficher Plus de réponses

Given a list, create a new list that does not include the duplicates of the original list.

3 réponses

a = old list b = new list code : a = set(a) b = list(a)

Maybe they were asking to do it in-place. In that case, switch the duplicate elements to the end. Moins

python 4 lines of code.


The percentage of female customer base

3 réponses

Wrote the SQL query to answer this question

Do you have any details on Python questions?

You need demographics data for this. Query would be fairly simple


If you can build a perfect (100% accuracy) classification model to predict some customer behavior, what will be the problem in application?

3 réponses

Distribution shift. You can never guarantee your train or test distribution covers future observations. Moins

Than we have a determinist problem, so what is the point of building a model at all Moins

Than we have a determinist problem, so what is the point of building a model at all Moins

Boston Consulting Group

Technical case interview which is a mix of modelling skills + classical case interview structure

3 réponses

Hi there, Thank you for sharing your experience. Just a quick question - do you remember how long you waited till you heard back after the business case interview stage? Thanks! Moins

Hi there, Sorry you had a bad experience with this interviewer - do you mind giving us the first name of this interviewer? Or at least first and last initials? I'll be sure to contact this employee and point them to training resources at BCG. Thanks. Moins

wow, sorry to hear that. of all of gamma’s shortcomings, lack of common courtesy/EQ would not be on my radar’s radar. Moins


Imagine you have N pieces of rope in a bucket. You reach in and grab one end-piece, then reach in and grab another end-piece, and tie those two together. What is the expected value of the number of loops in the bucket?

3 réponses

Is the question and answer makes sense? I thought the answer is 1/(2n-1). I don't understand why the solution adds all probability from 1 to N case together? For the 2 ropes case, the p(1 loop) = 1/3. So expected number of loop is also 1/3, but why the answer is 1+1/3= 4/3?Am I missing something? Moins

You are right, the long answer failed simple boundary condition: if you tie once after pick two end, the max number of loops is one! So the p(n) is [0,1], lol Moins

I got the correct answer, but the mathematician yelled at me for arriving to slowly at such an "easy" answer. Moins


If you take 3 subsequent number (n, n+1, n+2) and know, that n and n+2 are prime numbers, can you proove, that n+1 is always dividable by 6?

3 réponses

3, 4, 5. 3 and 5 are prime. 4 is not divisible by 6.

n+1 will be divisible by 2 since n and n+2 are prime now n,n+1 or n+2 any one of them should be divisible by 3 n and n+2 are prime so n+1 should be divisible by 3 Hence proved Moins

1. if n and n+2 are prime, the n+1 is dividable by 2; of the three subsequent number (n, n+1, n+2) must be dividable by 3. because n and n+2 are prime, then n+1 is dividable by 3. So n+1 is dividable by 6. Moins


Describe the metrics one would use to evaluate a binary classifier.

2 réponses

Precision, Recall, F-score, Accuracy, ROC

Through questions like this, interviewers are mostly trying to test your skillset (and its relevance to the role) as robustly as possible, so be prepared for multiple offshoots and followups. It could be a useful exercise to do mocks with friends or colleagues in Bumble to get a real sense of what the interview is actually like. Alternatively Prepfully has a ton of Bumble Senior Data Scientist experts who provide mock interviews for a pretty reasonable amount. Moins

1 - 10 sur 2 388 Questions d'entretien

Consultez les questions posées en entretiens pour des emplois similaires

senior software engineer