Entretien pour Data Scientist
You're about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call 3 random friends of yours who live there and ask each independently if it's raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that "Yes" it is raining. What is the probability that it's actually raining in Seattle?
Réponses aux questions d'entretien
Bayesian stats: you should estimate the prior probability that it's raining on any given day in Seattle. If you mention this or ask the interviewer will tell you to use 25%. Then it's straight-forward: P(raining | Yes,Yes,Yes) = Prior(raining) * P(Yes,Yes,Yes | raining) / P(Yes, Yes, Yes) P(Yes,Yes,Yes) = P(raining) * P(Yes,Yes,Yes | raining) + P(not-raining) * P(Yes,Yes,Yes | not-raining) = 0.25*(2/3)^3 + 0.75*(1/3)^3 = 0.25*(8/27) + 0.75*(1/27) P(raining | Yes,Yes,Yes) = 0.25*(8/27) / ( 0.25*8/27 + 0.75*1/27 ) **Bonus points if you notice that you don't need a calculator since all the 27's cancel out and you can multiply top and bottom by 4. P(training | Yes,Yes,Yes) = 8 / ( 8 + 3 ) = 8/11 But honestly, you're going to Seattle, so the answer should always be: "YES, I'm bringing an umbrella!" (yeah yeah, unless your friends mess with you ALL the time ;)
Utilisateur anonyme le
Answer from a frequentist perspective: Suppose there was one person. P(YES|raining) is twice (2/3 / 1/3) as likely as P(LIE|notraining), so the P(raining) is 2/3. If instead n people all say YES, then they are either all telling the truth, or all lying. The outcome that they are all telling the truth is (2/3)^n / (1/3)^n = 2^n as likely as the outcome that they are not. Thus P(ALL YES | raining) = 2^n / (2^n + 1) = 8/9 for n=3 Notice that this corresponds exactly the bayesian answer when prior(raining) = 1/2.
DS Junkie le
26/27 is incorrect. That is the number of times that at least one friend would tell you the truth (i.e., 1 - probability that would all lie: 1/27). What you have to figure out is the odds it raining | (i.e., given) all 3 friends told you the same thing. Because they all say the same thing, they must all either be lying or they must all be telling the truth. What are the odds that would all lie and all tell the truth? In 1/27 times, they would the all lie and and in 8/27 times they would all tell the truth. So there are 9 ways in which all your friends would tell you the same thing. And in 8 of them (8 out of 9) they would be telling you the truth.
Utilisateur anonyme le
I thought about this a little differently from a non-bayes perspective. It's raining if any ONE of the friends is telling the truth, because if they are telling the truth then it is raining. If all of them are lieing, then it isn't raining because they told you that it was raining. So what you want is the probability that any one person is telling the truth. Which is simply 1-Pr(all lie) = 26/27 Anyone let me know if I'm wrong here!
nub data scientist le
I flagged Nub data scientist's answer as useful, because it shows an interesting flaw in reasoning. The 3 random variables are not to be treated as intrinsically independent. Only conditioned on the truth (raining/not raining) are they independent.
This problem requires the marginal probability of rain to solve, following Interview Candidate's answer. M.B. provides the rationale behind why the bayes approach is necessary: if the pr(rain) = 0, then the pr(rain|y, y, y) = 0. (maybe it is July in Seattle). A few conceptual problems in many answers that I want to point out: 1) There is lots of conflation between Pr(truth) and Pr(Y). Pr(truth) = Pr(Y|R) does not equal Pr(Y). 2) Consider there is only a single friend and they say yes, the logical conclusion from a lot of these answers is that Pr(Rain|Yes) = Pr(Yes|Rain) = 2/3, which is not correct. Bayes' rule is very clear in this simpler case. 3) The friends' answers are conditionally independent assuming no collusion. The combinations of their honesty/lying adds no additional information. The marginal probabilities are not independent, Pr(y,y,y) does not equal pr(y)^3, it equals pr(y,y,y,rain) + pr(y,y,y, no rain), the integration of the joint space over rain. Using conditional independence and bayes rule, this becomes: pr(y|rain)^3*pr(rain) + pr(y|no rain)^3(1-pr(rain)). A more general solution using Pr(rain) = r. Pr(rain|y,y,y) = Pr(y,y,y|rain)*pr(rain)/pr(y,y,y) #Bayes' formula pr(y,y,y|rain) = pr(y|rain)^3 = (2/3)^3 #conditional independence pr(y,y,y) = pr(y|rain)^3*pr(rain) + pr(y|no rain)^3*pr(no rain) #by definition, see point 3 the answer: r*(2/3)^3 / [r*(2/3)^3 + (1 - r)*(1/3)^3]
Let Y denote rain, N denote no rain Actual Answer probability ------------------------------------------ Y=> 8/27 YYY, 1/27 NNN, 12/27 YYN, 6/27 YNN N=> 1/27 YYY, 8/27 NNN, 6/27 YYN, 12/27 YNN So, P(Y|YYY) = (8/8+1) = 8/9
Utilisateur anonyme le
Interview Candidate solves this problem using Bayesian stats despite the fact that no enough information is given to do Bayesian probability analysis i.e. he had to pull the probability of it raining in Seattle out of thin air when it was not given in the interview question. With only the information from the interview question, we have to assume that friends are either all lying or all telling the truth. Let truth=T and lie=L P(TTT)=8/27, P(LLL)=1/27, P(TLL)=2/27,P(TTL)=4/27. But we know that they all had the same answer, so we must compare P(TTT) to P(LLL). P(TTT) is 8 times more likely than P(LLL), so we have P(All same answers|TTT)=8/9, P(All same answers|LLL)=1/9. Therefore the solution given ONLY THE INFORMATION GIVEN is P(Rain)=8/9, P(Dry)=1/9.
There is an obvious conceptual reason as to why several answers here (ones that don't use Bayes' formula) are incorrect. The probability in question has to depend on the probability of rain in Seattle. If, for the sake of discussion, it ALWAYS rains in Seattle, i.e. P(rain)=1, then the required prob. is always 1 as well. Likewise if it's a place where it never rains, or if the question asks about the prob. of it raining elephants given the 3 friends said yes, it'd be still 0. I believe this is a std. textbook example of the Bayes' formula, anything short of that I don't think will work out.
This can easily be solved without Bayes: There are two cases: Case 1: It is raining and all friends are telling the truth: 0.25*(2/3)^3 = 1/4*8/27 Case1: It is not raining and all friends are lying: 0.75*(1/3)^3 = 3/4*1/27 Probability: P(E) = Case1 / (Case1+Case2) = (1/4*8/27) / (3/4*1/27 + 1/4*8/27) = 2 / (11/4) = 8/11
Rule of conditional probability states P(A|B) = P( A & B ) / P(B) Reformulating to this case, P(Rain | 3Y) = P(R & 3Y) / P(3Y) P(R & 3Y) = 2/3 ^3 (if it is raining, then they must all speak the truth) = 8/27 (one could multiply probability of rain here. I assumed as prior) P(3y) = all truth or all lie = 2/3 ^ 3 + 1/3 ^3 = 9/27 hence P(R | 3Y) = 8/9
Let X be the probability it's raining. Obviously we want P(X|all three say yes). Now let Y be the probability at least one of them is lying. If Y = 0 it's easy to solve, if not then not so easy. Now you keep going.
Obvious, bayesian is a way to go...
There is a way to easily confirm the right answer. Just write a computer simulation and run it a few million times, which I did. If the long term chance of rain in Seattle is 25%, the chance it is raining now, given the YYY answers and the 2/3 truth 1/3 lying, is 73% (rounded to whole number), which is the same as 8/11, so the reasoning with the Bayesian math is correct.
Most of the answers/comments made all unconditional assumptions except a few reasonings that lead to the 8/9 probability. Note that the question states that "Each of your friends has a 2/3 chance of telling you the truth". This essentially means P(raining, yes) + P (non-raining, no) = 2/3. Any attempts to interpret this as conditional probability P(raining | yes) = 2/3 or P(yes | raining) = 2/3 are making other assumptions.
Utilisateur anonyme le
In the absence of further information, the only correct answer is the posterior probability of rain p is in the interval (0, 1). In the absence of further information any prior is as good as any other, so by implication the posterior can take any value as well. The interval for p can be restricted to [0, 1] on the assumption that the question to the friends would not be posed if the prior is absolute certainty whether it will rain or not. With the further assumption that the prior probability is measured with limited precision (e.g. rounded to a percentage point), the posterior would be in the interval (0,075, 1). If the alternative assumption is made that information from the friends will be requested only if it had any chance to move the posterior below or above 0.5, the posterior interval for the probability is (0.5, 1). any more precise answer than that requires further information about the prior which is not supplied in the original problem formulation. Also note that even a precise answer about the probability of rain is not sufficient to answer the question whether an umbrella should be brought or not.
The probability of each of the friend say "YES" is 2/3 * 2/3 * 2/3 = 8/27. Now the probability that it is actually raining in Seattle depends on that how do I select them to phone. There is only three way to select and phone them. So, the probability that it is actually raining in Seattle is 3 * (8/27) = 8/9.
Utilisateur anonyme le
At least 66,6% chance it is raining so leave your umbrella because nobody uses an umbrella in Seattle.
Matthew Seed le
The probability of Seattle beginning to rain would likely to be 2 out of the 3rd chance.
Taylor Ingles le
If all three says "Yes, it is raining" then it is actually raining.
I think there are two possible ways of approaching this question: - All of your friends should be giving you the same answer, thus making the possibility 8/9 that they it is raining in Seattle. - At least one of your friends tells you it is raining. Considering the surface area of Seattle is 217km2, it could be raining in some part of the city where one of your friends lives, but not where another friend lives. Thus making the probability 26/27. I guess both answers could be correct depending on how you understand/interpret the question.
Wait, why can't we just check Seattle weather on phone!
Utilisateur anonyme le
The actual chance of rain in the context of Seattle's climate is irrelevant because the core of this question is the chance of the friends are telling the truth; they don't decrease their likelihood of lying if there is an increased likelihood of rain, and vice versa. So, if they all say it's raining and it is in fact raining, then TRUTH = RAIN and LIE = NOT RAIN. Even if it's a 1% chance of rain, it's either raining or it isn't, so each friend is either telling the truth or they are not. This % chance does not affect whether or not they lie, hence, their probability of truth-telling is not conditional on whether or not it is raining. Think of it this way: if I tell you I saw Haley's Comet or rode a dolphin or (insert any assertion here), I am either telling the truth, or not. It either happened or it didn't. If we make the chance of lying dependent or conditional upon the assertion, then there would be a statement like "The friends tell the truth 2/3 of the time when there is a __% of rain" etc. All this to say, the overall probability that the friends are telling the truth is the product of their individual chances of telling the truth, or (2/3)^3 = 8/27, if they say it's raining AND it actually is raining (TRUTH = RAIN) Then again I'm just a business student with an undergraduate degree in English and there are many things in this universe that confound me so I could be wrong.
YES=yes,yes,yes T=truth, truth, truth L=lie,lie,lie P(Rain|YES)=P(Rain|YES,T)*P(T)+P(Rain|YES,L)*P(L) P(Rain|YES,L)=0==> whats the probability of rain given we know that they are lying and theyve told us it is raining. P(Rain|YES)=P(Rain|YES,T)*P(T) P(Rain|YES,T)=1==> whats the probability of it raining given that they are telling the truth and have told us its raining then P(T)=(2/3)^3 its obvious. why in the world would i do bayesian methods when its certain
to do le
Here's another perspective on how to answer a question like this: Bring an umbrella. It's Seattle - if it's not raining right now, it probably will be by the time you get there.
another candidate le
Please correct me if incorrect. But I would just prefer to condition. either they are all telling the truth and its it raining or they are all lying and it is not raining. P(rain)=P(rain|truth,truth,truth)*P(truth,truth, truth)+P(rain|lie,lie,lie)*P(lie,lie,lie) notice that truth does not mean yes it is raining, it simply corresponds to them telling the truth. Since they said yes, IF they were lying and we knew they were lying then the probability of rain would be zero, thus eliminating the second term. P(rain)=P(rain|3xtruth)*P(3xtruth) and the probability of the truth is (2/3)^3 and the probability of rain if they are telling the truth is 1. I did a little skipping of steps, since truth doesnt equal yes, but i just sort of meshed it toegher towards the end
I agree with TLP and nub scientist. For me, the question is really (1 - the odds that all three of your friends are lying to you) Clearly 1 - 1/3 * 1/3 * 1/3. It's convenient that they all gave the same answer, otherwise it would be more difficult.
8/27 is not the answer. For the weather to be nice in this case, all 3 of your friend NEED to have lied to you. Therefor the odds are 1/27.
Utilisateur anonyme le
It should be (2/3)^3, I think zen and todo is correct.
The probability of raining is that they are all telling the truth, therefore, (2/3)^3.
I agree with (2/3)^3.
TLP and nub data scientists, Your answers include possibilities which are not feasible; we cannot have any combination of 2/3 and 1/3 together... what about (2/3)^3?
Isn't the answer 2/3. The key thing is that they are ALL saying "Yes". You can't have all 3 says yes and have some people lying and some people telling the truth. It either is raining or it isn't. Not both. They either are all lying or all telling the truth. Since they are all in agreement (all lying or all truthful), they are essentially voting as one person. What is the probability that one person is telling the truth? 2/3
Wannabe data scientist le