In this post you will learn how to efficiently solve the celebrity problem.
Problem Statement
Given a room full of people you are tasked to find the celebrity. Person X
is the celebrity if
- everyone else in the room knows
X
- and
X
does not know anyone in the room.
How can you find the celebrity with the least amount of questions?
Visualizing The Problem
We can visualize the problem with a graph where each node represents a person. A directed edge from D
to A
means that D
knows A
.
Who is the celebrity in the graph below?
There is an edge going to A
from every other node. There is no edge going from A
to any node. A
is the celebrity.
How can we solve this problem in a general way?
First Approach: Brute Force
In our first approach we loop through every person and every question. We ask A
if it knows B
, C
, D
, and E
. We then do the same with B
, C
, D
, and E
. We can call this the brute force approach.
Let’s see it in action. Click “Next” to step through the brute force approach. Compare the changes in the matrix with the highlighted nodes and edges.
In the matrix we write 1 if the person given by the row knows the person given by the column and we write 0 otherwise.
In our example with 5 people we have to ask 20 questions. It’s 5 * 4 and not 5 * 5 because we can assume that everyone knows themselves. This fact is illustrated in the diagonal line of 1s in the matrix. (The reflexive edges (= loops) are omitted from the graph.)
A celebrity can be identified in the matrix by finding a row with all 0s (excluding the entry for itself) for which there is a corresponding column with all 1s. In the matrix above this is row and column A
.
Can we reliably identify the celebrity with less questions?
Divide and Conquer: First Attempt
We try to simplify the problem by spliting it into smaller subproblems. How many questions do we have to ask if there are only 2 people in the room?
Let’s say A
and B
are the two people.
We have to ask A
if she knows B
and B
if she knows A
. We need to ask 2 questions to determine that A
is the celebrity in the room.
For more than two people we can solve the problem recursively.
1. One by one we send a person out of the room until only 2 are left.
2. We find the suspected celebrity by asking both about the other.
3. One by one we invite a person back into the room. For each person
we bring back, we have to find out if (a) the entering person is
the celebrity or if (b) the celebrity is a person in the room. We
use the following steps to establish this:
3a. We ask each person in the room if they know the entering person.
3b. We ask the entering person the same about each person in the room.
Click "Next" to see the algorithm in action
If you do the math, you realize that we are asking the same amount of questions as with the brute force approach. 😕
Why go through the trouble then?
The algorithm above does have one nice property: At the end of each iteration in step 3 we know who the current celebrity is in the room. This is called an invariant.
We can improve our algorithm, if we can reduce the questions while preserving this property.
Divide and Conquer: Second Attempt
We can reduce the number of questions we need to ask in step 3 if we already know something about the person we are inviting back. Our goal is to isolate the celebrity. Ideally we only send out people that we know are not the celebrity.
Let us assume we know how to never send out the celebrity. How would this improve our algorithm?
Let’s walk through it:
- We send a person out one by one, always making sure that we don’t send out the celebrity.
- Finally we are left with 2 people. We know for a fact that – if there is a celebrity – it must be the person we have identified now.
- Then we have to verify that everybody we have previously sent out knows the suspected celebrity and that the suspected celebrity doesn’t know anyone. That’s only two questions per person entering, a big improvement to our previous algorithm.
The only thing left for us to do is to figure out how to never send out the celebrity.
It turns out, we can do this with just one question! 🎉
Here is how: We can ask X
if she knows Y
. If she does, we know that X
is not the celebrity. If she doesn’t, we know that Y
is not the celebrity. Either way we can be absolutely sure that we are not sending out the celebrity.
With everything figured out, we can now update our initial algorithm:
1. One by one we send a person out of the room. To choose who we send out
we ask X if she knows Y. If she does, we send out X otherwise we send out Y.
2. With 2 left we identify the suspected celebrity by asking each about the other.
3. One by one we invite a person back into the room. For each person we bring back,
we have to ask the suspected celebrity about the person entering and the
person entering about the suspected celebrity.
Click "Next" to see the algorithm in action
We have successfully identified the celebrity if each person entering the room knows the suspected celebrity and the assumed celebrity never knows the person entering.
Time Complexity
Now, let’s do the math. How many questions are we asking with our improved algorithm?
We simply follow along with the explanation above to do the calculation:
- First, we are asking one question per person we send out. If there are
n
people in the room in the beginning, then we send outn - 2
people. - Second, we are asking the two remaining people whether they know each other. That’s
2
questions. - Third, we are inviting back
n - 2
people. We ask each about the suspected celebrity and we ask the suspected celebrity about the entering person. That’s2 * (n - 2) = 2n - 4
questions.
Adding everything together we ask a total of (n - 2) + 2 + (2n - 4) = 3n - 4
questions.
While it may not seem like a big improvement for our example (11 questions instead of 20 questions) it is in fact a huge improvement. The brute force approach is in the order of n^2
whereas the improved version is in the order of n
questions.
The larger the number of people the bigger the difference between the two approaches gets. If there are 100 people in the room, the brute force approach requires 100 * 99 = 9900
questions! With our improved algorithm we only have to ask 3 * 100 - 4 = 296
questions.