Monday, March 30, 2009

Trivial Statistics

So we got the new Scene It? Box Office Smash for the xbox 360.
We really like trivia games and we especially enjoy the Scene It series (we also have Lights, Camera, Action).
We like the subject, but much more than that we enjoy the presentation. Questions like anagrams, pictogram, skewed images that gradually clear, sound bites that require that you pay attention to the details etc'. All in all its a well made game with original question presentation. It also has the advantage of being one of the too few 4 player games (on the same xbox).

However, there is a serious downside to the game and it has to do with the way questions are selected.
In Scene It? Lights,Camera, Action there were 1,800 new questions (For some reason this information on Box Office Smash is hard to track down). Unfortunately we never got to experience all these questions because relatively quickly we started noticing questions repeating. This got me thinking: there are 1,800 questions and there's no way we played long enough to see 1,800 questions, why are we seeing repeats?

The cause as it turns out, is statistics. Specifically a phenomenon similar to the birthday paradox.
Quick review for those who are not familiar with the birthday paradox and can't be bothered to read the wiki page:
If you sit in a room with a group of 23 randomly selected people (like a classroom) there's a 50 percent chance 2 people in the room have the same birthday.
for 57 people there's >99% chance for two people to have the same birthday.
To get the general idea why your intuition (that this is sounds wrong) is wrong, think of the 2 kids in your elementary school or high school class that had the same birthday.
Oh, and think of how many people you know that also had 2 kids with the same birthday in their class (hint: you went to school with them :) )*.

How is this the same for a game that randomly selects questions?
Without going into the math too much lets assume you played the game for a little while and you saw 180 questions (10%).
At this point, the probability that the next question is a question you already answered is ~1/10. This already is high enough to be discouraging. Worse yet, the expected number of repeat questions until you reach 180 questions is ~8.4! (because of the birthday paradox).
Now lets assume you played long enough to answer 600 (~33%) questions in the game. Out of the 600 question ~80(!!) are expected to be repeats. Needless to say, if you saw a third of the questions (you need to answer more then 600 questions for that) then one in every three questions will be one that you saw already.
At 1/10 its annoying. at 1/3 the game is unplayable. So everybody loses. You enjoy only about a third of the value you thought you were getting from the game, and the game developer works very hard to create 1,200 more questions that you will never see.

The good news is that there is a pretty simple solution for this problem that works out well for everyone.
You see, the problem stems from the method of randomly selecting the questions. Fortunately there is a different method that both guarantees randomly selected questions and no repetitions. instead of "rolling the dice" every time we need to select a game we select a permutation of the sequence of questions (in simple terms - we randomly rearrange the sequence of questions) and save this permutation. This guarantees that you will see all 1,800 questions before you see a question you know.
Since 1,800 is a decent number of questions there's a good chance that you will forget the earlier questions in the sequence. But even if you don't - at least you get to enjoy all of the questions in the game.

BTW: On xbox.com Feature listing for the game mentions a "minimal repeat" feature for the game which keeps track of questions answered to minimize repeats.
I don't know how that feature is implemented, but i can tell you from experience that it doesn't seem to work very well as we saw plenty of repeats. Besides, there's no reason the number of repeats should be anything other then zero.

Oh, and as for the issues of online play and playing against different players etc'. There are solutions to all of these problems (generate a permutation for the unseen subgroup on all player's lists, you can also add a timestamp to the question to ensure a 'long' time between repeats). I know it's not ideal and i know that there are details to work out. But i believe that this is a core issue for any trivia game and a great game needs to give the very best possible solution.

One final point, and i may be way off on this one. I think if i knew that i was running out of questions or if there was some way of indicating to me that i saw most of what's available, it would encourage me to get the question packs. It might be nice if this was done automatically (you only have 200 questions left, why not try the XYZ expansion).

* - yes, yes, i know *technically* its statistical lie. But what better way to fight faulty statistical intuition than faulty statistical intuition?

No comments: