Hello all and welcome back to the Excel Tip of the Week! This week we have a Developer-level post in which we are going to take a particular complicated scenario we want to model, and we’re going to use it to get deeper into understanding how loops and variables can help us when we’re working in VBA.
The basic problem
Imagine you roll a normal die and get a 5. Next time, you get a 4, and then another 5. How many rolls should you expect to need before you have seen all six sides of the die?
This version is fairly simple to solve explicitly mathematically. The first roll will give us a never-before-seen face for sure, so that will take 1 roll. Then 5/6 of the faces are novel, so we should expect to need another 6/5 rolls to see one of them. Then we have 4 new faces to find, which on average will take 6/4 rolls, and so on. That gives us:
6/6 + 6/5 + 6/4 + 6/3 + 6/2 + 6/1 = 14.7
So on average we should expect to need 14.7 rolls.
This same logic applies for dice with different numbers of faces, or other similar situations. For example, if you are trying to collect a complete set of eleven football stickers, and they are equally likely to show up, then the same calculation tells you to expect to need to buy 33.2 packets of stickers on average.
In Excel 365 this calculation can be easily written for a number of events as:
Taking it to the limit
Now we are going to move to the more complicated version of our situation that we will be exploring via modelling with VBA. We imagine we have a question bank for an exam (or any other similar situation). The exam consists of a certain number of question slots, and there are multiple potential questions for each slot. How many times would somebody have to take the exam to see every question? This scenario is analogous to plenty of other situations with more complicated random behaviour.
Now getting an exact answer mathematically is much less approachable. Instead, we will simulate the process of taking and retaking the exam until every question has been seen, and then automate re-running that simulation many times, until we have enough data to approximate the result.
Let’s look at how to simulate this all in Excel. We’ll start by clearing the output range of anything from previous times the macro has been run and then defining our variables, and asking the user to set the values of the ones that are user-defined:
Dim ExamSize As Integer
ExamSize = InputBox("How many questions in the exam?")
Dim PoolDepth As Integer
PoolDepth = InputBox("How many possibilities for each question?")
Dim SimulationSize As Integer
SimulationSize = InputBox("How many simulations to run?")
Dim QuestionsSeen As New Collection
Dim InCollection As Boolean
Dim LoopCounter As Integer
Dim QuestionValue As String
[code goes here]
The first three variables are easily understandable by the InputBox statements that define them. QuestionsSeen is a collection of all the questions seen in the current run, InCollection is a true/false that we’ll use to check if a new question needs to be added to the collection or has been seen before, LoopCounter will track how many exams we have to take until our collection is complete, and QuestionValue is the ID of the current question we’re looking at.
Now we need a For statement that will run our simulation the required number of times:
For i = 1 To SimulationSize
LoopCounter = 0
Do While QuestionsSeen.Count > 1
[code to run a simulation goes here]
Cells(i, 1) = LoopCounter
We start by setting LoopCounter to 0, and clearing out anything in the QuestionsSeen collection so we can start afresh. We then run the process of taking exams until we’ve seen all the questions, counting how many times we do that by incrementing LoopCounter, and then finally output the number of attempts it took us in column A.
Let’s dive deeper and see how we’re actually simulating taking the exam many times:
Do While QuestionsSeen.Count < ExamSize * PoolDepth
LoopCounter = LoopCounter + 1
For j = 1 To ExamSize
InCollection = False
QuestionValue = j & "/" & WorksheetFunction.RandBetween(1, PoolDepth)
For k = 1 To QuestionsSeen.Count
If QuestionsSeen(k) = QuestionValue Then
InCollection = True
If InCollection Then
Breaking this down into English:
- We are going to run this process until the number of items in our QuestionsSeen collection is the total size of the question pool, aka the number of question slots multiplied by the depth of the question pool for each slot
- We increment the number of loops by 1 (we start at 0 so we do this for every loop)
- We run a for-loop that is the length of the exam, simulating a single taking of the test
- For each question, we generate a random candidate in a format such as 1/2 – the question slot / the specific option that showed up this time
- We then check through the QuestionsSeen collection to see if we have seen this one before
- If we haven’t, we add the question to QuestionsSeen
- We continue until all questions have been seen
The final output of this code will be a column listing how many exams had to be taken to exhaust the question bank for each simulation. From this we can get a sense of our expectation. For example, if we have a 30-question slot exam with four candidate questions for each slot, on average we should expect a test-taker to need to take a little over 19 exams to see all the questions. But more importantly, we can see how to use collections, For loops, Do While loops, and a few variables to model a complex scenario.
You can check out the full code in the attached file.
This article is brought to you by the Excel Community where you can find additional extended articles and webinar recordings on a variety of Excel related topics. In addition to live training events, Excel Community members have access to a full suite of online training modules from Excel with Business.