Supermarket chain Edeka currently runs this sticker collecting racket – which used to be smurf collecting – where you get four stickers with animal pictures for every €10 spent. The kids are collecting like crazy, I am told, even though the pictures are mediocre at best. Anyway, as geeks we don’t care much about the pictures but rather the combinatorics and probability laws that make this scheme so successful.

#### Expectations

The simplest quantity to calculate is the expected number of stickers needed to have a full collection of all $m$ stickers (for the current Edeka campaign there are $m$ = 180 different stickers). Presumably, the stickers are randomly drawn from a set $M=\bigl\lbrace 1,2,\ldots,m\bigr\rbrace$ with uniform probability (each sticker is equally probable), where we simply assign a number for each unique sticker instead of writing out the animal names. We also neglect the fact that the stickers come in packs of four – this will affect the probabilities slightly as the packs are randomly mixed but will likely not contain doubles.

The probability distribution of the number of (Bernoulli) draws necessary to obtain a “success” given success probability $p_i$ is given by the geometric distribution [1]. Here, success is defined as obtaining a new sticker not yet within the collection. This probability $p_i$ reduces with each new sticker obtained. It is obviously 1 in the first draw, but the second sticker drawn will be new only with probability $m-1\big/m$ since there are only $m-1$ stickers new out of the $m$ total. Generally,

$$p_i = \frac{m-\bigl(i-1\bigr)}{m} \quad \text{for} \quad i\in\bigl\lbrace 1,2,…,m \bigr\rbrace \label{geometric-probability}$$

The total number of stickers drawn, $n_\Sigma$, is then just the sum of the number $n_i$ of stickers drawn until success $i$:

$$n_\Sigma = \sum\limits_{i=1}^{m} n_i \label{n_Sigma}$$

The expected value of the geometric distribution is the inverse of $p_i$, and since the expectation of a sum is just the sum of expectations,

$$\mathcal{E}\Bigl[n_\Sigma\Bigr] = \sum\limits_{i=1}^{m} \frac{1}{p_i} = \sum\limits_{i=1}^{m} \frac{m}{m-\bigl(i-1\bigr)} \label{expected-necessary-draws}$$

For the $m$ = 180 animal stickers this means that **we need to collect approximately 1039 stickers on average to have the full set**.

Deriving the distribution of $N$ is somewhat more difficult, since it involves the convolution of $m$ probability distributions (alternatively, multiplication of $m$ probability-generating functions with subsequent inverse z-transform of the result).

#### Combinatorics

A different way to calculate the various probabilities and their distributions is from a combinatorical point-of-view. To find the probability of $j$ stickers missing (out of $m$ possible) in a random draw of $n$ stickers – or alternatively, the probability of $r=m-j$ different stickers being present in a drawn sample – it is sufficient to find the number of different draws of $n$ stickers in which exactly $j$ stickers are missing and normalize it by the total number of possible samples. The latter is simply $m^n$.

To find the number of possible draws with exactly $j$ stickers missing, we can determine the number of possible draws of $n$ stickers which contain all stickers from a reduced set of size $m-j$ and multiply then this by the number of combinations of the $j$ missing stickers (since we do not care which stickers are missing). This problem is very closely related to the *birthday problem* in which we’d like to know how many different birthdays occur on average in a group of $n$ people. For the birthday problem, $m=365$ discounting leap years. The birthday problem is very well explained and its combinatorics derived in [2] (see item 13 for the derivation of the most interesting quantity). The probability of exactly $j$ stickers missing out of a set $M$ of size $m$ in a random draw of $n$ stickers is$^1$

$$P_\text{missing}(j,m,n) = \binom{m}{j} \sum\limits_{k=0}^{m-j} (-1)^k \binom{m-j}{k}\left(1 - \frac{j + k}{m}\right)^n \label{P_missing}$$

with

$$j \in \Bigl\lbrace\max\Bigl[m-n,0\Bigr], \ldots, m-1\Bigr\rbrace \notag$$

since there have to be at least $m-n$ stickers missing if $n\lt m$. The probability mass function for the number $r=m-j$ of collected stickers out of $m$ total after $n$ draws (which is also given by $P_\text{missing}(j,m,n)$) is shown below.

Interestingly, the spread of this distribution is not very large; the number of unique stickers one can expect to own at pretty much any one point does not vary much. By calculating the distribution for every $n$ we can plot the expected number of unique stickers over $n$:

#### A Full Set

So, then, how many stickers to we need to collect to achieve $j=0$ (a full set) with some probability $p$? This can be determined with the help of $\eqref{P_missing}$ by noting that in order to complete the collection on the $n$th sticker, there must have been one sticker missing on the $n-1$th draw. The probability of this case is $P_\text{missing}(1,m,n-1)$ and the probability of getting the missing sticker on the $n$th draw is $1/m$ as in $\eqref{geometric-probability}$, so that the probability of having the full set of stickers after exactly $n$ draws is

$$P_\text{full}(m,n) = \sum_{k=0}^{m-1} (-1)^k \binom{m - 1}{k} \left(1 - \frac{1+k}{m}\right)^{n-1}$$

with

$$n \in \Bigl\lbrace m, m + 1, \ldots \Bigr\rbrace\notag$$

This is the probability mass function (PMF) for $n_\Sigma$ as defined earlier in $\eqref{n_Sigma}$. The probability of having a complete set after $n_\Sigma$ draws also includes the probabilities of having completed the set on an earlier draw, so we simply integrate the PMF to obtain the cumulative density function:

Fortunately, there are swap meets and places like eBay, or a lot of mommies would need to spend loads of money at Edeka (which I assume is the whole purpose of this exercise) – to get the 1039 stickers needed (on average) means spending about €2,600 in the supermarket chain.

#### Other Cases

Unfortunately, it is not reasonably possible to make $m$ or $n$ arbitrarily variable within this page, as browser mathematics is too limited to correctly compute the desired quantities on-the-fly (a lot of almost equal numbers are added and subtracted within the sums which poses great problems for floating point calculators). So we are limited to choosing from a few pre-computed cases. Click on the buttons below to see the graphics above change accordingly.

Choose a value for $m$ (number of different items available):

Footnotes:

**1** As of October 5, 2012 the formula given in [2] has the index of the sum running from 1 instead of 0, which I believe is an error.

References:

[1] Geometric Distribution [Wikipedia]

[2] The Birthday Problem [University of Alabama]

last posts in mathematics:

At REWE you geht 5 Sticker for 10€, but you need 210 for a full set.

I added 210 to the set of options. You’ll need 1245 stickers on average, but since you get them at 5/4 the EDEKA rate, you’ll only need to spend roughly €2500.

Thanks.

We get them with much less money to spend. - fortunately