\newcommand{\ra}{\rightarrow} \newcommand{\la}{\leftarrow} \newcommand{\Ra}{\Rightarrow} \newcommand{\La}{\Leftarrow} \newcommand{\Prob}[1]{\textbf{Pr}\!\left[#1\right]} \newcommand{\Pr}[1]{\textbf{Pr}\!\left[#1\right]} \newcommand{\prob}[1]{\textbf{Pr}[#1]} \newcommand{\pr}[1]{\textbf{Pr}[#1]} \newcommand{\Expect}[1]{\textbf{E}\!\left[#1\right]} \newcommand{\expect}[1]{\textbf{E}[#1]} \newcommand{\EE}[1]{\Expect{#1}} \newcommand{\Ee}[1]{\expect{#1}} \newcommand{\E}[1]{\expect{#1}} \newcommand{\var}[1]{\textbf{Var}[#1]} \newcommand{\Var}[1]{\textbf{Var}\!\left[#1\right]} \renewcommand{\geq}{\geqslant} \renewcommand{\ge}{\geqslant} \renewcommand{\leq}{\leqslant} \renewcommand{\le}{\leqslant} \newcommand{\set}[1]{\{ #1 \}} \newcommand{\Set}[1]{\left\{ #1 \right\}} \newcommand{\brackets}[1]{\left( #1 \right)} \newcommand{\lb}{\left} \newcommand{\rb}{\right} \newcommand{\br}[1]{\brackets{#1}} \newcommand{\range}{\textsf{range}} \newcommand{\deg}{\textsf{deg}} \newcommand{\bad}{\text{Bad}}

Roadmap

    Concentration Bounds:
    • Markov's Inequality
    • Chebyshev Inequality
    • Chernoff-Hoeffding Bounds
    • ...and lots of applications where these are useful!

Concentration bounds

  • Key Idea: very often most of the probability is concentrated among small proportion of sample space
  • Bound probability that random variable with bell-shaped distribution takes value in tails of the distribution, i.e., far away from the mean.

Markov's Inequality

"No more than 1/5 of the population can have more than 5 times the average income."
  • The most basic concentration bound
  • Requires (almost) no knowledge about properties of random variable
  • Consequently, the bound isn't very tight
  • Nevertheless many applications
Theorem: If X is a nonnegative (discrete) RV and a > 0 , then: \Pr{ X \ge a \cdot \EE{X}} \le \frac{1}{a}
Theorem: If X is a nonnegative (discrete) RV and a > 0 , then: \Pr{ X \ge a \cdot \EE{X}} \le \frac{1}{a}

Proof by Picture! :-)

  • \color{blue}{g(x)} = x
  • \color{green}{f(x)} = \begin{cases} 0 &\text{$x < a\Ee{X}$} \\ a\Ee{X} &\text{$x\ge a\Ee{X}$} \end{cases}
  • \Ra \forall x\colon f(x) \le g(x)
  • Define p(x) = \Pr{ X = x}
\Ra\ \fragment{a\Ee{X}\cdot \Pr{X\!\ge\! a \Ee{X}} =} \sum_{x} f(x)p(x) \le \sum_x g(x)p(x) \fragment{= \Ee{X}}

Proof by Algebra:

  • \begin{align} (a\Ee{X}) \cdot \Pr{X \ge a\cdot \EE{X}\ } &= (a\Ee{X})\cdot \sum_{k\ge a \Ee{X}} \Pr{X=k}\\ &\fragment{ = \sum_{k\ge a\Ee{X}} (a\Ee{X})\cdot \Pr{X=k} }\\ &\fragment{ \le \sum_{k\ge a\Ee{X}} \color{red}{k} \cdot \Pr{X=k} }\\ &\fragment{ \le \sum_{k\in\range(X)} k \cdot \Pr{X=k} } \fragment{ = \Ee{X} } \end{align}
  • \Ra \Pr{ X \ge a \cdot \EE{X}} \le \tfrac{\Ee{X}}{a\Ee{X}}

Revisiting Distributed MIS Algorithm

  • So far, we obtained O(\log n) expected number of rounds.
  • Recall:
Lemma: After iteration m=\lceil 4\log_2 n\rceil , it holds: \expect{ L_m } \le \tfrac{10}{n^2}

Idea: Apply Markov's Inequality to RV L_m , since L_m > 0 .

Lemma: If X is a nonnegative (discrete) RV and a > 0 , then: \Pr{ X \ge a \cdot \EE{X}} \le \frac{1}{a}
  • \Pr{ L_m \ge 1 } \le \EE{ L_ m } \le \tfrac{10}{n^2}
  • Algorithm terminates within \lceil 4n\log_2 n\rceil iterations with probability \ge 1 - O\left(\tfrac{1}{n^2}\right) : For n=20 , this is \ge 0.9975

Revisiting Byzantine Agreement

Lemma: The Randomized Byzantine Agreement Algorithm terminates in 3 rounds in expectation.
  • Let T be RV of number of rounds of algorithm.
  • \Ra \Pr{ T\ge x } \le \frac{\Ee{T}}{x}
  • To get probability p , solve right-hand side for x :
  • \tfrac{\Ee{T}}{x} \le 1-p

Example:

  • For p=0.9 : x = \Ee{T}/0.1 = 30
  • For p=0.99 : x = \Ee{T}/0.1 = 300

Expectation alone might not be Sufficient

Consider the following coin flipping games

  1. You flip one fair coin. You win $1 if heads; you lose $1 if tails.
  2. You flip 10 fair coins. You win $1000 if all heads; else you lose $1

Let's compute expected value of winnings X :

  • Game 1: \EE{X} = \tfrac{1}{2}\cdot 1 + \tfrac{1}{2}\cdot (-1) = 0
  • Game 1: \EE{X} = \tfrac{1}{2^{10}}\cdot 1000 + (1-\tfrac{1}{2^{10}})\cdot (-1) \approx 2
  • Which game would you prefer?

Variance

Definition: Consider RV X . The variance \var{X} is the expectation of squared deviation of RV X from \Ee{X} : \var{X} = \EE{(X - \EE{X})^2}
  • Red distribution has mean 100 and variance 100 ( SD=10 )
  • Blue distribution has mean 100 and variance 2500 ( SD=50 )
(source: wikipedia)
image/svg+xml

Example: Variance of Fair Coin Flip

  • Let X be indicator random variable of fair coin flip. Then:
  • \Pr{X=0} = \Prob{X=1} = \tfrac{1}{2} .
  • \Ee{X} = \tfrac{1}{2} .
  • What about Variance? \begin{align} \var{X} &= \EE{(X - \EE{X})^2} \\ &\fragment{ = \sum_{k\in\range{(X-\Ee{X})}} k\cdot \Prob{(X-\Ee{X})=k} }\\ &\fragment{ = \sum_{k\in\range{(X-\color{red}{\tfrac{1}{2}})}} k\cdot \Prob{(X-\color{red}{\tfrac{1}{2}})=k} }\\ &\fragment{ = \brackets{1 - \tfrac{1}{2}}^2\cdot\tfrac{1}{2} \fragment{+ \brackets{0 - \tfrac{1}{2}}^2\cdot\tfrac{1}{2}} } \fragment{ = \tfrac{1}{4} } \end{align}

Expectation of Product of Independent RVs

Lemma: If X and Y are independent RVs, then: \EE{X \cdot Y} = \EE{X} \cdot \EE{Y}.

Proof:

  • \begin{align} \EE{X Y } &= \sum_{i} \sum_j i \cdot j \cdot \Pr{X = i, Y = j} \\ &\fragment{ = \sum_{i} \sum_j i \cdot j \cdot \Pr{X = i} \cdot \Pr{Y = j} \\ }\\ &\fragment{ = \sum_{i} i \cdot \Pr{X = i} \cdot \sum_j j \cdot \Pr{Y = j} \\ }\\ &\fragment{ = \EE{X} \cdot \EE{Y} } \end{align}

Linearity of Variance of Independent RVs

Lemma: If X and Y are independent RVs, then: \Var{X + Y} = \Var{X} + \Var{Y}

Proof:

  • \begin{align} \Var{X + Y } &= \EE{(X + Y - \EE{X+Y})^2}\\ &\fragment{ = \EE{((X - \EE{X}) + (Y - \EE{Y}))^2} }\\ &\fragment{ = \EE{(X - \EE{X})^2 + (Y - \EE{Y})^2 + 2(X - \EE{X})(Y-\EE{Y})} }\\ \end{align}
  • But \EE{X - \EE{X}} = \fragment{\EE{X} - \EE{\EE{X}} = \EE{X} - \fragment{\EE{X} = 0.}}
  • Therefore: \Var{X + Y } = \EE{(X - \EE{X})^2 + (Y - \EE{Y})^2} = \Var{X} + \Var{Y}

Plan for Today

  • Bernoulli & Binomial RVs
  • Algorithm for Computing Median
  • Chernoff Bounds (maybe Tuesday)

Chebyshev's Inequality

Theorem: For any random variable X and any a>0 : \Pr{|X - \Ee{X}|\! \ge\! a} \le \frac{\var{X}}{a^2}

Proof:

  • Just use Markov's Inequality:
  • Define helper RV Y = (X - \Ee{X})^2
  • \begin{align} \Pr{|X - \Ee{X}|\! \ge\! a} &\fragment{\, = \Pr{(X - \Ee{X})^2\! \ge\! a^2} }\\ &\fragment{\, = \Pr{Y \ge a^2} \le \tfrac{\expect{Y}}{a^2} } \fragment{ = \frac{\Var{X}}{a^2} } \end{align}

Bernoulli Random Variables

Indicator RV X is a Bernoulli RV with parameter p , denoted X \sim \text{Brn}(p) , if \Pr{X=1} = p and \Pr{X=0}=1-p .
  • Example: Fair coin flip is Bernoulli RV with parameter \tfrac{1}{2} . Indicator RVs!

General Properties:

  • Expectation: \EE{X} = \Pr{X=1} = {p}
  • Variance calculated similarly as for fair coin flip: \begin{align} \Var{X} &= \EE{(X-p)^2} \\ &= (0-p)^2\cdot p + (1-p) \cdot (1-p) = p(1-p). \end{align}

Binomial Random Variables

Let X_1,\dots,X_n be n independent Bernoulli RVs with parameters p and define X = \sum_{i=1}^n X_i . Then X is a Binomial RV with parameters n and p , denoted X \sim \text{Binom}(n,p) , such that: \Pr{X \!=\! k} = {n \choose k} p^k (1 - p)^{n-k}

General Properties:

  • Computing expectation and variance is straightforward!
  • \EE{X} = \EE{\sum_{i=1}^n X_i} = \sum_{i=1}^n\EE{X_i} = \sum_{i=1}^n p = n p
  • \Var{X} = \Var{\sum_{i=1}^n X_i} = \sum_{i=1}^n \Var{X_i} = \sum_{i=1}^n p(1-p) = n p(1-p)

Fast Computation of the Median

  • We're given some unsorted data set: list of n integers
  • Median is the value separating the higher half from the lower half of the data set
  • m is median, if...
    1. at least \lfloor n/2 \rfloor elements are \le m , and
    2. at least \lfloor n/2 \rfloor + 1 elements are \ge m .
  • Goal: Output the median element in time O(n) (in RAM model)
  • 4
    4
    1
    1
    3
    3
    9
    9
    8
    8
    4
    4
    2
    2
    7
    7
    8
    8
    0
    0
    6
    6
    9
    9
    6
    6
    median:
    4
    4
    1
    1
    3
    3
    9
    9
    8
    8
    4
    4
    2
    2
    7
    7
    8
    8
    0
    0
    6
    6
    6
    6
    9
    9
  • Easy way to compute median in time O(n\log n) ?
  • Sort list and output middle element in sorted order

Why the median is important

  • Consider sample of 1000 household incomes
  • Let's assume all except 1 sample earn 80,000\$ .
  • One sample earns 158.2 billion USD (e.g. CEO of Amazon.com)
  • Value of Mean? \approx 158,000,000\$
  • Value of Median? 80,000\$
  • Median is more robust to outliers!

Randomized Algorithm for Median - High Level

  • Input: List S of n = 2k+1 integers; n odd
  • Output: median of S , i.e.: (k+1) -th element in sorted order of S
  • Main Idea:

    • Sampling of small subsets
    • ...and sorting of small subsets

Randomized Algorithm for Computing Median

Input: Unsorted array S of n integers;

Output: median m of S

  1. Sample \lceil n^{3/4} \rceil elements u.a.r from S . Store them in array R .
  2. R' \gets \text{Merge-Sort}(R)
  3. a \gets R'\!\left[\lfloor \tfrac{1}{2}n^{3/4} - \sqrt{n}\rfloor\right]
  4. b \gets R'\!\left[\lfloor \tfrac{1}{2}n^{3/4} + \sqrt{n}\rfloor\right]
  5. C \gets \set{ x \in S\colon a \le x \le b }
  6. \alpha \gets |\set{x \in S\colon x < a}
  7. \beta \gets |\set{x \in S\colon x > b}
  8. if (\alpha > n/2) or (\beta > n/2) then: output fail and terminate
  9. if |C| > 4n^{3/4} then: output fail and terminate
  10. C' \gets \text{Merge-Sort}(C)
  11. output C'\!\left[\lfloor n/2 \rfloor - \alpha + 1\right]
Intuition

Find a and b such that:

  1. a \le m \le b
  2. |C| = O(n/\log n)
Lemma: The algorithm terminates in time O(n) and outputs either fail or the median.

Proof:

Time Complexity:

  • Sorting R takes O(|R|\log |R|) = O(n^{3/4}\log(n^{3/4})) = o(n) .
  • Computing C , \alpha , \beta , takes O(n) , i.e., each time 1 pass through S .
  • Sorting C takes O(|C|\log |C|) = O(n^{3/4}\log(n^{3/4})) = o(n) .
  • \Ra O(n) time in total.

Correctness:

  • Algorithm is correct if it either outputs fail or median.
  • Suppose algorithm outputs m' . Is m' median?
  • We know a \le m' \le b ; otherwise algorithm would have output fail
  • \Ra m' \in C
  • \Ra m' = C'\!\left[ \lfloor n/2\rfloor - \alpha + 1\right] (this is only element that is output)
  • Exactly \alpha elements less or equal than a
  • \Ra C' doesn't contain smallest \alpha elements of S
  • \Ra element m' at position \lfloor n/2\rfloor - \alpha + 1 of C' is median of S .

Analysis of Success Probability

We need to avoid 3 bad events...
  • \bad_1 : occurs if \alpha > \tfrac{n}{2}
  • \bad_2 : occurs if \beta > \tfrac{n}{2}
  • \bad_3 : occurs if |C| > 4n^{3/4}
Lemma: \Pr{\bad_1} = \Pr{ \alpha\!>\! \tfrac{n}{2} } < \frac{1}{4n^{1/4}} .

Proof:

  • \bad_1 occurs...
    • \Leftrightarrow a > m
    • \Leftrightarrow |\set{r \in R\colon r\le m}| < \tfrac{1}{2}n^{3/4} - \sqrt{n}
    • \phantom{\Leftrightarrow} because m cannot be greater than (\tfrac{1}{2}n^{3/4}\!-\!\sqrt{n} ) smallest elements of R
  • Define X_i = \begin{cases} 1 & \text{if $i$-th sample of $R$ is $\le m$}\\ 0 & \text{otherwise} \end{cases}
  • \Ra \E{X_i} = \Pr{X_i = 1} = \tfrac{\text{# elem. in $S$ that $\le m$}}{n} = \tfrac{(n-1)/2+1}{n} = \frac{1}{2}+\frac{1}{2n}
  • Define Y = \sum_{i=1}^{|R|} X_i
  • \Ra \Pr{\bad_1} = \Pr{ Y < \tfrac{1}{2} n^{3/4} - \sqrt{n}}
  • \Pr{\bad_1} = \Pr{ Y < \tfrac{1}{2} n^{3/4} - \sqrt{n}}
  • Idea: Use Chebyshev's Inequality!
  • First determine \E{Y} and \Var{Y} ...
  • Since Y \sim \text{Binom}(|R|,\tfrac{1}{2} + \tfrac{1}{2n}) :
    • \E{Y} = \sum_{i=1}^{|R|} \E{X_i} = n^{3/4}\lb( \tfrac{1}{2} + \tfrac{1}{2n}\rb) = \tfrac{1}{2}n^{3/4} + \tfrac{1}{2n^{1/4}} > \tfrac{1}{2}n^{3/4}
    • \Var{Y} = n^{3/4}\lb( \tfrac{1}{2} + \tfrac{1}{2n}\rb) \lb(1 - \tfrac{1}{2} - \tfrac{1}{2n}\rb) = \tfrac{1}{4}n^{3/4} - \tfrac{1}{4n^{5/4}} < \tfrac{1}{4}n^{3/4}
  • \E{Y} > \tfrac{1}{2}n^{3/4}
  • \Var{Y} < \tfrac{1}{4}n^{3/4}
  • Now let's apply Chebyshev's Inequality:
\begin{align} \Pr{ Y < \tfrac{1}{2} n^{3/4} - \sqrt{n}} &= \Pr{ Y - \tfrac{1}{2} n^{3/4} < - \sqrt{n}} \\ &\fragment{\, = \Pr{ \tfrac{1}{2} n^{3/4} - Y > \sqrt{n}} }\\ &\fragment{\, \le \Pr{ |\tfrac{1}{2} n^{3/4} - Y| > \sqrt{n}} }\\ &\fragment{\, \le \frac{\var{Y}}{(\sqrt{n})^2} } \fragment{\, < \frac{\tfrac{1}{4}n^{3/4}}{n} } \fragment{\, = \frac{1}{4n^{1/4}} } \end{align}
Lemma: \Pr{\bad_2} = \Pr{ \beta\!>\! \tfrac{n}{2} } < \frac{1}{4n^{1/4}} .

Proof:

  • Same analysis as for \Pr{\bad_1} .
Lemma: \Pr{ \bad_3 } = \Pr{ |C| > 4n^{3/4}} < \frac{1}{2n^{1/4}}

Proof:

  • Define 2 events:
    1. Event A : at least |C|/2 = 2n^{3/4} elements of C are \ge m
    2. Event B : at least |C|/2 = 2n^{3/4} elements of C are \le m
  • If |C| > 4n^{3/4} , then at least one of A or B occurs.
  • Focus on event A : at least |C|/2 = 2n^{3/4} elements of C are \ge m . (Argument for B is symmetric.)
  • Recall that b \ge x for all x \in C .
  • \Ra event A implies that position of b in \text{Sorted}(S) is \ge \tfrac{1}{2}n + 2n^{3/4}
  • Define X_i = \begin{cases} 1 & \text{$i$-th sample is among $\tfrac{1}{2}n-2n^{3/4}$ largest of $S$}\\ 0 & \text{otherwise} \end{cases}
  • \EE{X_i} = \Pr{X_i=1} = \tfrac{(1/2)n - 2n^{3/4}}{n} = \tfrac{1}{2} - \tfrac{2}{n^{1/4}}
  • Define X = \sum_i^{|R|} X_i . Then X \sim \text{Binom}(n^{3/4},\tfrac{(1/2)n - 2n^{3/4}}{n})
  • Define X = \sum_i^{|R|} X_i . Then X \sim \text{Binom}(n^{3/4},\tfrac{1}{2} - \tfrac{2}{n^{1/4}}) and thus:
    • \EE{X} = n^{3/4}\brackets{\tfrac{1}{2} - \tfrac{2}{n^{1/4}}} = \tfrac{1}{2}n^{3/4} - 2\sqrt{n}
    • \Var{X} = n^{3/4} \brackets{\tfrac{1}{2} - \tfrac{2}{n^{1/4}}} \brackets{1 - \brackets{\tfrac{1}{2} - \tfrac{2}{n^{1/4}}}} = \frac{1}{4}n^{3/4} - 4n^{1/4} < \tfrac{1}{4} n^{3/4}
  • Let's apply Chebyshev's Inequality:
  • \begin{align} \Pr{A} = \Pr{ X \ge \tfrac{1}{2}n^{3/4} - \sqrt{n}} &\fragment{ = \Pr{ X \ge \tfrac{1}{2}n^{3/4} - \color{red}{2}\sqrt{n} + \color{red}{\sqrt{n}}} }\\ &\fragment{ = \Pr{ X - \EE{X} \ge \sqrt{n}} }\\ &\fragment{ \le \Pr{ |X - \E{X}| \ge \sqrt{n}} }\\ &\fragment{ \le \frac{\var{X}}{n} \fragment{=\tfrac{1}{4n^{1/4}}} } \end{align}
  • Recall that event \bad_3 happens if event A or B
  • So far, we've shown that \Pr{A} \le \tfrac{1}{4n^{1/4}} .
  • Symmetric argument shows that \Pr{B} \le \tfrac{1}{4n^{1/4}} .
  • \Ra \Pr{\bad_3} \le \Pr{A} + \Pr{B} \le \tfrac{1}{2n^{1/4}} .

Combining All the Pieces

  • Algorithm outputs median unless \bad_1 or \bad_2 or \bad_3 happen.
  • \begin{align} \Ra \Pr{\texttt{fail}} &\le \Pr{\bad_1} + \Pr{\bad_2} + \Pr{\bad_3} \\ &\fragment{ \le 2 \cdot \tfrac{1}{4n^{1/4}} + \tfrac{1}{2n^{1/4}} }\\ &\fragment{ = \tfrac{1}{n^{1/4}} } \end{align}
Theorem: There exists a randomized Monte Carlo algorithm that runs in time O(n) and outputs the median for a given input array of n integers with probability at least 1 - \tfrac{1}{n^{1/4}} .

Can we make this a Las Vegas algorithm?

Interlude: Probability vs Logical Implications

  • Consider events A and B .
  • " A \Ra B " implies \Pr{A} \le \Pr{B} .
  • Why?
    • Recall: an event is set of outcomes
    • So A \Ra B really means, for all outcomes x \in A , also x \in B
    • Thus A \subseteq B
    • Probability of set B cannot be smaller than probability of set A .
  • Example: {x-y > d} \Ra {|x-y|>d} , so \Pr{ x-y > d} \le \Pr{ |x-y| > d}

Chernoff Bounds

  • Tail inequality for random variables
  • Gives much tighter bounds than Markov's or Chebyshev's Ineq.
  • Extremely common tool when analyzing stochastic processes and algorithms

General Version of Chernoff Bound

Theorem (General Version): Let X_1,\dots,X_n be independent Bernoulli RVs with parameter p and define X = \sum_{i=1}^{n}X_i .
For any \delta >0 : \Pr{X \ge (1+\delta) \EE{X} } \le \br{ \frac{e^\delta}{(1+\delta)^{1+\delta}}}^{\E{X}} For any \delta \in (0,1) : \Pr{X \le (1-\delta) \EE{X} } \le \br{ \frac{e^{-\delta}}{(1-\delta)^{1-\delta}}}^{\E{X}}