## Roadmap

Concentration Bounds:
• Markov's Inequality
• Chebyshev Inequality
• Chernoff-Hoeffding Bounds
• ...and lots of applications where these are useful!

## Concentration bounds

• Key Idea: very often most of the probability is concentrated among small proportion of sample space
• Bound probability that random variable with bell-shaped distribution takes value in tails of the distribution, i.e., far away from the mean.

## Markov's Inequality

"No more than 1/5 of the population can have more than 5 times the average income."
• The most basic concentration bound
• Requires (almost) no knowledge about properties of random variable
• Consequently, the bound isn't very tight
• Nevertheless many applications
Theorem: If is a nonnegative (discrete) RV and , then:
Theorem: If is a nonnegative (discrete) RV and , then:

Proof by Picture! :-)

• Define

Proof by Algebra:

## Revisiting Distributed MIS Algorithm

• So far, we obtained expected number of rounds.
• Recall:
Lemma: After iteration , it holds:

Idea: Apply Markov's Inequality to RV , since .

Lemma: If is a nonnegative (discrete) RV and , then:
• Algorithm terminates within iterations with probability : For , this is

## Revisiting Byzantine Agreement

Lemma: The Randomized Byzantine Agreement Algorithm terminates in rounds in expectation.
• Let be RV of number of rounds of algorithm.
• To get probability , solve right-hand side for :

Example:

• For :
• For :

## Expectation alone might not be Sufficient

Consider the following coin flipping games

1. You flip one fair coin. You win $1 if heads; you lose$1 if tails.
2. You flip 10 fair coins. You win $1000 if all heads; else you lose$1

Let's compute expected value of winnings :

• Game 1:
• Game 1:
• Which game would you prefer?

## Variance

Definition: Consider RV . The variance is the expectation of squared deviation of RV from :
• Red distribution has mean and variance ()
• Blue distribution has mean and variance ()
(source: wikipedia)

## Example: Variance of Fair Coin Flip

• Let be indicator random variable of fair coin flip. Then:
• .
• .
• What about Variance?

## Expectation of Product of Independent RVs

Lemma: If and are independent RVs, then:

Proof:

## Linearity of Variance of Independent RVs

Lemma: If and are independent RVs, then:

Proof:

• But
• Therefore:

## Plan for Today

• Bernoulli & Binomial RVs
• Algorithm for Computing Median
• Chernoff Bounds (maybe Tuesday)

## Chebyshev's Inequality

Theorem: For any random variable and any :

Proof:

• Just use Markov's Inequality:
• Define helper RV

## Bernoulli Random Variables

Indicator RV is a Bernoulli RV with parameter , denoted , if and .
• Example: Fair coin flip is Bernoulli RV with parameter . Indicator RVs!

General Properties:

• Expectation:
• Variance calculated similarly as for fair coin flip:

## Binomial Random Variables

Let be independent Bernoulli RVs with parameters and define . Then is a Binomial RV with parameters and , denoted , such that:

General Properties:

• Computing expectation and variance is straightforward!

## Fast Computation of the Median

• We're given some unsorted data set: list of integers
• Median is the value separating the higher half from the lower half of the data set
• is median, if...
1. at least elements are , and
2. at least elements are .
• Goal: Output the median element in time (in RAM model)
• median:
• Easy way to compute median in time ?
• Sort list and output middle element in sorted order

## Why the median is important

• Consider sample of household incomes
• Let's assume all except 1 sample earn .
• One sample earns 158.2 billion USD (e.g. CEO of Amazon.com)
• Value of Mean?
• Value of Median?
• Median is more robust to outliers!

## Randomized Algorithm for Median - High Level

• Input: List of integers; odd
• Output: median of , i.e.: -th element in sorted order of
• Main Idea:

• Sampling of small subsets
• ...and sorting of small subsets

## Randomized Algorithm for Computing Median

Input: Unsorted array of integers;

Output: median of

1. Sample elements u.a.r from . Store them in array .
2. if or then: output fail and terminate
3. if then: output fail and terminate
4. output
Intuition

Find and such that:

Lemma: The algorithm terminates in time and outputs either fail or the median.

Proof:

Time Complexity:

• Sorting takes .
• Computing , , , takes , i.e., each time pass through .
• Sorting takes .
• time in total.

Correctness:

• Algorithm is correct if it either outputs fail or median.
• Suppose algorithm outputs . Is median?
• We know ; otherwise algorithm would have output fail
• (this is only element that is output)
• Exactly elements less or equal than
• doesn't contain smallest elements of
• element at position of is median of .

## Analysis of Success Probability

We need to avoid 3 bad events...
• : occurs if
• : occurs if
• : occurs if
Lemma: .

Proof:

• occurs...
• because cannot be greater than ) smallest elements of
• Define
• Define
• Idea: Use Chebyshev's Inequality!
• First determine and ...
• Since :
• Now let's apply Chebyshev's Inequality:
Lemma: .

Proof:

• Same analysis as for .
Lemma:

Proof:

• Define 2 events:
1. Event : at least elements of are
2. Event : at least elements of are
• If , then at least one of or occurs.
• Focus on event : at least elements of are . (Argument for is symmetric.)
• Recall that for all .
• event implies that position of in is
• Define
• Define . Then
• Define . Then and thus:
• Let's apply Chebyshev's Inequality:
• Recall that event happens if event or
• So far, we've shown that .
• Symmetric argument shows that .
• .

## Combining All the Pieces

• Algorithm outputs median unless or or happen.
Theorem: There exists a randomized Monte Carlo algorithm that runs in time and outputs the median for a given input array of integers with probability at least .

Can we make this a Las Vegas algorithm?

## Interlude: Probability vs Logical Implications

• Consider events and .
• "" implies .
• Why?
• Recall: an event is set of outcomes
• So really means, for all outcomes , also
• Thus
• Probability of set cannot be smaller than probability of set .
• Example: , so

## Chernoff Bounds

• Tail inequality for random variables
• Gives much tighter bounds than Markov's or Chebyshev's Ineq.
• Extremely common tool when analyzing stochastic processes and algorithms

## General Version of Chernoff Bound

Theorem (General Version): Let be independent Bernoulli RVs with parameter and define .
For any : For any :