The theory of probability

  

 aniblack01_right.gif Introduction
 aniblack01_right.gif Foundation
 aniblack01_right.gif Calculating
 aniblack01_right.gif Presize definition
 aniblack01_right.gif Binomial distribution
 aniblack01_right.gif Hypergeometric distribution
 aniblack01_right.gif Combined events
 aniblack01_right.gif Inseparable events
 aniblack01_right.gif 'Ion Saliu's Paradox'
 aniblack01_right.gif Combinatorics
 aniblack01_right.gif Markov Chains
 aniblack01_right.gif Random Walks
 aniblack01_right.gif Contacts

 

Markov Chains

Most of our study of probability has dealt with independent trials processes. These processes are the basis of classical probability theory and much of statistics. We have discussed two of the principal theorems for these processes: the Law of Large Numbers and the Central Limit Theorem. We have seen that when a sequence of chance experiments forms an independent trials process, the possible outcomes for each experiment are the same and occur with the same probability. Further, knowledge of the outcomes of the previous experiments does not influence our predictions for the outcomes of the next experiment. The distribution for the outcomes of a single experiment is su±cient to construct a tree and a tree measure for a sequence of n experiments, and we can answer any probability question about these experiments by using this tree measure. Modern probability theory studies chance processes for which the knowledge of previous outcomes in°uences predictions for future experiments. In principle, when we observe a sequence of chance experiments, all of the past outcomes could in°uence our predictions for the next experiment. For example, this should be the case in predicting a student's grades on a sequence of exams in a course. But to allow this much generality would make it very di±cult to prove general results. In 1907, A. A. Markov began the study of an important new type of chance process. In this process, the outcome of a given experiment can affect the outcome of the next experiment. This type of process is called a Markov chain.

We describe a Markov chain as follows: We have a set of states, S = fs1; s2; : : : ; srg. The process starts in one of these states and moves successively from one state to another. Each move is called a step. If the chain is currently in state si, then it moves to state sj at the next step with a probability denoted by pij , and this probability does not depend upon which states the chain was in before the current state. The probabilities pij are called transition probabilities. The process can remain in the state it is in, and this occurs with probability pii. An initial probability distribution, de¯ned on S, speci¯es the starting state. Usually this is done by specifying a particular state as the starting state. R. A. Howard1 provides us with a picturesque description of a Markov chain as a frog jumping on a set of lily pads. The frog starts on one of the pads and then jumps from lily pad to lily pad with the appropriate transition probabilities.

Example According to Kemeny, Snell, and Thompson,2 the Land of Oz is blessed by many things, but not by good weather. They never have two nice days in a row. If they have a nice day, they are just as likely to have snow as rain the next day. If they have snow or rain, they have an even chance of having the same the next day. If there is change from snow or rain, only half of the time is this a change to a nice day. With this information we form a Markov chain as follows. We take as states the kinds of weather R, N, and S. From the above information we determine the transition probabilities. These are most conveniently represented in a square array as
img6 

We now consider the long-term behavior of a Markov chain when it starts in a state chosen by a probability distribution on the set of states, which we will call a probability vector. A probability vector with r components is a row vector whose entries are non-negative and sum to 1. If u is a probability vector which represents the initial state of a Markov chain, then we think of the ith component of u as representing the probability that the chain starts in state si. With this interpretation of random starting states, it is easy to prove the following theorem.

img7gif

Theorem Let P be the transition matrix of a Markov chain, and let u be the probability vector which represents the starting distribution. Then the probability that the chain is in state si after n steps is the ith entry in the vector
img8.gif