“The future depends only on the present, not on how we got here.”
This is the Markov property. Surprisingly powerful for modeling sequences, from weather to text to molecules.
Definition
A sequence of states where:
$$P(X_{t+1} | X_t, X_{t-1}, …, X_0) = P(X_{t+1} | X_t)$$
History doesn’t matter. Only current state matters.
Transition matrix
Define P[i,j] = probability of going from state i to state j.
Example (weather):
Sunny Rainy
Sunny 0.8 0.2
Rainy 0.4 0.6
If sunny today, 80% chance sunny tomorrow.
Watch state transitions: Markov Chain Animation
Computing future states
If $\pi_t$ is distribution over states at time t:
$$\pi_{t+1} = \pi_t \cdot P$$
After n steps: $$\pi_n = \pi_0 \cdot P^n$$
import numpy as np
P = np.array([[0.8, 0.2],
[0.4, 0.6]])
# Start sunny
pi = np.array([1, 0])
# After 5 steps
for _ in range(5):
pi = pi @ P
print(pi) # [0.667, 0.333]
Stationary distribution
As n → ∞, $\pi_n$ converges to stationary distribution $\pi^*$:
$$\pi^* = \pi^* \cdot P$$
The distribution that doesn’t change after transitions.
For our weather example: [0.667, 0.333] Long-run: 2/3 sunny, 1/3 rainy (regardless of starting state).
Finding it: solve $\pi P = \pi$ with $\sum_i \pi_i = 1$
Applications
Text generation
States = words. Transitions = what word follows what.
# Build from corpus
transitions = defaultdict(Counter)
for i in range(len(words) - 1):
transitions[words[i]][words[i+1]] += 1
# Generate text
def generate(start, length):
current = start
result = [current]
for _ in range(length):
next_words = transitions[current]
current = random.choices(
list(next_words.keys()),
list(next_words.values())
)[0]
result.append(current)
return ' '.join(result)
Simple but captures some structure. Early language models.
MCMC (Markov Chain Monte Carlo)
Sample from complex distributions by constructing Markov chain whose stationary distribution is your target.
Metropolis-Hastings, Gibbs sampling - fundamental in Bayesian inference.
Hidden Markov Models
States are hidden. We only see emissions from states.
Used for:
- Speech recognition (states = phonemes, emissions = audio)
- Part-of-speech tagging (states = POS tags, emissions = words)
- Biological sequences
Limitations
First-order Markov: only one step of history. Often too restrictive.
Fixes:
- Higher-order Markov (last n states)
- Hidden Markov Models (hidden states capture history)
- Replace with RNN/Transformer (learned representations)