Ask HN: Best place to start learning about Markov Chains?

Just pick a random place to start, read some stuff, and then take a guess as to which direction to go in next, based on what’s probably a good next thing to read. Then keep repeating the process over and over again.

It’s also important that you base your guess of what’s probably good to read next only on the previous thing you read. Forget everything that came before that.

If there’s a finite amount of literature on the subject, this advice will send the OP in circles with probability one.

It’s hilarious because it’s also a “semi”-decent method on how to learn knew topics in general.

You can then model the probability that you’ll end up at any one given place after n steps as a markov chain.

Markov chains in essence are simple. Instead of diverging and reading all the theory, I’d recommend do it on a need basis. Learn as you go. So pick up a problem and move ahead. I don’t think it is fruitful to just learn everything about Markov Chains just for the sake of it.

Markov Chain Monte Carlo to sample from probability distributions is a good start – if you are into sampling.

1. Elementary probability theory.

2. Poisson processes.

3. The Markov property.

4. Stochastic processes.

5. Realise that you’re missing a background in analysis, therefore you don’t know sh?t about measure theory but you actually need it to know anything deeper . Wonder to yourself if you really want to spend the next 3 years getting a maths background you don’t have.

6. Convince yourself that it’s all just engineering and middle through by picking a project involving non trivial markov chain.

7. Go back and spend 3 years doing foundational maths then repeat point 1-5.

Poisson processes are continuous time though. If you’re interested in Markov chains you only need the discrete-time theory.

In discrete time and discrete space, it mostly just reduces to linear algebra.

Unless you’re interested in continuous markov chains, infinite state spaces, renewal theory, excessive functions, and so on.

While I agree with the progression of knowledge listed here I don’t think it requires 3 years of foundation to math. If you have a basic understanding of math already you should be able to pick up the theory fairly well in a couple of months of research and application.

You don’t need much math to pick up the very basic theory, but after a certain point you’re going to hit a hard wall unless you have a strong background in analysis.

I think when you get out of the basic linear algebra and calculus prerequisite and into the analysis and measure theory prerequisite nothing takes a few months anymore 🙂

That whole math sequence was part of my MBA program that culminated in Markov chains for synthetic options pricing after like, 9 months. And this is for business school students; not engineers 🙂

Sure, and for any given application it’ll be possible to explain Markov chains as they apply to it. I recently did a financial valuation course where we did an “intuitive derivation of Itos formula” so that we could skip the measure theory prerequisites. We also skipped talking about Reimann integrals and just accepted that sums are integrals at a limit … we also glossed the separating hyperplane theorem so that we could say “no arb iff risk neutral measure exists”, and so on.

However, if you actually want a background in the theory of Markov chains, I don’t think this approach works.

The wikipedia page for Markov chains is really one of the best wikipedia pages I’ve ever seen for a technical topic.

Covers a ton of ground, and gives concrete examples to motivate the ideas.

If you are not already intimately familiar with them learn about FSA (= finite state automata), aka FSM (finite state machines).

Most interesting facts about Markov chains (e.g. the Stationary Distribution Theorem) really are probabilistic generalisations of simpler facts about FSAs (e.g. FSAs cannot be used to “count”). In my experience, understanding them first for FSAs and then seeing how they generalise for the probabilitic case is a good way of approaching this subject.

Personally, I started with Eugene Charniak’s Statistical Language Learning [1] then continued with Manning and Schütze’s Foundations of Statistical Natural Language Processing [2] and Speech and Language Processing by Jurafsky and Martin [3].

The Charniak book is primarily about HMMs and quite short, so it’s the best introduction to the subject. Manning and Schütze and Jurafsky and Martin are much more extensive and cover pretty much all of statistical NLP up to their publication date (so no LSTMs if I remember correctly) but they are required reading for an in-depth approach.

You will definitely want to go beyond HMMs at some point, so you will probably want the other two books. But, if you really just want to know about HMMs, then start with the Charniak.





Highly recommended. Preferred way to learn is to grasp an intuitive understanding before diving deep into theory. This visual explainer is great first step.

These are my favorite lecture notes, they assume almost no a-priori knowledge (with an awesome review of basic probabilities) and yet they don’t shy away from explaining all the rigorous math.

If you have time to read step-by-step derivations and want to understand the fundamentals, I think this is an excellent self-contained resource.

“No prior knowledge” and “explain all the rigorous maths” are mutually exclusive in my opinion. I stress this as honest advise to anyone reading.

Rigorous maths is akin to trying to explain to your non technical friends what you do in devops: colloquialise it all you want, it’ll always be a shallow story.

— Markov Decision Processes

there is a lot of info out there about markov chains, but very little about markov decision processes (MDP).

How popular are MDP? What are their strengths? weaknesses?

— Kalman Filters vs HMM (Hidden Markov Model):

“In both models, there’s an unobserved state that changes over time according to relatively simple rules, and you get indirect information about that state every so often. In Kalman filters, you assume the unobserved state is Gaussian-ish and it moves continuously according to linear-ish dynamics (depending on which flavor of Kalman filter is being used). In HMMs, you assume the hidden state is one of a few classes, and the movement among these states uses a discrete Markov chain. In my experience, the algorithms are often pretty different for these two cases, but the underlying idea is very similar.” – THISISDAVE


“Some state-of-the-art industrial speech recognition [0] is transitioning from HMM-DNN systems to “CTC” (connectionist temporal classification), i.e., basically LSTMs. Kaldi is working on “nnet3″ which moves to CTC, as well. Speech was one of the places where HMMs were _huge_, so that’s kind of a big deal.” -PRACCU

“HMMs are only a small subset of generative models that offers quite little expressiveness in exchange for efficient learning and inference.” – NEXTOS

“IMO, anything that be done with an HMM can now be done with an RNN. The only advantage that an HMM might have is that training it might be faster using cheaper computational resources. But if you have the $$$ to get yourself a GPU or two, this computational advantage disappears for HMMs.” – SHERJILOZAIR

How about a textbook maybe? There aren’t always easy alternatives out there, sometimes you have to bite the bullet and do the work

If the “motivation-theorem-proof” style appeals to you, find a copy of Finite Markov Chains by Kemeny and Snell. ISBN 0442043287

Do you have an application in mind to help guide suggestions?

As others have said, if you know know probability, start there.

You can find a copy of “Markov Chains and Mixing Times” online, which is good and relatively accessible.

Leave a Reply

Your email address will not be published. Required fields are marked *

Next Post

Keybase adds documentation for proof integrations

Thu Apr 11 , 2019

You May Like