d538: How to read Machine Learning Papers

How to read Machine Learning Papers” from Reddit by Schmook:

A “math heavy paper” could mean: a paper with long equations, lots of algebra and manipulation of complicated equations.

When you read a paper, you never read it only once. You read the title first, than you decide if you should read the abstract. You read the abstract and decide if you will skim through the results. You do that and decide if you’ll skim through the whole text. Etc, etc. Life’s short and there are too many damn articles to read.

The secret for reading algebra-heavy papers is NOT trying to follow the algebra on the first read. This is a mistake most students do. You don’t need to understand all steps of a long calculation on the first read. You skim through the algebra and assume it is correct, take a deep look at key steps along the way. Read that thing written in English between the equations. Read the results. Read the conclusion. When you made sense of what this fucking paper is talking about generally, than you decide if you’re going to waste your time with the algebra. Don’t get bogged down on the steps you don’t understand. Assume they are correct and carry on. Go back to them later. Repeat until you get it all.

When you mature as an “applied mathematician” you develop this ability to skim through algebra and understand more or less what this guy is trying to do, where he wants to get to and what are more or less the steps required to do so. Nobody can read long manipulations of complicated equations fast. That’s why you don’t do that in the first read. You read in a coarse grained way, paying attention to finer and finer detail at each new read.

Also, you should pay attention to the fact that A LOT of times there are mistakes in the calculations. And finding them in the first read is impossible. Most of the time those mistakes are irrelevant to the point the article wants to make, but they can make you confused and get in the way of understanding the algebra. If you already understand in a general level what’s being done, those mistakes are much more easily spotted.

Also, when you look at the equations make sure you understand what do they actually mean. I’m sure you know the math of that equation, but do you know the physics of that equation? (Sorry, I’m a physicist, so that’s the only analogy I know). Do you know how to explain to me, in English, what does that equation say about what that particular system is doing? Can you say something like “when you maximize the ELBO, the approximate posterior will be as similar to the prior as the data in the likelihood term allows”? That’s the “physics” of that nasty looking ELBO expression. When you get to that point reasoning about long algebraic manipulations gets easier. How to get to that point? Read lots of theory papers and do a lot of algebra. There’s no other way.

Another way a paper can be called “math heavy” is when it uses very formal mathematical lingo and relies (sometimes, excessively and unnecessarily) on many formal mathematical concepts. It invokes Lebesgue measures, Radon-Nikodym derivatives, sigma algebras, etc.

Those are way more difficult to read for me because they confuse my internal bullshit detector. All the formal talk looks important. But the technique is the same: skim through first. This is not the time to go to Wikipedia to try to remember what a Borel hierarchy is. Save that for later, you might not even read this article another time.

Also, it helps to mentally substitute the formal concept for a special case in a simple scenario. Many times when people use formal math is because they’re trying to be safe and not have weird corner cases fuck up their reasoning. Things like the smartass math PhD candidate in the room asking “Oh yeah, what if this function is continuous everywhere but is not differentiable anywhere? Does your thing still work?”. So, what you can do is to assume there’s no such smartass and mentally replace all Radon-Nikodym derivatives by ratios, all measures by simple functions with good old Riemann integrals, and assume this author is just showing off and that you don’t need this fancy talk to understand what he’s talking about.

Sometimes this fails and there’s a paper you really should read that is riddled with formal math and the math is really there for a reason. Put on your Bourbaki hat and good luck. If you’re like me and formal maths isn’t your strongest skill, you’re in for a long and difficult read :).