Michael Nielsen Anki

Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs. For much, much more on the nature of understanding proofs and how to use Anki, see the excellent blog post Seeing Through a Piece of Mathematics by Michael Nielsen. Nielsen walks through a simple problem and discusses the possible ways one might understand it, and how one might use Anki to encode one’s understanding. 'The single biggest change that Anki brings about is that it means memory is no longer a haphazard event, to be left to chance. Rather, it guarantees I will remember something, with minimal effort. That is, Anki makes memory a choice.' Nielsen, 'Augmenting Long-term Memory'. View the profiles of people named Anki Nielsen. Join Facebook to connect with Anki Nielsen and others you may know. Facebook gives people the power to. Y Combinator Research lead Michael Nielsen has recently put together an Anki explainer longform article based on his process. It almost raises Spaced Repetition Memory Systems like Anki to a getting-things-done level of life essentials. His twitter thread on the same topic was also popular. At one point he said he'd prefer Anki to owning a car.

By Michael Nielsen, January 2019

What does it mean to understand a piece of mathematics? Naively, weperhaps think of this in relatively black and white terms: initiallyyou don’t understand a piece of mathematics, then you go through abrief grey period where you’re learning it, and with some luck andhard work you emerge out the other side “understanding” themathematics.

Net

In reality, mathematical understanding is much more nuanced. Myexperience is that it’s nearly always possible to deepen one’sunderstanding of any piece of mathematics. This is even true –perhaps especially true – of what appear to be very simplemathematical ideas.

I first really appreciated this after reading an essay by themathematician Andrey Kolmogorov. You might suppose a greatmathematician such as Kolmogorov would be writing about some verycomplicated piece of mathematics, but his subject was the humbleequals sign: what made it a good piece of notation, and what itsdeficiencies were. Kolmogorov discussed this in loving detail, andmade many beautiful points along the way, e.g., that the invention ofthe equals sign helped make possible notions such as equations (andalgebraic manipulations of equations).

Prior to reading the essay I thought I understood the equalssign. Indeed, I would have been offended by the suggestion that I didnot. But the essay showed convincingly that I could understand theequals sign much more deeply.

This experience suggested three broader points. First, it’s possibleto understand other pieces of mathematics far more deeply than Iassumed. Second, mathematical understanding is an open-ended process;it’s nearly always possible to go deeper. Third, even greatmathematicians – perhaps, especially, great mathematicians– thought it worth their time to engage in such deepening.

(I found Kolmogorov’s essay in my University library as ateenager. I’ve unsuccessfully tried to track it down several times inthe intervening years. If anyone can identify the essay, I’dappreciate it. I’ve put enough effort into tracking it down that Imust admit I’ve sometimes wondered if I imagined the essay. If so, Ihave no idea where the above story comes from.)

How can we make actionable this idea that it’s possible to deepen ourmathematical understanding in an open-ended way? What heuristics canwe use to deepen our understanding of a piece of mathematics?

Over the years I’ve collected many such heuristics. In these notes Idescribe a heuristic I stumbled upon a year or so ago that I’ve foundespecially helpful (albeit time intensive). I’m still developing theheuristic, and my articulation will therefore be somewhatstumbling. I’m certain it can still be much improved upon! But perhapsit will already be of interest to others.

One caveat is that I’m very uncertain how useful the heuristic will beto people with backgrounds different to my own. And so it’s perhapsworth saying a little about what that background is. I’m not aprofessional mathematician, but I was trained and worked as aprofessional theoretical physicist for many years. As such, I’vewritten dozens of research papers proving mathematical theorems,mostly in the field of quantum information and computation. Much of mylife has been spent doing mathematics for many hours each day. It’spossible someone with a different background would find the heuristicI’m about to describe much less useful. This applies to people withboth much less and much more mathematical background than I have.

It’s also worth noting that my work mostly involves mathematics onlyincidentally these days. I still do some mathematics as a hobby, andoccasionally as part of other research projects. But it’s no longer acentral focus of my life in the way it once was. I suspect theheuristic I will describe would have been tremendously useful to mewhen mathematics was a central focus. But I’m honestly not sure.

The heuristic involves the use of spaced-repetition memorysystems. The system I use is a flashcard program called Anki. Youenter flashcards with a question on one (virtual) side of the card,and the answer on the other. Anki then repeatedly tests you on thequestions. The clever thing Anki does is to manage the schedule. Ifyou get a question right, Anki increases the time interval untilyou’re tested again. If you get a question wrong, the interval isdecreased. The effect of this schedule management is to limit thetotal time required to learn the answer to the question. Typically, Iestimate total lifetime study for a card to be in the range 5-10minutes.

I’ve described many elements of my Anki practice in a separate essay.Reading that essay isn’t necessary to understand what follows, butwill shed additional light on some of the ideas. Note that that essaydescribes a set of heuristics for reading papers – indeed, ofsyntopically reading entire literatures – that are largelyorthogonal to the heuristic I’m about to describe. I find theheuristics in that essay useful for rapidly getting a broad picture ofa subject, while the heuristics in this essay are for drilling downdeeply.

To explain the heuristic, I need a piece of mathematics to use as anexample. The piece I will use is a beautiful theorem of linearalgebra. The theorem states that a complex normal matrix is alwaysdiagonalizable by a unitary matrix. The converse is also true (and ismuch easier to prove, so we won’t be concerned with it): a matrixdiagonalizable by a unitary matrix is always normal.

Unpacking that statement, recall that a matrix $M$ is said to benormal if $MM^dagger = M^dagger M$, where $M^dagger$ is the complextranspose, $M^dagger := (M^*)^T$. And a matrix is diagonalizable by aunitary matrix if there exists a unitary matrix $U$ such that $M = U DU^dagger$, where $D$ is a diagonal matrix.

(As shorthand, from now on I will use “diagonalizable” as shorthand tomean “diagonalizable by a unitary matrix”.)

What’s lovely about this theorem is that the condition $MM^dagger =M^dagger M$ can be checked by simple computation. By contrast,whether $M$ is diagonalizable seems a priori much harder to check,since there are infinitely many possible choices of $U$ and $D$. Butthe theorem shows that the two conditions are equivalent. So itconverts what seems like a search over an infinite space into simplychecking a small number of algebraic conditions. Furthermore, workingwith diagonalizable matrices is often much easier than working withgeneral matrices, and so it’s extremely useful to have an easy way ofchecking whether a matrix is diagonalizable.

Let me explain the proof. I shall explain it at about the level ofdetail I would use with a colleague who is a mathematician or quantuminformation theorist; people less comfortable with linear algebra mayneed to unpack the proof somewhat.

There are two ideas in the proof.

The first idea is to observe that $MM^dagger = M^dagger M$ means thelength of the $j$th row of $M$ is equal to the length of the $j$thcolumn. It’s easiest to see this for the first row and first column.Suppose we write $M$ as

where $r$ is the first row and $M’$ is the remainder of thematrix. Then the top-left entry in $MM^dagger$ is:

Similarly, suppose we write $M$ as:

where $c$ is the first column and $M’’$ is the remainder of thematrix. Then the top-leftmost entry in $M^dagger M$ is:

The normalcy condition $MM^dagger = M^dagger M$ then implies that $rr^dagger = c^dagger c$, and thus the length of the first row $r$must be the same as the length of the first column $c$.

The second idea in the proof is to observe that since $M$ is over thealgebraically complete field of complex numbers, the characteristicequation $ M-lambda I =0$ has at least one solution $lambda$ and sothere is an eigenvalue $lambda$ and a basis in which $M$ can bewritten:

But we just saw that normalcy implies the length of the first columnis equal to the length of the first row, so the remaining entries ofthe first row must be zero:

Recursively applying this to the bottom-right block in the matrix wecan diagonalize $M$. That completes the proof.

Alright, so that’s the proof. But that’s not the end of the process. Ithen use Anki to go much deeper into the proof; I’ll call this the(deep) Ankification process. This Ankification process works in(roughly) two phases.

Phase I: understanding the proof: This involves multiple passes overthe proof. Initially, it starts out with what I think of as grazing,picking out single elements of the proof and converting to Ankicards. For instance, for the above proof, I have Anki cards like thefollowing:

Q: If $M$ is a complex matrix, how is the top-left entry of $MM^dagger$ related to the first row $r$ of the matrix $M$?

A: It’s the length $ r ^2$.

Q: If $M$ is a complex matrix, how is the top-left entry of$M^dagger M$ related to the first column $c$ of the matrix $M$?

A: It’s the length $ c ^2$.

I work hard to restate ideas in multiple ways. For instance, here’s arestatement of the first question above:

Q: If $M$ is a complex matrix, why is the top-left entry of$MM^dagger$ equal to the length squared $ r ^2$ of the first row?

A:

Indeed, I worked hard to simplify both questions and answers –the just given question-and-answer pair started out somewhat morecomplicated. Part of this was some minor complexity in the question,which I gradually trimmed down. The answer I’ve stated above, though,is much better than in earlier versions. Earlier versions mentioned$M$ explicitly (unnecessary), had more blocks in the matrices, used$cdots$ rather than $cdot$, and so on. You want to aim for theminimal answer, displaying the core idea as sharply aspossible. Indeed, if it was easy to do I’d de-emphasize the matrixbrackets, and perhaps find some way of highlighting the $r$,$r^dagger$ and $ r ^2$ entries. Those are the thing that reallymatters.

I can’t emphasize enough the value of finding multiple different waysof thinking about the “same” mathematical ideas. Here’s a couple morerelated restatements:

Q: What’s a geometric interpretation of the diagonal entries in thematrix $MM^dagger$?

A: The lengths squared of the respective rows.

Q: What’s a geometric interpretation of the diagonal entries in thematrix $M^dagger M$?

A: The lengths squared of the respective columns.

Q: What do the diagonal elements of the normalcy condition$MM^dagger = M^dagger M$ mean geometrically?

A: The corresponding row and column lengths are the same.

What you’re trying to do at this stage is learn your way around theproof. Every piece should become a comfortable part of your mentalfurniture, ideally something you start to really feel. That meansunderstanding every idea in multiple ways, and finding as manyconnections between different ideas as possible.

People inexperienced at mathematics sometimes memorize proofs aslinear lists of statements. A more useful way is to think of proofs isas interconnected networks of simple observations. Things are rarelytrue for just one reason; finding multiple explanations for thingsgives you an improved understanding. This is in some sense“inefficient”, but it’s also a way of deepening understanding andimproving intuition. You’re building out the network of the proof,making more connections between nodes.

One way of doing this is to explore minor variations. For instance,you might wonder what the normalcy condition $MM^dagger = M^daggerM$ means on the off-diagonal elements. This leads to questions like(again, it’s useful to enter many different variations of thisquestion, I’ll just show a couple):

Q: What does the normalcy condition $MM^dagger = M^dagger M$ meanfor the $jk$th component, in terms of the rows $r_j$ and columns$c_j$ of the matrix $M$?

A: The inner product $r_k cdot r_j = c_j cdot c_k$.

Q: The normalcy condition $MM^dagger = M^dagger M$ implies $r_kcdot r_j = c_j cdot c_k$ for rows and columns. What does this meanfor row and column lengths?

A: They must be the same.

(By the way, it’s questions like these that make me think it helps tobe fairly mathematically experienced in carrying this Ankificationprocess out. For someone who has done a lot of linear algebra theseare very natural observations to make, and questions to ask. But I’mnot sure they would be so natural for everyone. The ability to ask the“right” questions – insight-generating questions – is alimiting part of this whole process, and requires some experience.)

I’ve been describing the grazing process, aiming to thoroughlyfamiliarize yourself with every element of the proof. This is useful,but is also a rather undirected process, with no clear end point, andnot necessarily helping you understand the broader to structure of theproof. I also impose on myself a set of aspirational goals, allvariations on the idea of distilling the entire proof to one questionand (simple) answer. The aim is to fill in the answers to questionshaving forms like:

Q: In one sentence, what is the core reason a (complex) normal matrixis diagonalizable?

And:

Q: What is a simple visual representation of the proof that (complex)normal matrices are diagonalizable?

Michael Nielsen Anki Mathematics

I think of these question templates as boundary conditions or forcingfunctions. They’re things to aim for, and I try to write questionsthat will help me move toward answers. That starts with grazing, butover time moves to more structural questions about the proof, andabout how elements fit together. For instance:

Q: How many key ideas are there in the proof that complex normalmatrices are diagonalizable?

A: Two.

Q: What are the two key ideas in the proof that complex normalmatrices $M$ are diagonalizable?

A: (1) Write $M$ in a basis where the first column is all zeroesexcept the first entry; and (2) use the normalcy condition to arguethat row lengths are equal to column lengths.

The second card here is, in fact, too complicated – it’d bebetter to refactor into two or more cards, separating the two ideas,and sharpening the answers. In general, it’s helpful to make bothquestions and answers as atomic as possible; it seems to help buildclarity. That atomicity doesn’t mean the questions and answers can’tinvolve quite sophisticated concepts, but they ideally express asingle idea.

In practice, as I understand the proof better and better theaspirational goal cards change their nature somewhat. Here’s a goodexample of such an aspirational card:

Q: What is a simple visual representation of the reason that(complex) normal matrices are diagonalizable?

A:

This is pretty good – certainly, there’s a sense in which it’smuch better than the original proof! But it’s still somewhatcomplicated. What you really want is to feel every element (and theconnections between them) in your bones. Some substantial part of thatfeeling comes by actually constructing the cards. That’s a feeling youcan’t get merely by reading an essay, it can only be experienced bygoing through the deep Ankification process yourself. Nonetheless, Ifind that process, as described up to now, is also not quiteenough. You can improve upon it by asking further questionselaborating on different parts of the answer, with the intent ofhelping you understand the answer better. I haven’t done this nearlyas much as I would like. In part, it’s because the tools I have aren’twell adapted. For instance, I’d love to have an easy way ofhighlighting (say, in yellow) the crucial rows and columns that aremultiplied in the matrices above, and then connecting them to thecrucial inference on the right. But while I can easily imaginemultiple ways of doing that, in practice it’s more effort than I’mwilling to put in.

Another helpful trick is to have multiple ways of writing thesetop-level questions. Much of my thinking is non-verbal (especially insubjects I’m knowledgeable about), but I still find it useful to forcea verbal question-and-answer:

Q: In one sentence, what is the core reason a (complex) normal matrixis diagonalizable?

A: If an eigenvalue $lambda$ is in the top-left of $M$, thennormalcy means $ lambda ^2 + r ^2 = lambda ^2$, and so theremainder $r$ of the first row vanishes.

As described, this deep Ankification process can feel ratherwasteful. Inevitably, over time my understanding of the proofchanges. When that happens it’s often useful to rewrite (and sometimesdiscard or replace) cards to reflect my improved understanding. Andsome of the cards written along the way have the flavor of exhaust,bad cards that seem to be necessary to get to good cards. I wish I hada good way of characterizing these, but I haven’t gone through thisoften enough to have more than fuzzy ideas about it.

A shortcoming of my description of the Ankification process is that Icheated in an important way. The proof I wrote above was writtenafter I’d already gone through the process, and was much clearerthan any proof I could have written before going through the process.And so part of the benefit is hidden: you refactor and improve yourproof along the way. Indeed, although I haven’t been in the habit ofrewriting the refactored proof after the Ankification process (thisessay is the first time I’ve done it), I suspect it would be a goodpractice.

The inner experience of mathematics: As I reread the description ofPart I just given, it is rather unsatisfactory in that it conveyslittle of the experience of mathematics one is trying to movetoward. Let me try to explain this in the context not of Anki, butrather of an experience I’ve sometimes had while doing research, anexperience I dub “being inside a piece of mathematics”.

Typically, my mathematical work begins with paper-and-pen and messingabout, often in a rather ad hoc way. But over time if I really getinto something my thinking starts to change. I gradually internalizethe mathematical objects I’m dealing with. It becomes easier andeasier to conduct (most of) my work in my head. I will go on longwalks, and simply think intensively about the objects ofconcern. Those are no longer symbolic or verbal or visual in theconventional way, though they have some secondary aspects of thisnature. Rather, the sense is somehow of working directly with theobjects of concern, without any direct symbolic or verbal or visualreferents. Furthermore, as my understanding of the objects change– as I learn more about their nature, and correct my ownmisconceptions – my sense of what I can do with the objectschanges as well. It’s as though they sprout new affordances, in thelanguage of user interface design, and I get much practice in learningto fluidly apply those affordances in multiple ways.

This is a very difficult experience to describe in a way that I’mconfident others will understand, but it really is central to myexperience of mathematics – at least, of mathematics that Iunderstand well. I must admit I’ve shared it with some trepidation; itseems to be rather unusual for someone to describe their innermathematical experiences in these terms (or, more broadly, in theterms used in this essay).

If you don’t do mathematics, I expect this all sounds rather strange.When I was a teenager I vividly recall reading a curious letter AlbertEinstein wrote to the mathematician Jacques Hadamard, describing his(Einstein’s) thought processes. I won’t quote the whole letter, buthere’s some of the flavor:

The words or the language, as they are written or spoken, do notseem to play any role in my mechanism of thought. The psychicalentities which seem to serve as elements in thought are certainsigns and more or less clear images which can be “voluntarily”reproduced and combined… The above-mentioned elements are, in mycase, of visual and some of muscular type. Conventional words orother signs have to be sought for laboriously only in a secondarystage, when the mentioned associative play is sufficientlyestablished and can be reproduced at will.

When I first read this, I had no idea what Einstein was talkingabout. It was so different from my experience of physics andmathematics that I wondered if I was hopelessly unsuited to do work inphysics or mathematics. But if you’d asked me about Einstein’s lettera decade (of intensive work on physics and mathematics) later, I wouldhave smiled and said that while my internal experience wasn’t the sameas Einstein’s, I very much empathized with his description.

In retrospect, I think that what’s going on is what psychologists callchunking. Peoplewho intensively study a subject gradually start to build mentallibraries of “chunks” – large-scale patterns that they recognizeand use to reason. This is why some grandmaster chess players canremember thousands of games move for move. They’re not remembering theindividual moves – they’re remembering the ideas those gamesexpress, in terms of larger patterns. And they’ve studied chess somuch that those ideas and patterns are deeply meaningful, much as thephrases in a lover’s letter may be meaningful. It’s why top basketballplayers have extraordinary recall of games. Experts begin tothink, perhaps only semi-consciously, using such chunks. Theconventional representations – words or symbols in mathematics,or moves on a chessboard – are still there, but they are somehowsecondary.

So, my informal pop-psychology explanation is that when I’m doingmathematics really well, in the deeply internalized state I describedearlier, I’m mostly using such higher-level chunks, and that’s why itno longer seems symbolic or verbal or even visual. I’m not entirelyconscious of what’s going on – it’s more a sense of just playingaround a lot with the various objects, trying things out, trying tofind unexpected connections. But, presumably, what’s underlying theprocess is these chunked patterns.

Now, the only way I’ve reliably found to get to this point is to getobsessed with some mathematical problem. I will start out thinkingsymbolically about the problem as I become familiar with the relevantideas, but eventually I internalize those ideas and their patterns ofuse, and can carry out a lot (not all) of operations inside my head.

What’s all this got to do with the Ankification process? Well, I saidthat the only reliable way I’ve found to get to this deeplyinternalized state is to obsess over a problem. But I’ve noticed thatwhen I do the Ankification process, I also start to think less andless in terms of the conventional representations. The more questionsI write, the more true this seems to be. And so I wonder if theAnkification process can be used as a kind of deterministic way ofattaining that type of state. (Unfortunately, I can’t get obsessedwith a problem on demand; it’s a decidedly non-deterministic process!)

One consequence of this for the Ankification process is that over timeI find myself more and more wanting to use blank answers: I don’t havea conventional symbolic or visual representation for theanswer. Instead, I have to bring to mind the former experience of theanswer. Or, I will sometimes use an answer that would be essentiallyunintelligible to anyone else, relying on my internal representationto fill in the blanks. This all tends to occur pretty late in theprocess.

Now, unfortunately, this transition to the chunked,deeply-internalized state isn’t as thorough when I’m Ankifying as itis when obsessively problem solving. However, I suspect it greatlyenables such a transition. (I rarely obsessively problem solve thesedays, so I haven’t yet had a chance to see this happen.) And I dowonder if there are types of question I can ask that will help me getmore fully to the deeply-internalized state. What seems to be lackingis a really strongly-felt internalization of the meaning of answerslike that shown above:

A:

That type of strongly-felt meaning can, however, be built by usingsuch representations in many different ways as part ofproblem-solving; it builds fluency and familiarity. But I haven’tactually done it.

Phase II: variations, pushing the boundaries: Let’s get back todetails of how the Ankification process works. One way of deepeningyour understanding further is to find ways of pushing the boundariesof the proof and of the theorem. I find it helpful to consider manydifferent ways of changing the assumptions of the theorem, and to askhow it breaks down (or generalizes). For instance:

Q: Why does the proof that complex normal matrices are diagonalizablefail for real matrices?

A: It may not be possible to find an eigenvector for the matrix,since the real numbers aren’t algebraically complete.

Q: What’s an example of a real normal matrix that isn’tdiagonalizable by a real orthogonal matrix?

A:

As per usual, these questions can be extended and varied in many ways.

Another good strategy is to ask if the conditions can be weakened. Forinstance, you might have noticed that we only seemed to use thenormality condition on the diagonal. Can we get away with requiring$M^dagger M = MM^dagger$ just on the diagonal? In fact, somereflection shows that the answer is no: we need it to be true in abasis which includes an eigenvector of $M$. So we can add questionslike this:

Q: In the proof that normalcy implies diagonalizability, why does itnot suffice to require that $M^dagger M = MM^dagger$ only on thediagonal?

A: Because we need this to be true in a particular basis, and wecannot anticipate in advance what that basis will be.

Or we can try to generalize:

Q: For which fields is it possible to generalize the result thatcomplex normal matrices are diagonalizable?

A: [I haven’t checked this carefully!] For algebraically completefields.

(My actual Anki card doesn’t have the annotation in the lastanswer. But it’s true: I haven’t checked the proof carefully. Still,answering the question helped me understand the original proof and theresult better.)

This second phase really is open-ended: we can keep putting invariations essentially ad infinitum. The questions are no longerdirectly about the proof, but rather are about poking it in variousways, and seeing what happens. The further I go, and the more Iconnect to other results, the better.

“The” proof? Having described the two phases in this Ankificationprocess, let me turn to a few miscellaneous remarks. One complicationis that throughout I’ve referred to “the” proof. Of course,mathematical theorem often have two or more proofs. Understandingmultiple proofs and how they relate is a good way of deepening one’sunderstanding further. It does raise an issue, which is that some ofthe Anki questions refer to “the” proof of a result. I must admit, Idon’t have an elegant way of addressing this! But it’s something Iexpect I’ll need to address eventually.

A related point is how much context-setting to do in the questions– do we keep referring, over and over, to “the proof that$MM^dagger = M^dagger M$ implies normalcy”, or to “if $M$ is acomplex matrix” (and so on)? In my Anki cards I do (note that I’veelided this kind of stuff in some of the questions above), but franklyfind it a bit irritating. However, since the cards are studied atunknown times in the future, and I like to mix all my cards up in asingle deck, some context-setting is necessary.

What have I used this to do? I’ve used this process onthree-and-a-half theorems so far:

  • Complex normal matrices are diagonalizable.
  • Euler’s theorem that $a^{phi(n)} equiv 1 (mod n)$ for any number$a$ coprime to positive integer $n$, and $phi(n)$ is Euler’stotient function.
  • Lagrange’s theorem (used in the proof of Euler’s theorem) that theorder of a subgroup of a finite group must divide the order of theentire group.
  • I’ve started the process for the fundamental theorem of algebra,stating that every non-constant polynomial has a zero in the complexplane. I was interrupted (I don’t recall why), and never finishedit.

It’s quite time-intensive. I don’t have any easy way to count thenumber of questions I’ve added for each of these theorems, but I guesson the order of dozens of cards for each. It takes a few hourstypically, though I expect I could easily add many more questions.

[Note added: in the initial version of this essay I wrote “100 cardsfor each”. I looked, and in fact there are fewer – on the orderof dozens, well short of 100. This surprised me – if anything,I’d have guessed my error was in underestimation. The card-addingprocess was intense, however, which perhaps accounts for my badlymistaken impression.]

Seeing through a piece of mathematics: This is all a lot of work!The result, though, has been a considerable deepening in myunderstanding of all these results. There’s a sense of being able to“see through” the result. Formerly, while I could have written down aproof that normal matrices are diagonalizable, it was all a bitmurky. Now, it appears almost obvious, I can very nearly seedirectly that it’s true. The reason, of course, is that I’m far morefamiliar with all the underlying objects, and the relationshipsbetween them.

My research experience has been that this ability to see through apiece of mathematics isn’t just enjoyable, it’s absolutely invaluable;it can give you a very rare level of understanding of (and flexibilityin using) a particular set of mathematical ideas.

Discovering alternate proofs: After going through the Ankificationprocess described above I had a rather curious experience. I went fora multi-hour walk along the San Francisco Embarcadero. I found that mymind simply and naturally began discovering other facts related to theresult. In particular, I found a handful (perhaps half a dozen) ofdifferent proofs of the basic theorem, as well as noticing manyrelated ideas. This wasn’t done especially consciously – rather,my mind simply wanted to find these proofs.

At the time these alternate proofs seemed crystalline, almostobvious. I didn’t bother writing them down in any form, or adding themto Anki; they seemed sufficiently clear that I assumed I’d rememberthem forever. I regret that, for later I did not recall the proofs atall.

Curiously, however, in the process of writing these notes I haverecalled the ideas for two of the proofs. One was something like thefollowing: apply the condition $M^dagger M = MM^dagger$ directly tothe upper triangular form $M = D+T$ where $D$ is diagonal and $T$ isstrictly upper triangular; the result drops out by considering thediagonal elements. And another was to apply the normalcy condition tothe singular value decomposition for the matrix $M$; the proof dropsout immediately when the singular values are distinct, and can berecovered with a little work when the singular values are not.

Simplicity of the theorems: The three-and-a-half theorems mentionedabove are all quite elementary mathematics. What about using thisAnkification process to deepen my understanding of more advancedmathematical ideas? I’ll certainly try it at some point, and amcurious about the effect. I’m also curious to try the process withnetworks of related theorems – I suspect there will be somesurprising mutual benefits in at least some cases. But I don’t yetknow.

In what sense is this really about Anki flashcards? There’s verylittle in the above process that explicitly depended on me usingAnki’s spaced-repetition flashcards. Rather, what I’ve described is ageneral process for pulling apart the proof of a theorem and makingmuch more sense of it, essentially by atomizing the elements. There’sno direct connection to Anki at all – you could carry out theprocess using paper and pencil.

Nonetheless, something I find invaluable is the confidence Anki bringsthat I will remember what I learn from this process. It’s not so muchany single fact, but rather a sense of familiarity and fluency withthe underlying objects, an ability to simply see relationships betweenthem. That sense does fade with time, but far less rapidly than if Isimply didn’t think about the proof again. That’s a large payoff, andone that I find makes me far more motivated to go through theprocess. Perhaps other people, with different motivations, would findAnki superfluous.

That said, I do have some sense that, as mentioned earlier, some ofthe cards I generate are a type of exhaust, and would be better offexcluded from the process. This is especially true of many of thecards generated early in the process, when I’m still scratchingaround, trying to get purchase on the proof. Unfortunately, also asmentioned above, I don’t yet have much clarity on which cards areexhaust, and which are crucial.

Can I share my deck? When I discuss Anki publicly, some peoplealways ask if I can share my deck. The answer is no, for reasons I’veexplained here. I must admit,in the present case, I don’t really understand why you’d want to use ashared deck. In part, that’s because so much of the value is in theprocess of constructing the cards. But even more important: I suspecta deck of 100+ of my cards on the proof above would be largelyillegible to anyone else – keep in mind that you’d see the cardsin a randomized order, and without the benefit of any of the contextabove. It’d be an incomprehensible mess.

Discovery fiction: I’ve described this Ankification process as amethod for more deeply understanding mathematics. Of course, it’s justone approach to doing that! I want to briefly mention one otherprocess I find particularly useful for understanding. It’s to writewhat I call discovery fiction. Discovery fiction starts with thequestion “how would I have discovered this result?” And then you tryto make up a story about how you might have come to discover it,following simple, almost-obvious steps.

Two examples of discovery fiction are my essayexplaining how you might have come to invent Bitcoin, and my essayexplaining how you might have invented an advanced data structure (theBloom filter).

Writing discovery fiction can be tough. For the theorem considered inthis essay, it’s not at all clear how you would have come to theresult in the first place. But maybe you started out alreadyinterested in $M^dagger$, and in the question of when two matrices$A$ and $B$ commute. So you ask yourself: “Hmm, I wonder what itmight mean that $M$ and $M^dagger$ commute?” If you’re willing togrant that as a starting point, then with some work you can probablyfind a series of simple, “obvious” steps whereby you come to wonder ifmaybe $M$ is diagonalizable, and then discover a proof.

Any such “discovery fiction” proof will be long – far longerthan the proof above. Even a cleaned-up version will be – shouldbe! – messy and contain false turns. But I wanted to mentiondiscovery fiction as a good example of a process which gives rise to avery different kind of understanding than the Ankification process.

What about other subjects? Mathematics is particularly well suitedto deep Ankification, since much of it is about precise relationshipsbetween precisely-specified objects. Although I use Anki extensivelyfor studying many other subjects, I haven’t used it at anything likethis kind of depth. In the near future, I plan to use a similarprocess to study some of the absolute core results about climatechange, and perhaps also to study some of the qualities of goodwriting (e.g., I can imagine using a similar process to analyze thelead sentences from, say, 30 well-written books). I don’t know howthis will go, but am curious to try. I’m a little leery of coming torely too much on the process – creative work also requires manyskills at managing uncertainty and vagueness. But as a limited-usecognitive tool, deep Ankification seems potentially valuable in manyareas.

Acknowledgments

Many thanks to everyone who has talked with me about spaced-repetitionmemory systems. Especial thanks to Andy Matuschak, whose conversationhas deeply influenced how I think about nearly all aspects of spacedrepetition. And thanks to Kevin Simler for additional initialencouragement to write about my spaced repetition practice.

Citation and licensing

In academic work, please cite this as: Michael A. Nielsen, “Usingspaced repetition systems to see through a piece of mathematics”http://cognitivemedium.com/srs-mathematics, 2019.

This work is licensed under a Creative CommonsAttribution-NonCommercial 3.0 Unported License. This means you’re freeto copy, share, and build on this essay, but not to sell it. If you’reinterested in commercial use, please contact me.

Michael Nielsen Ankita Lokhande

If you start a spaced-repetition collection and diligently study every day, you might soon face an unusual question: what should you learn?Sometimes, in your particular circumstances, the answer is obvious – the material you need to pass the bar exam, for example.But sooner or later, if you like learning things, or you just don’t have that much new information you obviously need to learn on a daily basis, you may find yourself with more study time than material.Or you may simply be impressed with the power at your fingertips and feel like you should be doing something to take advantage of it.

I suspect that choosing learning material poorly is the primary reason that many people fail to see the full benefits of spaced repetition.To answer the question, I want to take a trip down a line of inquiry that might get dangerously close to epistemology for a practical blog.Hold onto your hat – we’ll return to goals soon enough.

Why is knowledge valuable?

Given that you’re reading a series of long blog posts on memory, I think it’s safe to assume that you like knowing things and appreciate the value of knowing things; but you probably have never examined exactly why.You may have your own answer, and you could spend hours thinking about it if you cared to, but I submit that knowledge per se has no value at all.Knowledge is instead valuable because of the expanded possibilities it offers: when we know a certain piece of information, we can now do something, or think something, that we couldn’t before.Think of Hermann Ebbinghaus’ self-research on the forgetting curve using nonsense syllables: quite aside from the scientific information gained by his process, Ebbinghaus certainly gained no personal, professional, or spiritual benefits from learning strings of nonsense syllables, because nonsense syllables don’t have any meaning and consequently don’t help us do or think anything.

If you doubt that nonsense syllables constitute knowledge at all, consider a more reasonable example, in which a student learns some material by rote without thinking about its meaning.Again, because the student doesn’t understand it, it doesn’t allow him to do or think anything new, and it has little value.Or at an even broader level, imagine a student who learns about the history of China, fully understanding it, but never once has a chance to apply any of that information: she never reads a book where that history provides a useful background, thinks about the lessons the events could offer her life today, brings any of it up in conversation, compares the history to current events or to another civilization’s history, or the like.Did she benefit from learning this history?

Of course, in real life, anything we learn will have some benefit, provided what we learn is more or less accurate; even without any kind of spaced-repetition study or mnemonics, we can’t learn the history of an entire civilization and never use any of that knowledge, even unintentionally.Rather, it’s a matter of degree.Some knowledge will prove practically useful, or change our patterns of thought in ways that make us more interesting or better people, far more than other knowledge will.And the difference can be drastic.Learning the Big Five keyboard shortcuts will make any office worker’s life easier every day.Learning who the 27th president of the United States was will only on rare occasions give you any useful insight at all.

Goal-driven learning

I want to dwell on that last example for a moment.What exactly makes the difference between learning that the 27th president of the United States was William Howard Taft and learning that the keyboard shortcut to cut text is Ctrl-X?We shouldn’t make the mistake of saying it’s practicality; it’s enjoyable and meaningful to know what happens in famous works of literature, or how a beautiful scientific theory works that you’ll never use in real life, or some random facts that seem striking to you, but none of these things are practical.It’s also not immediacy; it’s valuable to know what to do if you get caught outside in a tornado, even though most people are even less likely to use that piece of information than who the 27th president was, and they certainly won’t need it anytime soon.I’m inclined to think the real difference is emotional connection.Many American citizens might feel it would be great to learn the presidents of their country, out of some kind of sense of civic duty.But duty is the right word here; are you really excited about learning a list of stuffy old white guys (with the occasional character)?Maybe some people are, but I’m willing to bet most aren’t.

I see this as the tragedy of learning, especially when described as “memorizing,” in the twenty-first century.When we think “ooh, I can learn anything!”, we think of learning things like lists of people we’ve barely even heard of (unless you’re a United States historian, do you know anything about Taft at all?).But look around you – everything you do and think relies on and is shaped by your memory.What things do you want to be able to do?What ideas do you want to develop?Who do you want to be?Choose the right things to remember, and you’re well on your way there.

This brings us to goal-driven learning: choose things to learn with Anki that will help you accomplish some goal you have.If you want to become a better computer programmer, add cards that call out features you rarely use in your favorite language, or cards that remind you of the mistakes you’ve made in the past and how to avoid them.If you want to improve your vocabulary, add cards for new words you’ve learned recently.

Goal might smack a bit of business and the inhuman side of ruthless efficiency for some people, so let’s be clear: as in the set of examples offered earlier, your goal doesn’t have to be “practical” in any sense, much less something you’ll immediately do at your job or around the house.You might learn literary vocabulary and outlines of famous books because you want to be a well-read person.You might learn scientific theories because you want to understand how the world around you works (to the greatest extent any human can).You might learn random facts because you like playing trivia games or collecting anecdotes.The point is that you feel you want to learn the information and you can explain why you want to learn it, even if the reason is just that it’s fascinating and you want to think about it again in the future.

Lastly, lest I be too hard on the presidents of the United States, I do not mean to argue knowing who has governed your country is useless knowledge.I do think you might be better off learning about the presidents in the context of the events surrounding their presidencies, or of their accomplishments or lives, rather than by number or date, in a list; it’ll be easier to remember them and more rewarding that way.You’ll also be more likely to recall the information in a relevant context because you’ve added more hooks, so it will be more meaningful as well.But even if you just want to learn the list, there’s nothing wrong with that (so long as you understand the information you’re learning).Memorizing dumps of information, especially with mnemonic techniques, can be legitimately fun once in a while, not to mention useful.However, this kind of learning ought to be the exception rather than the rule.And if the prospect of learning the list doesn’t excite you and you don’t have a practical, immediate reason you need to know it (e.g., an upcoming exam), absolutely don’t try – you’re more likely to kill your motivation for learning than to get anything meaningful out of it.

Michael Nielsen Anki Net Worth

Lazy learning

Suppose you agree with me, and you now know what kinds of things you want to learn: things that you care about and are interested in learning because they have some practical or intellectual value to you.How do you go about seeking out and collecting the specific facts and ideas to learn?How do you filter the things you find?

I favor a method I have colorfully termed lazy learning, by analogy with the software-engineering practices of lazy loading and lazy evaluation.Say you have a task you know will take a little while and consume some resources, like downloading 50 megabytes of data from the Internet.But you also know that you only need to download those 50 megabytes 0.01% of the time, when the user chooses an obscure menu option.It often makes sense to only start the task once you know you’re going to use that data; sure, the user will have to wait a minute for it to download the first time they click the button, but it’s worse to waste everyone else’s bandwidth and hard drive space on the off chance they need the option.

Similarly, lazy learning allows you to skip deciding what information might be meaningful to learn altogether.Instead, you learn information once you know it’s meaningful to you, because you’ve already used it or organically come across it.Information you’ve used or encountered once is far more likely to be useful again than information you’ve never used before, for the same reason that lightning often strikes the same place repeatedly (sorry proverb, you’re dead wrong).If something’s been struck once, chances are it’s in a favorable place for lightning strikes, so there’s a good chance it will be struck again sooner or later.In the context of spaced repetition, once we learn a piece of information once, if it seems at all plausible we’ll want it again and we stand to lose more than a few minutes or miss an insightful connection by having to re-learn it, that’s our cue to create cards to permanently remember that information.

Lazy learning is not only valuable for efficiency reasons.New Anki users have a tendency to cram heaps of information that they feel “might be useful someday!” into their collections, simply because they can.Aside from the very real chance it will never be useful and the time spent learning it will go to waste, people seldom have a strong emotional connection with this material: after all, they don’t know yet if it will help them reach their goals.It becomes a weight they supposedly have to overcome to understand their chosen topic, instead of a friendly reminder of things they already know are valuable.Learning things you don’t care about makes doing reviews feel like a chore, and if you can’t get around that, chances are you will sooner or later get demotivated and stop using Anki.

“Using” a piece of information should be defined loosely.Perhaps you had to search the web or a reference book to find the solution to a problem; that obviously qualifies as using that information.But it might also mean that you were reading a book or just having a conversation and encountered an intriguing idea or a fact you think could be useful in a future project.This kind of information is arguably even more valuable than information you’ve used to perform a task; it’s the information that will lead you to creative insights, teach you where you need to look next to solve a problem or learn more, and keep you interested in learning.In Why You Still Need to Know Things, I pointed out that sometimes you don’t know you need a particular piece of information in a given situation unless you’ve encountered something similar before; this kind of information helps you spot these connections.So into your collection it goes.

Michael Nielsen Anki

The lazy-learning policy demands one exception.Namely, it doesn’t work at all if, by the time you discover you would find the information useful, you no longer have the opportunity to learn it – e.g., after you’re in the exam room, or after you’ve been hired for a new job where you claimed to have the experience already.In this case, you may be forced to learn ahead, without knowing for sure whether it will help you meet your practical or intellectual goals.In the Internet age, however, these situations are few and far between.Even when doing something classically reliant on memory like speaking a foreign language, rare is the scenario where you can’t talk your way around not knowing a word or phrase if it suddenly comes up unexpectedly, then learn it afterwards for next time.

Side note:A popular trend in the business world is “just-in-time” education: instead of teaching something in a classroom and then not using it for months (or maybe never using it at all), employees take the course right before they need to know the information in it.The JIT education movement starts out in line with the lazy-learning method, but then misses the most important point: once you’ve learned the information, you should not forget it again!Too many people cite JIT as useful primarily because you’ll just forget the information if you take the course too early, but if you use spaced repetition, you will only need to lazy-learn material once.

Michael Nielsen Ankita

Miscellaneous tips

A few thoughts on what to learn that didn’t fit anywhere else:

Michael Nielsen Anki Wikipedia

  • Even when lazy learning,make sure you understand how much you can learn in a given period of time,so you don’t become overwhelmed with reviews.If you haven’t done so already,check out How Much to Learn with Ankifor some estimates.You’re much more likely to become overwhelmed with shared decksthat you use to learn a body of material that might be useful someday, though,since you get all the content at the startinstead of gradually adding it over time.

  • When you’re learning a new topic or doing a lot of reading,you can make your life a lot easierby putting all the basic terms and concepts you don’t know into Anki.As Michael Nielsen has explainedin his fantastic article on spaced repetition,a surprising amount of difficulty in assimilating new materialis lack of understanding of the basics.As he says, it’s hard to follow an explanation or argumentwhen it’s full of words you don’t understand!

  • Regardless of what broad subjects they’re interested in,most people can benefit from a “miscellaneous” deck or tag,in which they place stupid items they can never seem to remember, like which key opens which door or what street the dentist is on or what the phone number of the company helpdesk is.Be aware that these questions can be harder to remember than you might expect, perhaps because they’re not connected to any other knowledge and thus offer fewer memory cues and routes of access, or perhaps because you only think to add them when you struggle to remember them in the first place; it may be helpful to intentionally use mnemonics on these from the start, and pay special attention to making the cards precise.

  • It’s normal and totally fine to learn something and then later decide it’s more burdensome than helpful and you no longer want to maintain it.In real life, this happens naturally: we call it “forgetting.”You stop caring about it and using it, and eventually you forget it and you’re free.Since Anki artificially controls the process of forgetting, we have to artificially decide what we no longer care about, by deleting or suspending the material.(If you might want the cards again later, you might prefer to suspend them – then you’ll still be able to search for them and can quickly pull them back into review if you decide you made a mistake.)

  • Last but not least, there’s so much out there to learn, and life is so short, that it’s a terrible shame to waste it learning things you don’t care about.If you’re not thoughtful about it, it’s surprisingly easy to choose to learn things you don’t care about.As I’ve said before, choose wisely!