### 2.2. Safe Use of Infinitesimals

The idea of infinitesimally small numbers has always irked purists. One prominent critic of the calculus was Newton's contemporary George Berkeley, the Bishop of Cloyne. Although some of his complaints are clearly wrong (he denied the possibility of the second derivative), there was clearly something to his criticism of the infinitesimals. He wrote sarcastically, “They are neither finite quantities, nor quantities infinitely small, nor yet nothing. May we not call them ghosts of departed quantities?”

Figure F. Bishop George Berkeley (1685-1753)

Infinitesimals seemed scary, because if you mishandled them, you could prove absurd things. For example, let du be an infinitesimal. Then 2du is also infinitesimal. Therefore both 1/du and 1/(2du) equal infinity, so 1/du = 1/(2du). Multiplying by du on both sides, we have a proof that 1=1/2.

In the eighteenth century, the use of infinitesimals became like adultery: commonly practiced, but shameful to admit to in polite circles. Those who used them learned certain rules of thumb for handling them correctly. For instance, they would identify the flaw in my proof of 1=1/2 as my assumption that there was only one size of infinity, when actually 1/du should be interpreted as an infinity twice as big as 1/(2du). The use of the symbol ∞ played into this trap, because the use of a single symbol for infinity implied that infinities only came in one size. However, the practitioners of infinitesimals had trouble articulating a clear set of principles for their proper use, and couldn't prove that a self-consistent system could be built around them.

By the twentieth century, when I learned calculus, a clear consensus had formed that infinite and infinitesimal numbers weren't numbers at all. A notation like dx/dt, my calculus teacher told me, wasn't really one number divided by another, it was merely a symbol for something called a limit,

$\lim_{\Delta t\rightarrow 0} \frac{\Delta x}{\Delta t} ,$

where Δ x and Δ t represented finite changes. I'll give a formal definition (actually two different formal definitions) of the term “limit” in section 3.2, but intuitively the concept is that we can get as good an approximation to the derivative as we like, provided that we make Δ t small enough.

That satisfied me until we got to a certain topic (implicit differentiation) in which we were encouraged to break the dx away from the dt, leaving them on opposite sides of the equation. I buttonholed my teacher after class and asked why he was now doing what he'd told me you couldn't really do, and his response was that dx and dt weren't really numbers, but most of the time you could get away with treating them as if they were, and you would get the right answer in the end. Most of the time!? That bothered me. How was I supposed to know when it wasn't “most of the time?”

Figure G. Abraham Robinson (1918-1974)

But unknown to me and my teacher, mathematician Abraham Robinson had already shown in the 1960's that it was possible to construct a self-consistent number system that included infinite and infinitesimal numbers. He called it the hyperreal number system, and it included the real numbers as a subset [3].

[3] The main text of this book treats infinitesimals with the minimum fuss necessary in order to avoid the common goofs. More detailed discussions are often relegated to the back of the book, as in Example 11. The reader who wants to learn even more about the hyperreal system should consult the list of further reading on page 201 (pdf version).

Moreover, the rules for what you can and can't do with the hyperreals turn out to be extremely simple. Take any true statement about the real numbers. Suppose it's possible to translate it into a statement about the hyperreals in the most obvious way, simply by replacing the word “real” with the word “hyperreal.” Then the translated statement is also true. This is known as the transfer principle.

Let's look back at my bogus proof of 1=1/2 in light of this simple principle. The final step of the proof, for example, is perfectly valid: multiplying both sides of the equation by the same thing. The following statement about the real numbers is true:

For any real numbers a, b, and c, if a=b, then ac=bc.

This can be translated in an obvious way into a statement about the hyperreals:

For any hyperreal numbers a, b, and c, if a=b, then ac=bc.

However, what about the statement that both 1/du and 1/(2du) equal infinity, so they're equal to each other? This isn't the translation of a statement that's true about the reals, so there's no reason to believe it's true when applied to the hyperreals --- and in fact it's false.

What the transfer principle tells us is that the real numbers as we normally think of them are not unique in obeying the ordinary rules of algebra. There are completely different systems of numbers, such as the hyperreals, that also obey them.

How, then, are the hyperreals even different from the reals, if everything that's true of one is true of the other? But recall that the transfer principle doesn't guarantee that every statement about the reals is also true of the hyperreals. It only works if the statement about the reals can be translated into a statement about the hyperreals in the most simple, straightforward way imaginable, simply by replacing the word “real” with the word “hyperreal.” Here's an example of a true statement about the reals that can't be translated in this way:

For any real number a, there is an integer n that is greater than a.

This one can't be translated so simplemindedly, because it refers to a subset of the reals called the integers. It might be possible to translate it somehow, but it would require some insight into the correct way to translate that word “integer.” The transfer principle doesn't apply to this statement, which indeed is false for the hyperreals, because the hyperreals contain infinite numbers that are greater than all the integers. In fact, the contradiction of this statement can be taken as a definition of what makes the hyperreals special, and different from the reals: we assume that there is at least one hyperreal number, H, which is greater than all the integers.

As an analogy from everyday life, consider the following statements about the student body of the high school I attended:

1. Every student at my high school had two eyes and a face.
2. Every student at my high school who was on the football team was a jerk.

Let's try to translate these into statements about the population of California in general. The student body of my high school is like the set of real numbers, and the present-day population of California is like the hyperreals. Statement 1 can be translated mindlessly into a statement that every Californian has two eyes and a face; we simply substitute “every Californian” for “every student at my high school.” But statement 2 isn't so easy, because it refers to the subset of students who were on the football team, and it's not obvious what the corresponding subset of Californians would be. Would it include everybody who played high school, college, or pro football? Maybe it shouldn't include the pros, because they belong to an organization covering a region bigger than California. Statement 2 is the kind of statement that the transfer principle doesn't apply to [4].

[4] For a slightly more precise and formal statement of the transfer principle, see page 143 (pdf version).

#### Example 14

As a nontrivial example of how to apply the transfer principle, let's consider how to handle expressions like the one that occurred when we wanted to differentiate t2 using infinitesimals:

$\frac{d t^2}{dt} = 2t+dt .$
I argued earlier that 2t+dt is so close to 2t that for all practical purposes, the answer is really 2t. But is it really valid in general to say that 2t+dt is the same hyperreal number as 2t? No. We can apply the transfer principle to the following statement about the reals:

For any real numbers a and b, with b≠ 0, a + ba.

Since dt isn't zero, 2t + dt ≠ 2t.

More generally, Example 14 leads us to visualize every number as being surrounded by a “halo” of numbers that don't equal it, but differ from it by only an infinitesimal amount. Just as a magnifying glass would allow you to see the fleas on a dog, you would need an infinitely strong microscope to see this halo. This is similar to the idea that every integer is surrounded by a bunch of fractions that would round off to that integer. We can define the standard part of a finite hyperreal number, which means the unique real number that differs from it infinitesimally. For instance, the standard part of 2t+dt, notated st(2t+dt), equals 2t. The derivative of a function should actually be defined as the standard part of dx/dt, but we often write dx/dt to mean the derivative, and don't worry about the distinction.

One of the things Bishop Berkeley disliked about infinitesimals was the idea that they existed in a kind of hierarchy, with dt2 being not just infinitesimally small, but infinitesimally small compared to the infinitesimal dt. If dt is the flea on a dog, then dt2 is a submicroscopic flea that lives on the flea, as in Swift's doggerel: “Big fleas have little fleas/ On their backs to ride 'em,/ and little fleas have lesser fleas,/And so, ad infinitum.” Berkeley's criticism was off the mark here: there is such a hierarchy. Our basic assumption about the hyperreals was that they contain at least one infinite number, H, which is bigger than all the integers. If this is true, then 1/H must be less than 1/2, less than 1/100, less then 1/1,000,000 --- less than 1/n for any integer n. Therefore the hyperreals are guaranteed to include infinitesimals as well, and so we have at least three levels to the hierarchy: infinities comparable to H, finite numbers, and infinitesimals comparable to 1/H. If you can swallow that, then it's not too much of a leap to add more rungs to the ladder, like extra-small infinitesimals that are comparable to 1/H2. If this seems a little crazy, it may comfort you to think of statements about the hyperreals as descriptions of limiting processes involving real numbers. For instance, in the sequence of numbers 1.12=1.21, 1.012=1.0201, 1.0012=1.002001, ..., it's clear that the number represented by the digit 1 in the final decimal place is getting smaller faster than the contribution due to the digit 2 in the middle.

One subtle issue here, which I avoided mentioning in the differentiation of the sine function on page 28, is whether the transfer principle is sufficient to let us define all the functions that appear as familiar keys on a calculator: x2, $$\sqrt{x}$$ , sin x, cos x, ex, and so on. After all, these functions were originally defined as rules that would take a real number as an input and give a real number as an output. It's not trivially obvious that their definitions can naturally be extended to take a hyperreal number as an input and give back a hyperreal as an output. Essentially the answer is that we can apply the transfer principle to them just as we would to statements about simple arithmetic, but I've discussed this a little more on page 149 (pdf version).