Deriving Euler’s Identity

Euler’s Identity has been called “the most beautiful equation” in mathematics. It neatly encapsulates five key values and three operators into a true equation: \[e^{\pi i} – 1 = 0\]

But why is it true? In this entry, I’m going to take it apart. Fully understanding the equation involves looking at various parts of algebra and calculus. The identity hides a jigsaw puzzle of important concepts.


A function is a machine that takes some sort of input and creates a predictable output. Although functions don’t need to be limited like this, we usually talk about functions in terms of taking a single number and creating a new single number.

We learn certain functions years before we learn the term “function”. For instance, when we’re told that 3 + 4 = 7, that’s a function. In its most general form, addition is even a binary function: It takes two numeric inputs and creates a single numeric output: add{3, 4} = 7. We learn that addition doesn’t care what order it gets its inputs, while subtraction does: subtract(4, 3) = 1 but subtract(3, 4) = -1.

We can also look at specific addition functions, which take a single input: 3 + _ = 7 is solved by putting a 4 in the blank. We can use this to create a general function: 3 + _. We know that, no matter what number we put in the blank, we can figure out what the value of the sum will be.

In later years, we start using function notation. This is just a way of being formal about functions so that we can make general comments about them, and look at their behavior. In this case, \[f(x) = 3 + x\] says to add three to whatever number we pick. \(f(x)\) says that the machine is called “f” (for function), and the input is called “x” (just because). This does not mean “f times x”: f is not a variable or unknown value, it’s the name of the function machine.

With this particular function, if we add one to the input, the output will also increase by one. If we add ten to the input, the output will also increase by ten. That’s not true of all functions. For instance, for the function \[f(x) = x^2,\] the function increases by 3 when we go from inputting 1 to inputting 2, but by 5 when we go from inputting 2 to inputting 3. The change in the output isn’t the same, even though we’ve gone up by the same amount in each case.


While most of this post can be understood with algebra, we do need one key idea from calculus. It has a strange name, which makes it more difficult to see what’s going on.

In the previous section, I explained that functions that look like \(f(x) = x + 3\) change at a rate that’s constant when the input changes at a constant rate. In this case, when the input increases by one, the output increases by one. There are other functions that act in a similar way: \(f(x) = 2x – 1\) has a two-to-one relationship. When the input increases by one, the output increases by two.

As a general rule for functions that look like \(f(x) = a \cdot x + b\), where \(a\) and \(b\) are any number you feel like choosing, the output changes are a rate of \(a\) times the input. This is a new function, even if it’s a silly one. If \(f(x) = ax + b\), then \(g(x) = a\). This new function, which tells us how quickly the output changes, based on the input, is called a derivative. We indicate that a function is derived from another function (that is, it’s a derivative) by using a prime symbol: \(f\prime(x)\) is the first derivative of \(f(x)\).

Notice that I say “first”. Because this is a new function, we can take its derivative as well. We’ll come back to this.

Let’s look at the function \(f(x) = x^2\). Here’s some input and output:

Input: 0 1 2 3 4 5 6 7 8 9 10
Output: 0 1 4 9 16 25 36 49 64 81 100
Change 1: 1 3 5 7 9 11 13 15 17 19
Change 2: 2 2 2 2 2 2 2 2 2

Notice that how much the output changes (change 1) from stepping up once depends on where we start: At low numbers, the output only changes a little bit. At larger numbers, the change is also larger. At the same time, though, notice that the change is increasing in a consistent way. Changing the input from 0 to 1 causes the output to go up by 1; changing the input from 1 to 4 causes the output to go up by 3. That’s an average change of 2.

Once we notice that the average change around each input value is twice that input value, we can see that the derivative of \(f(x) = x^2\) is \(f\prime(x) = 2x\). And we know the derivative of this derivative (change 2): \(f\prime\prime(x) = 2\).

Before I generalize, let’s look at another function, \(g(x) = x^3\). I’ll use g so we can still talk about the previous function at the same time. Here’s some input and output:

Input: 0 1 2 3 4 5 6 7 8 9 10
Output: 0 1 8 27 64 125 216 343 512 729 1000
Change 1: 1 7 19 37 61 91 127 169 217 271
Change 2: 6 12 18 24 30 36 42 48 54
Change 3: 6 6 6 6 6 6 6 6

This time, we have to look at three levels of changes before we get to a constant. The second level of change seems to be behaving like our first level of change in \(f(x)=x^2\).

Can we write a function for the first level of change? Unfortunately, it’s not as easy as taking the average of the values. That only worked for \(f(x)=x^2\). We can tell that \(g\prime(1)\) (that is, the value of \(g\prime(x)\) when the input is 1) has to be between 1 and 7, but we don’t know what it is. Likewise, \(g\prime(2)\) has to be between 19 and 37, somewhere. We can also figure out the second derivative of \(g(x)\), that is, \(g\prime\prime(x) = 6x\), and that \(g\prime\prime\prime(x) = 6\). Can we make this happen?

But what happened to get the derivative of \(f(x)\)? We put the exponent out in front of the function, and decreased the exponent by 1. Let’s go backwards a second: If \(g\prime\prime(x) = 6x\), then could it be that \(g\prime(x) = 3x^2\)? That fits the data, and it also fits the pattern: If we go backwards again by increasing the exponent by 1 and dividing the coefficient by this new exponent, we get \(g(x)\). That is: \[g(x) = x^3 \Rightarrow g\prime(x) = 3x^2 \Rightarrow g\prime\prime(x) = 6x \Rightarrow g\prime\prime\prime(x)=6\]

This can be generalized as what’s called the Power Rule of Calculus: \[f(x) = ax^b \Rightarrow f\prime(x) = abx^{b-1}\]

This works with any power function, that is, a function where every term is written as \(ax^b\). This is absolutely key to understanding Euler’s identity.

Note that this means that the derivative of a constant is zero. That’s because constants never change, so their rate of change is zero.

Taylor Series: Part One

Every function that we could come up has a derivative, although some are hard to find or can’t be written neatly.

However, notice that the derivatives we’ve had so far have been different functions. So it’s a fair question: Is there a function that has itself as a derivative? That is, is it ever the case that \(f\prime(x) = f(x)\)?

Let’s assume there is, and let’s assume we can write it in the form of a polynomial. Call it \(r(x)\), just to give it a name. We might need a lot of terms, so let’s start with a bunch: \[r(x) = a_0 + a_1x + a_2x^2 + a_3x^3 + a_4x^4 + a_5x^5 + a_6x^6 + a_7x^7 + a_8x^8\]

All those \(a\)s are coefficients. Our goal is to figure out what coefficients will make \(r(x)\) and all of its derivatives equal to each other.

To do this, let’s derive the function until we’re out of variables. We’re going to apply the power rule over and over. Remember: Multiply the coefficient by the exponent, then decrease the exponent by 1: \[r(x) = a_0 + a_1x + a_2x^2 + a_3x^3 + a_4x^4 + a_5x^5 + a_6x^6 + a_7x^7 + a_8x^8 \\ r\prime(x) = a_1 + 2a_2x + 3a_3x^2 + 4a_4x^3 + 5a_5x^4 + 6a_6x^5 + 7a_7x^6 + 8a_8x^7 \\ r\prime\prime(x) = 2a_2 + 6a_3x + 12a_4x^2 + 20a_5x^3 + 30a_6x^4 + 42a_7x^5 + 56a_8x^6 \\  r\prime\prime\prime(x) = 6a_3 + 24a_4x + 60a_5x^2 + 120a_6x^3 + 210a_7x^4 + 336a_8x^5 \\ r\prime\prime\prime\prime(x) = 24a_4 + 120a_5x + 360a_6x^2 + 840a_7x^3 + 1680a_8x^4 \\ r\prime\prime\prime\prime\prime(x) = 120a_5 + 720a_6x + 2520a_7x^2 + 6720a_8x^3 \\ r\prime\prime\prime\prime\prime\prime(x) = 720a_6 + 5040a_7x + 20160a_8x^2 \\ r\prime\prime\prime\prime\prime\prime\prime(x) = 5040a_7 + 40320a_8x \\ r\prime\prime\prime\prime\prime\prime\prime\prime(x) = 40320a_8 \]

We need to figure out what values of \(a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7, a_8\) will make all these functions equal to each other. This looks like a daunting task, but it becomes much easier when we realize that this function has to have a value at \(r(0)\). What is this value? Look at the first line: If \(x = 0\), then all those terms except the first one will be equal to zero. This means \(r(0) = a_0\). By the second line, it means that \(r\prime(0) = a_1\). But since \(r(0) = r\prime(0)\), so \(a_0 = a_1\). By the same reasoning, we can conclude: \[a_0 = a_1 = 2a_2 = 6a_3 = 24a_4 = 120a_5 = 720a_6 = 5040a_7 =  40320a_8\]

Whatever \(a_0\) is equal to, all the rest of the coefficients can be figured out. You may notice that these coefficients are factorials: 1, 1, 2, 6, 24, 120, 720, 5040, 40320.

This means any function of the form \(r(x) = a_0 + a_0x + a_0x^2/2! + a_0x^3/3! + a_0x^4/4! + a_0x^5/5! + … \), repeated forever, will have itself as a derivative. The parent function of this has \(a_0 = 1\), so: \[r(x) = 1 + x + x^2/2! + x^3/3! + x^4/4! + x^5/5! + …, \] repeated forever, will have itself as a derivative.

What is the value of \(r(1)\)? Looking at the first nine terms gives us 1 + 1 + 1/2 + 1/6 + 1/24 + 1/120 + 1/720 + 1/5040 + 1/40320 = 109601/40320, which is about 2.71827876984.

This is pretty close to Euler’s number, e, which is a little more than 2.718281828459. In fact, if we keep adding terms to our polynomial, we get very close to this number very quickly.

It turns out that \(r(x) = e^x\) when \(a_0 = 1\).

By the way, when we use an infinitely long, repeating polynomial to try to capture the behavior of a non-polynomial function, this is called a Taylor series. Let’s look at another one.

Taylor Series: Part Two

First, a bit of triangle talk. Take two lines, A and B, that intersect at some point o. Put a point (call it a) on one of these lines, A. Then draw a line from point a to point b on line B, so you form a right angle with B. You’ve just made a right triangle.

Call the distance from o to a the “hypotenuse”. Call the distance from a to b the “height”. Call the distance from o to b the “width”. These names are based on the object being a right triangle, but they’re only needed here so we have a common language. Don’t get caught up on the significance of the vocabulary.

As we move point a along line A, the size of the triangle will change, but the ratios between the three sides will stay the same. We’ll call the ratio between the height and the hypotenuse the “sine” of the angle between the two lines. We’ll call the ratio between the width and the hypotenuse the “cosine” of the angle. Writing these as functions (the names are always abbreviated) gives us \(\sin x\) and \(\cos x\).

What is the derivative of the sine? Remember, the derivative is how much the output changes when the input changes. In this case, the input is the size of the angle. Imagine that the angle gets bigger but point a stays in the same place. What happens? The height gets longer and the width gets shorter. Draw a line between where a was and where a is now, and create a new right triangle. The less we change the angle, the closer this triangle’s angles are to the original large right triangle, but it’s flipped. The proportions of the height and the hypotenuse of this tiny triangle match those of the width and the hypotenuse we had before, and vice versa.

That is to say: The derivative of the sine is the cosine, and the derivative of the cosine is the negation of the sine: \[\sin\prime x = \cos x \\ \cos\prime x = – \sin x\]

What is the sine when the angle is zero? The triangle has no height, so \(\sin 0 = 0\). What is the cosine? The width of the triangle is the same as the hypotenuse, so \(\cos 0 = 1\).

Now we have what we need to build the Taylor series for sine and cosine: We know what each derivative is, and we know what the value of each derivative at zero is. Here’s a table to summarize:

Function \(f\prime(x)\) \(f\prime\prime(x)\) \(f\prime\prime\prime(x)\) \(f\prime\prime\prime\prime(x)\)
\(\sin x\) \(\cos x\) \(-\sin x\) \(-\cos x\) \(\sin x\)
\(\cos x\) \(-\sin x\) \(-\cos x\) \(\sin x\) \(\cos x\)

Earlier we developed this: \(r(0) = a_0; r\prime(0) = a_1; r\prime\prime(0) = a_2/2; r\prime\prime\prime(0) = a_3/3!; r\prime\prime\prime\prime(0) = a_4/4!\)

When our angle is 0, our table gives us:

Function \(f(0)\) \(f\prime(0)\) \(f\prime\prime(0)\) \(f\prime\prime\prime(0)\) \(f\prime\prime\prime\prime(0)\) \(f\prime\prime\prime\prime\prime(0)\) \(f\prime\prime\prime\prime\prime\prime(0)\) \(f\prime\prime\prime\prime\prime\prime\prime(0)\) \(f\prime\prime\prime\prime\prime\prime\prime\prime(0)\)
\(\sin x\) 0 1 0 -1 0 1 0 -1 0
\(\cos x\) 1 0 -1 0 1 0 -1 0 1
Sum 1 1 -1 -1 1 1 -1 -1 1

This gives us a way of calculating the value of the sine and cosine for any angle we like: \[\sin x = x – x^3/3! + x^5/5! – x^7/7! … \\ \cos \; x = 1 – x^2/2 + x^4/4! – x^6/6! + x^8/8! …\]

Look at the third line, too: It’s very close to the Taylor series for \(e^x\). The problem is those negative signs: Pairs of them! If we could get just adjust for those!

Complex Numbers

Mathematicians realized there were times when we needed to use the square root of negative numbers, even though there’s no real number which, when multiplied by itself, is negative. To get around this, mathematicians introduced the concept of i, the imaginary number, which is equal to the square root of negative one. That is: \(i = \sqrt{-1}\).

This constant has a useful property. Let’s look at its value at certain powers:

Value \(x^0\) \(x^1\) \(x^2\) \(x^3\) \(x^4\) \(x^5\) \(x^6\) \(x^7\) \(x^8\)
\(i\) 1 \(i\) -1 -\(i\) 1 \(i\) -1 -\(i\) 1

Compare this to the previous table: The signs are identical to those of the last row (“Sum”), and the imaginary values line up with the terms for the sine. In other words, if we created the same table but multiplied the sine values by i, we’d get:

Function \(f(0)\) \(f\prime(0)\) \(f\prime\prime(0)\) \(f\prime\prime\prime(0)\) \(f\prime\prime\prime\prime(0)\) \(f\prime\prime\prime\prime\prime(0)\) \(f\prime\prime\prime\prime\prime\prime(0)\) \(f\prime\prime\prime\prime\prime\prime\prime(0)\) \(f\prime\prime\prime\prime\prime\prime\prime\prime(0)\)
\(i \sin x\) 0 i 0 i 0 i 0 i 0
\(\cos x\) 1 0 -1 0 1 0 -1 0 1
Sum 1 i -1 i 1 i -1 i 1

Now go back to our pattern for \(e^x = 1 + x + x^2/2! + x^3/3! + x^4/4! + x^5/5! + …\). What happens when we replace \(x\) with \(xi\)?

We get \(e^x = 1 + ix – x^2/2! – ix^3/3! + x^4/4! + ix^5/5! – x^6/6! – ix^7/7! + x^8/8! …\).

Which is the sum from the table above.

That is to say, \[e^{ix} = \cos x – i\sin x\]

Wrapping It Up

Okay, so now we have that. It’s very close to Euler’s identity. We have one last step.

I glossed over a detail about sine and cosine. It’s clear that this is a function, and that \(\sin 0 = 0\), but what is the value of the input when \(\sin x = 1\)? We begin by teaching that circles have 360 degrees, but this doesn’t work. Because we’re relating distances, we need an angle measurement that also relates to measure. If you have a circle with a radius of one, its perimeter is \(2\pi\): This is a number of length, not angle.

Our Taylor series also requires a length, not an angle. In order to use our Taylor series to calculate sines and cosines, we need to feed it lengths. Instead of saying that a circle has 360 degrees, we say that it has \(2\pi\) radians. A quarter turn around the circle has \(\pi/2\) radians, and \(\sin \pi/2 = 1\).

Likewise, a half turn around the circle has \(\pi\) radians, and \(\sin \pi = 0\). Meanwhile, \(\cos \pi = -1\).

And there you have it: \(\cos \pi + i \sin \pi = 1 + 0 = -1\). Which gives us: \[e^{i\pi} = -1\]

Adding 1 to both sides gives us: \[e^{i\pi} – 1 = 0 \]

Apparently unrelated concepts (complex numbers, trigonometry ratios, and the natural base e) coalesce into a simple, beautiful equation.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.