Mathematical terminology and notation through a linguistic lens
Introduction
The first time I attended graduate school was for Linguistics. My first year, I taught English as a Second Language. My most resistant students were Mathematics majors, because many of them held the opinion that mathematics is a universal language. Why bother getting fluent in English?
More recently, the idea that mathematics has cultural bias has become a lightning rod for conservative pundits, both in and out of mathematics, to mock social justice advocates.
We are taught, after all, that mathematics is the holder of objective truths. \(2 + 2 = 4\). This is an unmalleable truth. It is true no matter what language you speak, no matter what land you call home.
“Mathematics is the language of the universe.” It is a message repeated to students over and over.
Except… it’s more complicated than that.
The point of this piece is not to debate the objective truth of underlying mathematical facts. Whether mathematics exists independent of the human ability to perceive it is a valid and complex discussion, but that’s not the focus of this piece.
However, the belief that mathematical facts are objective has bled over into another claim, also widely held, also widely taught to students. This one is far more easily debunked: The idea that mathematical notation and terminology is logical, consistent, and culturally independent.
Mathematical facts may be logical, consistent, and culturally independent; that verdict is still out. The methods we use to communicate those facts to each other, though, are not.
Terminology
The thing that we explore in mathematics courses is not a single language. It is, by the broadest brush, at least three languages: Mathematical facts themselves, mathematical notation, and mathematical terminology.
Terminology refers to the way in which we discuss mathematics using natural language. Every technical field has its own jargon, and mathematics is no different. Indeed, I feel like mathematics is particularly notorious for taking common words and applying rigorous definitions to them.
“Line”, we tell our geometry students, does not refer to just anything we might write with a single stroke. No, it refers to the set of points that satisfies the equation \(Ax + By = C\), for some set constant values of A, B, and C. We praise mathematical language for its precision and rigor.
Except… not really. Even if we teachers somehow manage to maintain the fortress around “line” (instead of casually referring to curves as lines), there are other words we can’t agree on.
An equilateral triangle is isosceles, but is a parallelogram a trapezoid?
That question has been the cause of many an argument among US mathematicians, but I’ve seen it leave folks in the UK scratching their heads: In the UK, a parallelogram is definitely not a trapezoid, but that’s because “trapezoid” and “trapezium” are reversed there.
So here are the common US definitions of the terms:
· Trapezoid: A quadrilateral with one pair of parallel sides
· Trapezium: A quadrilateral with no pairs of parallel sides.
In UK English, the definitions are the other way around. For the duration of this passage, please use the US definitions.
Let’s take another look at that: A quadrilateral with one pair of parallel sides.
Does that mean exactly one pair, or at least one pair?
Both are meaningful mathematical objects: A quadrilateral with exactly one pair of parallel lines is the result of removing (truncating) a triangle from a larger, similar triangle. A quadrilateral with at least one pair of parallel lines has certain attributes which are then inherited by parallelograms.
I’m personally of the “at least one” opinion, but that’s not the point. The point is: Here we have a common term that does not have a definition that’s agreed upon by all English speakers, let alone globally.
And this is hardly the only case, even within geometry. For instance, rectangles are often taught in elementary school as having a “short” side and a “long” side. When we speak of areas, we might say “length times height” or “base times width” or other similar pairs of terms.
And speaking of “times,” each basic operator has multiple words, often unrelated. When we multiply two numbers, we might say “What is two times four?” or “What is the product of two and four?” Times, product, and multiply are all jargon, they are conceptually related, and they are not linguistically similar at all.
This is not logical or consistent.
That may have felt like a tangent from the topic of quadrilaterals, which brings me to yet another example.
In Geometry class, we teach that the tangent of an angle is the ratio of the corresponding legs in a right triangle containing that angle. But we also teach that a tangent line is a line that touches a circle exactly once.
In Calculus, this apparently inconsistent double-duty use of a term is (hopefully) cleared up, but most students don’t go on to Calculus, so they’re left with two superficially unrelated uses of a single term.
In contrast, those students who do go on to Calculus learn that the tangent line represents the slope of the graph of a function at any given point… so why did we learn the word “slope” in Algebra I?
Terminological inconsistencies abound in mathematics.
Where a graph of a function intercepts the x-axis might be called a zero, a root, a solution, or an x-intercept, depending on the source and the purpose.
Whether zero is a counting number, a whole number, or a natural number depends on the source.
There is an important distinction between \(\frac{0}{0}\) and \(\frac{1}{0}\) that we generally ignore, treating both as “undefined” and leaving it at that.
Why aren’t triangles and quadrilaterals typically called trigons and tetragons, to make them consistent with the rest of the polygons?
When students first learn the number line, it starts at zero and includes the positive numbers: This is a ray, not a line.
The standard form of a linear equation is \(Ax + By = C\). The standard form of a polynomial is \(y = ax^n + bx^{n-1} + \cdots\), meaning that \(y = ax + b\) is the standard form of a first degree polynomial but the slope-intercept form of a linear equation.
Mathematical jargon is part of natural language, so it will share many characteristics of natural language. That’s fine. What’s misleading is the idea that the terms we use all have rigorous, firmly set definitions. Many do. Many do not.
This is not a trivial quibble, it is an often-lost opportunity. It doesn’t really matter whether a parallelogram is a trapezoid in all discussions; what matters is that we clearly define our terms so that, in the context of our local conversation, we agree about those terms.
Indeed, it is the very wibbly-wobbliness of natural language, including technical jargon, that justifies the use of mathematical notation.
Notation
Ah… notation! Language independent! I can write something in mathematical notation and someone who doesn’t understand a word of English can understand what I’ve written!
Well… sort of.
Mathematical jargon is the dialect of a language (English, in my case) that allows us to communicate about mathematics. Mathematical notation is the putatively language-independent symbolic system that allows us to communicate about mathematics.
The truth is, we could have a culturally-independent, internally-logically-consistent symbolic notation. There have been a handful of attempts to create this over the last few centuries. These have been about as successful as Benjamin Franklin’s attempt to regularize English spelling (“A Scheme for a new Alphabet…”).
Instead, our current system of notation is a hodgepodge of conveniences. Some have even arisen from academic feuds. A century ago, Florian Cajori wrote a two-volume overview called “A History of Mathematical Notations” that documents in 800 dense pages the major evolution of notation to date.
Some but not all of the notation decisions have been constrained by the practical limitations of moveable type. Between the broad use of Gutenberg’s introduction of moveable type to Europe in the 15th Century and the ready availability of the computer in recent decades, the constraints of moveable type created a bias towards reusing existing symbols rather than inventing new ones.
Even so, plenty of symbols specific to mathematics have developed.
Regardless, though, our use of symbols is often illogical and inconsistent, and varies by culture.
What does 4,500 represent? In much of the world, it represents four thousand five hundred. In much of the world, it represents four whole units and five hundred thousandths, what we might write as 4.500 or 4½.
Speaking of mixed numbers: When we juxtapose two mathematical objects, such as \(5x\) or \(4(3 + 1)\), the missing operator is multiplication. But \(4½\) doesn’t represent \(4 \cdot ½ = 2\), it represents \(4 + ½ = 4.5\). This is inconsistent and, it turns out, not globally true: Parts of Europe don’t write mixed numbers this way for this reason. Note also that \(-4½\) equals \(–(4 + ½) = -4.5\), not \(-4 + 1/2 = -3.5\).
Of the major operators, our symbol for addition is the most well-behaved and the most globally consistent, which may be one reason why the conservative gatekeepers like to go to \(2 + 2 = 4\) as a universal mathematical truth.
Cajori writes: “The modern algebraic signs + and — came into use in Germany during the last twenty years of the fifteenth century” (sec. 201).
In general, the bulk of our modern mathematical notation was created in the last half-millennium. Even over such a comparatively short window of time, though, the standard way in which a symbol was accepted and standardized followed the path of natural language standardization: Someone started using it, other people decided they liked it, and eventually it took root.
I feel like mathematics teachers often act as if our symbols were the result of logical, informed debate, in which experts in the field conferred and a formal vote was taken. I don’t know if that has ever happened, but that’s not the usual path.
Earlier I mentioned the standard form of a first degree polynomial. I am teaching Algebra II this year from Pearson’s Algebra II Common Core text (2015); this is a typical US textbook.
p. 76: “The slope-intercept form of an equation of a line is \(y = mx + b\).”
p. 281: “A polynomial function \(P(x)\) in standard form is \(P(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0\).”
p. 498: “As an equation, direct variation has the form \(y = kx\).”
I am using this book for illustration only because it is the one I am using: This pattern exists throughout multiple texts.
These are three passages from the same book which show the slope of a linear equation as either \(m\), \(a_1\), or \(k\), depending on whether we’re calling it a linear equation, a polynomial, or direct variation.
Meanwhile, I’ve been told that it’s common in the UK to write \(y = mx + c\) rather than \(y = mx + b\). For regressions, the standard US calculator (the TI-84) uses \(y = ax + b\).
This is clearly not consistent, for an object (a linear function) that students probably spend more time in high school than any other.
One of the most consistent arguments on the internet comes from a misunderstanding of the order of operations, which in turn comes from our inconsistent and confusing use of operators.
In basic arithmetic, we have three tiers of operators. I will call these accumulation, bunching, and conglomerating.
Accumulating involves adding and subtracting amounts. We accumulate from left to right: \(4–2 + 3 = 2 + 3 = 5\) is standard convention; \(4–2 + 3 = 4–5 = -1\) is not.
Bunching involves creating globs by mashing things together or tearing them apart. We bunch from left to right as well. And if we have to both bunch and accumulate, we bunch before we accumulate unless told otherwise.
These rules about bunching before accumulating and going left to right are standard convention. They are part of the language of mathematical notation, not about any underlying objective mathematical truths. If mathematics itself is the language of the universe, mathematical notation is putatively the language we use to talk globally to each other.
That brings me to conglomerating. Conglomeration is a more difficult concept; it involves bunching globs together into megaglobs, or tearing them apart likewise. It feels like there ought to be more levels to do this, that is, of bunching megaglobs into megamegaglobs, and so on, and there is… but that gets really heavy quickly (both literally and figuratively), so we tend to stick only to these three tiers.
So if we accumulate left to right and bunch left to right, how do we conglomerate? In other words, what is \(2^{1^6}\) equal to?
It turns out we conglomerate from right to left. Since we use superscripts as our standard notation instead of the carat notation, and since we rarely conglomerate our conglomerations in high school (because the numbers get huge quickly), this inconsistency may well slip past most students’ radar. But it’s there.
Let’s talk about the notation for each tier now.
The accumulation tier has two operators. These are very consistent, although the second one isn’t technically needed. \(4 + 3\), \(5–2\). Not a lot of variation there. \(5–2 = 5 + (-2)\), and some elementary textbooks as well as some calculators use separate symbols for subtraction and negation, but the negation sign is always a variation on the minus sign. (Note: Accountants will sometimes use just the parentheses to indicate negative values.)
The bunching tier has two operators, but they’re inconsistent. The symbols for multiplication are famously inconsistent.
It can be represented by the times symbol ×. When typeset, we can make a clear difference between that and the variable. I have many students from Bangladesh, and some of them write the multiplication-sign-× and the variable-x with distinctly different strokes. But overall, we tend to abandon this symbol once variables are introduced.
Multiplication can also be represented by the central dot ·, but this can be confused with the decimal point. In the computer age, the asterisk has become common due to ease of typing. When variables are involved, we tend to just leave off the symbol entirely.
We can also place one or both of the multiplicands in parentheses. This causes problems when we introduce function notation, because \(f(x)\) could either be the function \(f\) applied to the variable \(x\) or the product of the variables \(f\) and \(x\).
Division is slightly better behaved, but still has three standard notations: The obelus (÷), which is usually abandoned in middle school; the vinculum, or fraction bar; and the slash (/), which is used to mimic the vinculum in single-line text.
The conglomeration tier has three basic operations, but these are infamously confusing for students.
Exponentiation involves a superscript. When superscripts are not practical, the carat is used instead.
Radical notation uses a special symbol to the left of the value (√), usually with an attached vinculum (more on this below). If we want to take any root other than square root, we have to put a little number in the radical sign (e.g., ∛). This can lead to errors in handwriting: \(x∛y\) is not the same as \(x^3√y\), but students often get them confused.
Logarithmic notation is… just weird, and a complete example in itself of the inconsistencies of our notation. It looks like a function, and there are indeed two versions that could rightly be called functions.
\(\ln(x)\) is the natural log of \(x\). Why \(\ln\) and not \(\text{nl}\)? Because ln has its origins in French, where most adjectives follow the noun. Because most calculators use sans serif fonts, though, it looks like “in” to far too many students.
\(\log(x)\) is either the common log or the natural log of x, depending on the source. The common log has an important feature that no other logarithm does (\(\log 4515 = 3 \log 4.515\), for instance), so in an age before calculators, when logs had to be calculated by hands using tables, it was special. But… we have calculators now, and from a mathematical perspective, the natural log is generally more important than the common log.
It’s bad enough when our notation is illogical and inconsistent. It’s even worse when parts of our notation are actively evolving.
And if you want the log of any other base, you need to mark it with a subscript. But, because subscripts are used elsewhere, we can’t just write, say, \(100_{10} = 2\) as the inverse of \(10^2 = 100\). That would be the consistent thing to do, but \(100_{10} = 100\).
A very common mistake: \(-1^2 = -1\). Because our order of operations says to evaluate conglomerations before accumulations, standard convention is to interpret this \(-1^2\) as \(-(1^2)\). However, because students have internalized that -1 is a numeric value in its own right, they want \(-1^2 = 1\).
Our convention reveals that we don’t consider -1 to truly refer to the numeric value one less than zero… meaning that we don’t have a consistent symbol for this fairly rudimentary value.
There are several acronyms for the order of operations, including PEMDAS, BODMAS, and GEMA. In each case, the first letter refers to grouping rules. So let’s look at those.
Historically, two ways to indicate that a set of symbols should be grouped to override the order of operations emerge: Parentheses and the vinculum, or overbar. These were not the only symbols ever used, but they’re the ones that have been most successful.
While parentheses have generally prevailed, we still see the vinculum in at least four places: In repeating decimals, attached to the radical sign, when doing long division, and in fraction notation.
Because of this, I have many students who confuse radicals with long division. The actual historical division sign behind long division and the actual historical radical sign are distinctly different: ) vs √. The overbar attached to the radical sign is a historical oversight, a case where an otherwise fading symbol has held on.
In some sources, meanwhile, repeating decimals are indeed indicated with parentheses, so that \(\frac{1}{6} = 0.1\overline{6}\). And long division in much of the world is done the other way around, with both the work and the quotient written under the dividend and the divisor.
In other words, I think it would a practical improvement on mathematical notation to get rid of the vinculum in all places except rational/fractional notation, and use parentheses as grouping symbols throughout.
These examples of inconsistencies have focused specifically on the basic operations, but there are other inconsistencies in K-12 mathematics.
One of the most infamous involves the use of -1 to indicate both an inverse function in general and a reciprocal specifically. A frustration here is that the notation could be consistent… but it’s not. It is common to want the inverse sine, so \(\sin^{-1} x\) is notation for that. Historically, high school students typically learned about all six basic trigonometric functions; \(\csc x\) basically rendered \(\frac{1}{\sin x}\) something peripheral. However, with calculators, cotangent, secant, and cosecant are drifting away (the TI-84 doesn’t even have them in the function library), so \((\sin x)^{-1}\) is more likely to occur.
Meanwhile, it’s not very common to need the sine of the sine of x. Standard notation for applying a function twice is \(f^2(x)\), but since \(\sin(\sin x)\) isn’t very common, \(\sin^2(x)\) means \((\sin x)^2\) instead.
This is a mess. It’s an easily fixable mess — clearly declare that \(\sin^2(x) = \sin(\sin x)\). It’s a well-known mess. And it’s a mess we have simply failed to fix.
There are copious examples of such inconsistencies, and most of my examples have been restricted to within US mathematics. For that matter, the astute reader may have noticed that I haven’t shortened the word “mathematics” anywhere in this piece. That’s because we can’t even agree, in English, on the “proper” shortening: “Math” in the US, “maths” in the UK, with ardent arguments on both sides.
I’ll wrap up with an example from Calculus. The origin of modern Calculus is normally attributed to two thinkers, working independently but with an overlap of circles: Isaac Newton, in England, and Gottfried Leibniz, in Germany.
Newton and Leibniz offered different notation for the derivative. Modern calculus has a variety of ways of indicating derivatives, but three are most common. One of these is Leibniz’s, but the others are not Newton’s (Newton’s is still used in physics, but is not common in mathematics).
Leibniz used \(\frac{dy}{dx}\) to indicate the change of \(y\) with respect to \(x\). Newton used \(\dot{y}\). The prime notation of \(f’(x)\) was introduced by Leonhard Euler and popularized by Joseph-Louis Lagrange. This has since been adapted to \(y’\).
On the one hand, there is functionality to having two notations, one that explicitly marks the independent variable and one that leaves it implied. The shorter notation allows for easier algebraic manipulation, even if it does blur the identity of the underlying variable. The longer notation reinforces that variable, and can avoid errors with implicit differentiation.
On the other hand, though, our notation concerning functions is confusing as it is; this is another level of frustration.
Conclusion
Mathematics is not a language: It is three languages.
When we speak of the objective, culture-free truths of mathematics (if such truly exist), we’re speaking of the way in which nature communicates to itself. When we communicate, we cannot communicate with each other directly in that language.
For much of the human experience, our communication about mathematics was in a jargonized version of natural language. Communicating in language reveals a cultural bias. The book that gives us the words “algebra” (from the title) and “algorithm” (from the author) was published in the Ninth Century, but its contents were opaque in Western Europe until it was translated in the Twelfth Century. This is a cultural bias.
It was common during large periods of academic thought to publish in Latin. The effect was to make materials more accessible to European academics regardless of their native language, but it also created access barriers and reflected a cultural bias.
The proof that the sum of the squares of the legs of a right triangle equals the square of the hypotenuse pre-dates the Pythagorean school by centuries or longer, but the theorem is widely associated with a philosopher who may never have personally known it. This is cultural bias.
Language-specific jargon creates barriers for people who don’t speak the language on two levels: The basic level of the language itself, plus the advanced level of the jargon. And when that jargon does not have the consistency of definitions that is claimed, it becomes that much more difficult.
Most of our current system of mathematical notation is less than half a millennium old. Cajori dates the modern equality sign (=) to as recently as 1557 (sec. 261)!
And the system is a hodgepodge of conveniences, acknowledged as such in the mutterings of mathematicians who are resistant to cleaning up the system because, once you’ve learned a language, its messiness becomes an occasional nuisance instead of an impenetrable fortress.
While I would be in favor of an overhaul of our notation, my point in this piece is only to offer a moment to reflect on our notational and terminological inconsistencies. Our choices for which functions get operators and which do not, our continuous mistreatment of logarithms, and suchlike create a hierarchy of mathematical ideas. The ideas themselves may not all be cultural, but the hierarchy is.
Mathematics itself is arguably universal, logical, and consistent, but our way of talking about it is not.