Thursday, May 2, 2013

High-level rats

In my previous post, I talked about integers in high-level languages.  In this post, I'm going to talk about rational numbers.

Like integers, you're already familiar with rational numbers, although you may refer to them as fractions.  They are numbers like 3/2, ratios of integers.

You know properties of rational numbers, such as: closed under addition, subtraction and multiplication.  And closed under division, except for division by zero.  And so on.  And, of course, the result of dividing two integers is a rational number, again excepting division by zero.  Some rational numbers are also integers; but many are not.

As with integers, another property is that rational numbers can be exactly represented on a computer in such a way that basic math can be performed efficiently.

And, in fact, many languages, in the same library as "bigint", have a "bigrat" or "rational" or "fraction" type.  But, even in most languages in which "int" is actually an integer, the rational type is usually buried.  The only exception I've found is Smalltalk.  In Smalltalk, as you would expect, the result of dividing two integers is a rational number.

Sadly, in almost every other language I've looked at, the result of dividing two integers is an integer; and therefore, is usually wrong.  (Obviously, this only applies to languages with any integer types.)  In cases where the dividend (i.e., the numerator) is actually divisible by the divisor (the denominator), the result is correct.  In other cases (again, ignoring division by zero), the result is one of the two integers nearest the correct answer; which one depends on the language.

The options seem to be: truncate toward zero, truncate down, or round.   I've never seen any language deciding that 31/10 = 4, although technically that's no worse than 39/10 = 3. Truncate toward zero and truncate down only differ for negative numbers; these seem much more common than rounding.  And rounding exact halves also has options: round down, round up, round toward zero, round away from zero, or round toward the even number.

There are two problems with getting the wrong answer when you divide two integers. The first, of course, is that you have the wrong answer.  The second is that you've confused the programmer.

You may remember the shock you felt when "integer division" in a programming language was first explained to you; or worse, when you accidentally discovered it on your own.  You may have said to yourself, "This is not my high-level language.  How did this integer get here?"  Or perhaps you have had the unpleasant and guilty experience of explaining these shenanigans to someone else.

Here's a general rule: if you're explaining to someone something which they already know, it should be pretty easy; or there should be a good reason it isn't (for example, my next post will discuss real numbers which are actually tricky on a computer).  People know how to divide integers before they start programming; all that you should need to explain to them is that in most programming languages you need to use the forward slash character to divide.  That should be it; they should already know the rest.  If you're explaining that the rules have changed (for no good reason), there's a problem.

Aside from producing wrong answers, it also makes it hard to reason about programs.  In normal math, a/b = (1/b)*a; and (a/b)*(c/d) = (c/b)*(a/d) = (a*c)/(b*d) etc.  This is not true in most programming languages; only Smalltalk gets this right.

Fix this, and a class of bugs goes away, a set of optimizations becomes available, and a pool of confusion evaporates.

So kudos to Smalltalk, and shame on all of the so-called high-level languages that get this wrong.

But even Smalltalk doesn't get it quite right.  We all know that 3.1 is the same as 31/10; that is, it's a rational number.  Or we used to know that before we learned how to program.  Because that's not true in any programming language I can find.  Most programming languages treat 3.1 as an inexact number, a floating point number with limited precision.  And usually that number is stored in binary (because of the bit-orientation of numeric types even in supposedly high-level languages).  And binary can't even exactly represent 1/10.

A few languages consider 3.1 to be a decimal floating point number; there are still issues with the limited precision once you start doing math, but at least 3.1 is equal to 31/10.

In languages where 3.1 is a binary floating point number, simple literal expression can generate incorrect results.  For instance, if I type "0.1 + 0.2 - 0.3" into my gnu-smalltalk, it tells me "5.551115123125783d-17", which isn't the correct answer.  But it gets "(1/10) + (2/10) - (3/10)" correct.  It prints "0", which also happens to be the number of languages I can find that get support for integers and rationals right, including literals.

In summary, high-level languages should have easy support for rational numbers, because they are easy to work with; and in particular, the quotient of two integers should be a rational number; and numeric literals with a decimal place should be rational numbers.  To me this all seems fairly basic, and pretty obvious, but it seems to have slipped past quite a few language designers.  Which doesn't seem rational.

Oh, and let's call this type "rat" so it's as easy to type as "int" and more fun to say.  "The function chooses the smallest of a set of rats."

In the next post, I'll talk about real numbers.  Really.  Sorry.

3 comments:

  1. > Which doesn't seem rational.

    Most language focus on CPU performance so it isn't that surprising..
    My take on the issue is that I like interval arithmetic (a la Frink) instead of the classic floating point computation so I wonder if rational are really needed when you have interval arithmetic..

    ReplyDelete
  2. Interval arithmetic is very good, although you still have to worry about representation of the endpoints.

    I don't know that performance would be that different on rational numbers than floating point for most usages.

    ReplyDelete
  3. In racket

    (/ 2 3) is equal to 2/3

    but this

    (= 3.1 (/ 31 10)) is false, yet this
    (= 3.1 (/ 31 10.0)) is true

    so put it in the Smalltalk bucket.

    ReplyDelete