Numbers, Characters, and Strings

Numbers, Characters, and Strings

10. Numbers, Characters, and Strings

While functions, variables, macros, and 25 special operators provide the basic building blocks of the language itself, the building blocks of your programs will be the data structures you use. As Fred Brooks observed in The Mythical Man-Month. Representation is the essence of programming. 1

Common Lisp provides built-in support for most of the data types typically found in modern languages: numbers (integer, floating point, and complex), characters, strings, arrays (including multidimensional arrays), lists, hash tables, input and output streams, and an abstraction for portably representing filenames. Functions are also a first-class data type in Lisp—they can be stored in variables, passed as arguments, returned as return values, and created at runtime.

And these built-in types are just the beginning. They’re defined in the language standard so programmers can count on them being available and because they tend to be easier to implement efficiently when tightly integrated with the rest of the implementation. But, as you’ll see in later chapters, Common Lisp also provides several ways for you to define new data types, define operations on them, and integrate them with the built-in data types.

For now, however, you can start with the built-in data types. Because Lisp is a high-level language, the details of exactly how different data types are implemented are largely hidden. From your point of view as a user of the language, the built-in data types are defined by the functions that operate on them. So to learn a data type, you just have to learn about the functions you can use with it. Additionally, most of the built-in data types have a special syntax that the Lisp reader understands and that the Lisp printer uses. That’s why, for instance, you can write strings as foo ; numbers as 123. 1/23. and 1.23 ; and lists as (a b c). I’ll describe the syntax for different kinds of objects when I describe the functions for manipulating them.

In this chapter, I’ll cover the built-in scalar data types: numbers, characters, and strings. Technically, strings aren’t true scalars—a string is a sequence of characters, and you can access individual characters and manipulate strings with a function that operates on sequences. But I’ll discuss strings here because most of the string-specific functions manipulate them as single values and also because of the close relation between several of the string functions and their character counterparts.

Numbers

Math, as Barbie says, is hard. 2 Common Lisp can’t make the math part any easier, but it does tend to get in the way a lot less than other programming languages. That’s not surprising given its mathematical heritage. Lisp was originally designed by a mathematician as a tool for studying mathematical functions. And one of the main projects of the MAC project at MIT was the Macsyma symbolic algebra system, written in Maclisp, one of Common Lisp’s immediate predecessors. Additionally, Lisp has been used as a teaching language at places such as MIT where even the computer science professors cringe at the thought of telling their students that 10/4 = 2. leading to Lisp’s support for exact ratios. And at various times Lisp has been called upon to compete with FORTRAN in the high-performance numeric computing arena.

One of the reasons Lisp is a nice language for math is its numbers behave more like true mathematical numbers than the approximations of numbers that are easy to implement in finite computer hardware. For instance, integers in Common Lisp can be almost arbitrarily large rather than being limited by the size of a machine word. 3 And dividing two integers results in an exact ratio, not a truncated value. And since ratios are represented as pairs of arbitrarily sized integers, ratios can represent arbitrarily precise fractions. 4

On the other hand, for high-performance numeric programming, you may be willing to trade the exactitude of rationals for the speed offered by using the hardware’s underlying floating-point operations. So, Common Lisp also offers several types of floating-point numbers, which are mapped by the implementation to the appropriate hardware-supported floating-point representations. 5 Floats are also used to represent the results of a computation whose true mathematical value would be an irrational number.

Finally, Common Lisp supports complex numbers—the numbers that result from doing things such as taking square roots and logarithms of negative numbers. The Common Lisp standard even goes so far as to specify the principal values and branch cuts for irrational and transcendental functions on the complex domain.

Numeric Literals

You can write numeric literals in a variety of ways; you saw a few examples in Chapter 4. However, it’s important to keep in mind the division of labor between the Lisp reader and the Lisp evaluator—the reader is responsible for translating text into Lisp objects, and the Lisp evaluator then deals only with those objects. For a given number of a given type, there can be many different textual representations, all of which will be translated to the same object representation by the Lisp reader. For instance, you can write the integer 10 as 10. 20/2. #xA. or any of a number of other ways, but the reader will translate all these to the same object. When numbers are printed back out—say, at the REPL—they’re printed in a canonical textual syntax that may be different from the syntax used to enter the number. For example:

The syntax for integer values is an optional sign ( + or — ) followed by one or more digits. Ratios are written as an optional sign and a sequence of digits, representing the numerator, a slash ( / ), and another sequence of digits representing the denominator. All rational numbers are canonicalized as they’re read—that’s why 10 and 20/2 are both read as the same number, as are 3/4 and 6/8. Rationals are printed in reduced form—integer values are printed in integer syntax and ratios with the numerator and denominator reduced to lowest terms.

It’s also possible to write rationals in bases other than 10. If preceded by #B or #b. a rational literal is read as a binary number with 0 and 1 as the only legal digits. An #O or #o indicates an octal number (legal digits 0 — 7 ), and #X or #x indicates hexadecimal (legal digits 0 — F or 0 — f ). You can write rationals in other bases from 2 to 36 with #nR where n is the base (always written in decimal). Additional digits beyond 9 are taken from the letters A — Z or a — z. Note that these radix indicators apply to the whole rational—it’s not possible to write a ratio with the numerator in one base and denominator in another. Also, you can write integer values, but not ratios, as decimal digits terminated with a decimal point. 6 Some examples of rationals, with their canonical, decimal representation are as follows:

You can also write floating-point numbers in a variety of ways. Unlike rational numbers, the syntax used to notate a floating-point number can affect the actual type of number read. Common Lisp defines four subtypes of floating-point number: short, single, double, and long. Each subtype can use a different number of bits in its representation, which means each subtype can represent values spanning a different range and with different precision. More bits gives a wider range and more precision. 7

The basic format for floating-point numbers is an optional sign followed by a nonempty sequence of decimal digits possibly with an embedded decimal point. This sequence can be followed by an exponent marker for computerized scientific notation. 8 The exponent marker consists of a single letter followed by an optional sign and a sequence of digits, which are interpreted as the power of ten by which the number before the exponent marker should be multiplied. The letter does double duty: it marks the beginning of the exponent and indicates what floating- point representation should be used for the number. The exponent markers s. f. d. l (and their uppercase equivalents) indicate short, single, double, and long floats, respectively. The letter e indicates that the default representation (initially single-float) should be used.

Numbers with no exponent marker are read in the default representation and must contain a decimal point followed by at least one digit to distinguish them from integers. The digits in a floating-point number are always treated as base 10 digits—the #B. #X. #O. and #R syntaxes work only with rationals. The following are some example floating-point numbers along with their canonical representation:

Finally, complex numbers are written in their own syntax, namely, #C or #c followed by a list of two real numbers representing the real and imaginary part of the complex number. There are actually five kinds of complex numbers because the real and imaginary parts must either both be rational or both be the same kind of floating-point number.

But you can write them however you want—if a complex is written with one rational and one floating-point part, the rational is converted to a float of the appropriate representation. Similarly, if the real and imaginary parts are both floats of different representations, the one in the smaller representation will be upgraded .

However, no complex numbers have a rational real component and a zero imaginary part—since such values are, mathematically speaking, rational, they’re represented by the appropriate rational value. The same mathematical argument could be made for complex numbers with floating-point components, but for those complex types a number with a zero imaginary part is always a different object than the floating-point number representing the real component. Here are some examples of numbers written the complex number syntax:

Basic Math

The basic arithmetic operations—addition, subtraction, multiplication, and division—are supported for all the different kinds of Lisp numbers with the functions +. -. *. and /. Calling any of these functions with more than two arguments is equivalent to calling the same function on the first two arguments and then calling it again on the resulting value and the rest of the arguments. For example, (+ 1 2 3) is equivalent to (+ (+ 1 2) 3). With only one argument, + and * return the value; - returns its negation and / its reciprocal. 9

If all the arguments are the same type of number (rational, floating point, or complex), the result will be the same type except in the case where the result of an operation on complex numbers with rational components yields a number with a zero imaginary part, in which case the result will be a rational. However, floating-point and complex numbers are contagious —if all the arguments are reals but one or more are floating-point numbers, the other arguments are converted to the nearest floating-point value in a largest floating-point representation of the actual floating-point arguments. Floating-point numbers in a smaller representation are also converted to the larger representation. Similarly, if any of the arguments are complex, any real arguments are converted to the complex equivalents.

Because / doesn’t truncate, Common Lisp provides four flavors of truncating and rounding for converting a real number (rational or floating point) to an integer: FLOOR truncates toward negative infinity, returning the largest integer less than or equal to the argument. CEILING truncates toward positive infinity, returning the smallest integer greater than or equal to the argument. TRUNCATE truncates toward zero, making it equivalent to FLOOR for positive arguments and to CEILING for negative arguments. And ROUND rounds to the nearest integer. If the argument is exactly halfway between two integers, it rounds to the nearest even integer.

Two related functions are MOD and REM. which return the modulus and remainder of a truncating division on real numbers. These two functions are related to the FLOOR and TRUNCATE functions as follows:

Thus, for positive quotients they’re equivalent, but for negative quotients they produce different results. 10

The functions 1+ and 1- provide a shorthand way to express adding and subtracting one from a number. Note that these are different from the macros INCF and DECF. 1+ and 1- are just functions that return a new value, but INCF and DECF modify a place. The following equivalences show the relation between INCF / DECF. 1+ / 1-. and + / - :

Numeric Comparisons

The function = is the numeric equality predicate. It compares numbers by mathematical value, ignoring differences in type. Thus, = will consider mathematically equivalent values of different types equivalent while the generic equality predicate EQL would consider them inequivalent because of the difference in type. (The generic equality predicate EQUALP. however, uses = to compare numbers.) If it’s called with more than two arguments, it returns true only if they all have the same value. Thus:

The /= function, conversely, returns true only if all its arguments are different values.

The functions . . =. and = order rationals and floating-point numbers (in other words, the real numbers.) Like = and /=. these functions can be called with more than two arguments, in which case each argument is compared to the argument to its right.

To pick out the smallest or largest of several numbers, you can use the function MIN or MAX. which takes any number of real number arguments and returns the minimum or maximum value.

Some other handy functions are ZEROP. MINUSP. and PLUSP. which test whether a single real number is equal to, less than, or greater than zero. Two other predicates, EVENP and ODDP. test whether a single integer argument is even or odd. The P suffix on the names of these functions is a standard naming convention for predicate functions, functions that test some condition and return a boolean.

Higher Math

The functions you’ve seen so far are the beginning of the built-in mathematical functions. Lisp also supports logarithms: LOG ; exponentiation: EXP and EXPT ; the basic trigonometric functions: SIN. COS. and TAN ; their inverses: ASIN. ACOS. and ATAN ; hyperbolic functions: SINH. COSH. and TANH ; and their inverses: ASINH. ACOSH. and ATANH. It also provides functions to get at the individual bits of an integer and to extract the parts of a ratio or a complex number. For a complete list, see any Common Lisp reference.

Characters

Common Lisp characters are a distinct type of object from numbers. That’s as it should be—characters are not numbers, and languages that treat them as if they are tend to run into problems when character encodings change, say, from 8-bit ASCII to 21-bit Unicode. 11 Because the Common Lisp standard didn’t mandate a particular representation for characters, today several Lisp implementations use Unicode as their native character encoding despite Unicode being only a gleam in a standards body’s eye at the time Common Lisp’s own standardization was being wrapped up.

The read syntax for characters objects is simple: # followed by the desired character. Thus, #x is the character x. Any character can be used after the #. including otherwise special characters such as . (. and whitespace. However, writing whitespace characters this way isn’t very (human) readable; an alternative syntax for certain characters is # followed by the character’s name. Exactly what names are supported depends on the character set and on the Lisp implementation, but all implementations support the names Space and Newline. Thus, you should write #Space instead of #. though the latter is technically legal. Other semistandard names (that implementations must use if the character set has the appropriate characters) are Tab. Page. Rubout. Linefeed. Return. and Backspace .

Character Comparisons

The main thing you can do with characters, other than putting them into strings (which I’ll get to later in this chapter), is to compare them with other characters. Since characters aren’t numbers, you can’t use the numeric comparison functions, such as and . Instead, two sets of functions provide character-specific analogs to the numeric comparators; one set is case-sensitive and the other case-insensitive.

The case-sensitive analog to the numeric = is the function CHAR=. Like =. CHAR= can take any number of arguments and returns true only if they’re all the same character. The case- insensitive version is CHAR-EQUAL .

The rest of the character comparators follow this same naming scheme: the case-sensitive comparators are named by prepending the analogous numeric comparator with CHAR ; the case-insensitive versions spell out the comparator name, separated from the CHAR with a hyphen. Note, however, that = and = are spelled out with the logical equivalents NOT-GREATERP and NOT-LESSP rather than the more verbose LESSP-OR-EQUALP and GREATERP-OR-EQUALP. Like their numeric counterparts, all these functions can take one or more arguments. Table 10-1 summarizes the relation between the numeric and character comparison functions.

Table 10-1. Character Comparison Functions


Leave a Reply