Table of contents:
Many moons ago – before I even owned a computer – I bought a copy of The C Workbook by Sathis Menon. I read through the book, teaching myself C, and practicing at school using Turbo C++, probably version 3.0. It’s an excellent book, better than others I had seen, and I was able to get through a lot of it – except pointers, for which I had to beg my friends on IRC for help. (Thanks phadthai, lithium, and bline in particular…)
I don’t quite remember why or how, but I stumbled upon the PC-SIG Library CD-ROM at my local library (back when you had to insert CD-ROMs into a ‘caddy’ – remember those?). In Disk #1337 (!), I found a copy of PCC, the Personal C Compiler.
I’m not actually sure whether it was the 12th or 13th edition, but I’m fairly sure it was one of those, as my copy of PCC was distributed in ZIP format, rather than ARC as was done in previous PC-SIG editions. (For the relatively sad tale of the compression wars, see episode 8, “Compression”, of The BBS Documentary [direct YouTube link].)
The late 1980s were a time of transition for the C programming language from the K&R C ‘standard’ (basically what was in the original 1978 edition of The C Programming Language by Kernighan & Ritchie) to the then-new ANSI standard (C89). PCC version 1.2b (in PC-SIG 8th Edition) has a date of 1988; 1.2c has a date of June 1989; and 1.2d has a date of January 1993. The PCC manual [2], p. 7, says (emphasis mine):
This manual describes the C Ware Personal C Compiler for the IBM-PC personal computer and the other MS-DOS based personal computers. It is based on the DeSmet C Development Package. If you are unfamiliar with the C language or UNIX, the book The C Programming Language (First edition – the Second edition contains features and enhancements of ANSI C not found in PCC) by Brian Kernighan and Dennis Ritchie is available.
So while using DeSmet C / PCC might seem anachronistic now, at the time of its release, it was current. But in later years, it was eclipsed in standards support by other compilers of the era, such as Turbo C, Microsoft C, and others.
Probably the most obvious difference between K&R C and ANSI C is function definitions. In modern C, a typical function definition might look like:
/* ANSI C */
int
main(int argc, char **argv)
{
/* ... */
return 0;
}
In K&R C – the standard that DeSmet C and PCC follow – the equivalent would be:
/* K&R C */
main(argc, argv)
int argc;
char **argv;
{
/* ... */
return 0;
}
(Note that the return type is optional, and was not specified here; this is because int is implied. More on that in a later section.)
If you want a definition that works for both standards, you can do something like:
/* K&R + ANSI C compatible */
int
#ifdef __STDC__
main(int argc, char **argv)
#else
main(argc, argv)
int argc;
char **argv;
#endif
{
/* ... */
return 0;
}
The modern C compilers I use define __STDC__, so it’s possible to detect that (or others, such as __GNUC__). Alternatively you could #define __PCC__ (or pass -n__PCC__ on the PCC command line), and detect that.
Formally, the definition of a function looks like ([1], pp. 67, 204–205):
return type, optional for int
name(argument list, if any)
argument declarations, if any
{
declarations and statements, if any
}
This one is perhaps a bit trickier to understand for programmers who know ANSI C well. ANSI C supports function prototypes, for example:
/* ANSI C */ long multiply(int, int);
This prototype shows the function return type, and the function argument type(s). You can name the arguments if you like, but it’s not required, for example:
/* ANSI C */ long multiply(int a, int b);
K&R C doesn’t have prototypes; it has function declarations. These two are equivalent:
/* K&R C */ long multiply(a, b); long multiply();
The arguments are not actually important, and so are optional here. What is important is the return type – unless it’s int, in which case it can also be omitted:
/* K&R C */ add();
(Yes, that’s really enough!)
I find §4.4 (External Variables) of [1], p. 72, very interesting (emphasis mine):
A C program consists of a set of external objects, which are either variables or functions. The adjective “external” is used primarily in contrast to “internal,” which describes the arguments and automatic variables defined inside functions. […] Functions themselves are always external, because C does not allow functions to be defined inside other functions.
And in the following section, §4.5 (Scope Rules), pp. 76–77, two relevant quotes (emphasis as in the original):
[…] if an external variable [or, function] is to be referred to before it is defined, or if it is defined in a different source file from the one where it is being used, then an extern declaration is mandatory.
It is important to distinguish between the declaration of an external variable [or, function] and its definition. A declaration announces the properties of a variable [or, function] (its type, size, etc.); a definition also causes storage to be allocated.
Early on in the book (§1.10, p. 30), the authors explain how to avoid having to make function declarations altogether (emphasis as in the original):
In certain circumstances, the extern declaration can be omitted: if the external definition of a variable [or, function] occurs in the source file before its use in a particular function, then there is no need for an extern declaration in the function.
By way of example, here is a typical K&R-style function which precedes main() in the source file:
/*
* ex01.c (K&R C)
* a good example of defining a function before using it
*/
long
multiply(a, b)
int a;
int b;
{
return a * b;
}
main()
{
printf("6 * 7 = %d\n", multiply(6, 7));
}
E:\NOTES>pcc ex01 PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 end of PCC 0029 code 000D data 1% utilization E:\NOTES>pccl ex01 -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 10% utilization E:\NOTES>ex01 6 * 7 = 42 E:\NOTES>
That’s good, and the output is as expected. But watch what happens if you put multiply() after main():
/*
* ex01a.c (K&R C)
* a bad example of not defining or declaring a function before using it
*/
main()
{
printf("6 * 7 = %ld\n", multiply(6, 7));
}
long
multiply(a, b)
int a;
int b;
{
return a * b;
}
E:\NOTES>pcc ex01a
PCC Compiler V1.2d Copyright by Mark DeSmet, 1993
11 multiply(a, b $$ )
warning:conflicting types
Number of Warnings = 1
end of PCC 0028 code 000D data 1% utilization
E:\NOTES>pccl ex01a -ld:
PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993
end of PCCL 10% utilization
E:\NOTES>ex01a
6 * 7 = -1310678
E:\NOTES>
Yikes! (And did you notice, this only elicited a warning, despite it having caused the wrong answer?) The problem is that main() had never seen multiply() before, and assumed that its return type was int (under the implied int rules).
Fortunately the fix is quite easy: just add a declaration of multiply()’s return type (long) before it is called:
/*
* ex01b.c (K&R C)
* a good example of declaring a function before using it
*/
long multiply();
main()
{
printf("6 * 7 = %ld\n", multiply(6, 7));
}
long
multiply(a, b)
int a;
int b;
{
return a * b;
}
E:\NOTES>pcc ex01b PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 end of PCC 0029 code 000D data 1% utilization E:\NOTES>pccl ex01b -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 10% utilization E:\NOTES>ex01b 6 * 7 = 42 E:\NOTES>
Aside from not having to declare the function arguments, this is a very similar pattern to what we do today with ANSI C function prototypes.
However, in [1], the external function declaration was conventionally done inside the calling function itself. There really is no practical difference here, but it does look strange if you’re not used to seeing it:
/*
* ex01c.c (K&R C)
* an example of declaring a function before using it,
* using the conventions of The C Programming Language (1978)
*/
main()
{
long multiply();
printf("6 * 7 = %ld\n", multiply(6, 7));
}
long
multiply(a, b)
int a;
int b;
{
return a * b;
}
E:\NOTES>pcc ex01c PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 end of PCC 0029 code 000D data 1% utilization E:\NOTES>pccl ex01c -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 10% utilization E:\NOTES>ex01c 6 * 7 = 42 E:\NOTES>
It occurs to me that in all of the preceding examples, I have been implicitly declaring printf() to have a return type of int, as it was the first time the function was called in this source file. (Since there are no function prototypes in K&R C, there’s little point in explicitly declaring functions that do return an int, such as printf().)
There are many cases in K&R C where an int was the implied data type, unless a different one was explicitly specified. Here are just a few:
f() { } /* f()'s return type is int */f(a, b) { } /* a and b are ints */f() { auto x; } /* x is an int */
As a type-modifier:f() { unsigned u; } /* u is an unsigned int */
After a storage-class is also an implied int:f() { static q; register r; } /* q and r are ints */
You can even declare an extern without any other decoration:y; /* y is an int */
main() { y = 2; }f(t) long t; { /* ... */ }
main() { f(1); } /* 1 is an int (bad!) */
Thus you have to take care when calling functions that you match the expected argument type. More on this in a later section, but briefly, to quote from [1], pp. 35, 164:
This is safer:Long constants are written in the style 123L. An ordinary integer constant that is too long to fit in an int is also taken to be a long.
0L […] could also be written as (long) 0.
f(t) long t; { /* ... */ }
main() { f(1L); } /* 1L is a long */C has sensible rules for mixing data types in arithmetic expressions, which I won’t repeat here. But the rules for type conversion when calling a function surprised me a bit. On p. 42 of [1], the authors write:
Since a function argument is an expression, type conversions also take place when arguments are passed to functions: in particular, char and short become int, and float becomes double. This is why we have declared function arguments to be int and double even when the function is called with char and float.
This applies also to the return types of functions. From pp. 69–70 (emphasis mine):
But what happens if a function must return some other type? […] To illustrate how to deal with this, let us write and use the function atof(s) […]
First, atof itself must declare the type of value it returns, since it is not int. Because float is converted to double in expressions, there is no point to saying that atof returns float; we might as well make use of the extra precision and thus we declare it to return double. […]
Second, and just as important, the calling routine must state that atof returns a non-int value. […]
Unless atof is explicitly declared in both places, C assumes that it returns an integer, and you’ll get nonsense answers. If atof itself and the call to it in main are typed inconsistently in the same source file, it will be detected by the compiler. But if (as is more likely) atof were compiled separately, the mismatch would not be detected, atof would return a double which main would treat as an int, and meaningless answers would result. […]
Notice the structure of the declarations and the return statement. The value of the expression in
return (expression)is always converted to the type of the function before the return is taken.
Specifying the return type at the function definition is not likely something you’ll forget. But it’s easy to miss declaring the function return type in the calling function (or, more conventionally nowadays, declaring it some time before the calling function). We will see in the section Incorrect function return type what happens if this declaration is missing (spoiler: it’s not a good thing).
(Answer: When it’s really an int!)
There are two pitfalls when dealing with longs in K&R C. These are:
Actually this could apply to any data type whose length differs from that of int, though in practice it seems longs are more susceptible because there are fewer implicit type conversions that apply.
As mentioned in a previous section, numeric constants default to int type, unless they are too large to fit into an int (which is machine and compiler-dependent), or unless you specify them explicitly as a long. Take the following example:
/*
* ex02.c (K&R C)
* a bad example showing how not to pass a numeric
* constant into a function that expects a long
*/
f(a)
long a;
{
printf("%ld = 0x%0*lx\n", a, (int)(sizeof(a) * 2), a);
}
main()
{
f(1);
}
What is the expected output? I had hoped for: 1 = 0x00000001. What really happens:
E:\NOTES>pcc ex02 PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 end of PCC 002C code 000F data 1% utilization E:\NOTES>pccl ex02 -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 10% utilization E:\NOTES>ex02 -1310719 = 0xFFEC0001 E:\NOTES>
While those numbers are indeed equal, it’s not what was passed into the function. So this caused a lot of consternation, until I looked at the assembly language output (passing the a option to get assembly output from PCC):
E:\NOTES>pcc ex02 a PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 End of PCC 0% Utilization E:\NOTES>
CSEG PUBLIC f_ PUBLIC printf_ DSEG __3 DB '%ld = 0x%0*lx',10,0 CSEG f_: PUSH BP MOV BP,SP PUSH WORD [BP+6] PUSH WORD [BP+4] MOV AX,8 PUSH AX PUSH WORD [BP+6] PUSH WORD [BP+4] MOV AX,OFFSET __3 PUSH AX CALL printf_ MOV SP,BP POP BP RET PUBLIC main_ main_: PUSH BP MOV BP,SP MOV AX,1 PUSH AX CALL f_ MOV SP,BP POP BP RET END
To make a long story short, main() is not passing enough data to f(), and therefore f() is going to print a word of data (0001), and a word of whatever junk happens to be on the stack (FFEC in this case), likely the return IP.
Making the constant an actual long, either by adding L (the letter ‘ell’) after it, or by casting it as (long) will resolve this:
/*
* ex02a.c (K&R C)
* a good example showing how to pass a long constant into a function
*/
f(a)
long a;
{
printf("%ld = 0x%0*lx\n", a, (int)(sizeof(a) * 2), a);
}
/* updated main() */
main()
{
f(1L);
}
This generates this assembly output (just showing main_, with new instructions bolded):
main_: PUSH BP MOV BP,SP MOV DX,0000H MOV AX,1 PUSH DX PUSH AX CALL f_ MOV SP,BP POP BP RET END
E:\NOTES>pcc ex02a a PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 End of PCC 0% Utilization E:\NOTES>pcca ex02a PCCA -- PCC Assembler V1.2b Copyright by Mary DeSmet 1988 end of PCCA 0030 code 000F data 1% utilization E:\NOTES>pccl ex02a -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 10% utilization E:\NOTES>ex02a 1 = 0x00000001 E:\NOTES>
Much better! The upshot: be sure to pass arguments that have the correct data type for the functions you call, whether they be variables or numeric constants.
You can’t always rely on the function definition to promote the types you used into the types the function expects. Take the following example:
/*
* ex02b.c (K&R C)
* another bad example of how not to call a function with a long argument
*/
f(a)
long a;
{
printf("%ld = 0x%0*lx\n", a, (int)(sizeof(a) * 2), a);
}
main()
{
int t = 1;
f(t);
}
E:\NOTES>pcc ex02b a PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 End of PCC 0% Utilization E:\NOTES>pcca ex02b PCCA -- PCC Assembler V1.2b Copyright by Mary DeSmet 1988 end of PCCA 0036 code 000F data 1% utilization E:\NOTES>pccl ex02b -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 10% utilization E:\NOTES>ex02b 65537 = 0x00010001 E:\NOTES>
I won’t bore you with the assembly, but suffice it to say that main_ is again not actually sending the full long to f_. A cast to (long) is the fix:
/*
* ex02c.c (K&R C)
* a good example showing how to pass a long to a function
* regardless of its original data type
*/
f(a)
long a;
{
printf("%ld = 0x%0*lx\n", a, (int)(sizeof(a) * 2), a);
}
main()
{
int t = 1;
f((long) t);
}
E:\NOTES>pcc ex02c a PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 End of PCC 0% Utilization E:\NOTES>pcca ex02c PCCA -- PCC Assembler V1.2b Copyright by Mary DeSmet 1988 end of PCCA 0039 code 000F data 1% utilization E:\NOTES>pccl ex02c -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 10% utilization E:\NOTES>ex02c 1 = 0x00000001 E:\NOTES>
To reiterate: you must ensure that the data you pass into functions has the correct type. The compiler will not reliably do this for you! (There are indeed cases where it might, but better to be explicit about it than to chance it.)
Unless otherwise specified, functions are “assumed to return an int.” [1], p. 68. We’ve already seen this in both Function definitions and Function declarations above.
While convenient – in the sense that int is a sane default, likely the overwhelmingly most frequently-used type – this assumption is somewhat hazardous when you do want to use a non-int return type. It’s easy to just forget to specify the return type in either the function definition, or more insidiously, in the calling function. And, unless both the called function and the caller are in the same file, the compiler is not likely to warn you, but will just silently accept that the called function will return an int.
Let’s see what happens if we try this on a modern C compiler:
/* ex03.c (ANSI C) */
#include <stdio.h>
main()
{
printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6));
}
long
f(int n)
{
return n * n;
}
% cc -std=gnu89 -Wall -Wno-error=implicit-function-declaration ex03.c
ex03.c:4:1: warning: type specifier missing, defaults to
'int' [-Wimplicit-int]
4 | main()
| ^
ex03.c:6:26: warning: implicit declaration of function 'f'
[-Wimplicit-function-declaration]
6 | printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6));
| ^
ex03.c:6:26: warning: format specifies type 'long' but the
argument has type 'int' [-Wformat]
6 | printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6));
| ~~~ ^~~~
| %d
ex03.c:6:57: warning: format specifies type 'unsigned long'
but the argument has type 'int' [-Wformat]
6 | printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6));
| ~~~~~ ^~~~
| %0*x
ex03.c:10:1: error: conflicting types for 'f'
10 | f(int n)
| ^
ex03.c:6:26: note: previous implicit declaration is here
6 | printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6));
| ^
4 warnings and 1 error generated.
%
There are a few things we can learn from this:
Actually only the last item is fatal, and is only possible to detect because f() is defined in the same file as its calling function, main(). What if f() were defined in a separate file?
/* ex03a_main.c */ #include <stdio.h> main() { printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6)); }/* ex03a_f.c (ANSI C) */ long f(int n) { return n * n; }
% cc -std=gnu89 -Wall -Wno-error=implicit-function-declaration ex03a_main.c ex03a_f.c
ex03a_main.c:4:1: warning: type specifier missing, defaults
to 'int' [-Wimplicit-int]
4 | main()
| ^
ex03a_main.c:6:26: warning: implicit declaration of
function 'f' [-Wimplicit-function-declaration]
6 | printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6));
| ^
ex03a_main.c:6:26: warning: format specifies type 'long'
but the argument has type 'int' [-Wformat]
6 | printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6));
| ~~~ ^~~~
| %d
ex03a_main.c:6:57: warning: format specifies type
'unsigned long' but the argument has type 'int' [-Wformat]
6 | printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6));
| ~~~~~ ^~~~
| %0*x
4 warnings generated.
% ./a.out
36 = 0000000000000024
%
Hey, it worked! So, is this okay to do?
No! It just happened to work in this one very contrived testcase, and there’s no guarantee it will work in a real scenario. This is why the compiler is warning you. Weaselling your way around compiler errors with ‘clever’ tricks is not likely to create stable programs.
What is PCC’s behaviour in this scenario?
/* ex03b.c (K&R C) */
main()
{
printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6));
}
long
f(n)
int n;
{
return n * n;
}
E:\NOTES>pcc ex03b
PCC Compiler V1.2d Copyright by Mark DeSmet, 1993
8 f(n $$ )
warning:conflicting types
Number of Warnings = 1
end of PCC 0033 code 000D data 1% utilization
E:\NOTES>pccl ex03b -ld:
PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993
end of PCCL 10% utilization
E:\NOTES>ex03b
524324 = 0000000000000000000000000000166EFFEC
E:\NOTES>
To its credit, PCC gave us a warning (warning:conflicting types) – but because this is C, let us proceed anyhow. And the program behaved badly, as expected.
Like with a modern compiler, separating the functions into different source files hampers PCC’s ability to detect this mismatch:
/* ex03c_m.c */ main() { printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6)); }/* ex03c_f.c (K&R C) */ long f(n) int n; { return n * n; }E:\NOTES>pcc ex03c_m PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 end of PCC 0027 code 000D data 1% utilization E:\NOTES>pcc ex03c_f PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 end of PCC 000C code 0000 data 1% utilization E:\NOTES>pccl ex03c_m ex03c_f -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 12% utilization E:\NOTES>ex03c_m 524324 = 0000000000000000000000000000166EFFEC E:\NOTES>
No warnings; but the result is still bogus.
Adding a function declaration fixes the glitch:
/* ex03d.c (K&R C) */ long f(); main() { printf("%ld = %0*lx\n", f(6), (int)(sizeof(long) * 2), f(6)); } long f(n) int n; { return n * n; }E:\NOTES>pcc ex03d PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 end of PCC 0035 code 000D data 1% utilization E:\NOTES>pccl ex03d -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 10% utilization E:\NOTES>ex03d 36 = 00000024 E:\NOTES>
This library function really sent me down a rabbit hole. Until I did a lot more reading and testing, I seriously thought PCC had a bug in its atol() implementation. Here’s the minimal test case I created:
/* ex04.c */ main() { long a; char *s = "0118999"; a = atol(s); printf("%ld = %0*lx\n", a, (int)(sizeof(long) * 2), a); }E:\NOTES>pcc ex04 PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 end of PCC 0039 code 0015 data 1% utilization E:\NOTES>pccl ex04 -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 10% utilization E:\NOTES>ex04 -12073 = FFFFD0D7 E:\NOTES>
I could not for the life of me understand why I was getting back 0xffffd0d7 rather than 0x0001d0d7.
After much reading, and discussion on IRC (thanks again, phadthai :-), I stumbled across this blurb in the PCC manual [2], p. 35:
7.9.4. atol()
char *cp;
long lval, atol();
lval = atol(cp);
As someone who had never encountered K&R-style function declarations within a function before, I did not understand the purpose of this seemingly errant function call. Now I know that it’s not a function call at all, but a declaration of the function atol(), giving its return type as long.
In PCC, the atol() function declaration exists in math.h, which seems a bit strange to me; usually it can be found in stdlib.h. Nevertheless, if you #include <math.h>, or simply add long atol(); anywhere in your code before you first call atol(), that will be enough to have it return correctly.
/* ex04a.c */ main() { long a, atol(); char *s = "0118999"; a = atol(s); printf("%ld = %0*lx\n", a, (int)(sizeof(long) * 2), a); }E:\NOTES>pcc ex04a PCC Compiler V1.2d Copyright by Mark DeSmet, 1993 end of PCC 0038 code 0015 data 1% utilization E:\NOTES>pccl ex04a -ld: PCCL Linker for PCC V1.2d - Copyright Mark DeSmet, 1993 end of PCCL 10% utilization E:\NOTES>ex04a 118999 = 0001D0D7 E:\NOTES>
Books and manuals:
Websites:
Feel free to contact me with any questions, comments, or feedback.