Saturday, September 29, 2007




Structure of a program
Structure of a program

Probably the best way to start learning a programming language is by writing a program. Therefore, here is our first program:
// my first program in C++
#include
using namespace std;
int main ()
{
cout << "Hello World!";
return 0;
}
Hello World!
The first panel shows the source code for our first program. The second one shows the result of the program once compiled and executed. The way to edit and compile a program depends on the compiler you are using. Depending on whether it has a Development Interface or not and on its version. Consult the compilers section and the manual or help included with your compiler if you have doubts on how to compile a C++ console program.
The previous program is the typical program that programmer apprentices write for the first time, and its result is the printing on screen of the "Hello World!" sentence. It is one of the simplest programs that can be written in C++, but it already contains the fundamental components that every C++ program has. We are going to look line by line at the code we have just written:
// my first program in C++
This is a comment line. All lines beginning with two slash signs (//) are considered comments and do not have any effect on the behavior of the program. The programmer can use them to include short explanations or observations within the source code itself. In this case, the line is a brief description of what our program is.
#include
Lines beginning with a pound sign (#) are directives for the preprocessor. They are not regular code lines with expressions but indications for the compiler's preprocessor. In this case the directive #include tells the preprocessor to include the iostream standard file. This specific file (iostream) includes the declarations of the basic standard input-output library in C++, and it is included because its functionality is going to be used later in the program.
using namespace std;
All the elements of the standard C++ library are declared within what is called a namespace, the namespace with the name std. So in order to access its functionality we declare with this expression that we will be using these entities. This line is very frequent in C++ programs that use the standard library, and in fact it will be included in most of the source codes included in these tutorials.
int main ()
This line corresponds to the beginning of the definition of the main function. The main function is the point by where all C++ programs start their execution, independently of its location within the source code. It does not matter whether there are other functions with other names defined before or after it - the instructions contained within this function's definition will always be the first ones to be executed in any C++ program. For that same reason, it is essential that all C++ programs have a main function.
The word main is followed in the code by a pair of parentheses (()). That is because it is a function declaration: In C++, what differentiates a function declaration from other types of expressions are these parentheses that follow its name. Optionally, these parentheses may enclose a list of parameters within them.
Right after these parentheses we can find the body of the main function enclosed in braces ({}). What is contained within these braces is what the function does when it is executed.
cout << "Hello World";
This line is a C++ statement. A statement is a simple or compound expression that can actually produce some effect. In fact, this statement performs the only action that generates a visible effect in our first program.
cout represents the standard output stream in C++, and the meaning of the entire statement is to insert a sequence of characters (in this case the Hello World sequence of characters) into the standard output stream (which usually is the screen).
cout is declared in the iostream standard file within the std namespace, so that's why we needed to include that specific file and to declare that we were going to use this specific namespace earlier in our code.
Notice that the statement ends with a semicolon character (;). This character is used to mark the end of the statement and in fact it must be included at the end of all expression statements in all C++ programs (one of the most common syntax errors is indeed to forget to include some semicolon after a statement).
return 0;
The return statement causes the main function to finish. return may be followed by a return code (in our example is followed by the return code 0). A return code of 0 for the main function is generally interpreted as the program worked as expected without any errors during its execution. This is the most usual way to end a C++ console program.
You may have noticed that not all the lines of this program perform actions when the code is executed. There were lines containing only comments (those beginning by //). There were lines with directives for the compiler's preprocessor (those beginning by #). Then there were lines that began the declaration of a function (in this case, the main function) and, finally lines with statements (like the insertion into cout), which were all included within the block delimited by the braces ({}) of the main function.
The program has been structured in different lines in order to be more readable, but in C++, we do not have strict rules on how to separate instructions in different lines. For example, instead of
int main ()
{
cout << " Hello World ";
return 0;
}
We could have written:
int main () { cout << "Hello World"; return 0; }
All in just one line and this would have had exactly the same meaning as the previous code.
In C++, the separation between statements is specified with an ending semicolon (;) at the end of each one, so the separation in different code lines does not matter at all for this purpose. We can write many statements per line or write a single statement that takes many code lines. The division of code in different lines serves only to make it more legible and schematic for the humans that may read it.
Let us add an additional instruction to our first program:
// my second program in C++
#include
using namespace std;
int main ()
{
cout << "Hello World! ";
cout << "I'm a C++ program";
return 0;
}
Hello World! I'm a C++ program
In this case, we performed two insertions into cout in two different statements. Once again, the separation in different lines of code has been done just to give greater readability to the program, since main could have been perfectly valid defined this way:
int main () { cout << " Hello World! "; cout << " I'm a C++ program "; return 0; }
We were also free to divide the code into more lines if we considered it more convenient:
int main ()
{
cout <<
"Hello World!";
cout
<< "I'm a C++ program";
return 0;
}
And the result would again have been exactly the same as in the previous examples.
Preprocessor directives (those that begin by #) are out of this general rule since they are not statements. They are lines read and processed by the preprocessor and do not produce any code by themselves. Preprocessor directives must be specified in their own line and do not have to end with a semicolon (;).
Comments
Comments are parts of the source code disregarded by the compiler. They simply do nothing. Their purpose is only to allow the programmer to insert notes or descriptions embedded within the source code.
C++ supports two ways to insert comments:
// line comment
/* block comment */
The first of them, known as line comment, discards everything from where the pair of slash signs (//) is found up to the end of that same line. The second one, known as block comment, discards everything between the /* characters and the first appearance of the */ characters, with the possibility of including more than one line.We are going to add comments to our second program:
/* my second program in C++
with more comments */
#include
using namespace std;
int main ()
{
cout << "Hello World! "; // prints Hello World!
cout << "I'm a C++ program"; // prints I'm a C++ program
return 0;
}
Hello World! I'm a C++ program
If you include comments within the source code of your programs without using the comment characters combinations //, /* or */, the compiler will take them as if they were C++ expressions, most likely causing one or several error messages when you compile it.

" programs shown in the previous section is quite questionable. We had to write several lines of code, compile them, and then execute the resulting program just to obtain a simple sentence written on the screen as result. It certainly would have been much faster to type the output sentence by ourselves. However, programming is not limited only to printing simple texts on the screen. In order to go a little further on and to become able to write programs that perform useful tasks that really save us work we need to introduce the concept of variable.
Let us think that I ask you to retain the number 5 in your mental memory, and then I ask you to memorize also the number 2 at the same time. You have just stored two different values in your memory. Now, if I ask you to add 1 to the first number I said, you should be retaining the numbers 6 (that is 5+1) and 2 in your memory. Values that we could now for example subtract and obtain 4 as result.
The whole process that you have just done with your mental memory is a simile of what a computer can do with two variables. The same process can be expressed in C++ with the following instruction set:
a = 5;
b = 2;
a = a + 1;
result = a - b;
Obviously, this is a very simple example since we have only used two small integer values, but consider that your computer can store millions of numbers like these at the same time and conduct sophisticated mathematical operations with them.
Therefore, we can define a variable as a portion of memory to store a determined value.
Each variable needs an identifier that distinguishes it from the others, for example, in the previous code the variable identifiers were a, b and result, but we could have called the variables any names we wanted to invent, as long as they were valid identifiers.
IdentifiersA valid identifier is a sequence of one or more letters, digits or underscore characters (_). Neither spaces nor punctuation marks or symbols can be part of an identifier. Only letters, digits and single underscore characters are valid. In addition, variable identifiers always have to begin with a letter. They can also begin with an underline character (_ ), but in some cases these may be reserved for compiler specific keywords or external identifiers, as well as identifiers containing two successive underscore characters anywhere. In no case they can begin with a digit.
Another rule that you have to consider when inventing your own identifiers is that they cannot match any keyword of the C++ language nor your compiler's specific ones, which are reserved keywords. The standard reserved keywords are:
asm, auto, bool, break, case, catch, char, class, const, const_cast, continue, default, delete, do, double, dynamic_cast, else, enum, explicit, export, extern, false, float, for, friend, goto, if, inline, int, long, mutable, namespace, new, operator, private, protected, public, register, reinterpret_cast, return, short, signed, sizeof, static, static_cast, struct, switch, template, this, throw, true, try, typedef, typeid, typename, union, unsigned, using, virtual, void, volatile, wchar_t, while
Additionally, alternative representations for some operators cannot be used as identifiers since they are reserved words under some circumstances:
and, and_eq, bitand, bitor, compl, not, not_eq, or, or_eq, xor, xor_eq
Your compiler may also include some additional specific reserved keywords.Very important: The C++ language is a "case sensitive" language. That means that an identifier written in capital letters is not equivalent to another one with the same name but written in small letters. Thus, for example, the RESULT variable is not the same as the result variable or the Result variable. These are three different variable identifiers.
Fundamental data typesWhen programming, we store the variables in our computer's memory, but the computer has to know what kind of data we want to store in them, since it is not going to occupy the same amount of memory to store a simple number than to store a single letter or a large number, and they are not going to be interpreted the same way.
The memory in our computers is organized in bytes. A byte is the minimum amount of memory that we can manage in C++. A byte can store a relatively small amount of data: one single character or a small integer (generally an integer between 0 and 255). In addition, the computer can manipulate more complex data types that come from grouping several bytes, such as long numbers or non-integer numbers.
Next you have a summary of the basic fundamental data types in C++, as well as the range of values that can be represented with each one:
Name
Description
Size*
Range*
char
Character or small integer.
1byte
signed: -128 to 127unsigned: 0 to 255
short int (short)
Short Integer.
2bytes
signed: -32768 to 32767unsigned: 0 to 65535
int
Integer.
4bytes
signed: -2147483648 to 2147483647unsigned: 0 to 4294967295
long int (long)
Long integer.
4bytes
signed: -2147483648 to 2147483647unsigned: 0 to 4294967295
bool
Boolean value. It can take one of two values: true or false.
1byte
true or false
float
Floating point number.
4bytes
3.4e +/- 38 (7 digits)
double
Double precision floating point number.
8bytes
1.7e +/- 308 (15 digits)
long double
Long double precision floating point number.
8bytes
1.7e +/- 308 (15 digits)
wchar_t
Wide character.
2bytes
1 wide character
* The values of the columns Size and Range depend on the system the program is compiled for. The values shown above are those found on most 32-bit systems. But for other systems, the general specification is that int has the natural size suggested by the system architecture (one "word") and the four integer types char, short, int and long must each one be at least as large as the one preceding it, with char being always 1 byte in size. The same applies to the floating point types float, double and long double, where each one must provide at least as much precision as the preceding one.
Declaration of variablesIn order to use a variable in C++, we must first declare it specifying which data type we want it to be. The syntax to declare a new variable is to write the specifier of the desired data type (like int, bool, float...) followed by a valid variable identifier. For example:
int a;
float mynumber;
These are two valid declarations of variables. The first one declares a variable of type int with the identifier a. The second one declares a variable of type float with the identifier mynumber. Once declared, the variables a and mynumber can be used within the rest of their scope in the program.
If you are going to declare more than one variable of the same type, you can declare all of them in a single statement by separating their identifiers with commas. For example:
int a, b, c;
This declares three variables (a, b and c), all of them of type int, and has exactly the same meaning as:
int a;
int b;
int c;
The integer data types char, short, long and int can be either signed or unsigned depending on the range of numbers needed to be represented. Signed types can represent both positive and negative values, whereas unsigned types can only represent positive values (and zero). This can be specified by using either the specifier signed or the specifier unsigned before the type name. For example:
unsigned short int NumberOfSisters;
signed int MyAccountBalance;
By default, if we do not specify either signed or unsigned most compiler settings will assume the type to be signed, therefore instead of the second declaration above we could have written:
int MyAccountBalance;
with exactly the same meaning (with or without the keyword signed)
An exception to this general rule is the char type, which exists by itself and is considered a different fundamental data type from signed char and unsigned char, thought to store characters. You should use either signed or unsigned if you intend to store numerical values in a char-sized variable.
short and long can be used alone as type specifiers. In this case, they refer to their respective integer fundamental types: short is equivalent to short int and long is equivalent to long int. The following two variable declarations are equivalent:
short Year;
short int Year;
Finally, signed and unsigned may also be used as standalone type specifiers, meaning the same as signed int and unsigned int respectively. The following two declarations are equivalent:
unsigned NextYear;
unsigned int NextYear;
To see what variable declarations look like in action within a program, we are going to see the C++ code of the example about your mental memory proposed at the beginning of this section:
// operating with variables
#include
using namespace std;
int main ()
{
// declaring variables:
int a, b;
int result;
// process:
a = 5;
b = 2;
a = a + 1;
result = a - b;
// print out the result:
cout << result;
// terminate the program:
return 0;
}
4
Do not worry if something else than the variable declarations themselves looks a bit strange to you. You will see the rest in detail in coming sections.
Scope of variablesAll the variables that we intend to use in a program must have been declared with its type specifier in an earlier point in the code, like we did in the previous code at the beginning of the body of the function main when we declared that a, b, and result were of type int.
A variable can be either of global or local scope. A global variable is a variable declared in the main body of the source code, outside all functions, while a local variable is one declared within the body of a function or a block.

Global variables can be referred from anywhere in the code, even inside functions, whenever it is after its declaration.
The scope of local variables is limited to the block enclosed in braces ({}) where they are declared. For example, if they are declared at the beginning of the body of a function (like in function main) their scope is between its declaration point and the end of that function. In the example above, this means that if another function existed in addition to main, the local variables declared in main could not be accessed from the other function and vice versa.
Initialization of variablesWhen declaring a regular local variable, its value is by default undetermined. But you may want a variable to store a concrete value at the same moment that it is declared. In order to do that, you can initialize the variable. There are two ways to do this in C++:
The first one, known as c-like, is done by appending an equal sign followed by the value to which the variable will be initialized:
type identifier = initial_value ;
For example, if we want to declare an int variable called a initialized with a value of 0 at the moment in which it is declared, we could write:
int a = 0;
The other way to initialize variables, known as constructor initialization, is done by enclosing the initial value between parentheses (()):
type identifier (initial_value) ;
For example:
int a (0);
Both ways of initializing variables are valid and equivalent in C++.
// initialization of variables
#include
using namespace std;
int main ()
{
int a=5; // initial value = 5
int b(2); // initial value = 2
int result; // initial value undetermined
a = a + 3;
result = a - b;
cout << result;
return 0;
}
6
Introduction to stringsVariables that can store non-numerical values that are longer than one single character are known as strings.
The C++ language library provides support for strings through the standard string class. This is not a fundamental type, but it behaves in a similar way as fundamental types do in its most basic usage.A first difference with fundamental data types is that in order to declare and use objects (variables) of this type we need to include an additional header file in our source code: and have access to the std namespace (which we already had in all our previous programs thanks to the using namespace statement).
// my first string
#include
#include
using namespace std;
int main ()
{
string mystring = "This is a string";
cout << mystring;
return 0;
}
This is a string
As you may see in the previous example, strings can be initialized with any valid string literal just like numerical type variables can be initialized to any valid numerical literal. Both initialization formats are valid with strings:
string mystring = "This is a string";
string mystring ("This is a string");
Strings can also perform all the other basic operations that fundamental data types can, like being declared without an initial value and being assigned values during execution:
// my first string
#include
#include
using namespace std;
int main ()
{
string mystring;
mystring = "This is the initial string content";
cout << mystring << endl;
mystring = "This is a different string content";
cout << mystring << endl;
return 0;
}
This is the initial string content
This is a different string content
For more details on C++ strings, you can have a look at the string class reference.

VIRTUAL FUNCTIONS

C++ Virtual Functions
Imagine that you are doing some graphics programming, with a variety of shapes to be output to the screen. Initially, you want to support Line, Circle, and Text. Each shape has an X,Y origin and a color.
How might this be done in C++? One way is to use virtual functions. A virtual function is a function member of a class, declared using the "virtual" keyword. A pointer to a derived class object may be assigned to a base class pointer, and a virtual function called through the pointer. If the function is virtual and occurs both in the base class and in derived classes, then the right function will be picked up based on what the base class pointer "really" points at.
For graphics, we can use a base class called Shape, with derived classes named Line, Circle, and Text. Shape and each of the derived classes has a virtual function draw(). We create new objects and point at them using Shape* pointers. But when we call a draw() function, as in: Shape* p = new Line(0.1, 0.1, Co_blue, 0.4, 0.4);
p->draw();
the draw() function for a Line is called, not the draw() function for Shape. This style of programming is very common and goes by names like "polymorphism" and "object-oriented programming". To illustrate it further, here is an example of this type of programming for a graphics application. Annotations in /* */ explain in some detail what is going on. #include
#include
#include

typedef double Coord;
/*
The type of X/Y points on the screen.
*/

enum Color {Co_red, Co_green, Co_blue};
/*
Colors.
*/

// abstract base class for all shape types
class Shape {
protected:
Coord xorig; // X origin
Coord yorig; // Y origin
Color co; // color
/*
These are protected so that they can be accessed
by derived classes. Private wouldn't allow this.

These data members are common to all shape types.
*/
public:
Shape(Coord x, Coord y, Color c) :
xorig(x), yorig(y), co(c) {} // constructor
/*
Constructor to initialize data members common to
all shape types.
*/
virtual ~Shape() {} // virtual destructor
/*
Destructor for Shape. It's a virtual function.
Destructors in derived classes are virtual also
because this one is declared so.
*/
virtual void draw() = 0; // pure virtual draw() function
/*
Similarly for the draw() function. It's a pure virtual and
is not called directly.
*/
};

// line with X,Y destination
class Line : public Shape {
/*
Line is derived from Shape, and picks up its
data members.
*/
Coord xdest; // X destination
Coord ydest; // Y destination
/*
Additional data members needed only for Lines.
*/
public:
Line(Coord x, Coord y, Color c, Coord xd, Coord yd) :
xdest(xd), ydest(yd),
Shape(x, y, c) {} // constructor with base initialization
/*
Construct a Line, calling the Shape constructor as well
to initialize data members of the base class.
*/
~Line() {cout << "~Line\n";} // virtual destructor
/*
Destructor.
*/
void draw() // virtual draw function
{
cout << "Line" << "(";
cout << xorig << ", " << yorig << ", " << int(co);
cout << ", " << xdest << ", " << ydest;
cout << ")\n";
}
/*
Draw a line.
*/
};

// circle with radius
class Circle : public Shape {
Coord rad; // radius of circle
/*
Radius of circle.
*/
public:
Circle(Coord x, Coord y, Color c, Coord r) : rad(r),
Shape(x, y, c) {} // constructor with base initialization
~Circle() {cout << "~Circle\n";} // virtual destructor
void draw() // virtual draw function
{
cout << "Circle" << "(";
cout << xorig << ", " << yorig << ", " << int(co);
cout << ", " << rad;
cout << ")\n";
}
};

// text with characters given
class Text : public Shape {
char* str; // copy of string
public:
Text(Coord x, Coord y, Color c, const char* s) :
Shape(x, y, c) // constructor with base initialization
{
str = new char[strlen(s) + 1];
assert(str);
strcpy(str, s);
/*
Copy out text string. Note that this would be done differently
if we were taking advantage of some newer C++ features like
exceptions and strings.
*/
}
~Text() {delete [] str; cout << "~Text\n";} // virtual dtor
/*
Destructor; delete text string.
*/
void draw() // virtual draw function
{
cout << "Text" << "(";
cout << xorig << ", " << yorig << ", " << int(co);
cout << ", " << str;
cout << ")\n";
}
};

int main()
{
const int N = 5;
int i;
Shape* sptrs[N];
/*
Pointer to vector of Shape* pointers. Pointers to classes
derived from Shape can be assigned to Shape* pointers.
*/
// initialize set of Shape object pointers

sptrs[0] = new Line(0.1, 0.1, Co_blue, 0.4, 0.5);
sptrs[1] = new Line(0.3, 0.2, Co_red, 0.9, 0.75);
sptrs[2] = new Circle(0.5, 0.5, Co_green, 0.3);
sptrs[3] = new Text(0.7, 0.4, Co_blue, "Howdy!");
sptrs[4] = new Circle(0.3, 0.3, Co_red, 0.1);
/*
Create some shape objects.
*/
// draw set of shape objects

for (i = 0; i < N; i++)
sptrs[i]->draw();
/*
Draw them using virtual functions to pick up the
right draw() function based on the actual object
type being pointed at.
*/
// cleanup

for (i = 0; i < N; i++)
delete sptrs[i];
/*
Clean up the objects using virtual destructors.
*/
return 0;
}
When we run this program, the output is: Line(0.1, 0.1, 2, 0.4, 0.5)
Line(0.3, 0.2, 0, 0.9, 0.75)
Circle(0.5, 0.5, 1, 0.3)
Text(0.7, 0.4, 2, Howdy!)
Circle(0.3, 0.3, 0, 0.1)
~Line
~Line
~Circle
~Text
~Circle
with enum color values represented by small integers.
A few additional comments. Virtual functions typically are implemented by placing a pointer to a jump table in each object instance. This table pointer represents the "real" type of the object, even though the object is being manipulated through a base class pointer.
Because virtual functions usually need to have their function address taken, to store in a table, declaring them inline as the above example does is often a waste of time. They will be laid down as static copies per object file. There are some advanced techniques for optimizing virtual functions, but you can't count on these being available.
Note that we declared the Shape destructor virtual (there are no virtual constructors). If we had not done this, then when we iterated over the vector of Shape* pointers, deleting each object in turn, the destructors for the actual object types derived from Shape would not have been called, and in the case above this would result in a memory leak in the Text class.
Shape is an example of an abstract class, whose purpose is to serve as a base for derived classes that actually do the work. It is not possible to create an actual object instance of Shape, because it contains at least one pure virtual function. Pointers to Members and Functions
POINTERS TO MEMBERS
In ANSI C, function pointers are used like this: #include
void f(int i)
{
printf("%d\n", i);
}
typedef void (*fp)(int);
void main()
{
fp p = &f;
(*p)(37); /* these are equivalent */
p(37);
}
and are employed in a variety of ways, for example to specify a comparison function to a library function like qsort().
In C++, pointers can be similarly used, but there are a couple of quirks to consider. We will discuss two of them in this section, and another one in the next section.
The first point to mention is that C++ has C-style functions in it, but also has other types of functions, notably member functions. For example: class A {
public:
void f(int);
};
In this example, A::f(int) is a member function. That is, it operates on object instances of class A, and the function itself has a "this" pointer that points at the instance in question.
Because C++ is a strongly typed language, it is desirable that a pointer to a member function be treated differently than a pointer to a C-style function, and that a pointer to a function member of class A be distinguished from a pointer to a member of class B. To do this, we can say: #include
class A {
public:
void f(int i) {cout << "value is: " << i << "\n";}
};
typedef void (A::*pmfA)(int);
pmfA x = &A::f;
void main()
{
A a;
A* p = &a;
(p->*x)(37);
}
Note the notation for actually calling the member function.
It is not possible to intermix such a type with other pointer types, so for example: void f(int) {}
pmfA x = &f;
is invalid. A static member function, as in: class A {
public:
static void g(int);
};
typedef void (*fp)(int);
fp p = &A::g;
is treated like a C-style function. A static function has no "this" pointer and does not operate on actual object instances.
Pointers to members are typically implemented just like C function pointers, but there is an issue with their implementation in cases where inheritance is used. In such a case, you have to worry about computing offsets of subobjects, and so on, when calling a member function, and for this purpose a runtime structure similar to a virtual table used for virtual functions is used.
It's also possible to have pointers to data members of a class, with the pointer representing an offset into a class instance. For example: #include
class A {
public:
int x;
};
typedef int A::*piA;
piA x = &A::x;
void main()
{
A a;
A* p = &a;
a.x = 37;
cout << "value is: " <<>*x << "\n";
}
Note that saying "&A::x" does not take the address of an actual data member in an instance of A, but rather computes a generic offset that can be applied to any instance.
A NEW ANGLE ON FUNCTION POINTERS
The discussion on function pointers in this issue overlooks one key angle that has fairly recently been introduced into the language. This involves distinguishing between C and C++ pointers. A C-style pointer in C++, that is, one that does not point to a member function, is used just like a function pointer in C. But according to the standard (section 7.5), such a pointer in fact has a different type.
For example, consider: extern "C" typedef void (*fp1)(int);
extern "C++" typedef void (*fp2)(int);
extern "C" void f(int);
fp1 and fp2 are not the same type, and saying: fp2 p = &f;
to initialize p to the f(int) declared in the 'extern "C"' will not work.
It is possible to overload functions on this basis, so that for example: extern "C" void f(void (*)(int));
extern "C++" void f(void (*)(int));
is legal, with the appropriate f() called based on the function pointer type passed to it. The function pointer parameter types in this example are not identical; the first is a pointer to a C function, the second a pointer to a C++ one.
This feature is new and may not be implemented in your local C++ compiler. Exception Handling
INTRODUCTION TO EXCEPTION HANDLING PART 1 - A SIMPLE EXAMPLE
In this and subsequent issues we will be discussing some aspects of C++ exception handling. To start this discussion, let's consider a simple example. Suppose that you are writing a program to manipulate calendar dates, and want to check whether a given year is in the 20th century (ignoring the issue of whether the 21st century starts in 2000 or 2001!).
Using exceptions, one way to do this might be: #include
class DateException {
char* err;
public:
DateException(char* s) {err = s;}
void print() const {cerr << err << endl;}
};
// a function that operates on dates
void g(int date)
{
if (date < 1900)
throw DateException("date < 1900");
if (date > 1999)
throw DateException("date > 1999");
// process date ...
}
// some code that uses dates
void f()
{
g(1879);
}
int main()
{
try {
f();
}
catch (const DateException& de) {
de.print();
return 1;
}
return 0;
}
The basic idea here is that we have a try block: try {
f();
}
Within this block, we execute some code, in this case a function call f(). Then we have a list of one or more handlers: catch (DateException de) {
de.print();
return 1;
}
If an abnormal condition arises in the code, we can throw an exception: if (date < 1900)
throw DateException("date < 1900");
and have it caught by one of the handlers at an outer level, that is, execution will continue at the point of the handler, with the execution stack unwound.
An exception may be a class object type such as DateException, or a fundamental C++ type like an integer. Obviously, a class object type can store and convey more information about the nature of the exception, as illustrated in this example. Saying: throw -37;
will indeed throw an exception, which may be caught somewhere, but this idiom is not particularly useful.
What if the handler we declare is changed slightly, as in: catch (DateException* de) {
de->print();
return 1;
}
In this case, because an object of type DateException is thrown, rather than a DateException* (pointer), no corresponding handler will be found in the program. In that case, the runtime system that handles exception processing will call a special library function terminate(), and the program will abort. One way to avoid this problem is to say: main()
{
try {
body_of_program();
}
catch (...) {
// all exceptions go through here
return 1;
}
return 0;
}
where "..." will catch any exception type.
We will explore various details of exception handling in future issues, but one general comment is in order. C++ exceptions are not the same as low-level hardware interrupts, nor are they the same as UNIX signals such as SIGTERM. And there's no linkage between exceptions such as divide by zero (which may be a low-level machine exception) and C++ exceptions.
INTRODUCTION TO EXCEPTION HANDLING PART 2 - THROWING AN EXCEPTION
In the last issue we introduced C++ exception handling. In this issue we'll go more into detail about throwing exceptions.
Throwing an exception transfers control to an exception handler. For example: void f()
{
throw 37;
}
void g()
{
try { // try block
f();
}
catch (int i) { // handler or catch clause
}
}
In this example the exception with value 37 is thrown, and control passes to the handler. A throw transfers control to the nearest handler with the appropriate type. "Nearest" means in the sense of stack frames and try blocks that have been dynamically entered.
Typically an exception that is thrown is of class type rather than a simple constant like "37". Throwing a class object instance allows for more sophisticated usage such as conveying additional information about the nature of an exception.
A class object instance that is thrown is treated similarly to a function argument or operand in a return statement. A temporary copy of the instance may be made at the throw point, just as temporaries are sometimes used with function argument passing. A copy constructor if any is used to initialize the temporary, with the class's destructor used to destruct the temporary. The temporary persists as long as there is a handler being executed for the given exception. As in other parts of the C++ language, some compilers may be able in some cases to eliminate the temporary.
An example: #include

class Exc {
char* s;
public:
Exc(char* e) {s = e; cerr << "ctor called\n";}
Exc(const Exc& e) {s = e.s; cerr << "copy ctor called\n";}
~Exc() {cerr << "dtor called\n";}
char* geterr() const {return s;}
};

void check_date(int date)
{
if (date < 1900)
throw Exc("date < 1900");

// other processing
}

int main()
{
try {
check_date(1879);
}
catch (const Exc& e) {
cerr << "exception was: " << e.geterr() << "\n";
}

return 0;
}
If you run this program, you can trace through the various stages of throwing the exception, including the actual throw, making a temporary copy of the class instance, and the invocation of the destructor on the temporary.
It's also possible to have "throw" with no argument, as in: catch (const Exc& e) {
cerr << "exception was: " << e.geterr() << "\n";
throw;
}
What does this mean? Such usage rethrows the exception, using the already-established temporary. The exception thrown is the most recently caught one not yet finished. A caught exception is one where the parameter of the catch clause has been initialized, and for which the catch clause has not yet been exited.
So in the example above, "throw;" would rethrow the exception represented by "e". Because there is no outer catch clause to catch the rethrown exception, a special library function terminate() is called. If an exception is rethrown, and there is no exception currently being handled, terminate() is called as well.
In the next issue we'll talk more about how exceptions are handled in a catch clause.
INTRODUCTION TO EXCEPTION HANDLING PART 3 - STACK UNWINDING
In the last issue we talked about throwing exceptions. Before discussing how exceptions are handled, we need to talk about an intermediate step, stack unwinding.
The exception handling mechanism is dynamic in that a record is kept of the flow of program execution, for example via stack frames and program counter mapping tables. When an exception is thrown, control transfers to the nearest suitable handler. "nearest" in this sense means the nearest dynamically surrounding try block containing a handler that matches the type of the thrown exception. We will talk more about exception handlers in a future issue.
Transfer of control from the point at which an exception is thrown to the exception handler implies jumping out of one program context into another. What about cleanup of the old program context? For example, what about local class objects that have been allocated? Are their destructors called?
The answer is "yes". All stack-allocated ("automatic") objects allocated since the try block was entered will have their destructors invoked. Let's look at an example: #include
class A {
int x;
public:
A(int i) {x = i; cerr << "ctor " << x << endl;}
~A() {cerr << "dtor " << x << endl;}
};
void f()
{
A a1(1);
throw "this is a test";
A a2(2);
}
int main()
{
try {
A a3(3);
f();
A a4(4);
}
catch (const char* s) {
cerr << "exception: " << s << endl;
}
return 0;
}
Output of this program is: ctor 3
ctor 1
dtor 1
dtor 3
exception: this is a test
In this example, we enter the try block in main(), allocate a3, then call f(). f() allocates a1, then throws an exception, which will transfer control to the catch clause in main().
In this example, the a1 and a3 objects have their destructors called. a2 and a4 do not, because they were never allocated.
It's possible to have class objects containing other class objects, or arrays of class objects, with partial construction taking place followed by an exception being thrown. In this case, only the constructed subobjects will be destructed.
INTRODUCTION TO EXCEPTION HANDING PART 4 - HANDLING AN EXCEPTION
In previous issues we discussed throwing of exceptions and stack unwinding. Let's now look at actual handling of an exception that has been thrown. An exception is handled via an exception handler. For example: catch (T x) {
// stuff
}
handles exceptions of type T. More precisely, a handler of the form: catch (T x) {
// stuff
}
or: catch (const T x) {
// stuff
}
or: catch (T& x) {
// stuff
}
or: catch (const T& x) {
// stuff
}
will catch a thrown exception of type E, given that: - T and E are the same type, or
- T is an unambiguous public base class of E, or
- T is a pointer type and E is a pointer type that can be
converted to T by a standard pointer conversion
As an example of these rules, in the following case the thrown exception will be caught: #include
class A {};
class B : public A {};
void f()
{
throw B();
}
int main()
{
try {
f();
}
catch (const A& x) {
cout << "exception caught" << endl;
}
return 0;
}
because A is a public base class of B. Handlers are tried in order of appearance. If, for example, you place a handler for a derived class after a handler for a corresponding base class, it will never be invoked. If we had a handler for B after A, in the example above, it would not be called. A handler like: catch (...) {
// stuff
}
appearing as the last handler in a series, will match any exception type.
If no handler is found, the search for a matching handler continues in a dynamically surrounding try block. If no handler is found at all, a special library function terminate() is called, typically ending the program.
An exception is considered caught by a handler when the parameters to the handler have been initialized, and considered finished when the handler exits.
In the next issue we'll talk a bit about exception specifications, that are used to specify what exception types a function may throw.
INTRODUCTION TO EXCEPTION HANDLING PART 5 - TERMINATE() AND UNEXPECTED()
Suppose that you have a bit of exception handling usage, like this: void f()
{
throw -37;
}
int main()
{
try {
f();
}
catch (char* s) {
}
return 0;
}
What will happen? An exception of type "int" is thrown, but there is no handler for it. In this case, a special function terminate() is called. terminate() is called whenever the exception handling mechanism cannot find a handler for a thrown exception. terminate() is also called in a couple of odd cases, for example when an exception occurs in the middle of throwing another exception.
terminate() is a library function which by default aborts the program. You can override terminate if you want: #include
#include
typedef void (*PFV)(void);
PFV set_terminate(PFV);
void t()
{
cerr << "terminate() called" << endl;
exit(1);
}
void f()
{
throw -37;
}
int main()
{
set_terminate(t);
try {
f();
}
catch (char* s) {
}
return 0;
}
Note that this area is in a state of flux as far as compiler adaptation of new features. For example, terminate() should really be "std::terminate()", and the declarations may be found in a header file "". But not all compilers have this yet, and these examples are written using an older no-longer-standard convention.
In a similar way, a call to the unexpected() function can be triggered by saying: #include
#include
typedef void (*PFV)(void);
PFV set_unexpected(PFV);
void u()
{
cerr << "unexpected() called" << endl;
exit(1);
}
void f() throw(char*)
{
throw -37;
}
int main()
{
set_unexpected(u);
try {
f();
}
catch (int i) {
}
return 0;
}
unexpected() is called when a function with an exception specification throws an exception of a type not listed in the exception specification for the function. In this example, f()'s exception specification is: throw(char*)
A function declaration without such a specification may throw any type of exception, and one with: throw()
is not allowed to throw exceptions at all. By default unexpected() calls terminate(), but in certain cases where the user has defined their own version of unexpected(), execution can continue.
There is also a brand-new library function: bool uncaught_exception();
that is true from the time after completion of the evaluation of the object to be thrown until completion of the initialization of the exception declaration in the matching handler. For example, this would be true during stack unwinding (see newsletter #017). If this function returns true, then you don't want to throw an exception, because doing so would cause terminate() to be called. Placement New/Delete
In C++, operators new/delete mostly replace the use of malloc() and free() in C. For example: class A {
public:
A();
~A();
};
A* p = new A;
...
delete p;
allocates storage for an A object and arranges for its constructor to be called, later followed by invocation of the destructor and freeing of the storage. You can use the standard new/delete functions in the library, or define your own globally and/or on a per-class basis.
There's a variation on new/delete worth mentioning. It's possible to supply additional parameters to a new call, for example: A* p = new (a, b) A;
where a and b are arbitrary expressions; this is known as "placement new". For example, suppose that you have an object instance of a specialized class named Alloc that you want to pass to the new operator, so that new can control allocation according to the state of this object (that is, a specialized storage allocator): class Alloc {/* stuff */};
Alloc allocator;
...
class A {/* stuff */};
...
A* p = new (allocator) A;
If you do this, then you need to define your own new function, like this: void* operator new(size_t s, Alloc& a)
{
// stuff
}
The first parameter is always of type "size_t" (typically unsigned int), and any additional parameters are then listed. In this example, the "a" instance of Alloc might be examined to determine what strategy to use to allocate space. A similar approach can be used for operator new[] used for arrays.
This feature has been around for a while. A relatively new feature that goes along with it is placement delete. If during object initialization as part of a placement new call, for example during constructor invocation on a class object instance, an exception is thrown, then a matching placement delete call is made, with the same arguments and values as to placement new. In the example above, a matching function would be: void operator delete(void* p, Alloc& a)
{
// stuff
}
With new, the first parameter is always "size_t", and with delete, always "void*". So "matching" in this instance means all other parameters match. "a" would have the value as was passed to new earlier.
Here's a simple example: int flag = 0;
typedef unsigned int size_t;
void operator delete(void* p, int i)
{
flag = 1;
}
void* operator new(size_t s, int i)
{
return new char[s];
}
class A {
public:
A() {throw -37;}
};
int main()
{
try {
A* p = new (1234) A;
}
catch (int i) {
}
if (flag == 0)
return 1;
else
return 0;
}
Placement delete may not be in your local C++ compiler as yet. In compilers without this feature, memory will leak. Note also that you can't call overloaded operator delete directly via the operator syntax; you'd have to code it as a regular function call. Operators new[] and delete[]
The C++ library has long had operator new() and delete() for dynamic storage allocation. Note that with these there's a distinction made between the operators specified as keywords, as in: new A[10];
and the functions, for example: operator new(159);
The former usage not only is responsible for allocating space, via operator new(), but also for arranging for constructors to be called for the individual objects in the array slots. So normally you will not use operator new() directly.
More recently the functions operator new[]() and operator delete[]() have been added to the language. These are like operator new() and operator delete(), but are invoked when arrays are being allocated and deallocated.
To see how this works, consider an example such as: #include
#include
class A {
int x;
public:
A() {printf("A::A %lx\n", (unsigned long)this);}
~A() {printf("A::~A %lx\n", (unsigned long)this);}
};
void* operator new[](size_t sz)
{
printf("allocated size = %lu\n", (unsigned long)sz);
void* vp = operator new(sz);
printf("allocated pointer = %lx\n", (unsigned long)vp);
return vp;
}
void operator delete[](void* ptr)
{
printf("returned pointer = %lx\n", (unsigned long)ptr);
operator delete(ptr);
}
int main()
{
A* ap = new A[10];
delete [] ap;
return 0;
}
This example redefines operator new[]() and operator delete[](), and they are invoked when the program is executed.
When operator new[]() is called, it is passed an argument indicating how many bytes are required for the total array. In this example, approximately 40 bytes are needed for the 10 array slots (this will vary from system to system, with overhead for each chunk of space allocated).
In the example above, the actual bytes are allocated via a call to operator new(), that is, the non-array version is called to allocate the bytes. operator delete[]() works in a similar way. Note that the C++ standard specifies that the size of the array is saved, so that when it is deleted, the system will know how many slots to iterate across to call the destructors for individual objects.
Typical output of the program is: allocated size = 44
allocated pointer = 7b2514
A::A 7b2518
A::A 7b251c
A::A 7b2520
A::A 7b2524
A::A 7b2528
A::A 7b252c
A::A 7b2530
A::A 7b2534
A::A 7b2538
A::A 7b253c
A::~A 7b253c
A::~A 7b2538
A::~A 7b2534
A::~A 7b2530
A::~A 7b252c
A::~A 7b2528
A::~A 7b2524
A::~A 7b2520
A::~A 7b251c
A::~A 7b2518
returned pointer = 7b2514
Note that objects are constructed and then destructed in LIFO (last-in first-out) order. Also, note that we used C-style I/O instead of stream I/O to print out information. Why is this? If stream I/O is used here, the program will crash with a popular compiler, probably because at the first call to operator new[](), the I/O system is not initialized as yet (the call to new in this case is presumably to obtain a buffer to initialize the system). So you need to be very careful in overloading the global versions of new and delete.
It's also possible to define operator new[]() and operator delete[]() on a per-class basis.
Considering this feature and the one described in the next section, there are six varieties each of new and delete: regular + throws exception
regular + doesn't throw exception
array + throws exception
array + doesn't throw exception
placement + doesn't throw exception
placement + array + doesn't throw exception