среда, 20 июля 2011 г.

Static code analysis for verification of the 64-bit applications

Abstract

The coming of 64-bit processors to the PC market causes a problem which the developers have to solve: the old 32-bit applications should be ported to the new platform. After such code migration an application may behave incorrectly. The article is elucidating question of development and appliance of static code analyzer for checking out of the correctness of such application. Some problems emerging in applications after recompiling in 64-bit systems are considered in this article as well as the rules according to which the code check up is performed.
This article contains various examples of 64-bit errors. However, we have learnt much more examples and types of errors since we started writing the article and they were not included into it. Please see the article "A Collection of Examples of 64-bit Errors in Real Programs" that covers defects in 64-bit programs we know of most thoroughly. We also recommend you to study the course "Lessons on development of 64-bit C/C++ applications" where we describe the methodology of creating correct 64-bit code and searching for all types of defects using the Viva64 code analyzer.

1. Introduction

Mass production of the 64-bit processors and the fact that they are widely spread led the developers to the necessity to develop 64-bit versions of their programs. The applications must be recompiled to support 64-bit architectures exactly for users to get real advantages of the new processors. Theoretically, this process must not contain any problems. But in practice after the recompiling an application often does not function in the way it is supposed to do. This may occur in different situations: from data file failure up to help system break down. The cause of such behavior is the alteration of the base type data size in 64-bit processors, to be more exact, in the alteration of type size ratio. That's why the main problems of code migration appear in applications which were developed using programming languages like C or C++. In languages with strictly structuralized type system (for example .NET Framework languages) as a rule there are no such problems.
So, what's the problem with exactly these languages? The matter is that even all the high-level constructions and C++ libraries are finally realized with the use of the low-level data types, such as a pointer, a machine word, etc. When the architecture is changed and these data types are changed, too, the behavior of the program may also change.
In order to be sure that the program is correct with the new platform it is necessary to check the whole code manually and to make sure that it is correct. However, it is impossible to perform the whole real commercial application check-up because of its huge size.

2. The example of problems arising when code is ported to 64-bit platforms

Here are some examples illustrating the appearance of some new errors in an application after the code migration to a 64-bit platform. Other examples may be found in different articles [1, 2].
When the amount of memory necessary for the array was defined constant size of type was used. With the 64-bit system this size was changed, but the code remained the same:
size_t ArraySize = N * 4;
intptr_t *Array = (intptr_t *)malloc(ArraySize); 
Some function returned the value of -1 size_t type if there was an error. The checking of the result was written in the following way:
size_t result = func();
if (result == 0xffffffffu) {
// error
}
For the 64-bit system the value of -1 for this type is different from 0xffffffff and the check up does not work.
The pointer arithmetic is a permanent source of problems. But in the case of 64-bit applications some new problems are added to the already existing ones. Let's consider the example:
unsigned a16, b16, c16;
char *pointer;
...             
pointer += a16 * b16 * c16;
As we can see, the pointer is never able to get an increment more than 4 gigabytes and this, though, is not diagnosed by modern compilers as a warning, and in the future would lead to disability of programs to work. There exist many more examples of potentially dangerous code.
All these and many other errors were discovered in real applications while migration to the 64-bit platform.

3. The review of the existing solutions

There exist different approaches to the securing of the code applications correctness. Let's enumerate the most widely spread ones: unit test checking, dynamic code analysis (performed when an application is working), static code analysis (analysis of source code). No one can claim that one of the variants of testing is better than the others, but all these approaches support different aspects of application quality.
Unit tests are meant for quick checking of small sections of a code, for instance, of single functions and classes [3]. Their peculiarity is that these tests are performed quickly and allow being started often. And this causes two nuances of using this technology. The first one is that these tests must be written. Secondly, testing of large amounts of memory (for example, more than two gigabytes) takes much time, so it is not expedient because the unit tests must work fast.
Dynamic code analyzers (the best representative of which is Compuware Bounds Checker) are meant to find errors in an application while the latter is running a program. This principle of work determines the main disadvantage of the dynamic analyzer. To make sure the program is correct it is necessary to accomplish all the possible code branches. For a real program this might be difficult. But this does not mean that the dynamic code analyzer is useless. This analysis allows to discover the errors which depends upon the actions of the user and cannot be defined through the application code.
Static code analyzers (for instance Gimpel Software PC-lint and Parasoft C++test) are meant for complex securing of the code quality and contain several hundreds of analyzed rules [4]. They also contain some rules which analyze the correctness of 64-bit applications. However, they are code analyzers of general purpose, so their use of securing the 64-bit application quality is not always appropriate. This can be explained by the fact that they are not meant for this purpose. Another serious disadvantage is their directivity to the data model which is used in Unix-systems (LP64),while the data model used in Windows-systems (LLP64) is quite different. That's why the use of static analyzers for checking of 64-bit Windows applications can be possible only after unobvious additional setting.
The presence of special diagnostic system for potentially incorrect code (for instance key /Wp64 in Microsoft Visual C++ compiler) can be considered as some additional level of code check. However this key allows to track only the most incorrect constructions, while it leaves out many other dangerous operations.
There arises a question "Is it really necessary to check the code while migrating to 64-bit systems if there are only few such errors in the application?" We believe that this checking is necessary at least because large companies (such as IBM and Hewlett-Packard) have laid out some articles [2] devoted to errors which appear when the code is being ported in their sites.

4. The Rules of the Code Correctness Analysis

We have formulated 10 rules of search of dangerous from the point of view of code migrating to 64-bit system C++ language constructions.
In the rules we use a specially introduced memsize type. Here we mean any simple integer type capable of storing a pointer inside and able to change its size when the digit capacity of a platform changes from 32 to 64 bit. The examples of memsize types are size_t, ptrdiff_t, all pointers, intptr_t, INT_PTR, DWORD_PTR.
Now let's list the rules themselves and give some examples of their application.

RULE 1.

Constructions of implicit and explicit integer type of 32 bits converted to memsize types should be considered dangerous:
unsigned a;
size_t b = a;
array[a] = 1;
The exceptions are:
1) The converted 32-bit integer type is a result of an expression in which less than 32 bits are required to represent the value of an expression:
unsigned short a;
unsigned char b;
size_t c = a * b;
At the same time the expression must not consist of only numerical literals:
size_t a = 100 * 100 * 100;
2) The converted 32-bit type is represented by a numeric literal:
size_t a = 1;
size_t b = 'G';

RULE 2.

Constructions of implicit and explicit conversion of memsize types to integer types of 32-bit size should be considered dangerous:
size_t a;       
unsigned b = a;
An exception: the converted size_t is the result of sizeof() operator accomplishment:
int a = sizeof(float);

RULE 3.

We should also consider to be dangerous a virtual function which meets the following conditions:
a) The function is declared in the base class and in the derived class.
b) Function argument types does not coincide but they are equivalent to each other with a 32-bit system (for example: unsigned, size_t) and are not equivalent with 64-bit one.
class Base {
  virtual void foo(size_t);
};
class Derive : public Base {
  virtual void foo(unsigned);
};

RULE 4.

The call of overloaded functions with the argument of memsize type. And besides, the functions must be overloaded for the whole 32-bit and 64-bit data types:
void WriteValue(__int32);
void WriteValue(__int64);
...
ptrdiff_t value;
WriteValue(value);

RULE 5.

The explicit conversion of one type of pointer to another should be consider dangerous if one of them refers to 32/64 bit type and the other refers to the memsize type:
int *array;        
size_t *sizetPtr = (size_t *)(array);

RULE 6.

Explicit and implicit conversion of memsize type to double and vice versa should be considered dangerous:
size_t a;
double b = a;

RULE 7.

The transition of memsize type to a function with variable number of arguments should be considered dangerous:
size_t a;
printf("%u", a);

RULE 8.

The use of series of magic constants (4, 32, 0x7fffffff, 0x80000000, 0xffffffff) should be considered as dangerous:
size_t values[ARRAY_SIZE];
memset(values, ARRAY_SIZE * 4, 0);

RULE 9.

The presence of memsize types members in unions should be considered as dangerous:
union PtrNumUnion {
  char *m_p;
  unsigned m_n;
} u;
...
u.m_p = str;
u.m_n += delta;

RULE 10.

Generation and processing of exceptions with use of memsize type should be considered dangerous:
char *p1, *p2;
try {
  throw (p1 - p2);
}
catch (int) {
  ...
}
It is necessary to note the fact that rule 1 covers not only type conversion while it is being assigned, but also when a function is called, an array is indexated and with pointer arithmetic. These rules (the first as well as the others) describe a large amount of errors, larger than the given examples. In other words, the given examples only illustrate some particular situations when these rules are applied.
The represented rules are embodied in static code analyzer Viva64. The principle of its functioning is covered in the following part.

5. Analyzer architecture

The work of analyzer consists of several stages, some of which are typical for common C++ compilers (picture 1).
Picture 1. Analyzer architecture.

Picture 1. Analyzer architecture.
At the input of the analyzer we have a file with the source code, and as a result of its work a report about potential code errors (with line numbers attached) is generated. The stages of the analyzer's work are the following: preprocessing, parsing and analysis itself.
At the preprocessing stage the files introduced by means of #include directive are inserted, and also the parameters of conditional compiling (#ifdef/#endif) are processed.
After the parsing of a file we get an abstract syntax tree with the information necessary for the future analysis is constructed. Let's take up a simple example:
int A, B;
ptrdiff_t C;
C = B * A;
There is a potential problem concerned with different data types in this code. Variable C can never possess the value less or more than 2 gigabytes and such situation may be incorrect. The analyzer must report that there is a potentially incorrect construction in the line "C = B * A". There are several variants of correction for this code. If variables B and a cannot possess the value less or more than 2 gigabytes in terms of the value, but the variable C can do it, so the expression should be written in the following way:
C =  (ptrdiff_t)(B) * (ptrdiff_t)(A);
But if the variables A and B with a 64-bit system can possess large values, so we should replace them with ptrdiff_t type:
ptrdiff_t A;
ptrdiff _t B;
ptrdiff _t C;
C = B * A;
Let's see how all this can be performed at the parsing stage.
First, an abstract syntax tree is constructed for the code (picture 2).
Picture 2. Abstract syntax tree.
Picture 2. Abstract syntax tree.
Then, at the parsing stage it is necessary to determine the types of variables, which participate in the evaluation of the expression. For this purpose some auxiliary information is used. This information was received when the tree was being constructed (type storage module). We can see this on the picture 3.
Picture 3. Type Information storage.

Picture 3. Type Information storage.
After the determination of types of all the variables participating in the expression it is necessary to calculate the resulting types of subexpressions. In the given example it is necessary to define the type of result of the intermediate expression "B * A". This can be done by means of the type evaluation module, as it is shown on the picture 4.
Picture 4. Expression type evaluation.
Picture 4. Expression type evaluation.
Then the correction of the resulting type expression evaluation is performed (operation "=" in the given example) and in the case of type conflict the construction is marked as potentially dangerous. There is such a conflict in the given example, because the variable C possesses the size of 64 bits (with the 64-bt system) and the result of the expression "B * A" possesses the size of 32 bits.
The analysis of other rules is performed in the similar way because almost all of them are related to the correction of the types of one or another parameter.

6. Results

Most methods of code analysis described in this article are embodied in the commercial static code analyzer Viva64. The use of this analyzer with real projects has proved the expediency of the code checking while developing 64-bit applications - real code errors could be discovered much quicker by means of this analyzer, than if you just use common examination of the source codes.

References

  1. J. P. Mueller. "24 Considerations for Moving Your Application to a 64-bit Platform", DevX.com, June 30, 2006.
  2. Hewlett-Packard, "Transitioning C and C++ programs to the 64-bit data model".
  3. S. Sokolov, "Bulletproofing C++ Code", Dr. Dobb's Journal, January 09, 2007.
  4. S. Meyers, M. Klaus, "A First Look at C++ Program Analyzer", Dr. Dobb's Journal, Feb. Issue, 1997.

Using Static Analysis in Program Development

Abstract

Static analysis allows checking program code before the tested program is executed. The static analysis process consists of three steps. First, the analyzed program code is split into tokens, i.e. constants, identifiers, reserved symbols, etc. This operation is performed by lexer. Second, the tokens are passed to parser, which builds an abstract syntax tree (AST) based on the tokens. Finally, the static analysis is performed over the AST. This article describes three static analysis techniques: AST walker analysis, data flow analysis and path-sensitive data flow analysis.

Introduction

Application testing is an important part of software development process. There are many different types of software testing. Among them there are two types involving the application's code: static analysis and dynamic analysis.

Dynamic analysis is performed on executable code of a compiled program. Dynamic analysis checks only user-specific behavior. That is, only the code, executed during a test is checked. Dynamic analyzer can provide the developer with information on memory leaks, program's performance, call stack, etc.

Static analysis allows checking program code before the tested program is executed. Compiler always performs static analysis during the compilation process. However, in large, real-life projects it is often necessary to make the entire source code fulfill some additional requirements. These additional requirements may vary from variables naming to portability (for example, the code should be successfully executed both on x86 and x64 platforms). The most common requirements are:

  • Reliability - a lower amount of bugs in the tested program.
  • Maintainability - better understanding of the source code by others so that it is easier to upgrade/change the source code.
  • Testability - shorter testing time due to more effective testing process.
  • Portability - flexibility when the tested program is launched on different hardware platforms (for example, x86 and x64, as it has already been mentioned above).
  • Readability - better understanding of the code by others and therefore shorter review times and clearer code reading[1].

All requirements can be divided into two categories: rules and guidelines. Rules describe what is mandatory, while guidelines describe what is recommended (by analogy with errors and warnings produced by built-in code analyzers of standard compilers).

Rules and guidelines form a coding standard. A coding standard defines the way a developer must and should write program code.

A static analyzer finds source code lines, which presumably do not fulfill the specified coding standard and displays diagnostic messages so that the developer can understand what is wrong with these lines. The static analysis process is similar to compilation except that no executable or object code is generated. This article describes the static analysis process step by step.

The Analysis Process

Static analysis process consists of two steps: abstract syntax tree creation and abstract syntax tree analysis.

In order to analyze source code, a static analysis tool should "understand" the code, that is, parse it and create a structure, describing the code in a convenient form. This form is named abstract syntax tree (often referred to as AST). To check, whether source code fulfils a coding standard, this tree should be built.

In general case, an abstract syntax tree is built only for an analyzed fragment of a source code (for example, for a specific function). Before the tree can be built, the source code is first processed by a lexer and then by a parser.

The lexer is responsible for dividing the input stream into individual tokens, identifying the type of the tokens, and passing tokens one at a time to the next stage of the analysis. The lexer reads text data line by line and splits a line to reserved words, identifiers and constants, which are called tokens. After a token is retrieved, the lexer identifies the type of the token.

If the first character of the token is a digit the token is a number, or if the first character is a minus sign the token is a negative number. If the token is a number it might be a real or an integer. If it contains a decimal point or the letter E (which indicates scientific notation) then it is a real, otherwise it is an integer. Note that this could be masking a lexical error. If the analyzed source code contains a token "4xyz" the lexer will turn it into an integer 4. It is likely that any such error will cause a syntax error, which the parser can catch. However, such errors can also be processed by lexer.

If the token is not a number it could be a string. String constants can be identified by quote marks, single quote marks or other symbols, depending on the syntax of the analyzed language.

Finally, if the token is not a string, it must be an identifier, a reserved word, or a reserved symbol. If the token is not identified as one of them, it is a lexical error. The lexer does not handle errors itself, so it simply notifies the parser that an unidentified token type has been found. The parser will handle the error[2].

The parser has an understanding of the language's grammar. It is responsible for identifying syntax errors and for translating an error free program into internal data structures, abstract syntax trees, that can be processed by static analyzer.

While lexer knows only language's syntax, parser also recognizes context. For example, let's declare a C function::

int Func(){return 0;}

Lexer will process this line in the following way (see table 1):

int Func ( ) { return 0 ; }
reserved word identifier reserved symbol reserved symbol reserved symbol reserved word integer constant reserved symbol reserved symbol

Table 1. Tokens of the "int Func(){return 0};" string.

The line will be identified as 8 correct tokens and these tokens will be passed to parser. The parser will check the context and find out that it is a declaration of function, which takes no parameters, returns an integer, and always returns 0.

The parser will find it out by creating an abstract syntax tree from the tokens provided by the lexer and analyzing the tree. If the tokens and the tree built from them will be considered to be correct, the tree will be used for the static analysis. Otherwise, the parser will report an error.

However, building an abstract syntax tree is not just organizing a set of tokens in a tree form.

Abstract Syntax Trees

An abstract syntax tree captures the essential structure of the input in a tree form, while omitting unnecessary syntactic details. ASTs can be distinguished from concrete syntax trees by their omission of tree nodes to represent punctuation marks such as semi-colons to terminate statements or commas to separate function arguments. ASTs also omit tree nodes that represent unary productions in the grammar. Such information is directly represented in ASTs by the structure of the tree.

ASTs can be created with hand-written parsers or by code produced by parser generators. ASTs are generally created bottom-up.

When designing the nodes of the tree, a common design choice is determining the granularity of the representation of the AST. That is, whether all constructs of the source language are represented as a different type of AST nodes or whether some constructs of the source language are represented with a common type of AST node and differentiated using a value. One example of choosing the granularity of representation is determining how to represent binary arithmetic operations. One choice is to have a single binary operation tree node, which has as one of its attributes the operation, e.g. "+". The other choice is to have a tree node for every binary operation. In an object-oriented language, this would results in classes like: AddBinary, SubtractBinary, MultiplyBinary, etc. with an abstract base class of Binary[3].

For example, let us parse two expressions: 1 + 2 * 3 + 4 * 5 and 1+ 2 * (3 + 4) * 5 (see figure 1):

Figure 1. Parsed expressions: 1 + 2 * 3 + 4 * 5 (left) and 1+ 2 * (3 + 4) * 5 (right).

Figure 1. Parsed expressions: 1 + 2 * 3 + 4 * 5 (left) and 1+ 2 * (3 + 4) * 5 (right).

As one can see from the figure, the expression can be restored to its original form if you walk the tree from left to right.

After the abstract syntax tree is created and verified, the static analyzer will be able to check, whether the source code fulfils the rules and guidelines specified by the code standard.

Static Analysis Techniques

There are many different analysis techniques, such as AST walker analysis, dataflow analysis, path-sensitive data flow analysis, etc. Concrete implementations of these techniques vary from tool to tool. Static analysis tools for different programming languages can be based on various analysis frameworks. These frameworks contain core sets of common techniques, which can be used in static analysis tools so that these tools reuse the same infrastructure. The supported analysis techniques and the way these techniques are implemented varies from framework to framework. For example, a framework may provide easy way to create an AST walker analyzer, but has no support for data-flow analysis[4].

Although all the three above mentioned analysis techniques use the AST created by parser, the techniques differ by their algorithms and purposes.

AST walker analysis, as one can see from the term, is performed by walking the AST and checking whether it fulfils the coding standard, specified as a set of rules and guidelines. This is the analysis performed by compilers.

Data flow analysis can be described as a process to collect information about the use, definition, and dependencies of data in programs. The data flow analysis algorithm operates on a control flow graph (CFG), generated from the source code AST. The CFG represents all possible execution paths of a given computer program: the nodes represent pieces of code and the edges represent possible control transfers between these code pieces. Since the analysis is performed without executing the tested program, it is impossible to determine the exact output of the program, i.e. to find out which execution path in the control flow graph is actually taken. That is why data flow analysis algorithms make approximations of this behavior, for example, by considering both branches of an if-then-else statement and by performing a fixed-point computation for the body of a while statement. Such a fixed-point always exists because the data flow equations compute sets of variables and there are only a finite number of variables available since we only consider programs with a finite number of statements. Therefore, there is a finite upper limit to the number of elements of the computed sets which means that a fixed-point always exists. In terms of control flow graphs, static analysis means that all possible execution paths are considered to be actual execution paths. The result of this assumption is that one can only obtain approximate solutions for certain data flow analysis problems[5].

The data flow analysis algorithm described above is path-insensitive, because it contributes all execution paths - whether feasible or infeasible, heavily or rarely executed - to a solution. However, programs execute only a small fraction of their potential paths and, moreover, execution time and cost is usually concentrated in a far smaller subset of hot paths. Therefore, it is natural to reduce the analyzed CFG and, therefore, to reduce the amount of calculations so that only a subset of the CFG paths are analyzed. Path-sensitive analysis operates on a reduced CFG, which does not include infeasible paths and does not contain "dangerous" code. The paths selection criteria are different in different tools. For example, a tool may analyze only the paths containing dynamic arrays declaration, which is considered to be "dangerous" according to the tool's settings.

Conclusion

The number of static analysis tools and techniques grows from year to year and this proves the growing interest in static analyzers. The cause of the interest is that the software under development becomes more and more complex and, therefore, it becomes impossible for developers to check the source code manually.

This article gave a brief description of the static analysis process and analysis techniques.

References

  1. Dirk Giesen Philosophy and practical implementation of static analyzer tools [Electronic resource]. -Electronic data. -Dirk Giesen, cop. 1998. -Access mode: http://www.viva64.com/go.php?url=63
  2. James Alan Farrell Compiler Basics [Electronic resource]. -Electronic data. -James Alan Farrell, cop 1995. -Access mode: http://www.viva64.com/go.php?url=64
  3. Joel Jones Abstract syntax tree implementation idioms [Electronic resource]. -Proceedings of the 10th Conference on Pattern Languages of Programs 2003, cop 2003.
  4. Ciera Nicole Christopher Evaluating Static Analysis Frameworks [Electronic resource].- Ciera Nicole, cop. 2006. - Access mode: http://www.viva64.com/go.php?url=64
  5. Leon Moonen A Generic Architecture for Data Flow Analysis to Support Reverse Engineering [Electronic resource]. - Proceedings of the 2nd International Workshop on the Theory and Practice of Algebraic Specifications, cop. 1997.

Using Static Analysis in Program Development

Abstract

Static analysis allows checking program code before the tested program is executed. The static analysis process consists of three steps. First, the analyzed program code is split into tokens, i.e. constants, identifiers, reserved symbols, etc. This operation is performed by lexer. Second, the tokens are passed to parser, which builds an abstract syntax tree (AST) based on the tokens. Finally, the static analysis is performed over the AST. This article describes three static analysis techniques: AST walker analysis, data flow analysis and path-sensitive data flow analysis.

Introduction

Application testing is an important part of software development process. There are many different types of software testing. Among them there are two types involving the application's code: static analysis and dynamic analysis.

Dynamic analysis is performed on executable code of a compiled program. Dynamic analysis checks only user-specific behavior. That is, only the code, executed during a test is checked. Dynamic analyzer can provide the developer with information on memory leaks, program's performance, call stack, etc.

Static analysis allows checking program code before the tested program is executed. Compiler always performs static analysis during the compilation process. However, in large, real-life projects it is often necessary to make the entire source code fulfill some additional requirements. These additional requirements may vary from variables naming to portability (for example, the code should be successfully executed both on x86 and x64 platforms). The most common requirements are:

  • Reliability - a lower amount of bugs in the tested program.
  • Maintainability - better understanding of the source code by others so that it is easier to upgrade/change the source code.
  • Testability - shorter testing time due to more effective testing process.
  • Portability - flexibility when the tested program is launched on different hardware platforms (for example, x86 and x64, as it has already been mentioned above).
  • Readability - better understanding of the code by others and therefore shorter review times and clearer code reading[1].

All requirements can be divided into two categories: rules and guidelines. Rules describe what is mandatory, while guidelines describe what is recommended (by analogy with errors and warnings produced by built-in code analyzers of standard compilers).

Rules and guidelines form a coding standard. A coding standard defines the way a developer must and should write program code.

A static analyzer finds source code lines, which presumably do not fulfill the specified coding standard and displays diagnostic messages so that the developer can understand what is wrong with these lines. The static analysis process is similar to compilation except that no executable or object code is generated. This article describes the static analysis process step by step.

The Analysis Process

Static analysis process consists of two steps: abstract syntax tree creation and abstract syntax tree analysis.

In order to analyze source code, a static analysis tool should "understand" the code, that is, parse it and create a structure, describing the code in a convenient form. This form is named abstract syntax tree (often referred to as AST). To check, whether source code fulfils a coding standard, this tree should be built.

In general case, an abstract syntax tree is built only for an analyzed fragment of a source code (for example, for a specific function). Before the tree can be built, the source code is first processed by a lexer and then by a parser.

The lexer is responsible for dividing the input stream into individual tokens, identifying the type of the tokens, and passing tokens one at a time to the next stage of the analysis. The lexer reads text data line by line and splits a line to reserved words, identifiers and constants, which are called tokens. After a token is retrieved, the lexer identifies the type of the token.

If the first character of the token is a digit the token is a number, or if the first character is a minus sign the token is a negative number. If the token is a number it might be a real or an integer. If it contains a decimal point or the letter E (which indicates scientific notation) then it is a real, otherwise it is an integer. Note that this could be masking a lexical error. If the analyzed source code contains a token "4xyz" the lexer will turn it into an integer 4. It is likely that any such error will cause a syntax error, which the parser can catch. However, such errors can also be processed by lexer.

If the token is not a number it could be a string. String constants can be identified by quote marks, single quote marks or other symbols, depending on the syntax of the analyzed language.

Finally, if the token is not a string, it must be an identifier, a reserved word, or a reserved symbol. If the token is not identified as one of them, it is a lexical error. The lexer does not handle errors itself, so it simply notifies the parser that an unidentified token type has been found. The parser will handle the error[2].

The parser has an understanding of the language's grammar. It is responsible for identifying syntax errors and for translating an error free program into internal data structures, abstract syntax trees, that can be processed by static analyzer.

While lexer knows only language's syntax, parser also recognizes context. For example, let's declare a C function::

int Func(){return 0;}

Lexer will process this line in the following way (see table 1):

int Func ( ) { return 0 ; }
reserved word identifier reserved symbol reserved symbol reserved symbol reserved word integer constant reserved symbol reserved symbol

Table 1. Tokens of the "int Func(){return 0};" string.

The line will be identified as 8 correct tokens and these tokens will be passed to parser. The parser will check the context and find out that it is a declaration of function, which takes no parameters, returns an integer, and always returns 0.

The parser will find it out by creating an abstract syntax tree from the tokens provided by the lexer and analyzing the tree. If the tokens and the tree built from them will be considered to be correct, the tree will be used for the static analysis. Otherwise, the parser will report an error.

However, building an abstract syntax tree is not just organizing a set of tokens in a tree form.

Abstract Syntax Trees

An abstract syntax tree captures the essential structure of the input in a tree form, while omitting unnecessary syntactic details. ASTs can be distinguished from concrete syntax trees by their omission of tree nodes to represent punctuation marks such as semi-colons to terminate statements or commas to separate function arguments. ASTs also omit tree nodes that represent unary productions in the grammar. Such information is directly represented in ASTs by the structure of the tree.

ASTs can be created with hand-written parsers or by code produced by parser generators. ASTs are generally created bottom-up.

When designing the nodes of the tree, a common design choice is determining the granularity of the representation of the AST. That is, whether all constructs of the source language are represented as a different type of AST nodes or whether some constructs of the source language are represented with a common type of AST node and differentiated using a value. One example of choosing the granularity of representation is determining how to represent binary arithmetic operations. One choice is to have a single binary operation tree node, which has as one of its attributes the operation, e.g. "+". The other choice is to have a tree node for every binary operation. In an object-oriented language, this would results in classes like: AddBinary, SubtractBinary, MultiplyBinary, etc. with an abstract base class of Binary[3].

For example, let us parse two expressions: 1 + 2 * 3 + 4 * 5 and 1+ 2 * (3 + 4) * 5 (see figure 1):

Figure 1. Parsed expressions: 1 + 2 * 3 + 4 * 5 (left) and 1+ 2 * (3 + 4) * 5 (right).

Figure 1. Parsed expressions: 1 + 2 * 3 + 4 * 5 (left) and 1+ 2 * (3 + 4) * 5 (right).

As one can see from the figure, the expression can be restored to its original form if you walk the tree from left to right.

After the abstract syntax tree is created and verified, the static analyzer will be able to check, whether the source code fulfils the rules and guidelines specified by the code standard.

Static Analysis Techniques

There are many different analysis techniques, such as AST walker analysis, dataflow analysis, path-sensitive data flow analysis, etc. Concrete implementations of these techniques vary from tool to tool. Static analysis tools for different programming languages can be based on various analysis frameworks. These frameworks contain core sets of common techniques, which can be used in static analysis tools so that these tools reuse the same infrastructure. The supported analysis techniques and the way these techniques are implemented varies from framework to framework. For example, a framework may provide easy way to create an AST walker analyzer, but has no support for data-flow analysis[4].

Although all the three above mentioned analysis techniques use the AST created by parser, the techniques differ by their algorithms and purposes.

AST walker analysis, as one can see from the term, is performed by walking the AST and checking whether it fulfils the coding standard, specified as a set of rules and guidelines. This is the analysis performed by compilers.

Data flow analysis can be described as a process to collect information about the use, definition, and dependencies of data in programs. The data flow analysis algorithm operates on a control flow graph (CFG), generated from the source code AST. The CFG represents all possible execution paths of a given computer program: the nodes represent pieces of code and the edges represent possible control transfers between these code pieces. Since the analysis is performed without executing the tested program, it is impossible to determine the exact output of the program, i.e. to find out which execution path in the control flow graph is actually taken. That is why data flow analysis algorithms make approximations of this behavior, for example, by considering both branches of an if-then-else statement and by performing a fixed-point computation for the body of a while statement. Such a fixed-point always exists because the data flow equations compute sets of variables and there are only a finite number of variables available since we only consider programs with a finite number of statements. Therefore, there is a finite upper limit to the number of elements of the computed sets which means that a fixed-point always exists. In terms of control flow graphs, static analysis means that all possible execution paths are considered to be actual execution paths. The result of this assumption is that one can only obtain approximate solutions for certain data flow analysis problems[5].

The data flow analysis algorithm described above is path-insensitive, because it contributes all execution paths - whether feasible or infeasible, heavily or rarely executed - to a solution. However, programs execute only a small fraction of their potential paths and, moreover, execution time and cost is usually concentrated in a far smaller subset of hot paths. Therefore, it is natural to reduce the analyzed CFG and, therefore, to reduce the amount of calculations so that only a subset of the CFG paths are analyzed. Path-sensitive analysis operates on a reduced CFG, which does not include infeasible paths and does not contain "dangerous" code. The paths selection criteria are different in different tools. For example, a tool may analyze only the paths containing dynamic arrays declaration, which is considered to be "dangerous" according to the tool's settings.

Conclusion

The number of static analysis tools and techniques grows from year to year and this proves the growing interest in static analyzers. The cause of the interest is that the software under development becomes more and more complex and, therefore, it becomes impossible for developers to check the source code manually.

This article gave a brief description of the static analysis process and analysis techniques.

References

  1. Dirk Giesen Philosophy and practical implementation of static analyzer tools [Electronic resource]. -Electronic data. -Dirk Giesen, cop. 1998. -Access mode: http://www.viva64.com/go.php?url=63
  2. James Alan Farrell Compiler Basics [Electronic resource]. -Electronic data. -James Alan Farrell, cop 1995. -Access mode: http://www.viva64.com/go.php?url=64
  3. Joel Jones Abstract syntax tree implementation idioms [Electronic resource]. -Proceedings of the 10th Conference on Pattern Languages of Programs 2003, cop 2003.
  4. Ciera Nicole Christopher Evaluating Static Analysis Frameworks [Electronic resource].- Ciera Nicole, cop. 2006. - Access mode: http://www.viva64.com/go.php?url=64
  5. Leon Moonen A Generic Architecture for Data Flow Analysis to Support Reverse Engineering [Electronic resource]. - Proceedings of the 2nd International Workshop on the Theory and Practice of Algebraic Specifications, cop. 1997.

About size_t and ptrdiff_t

Abstract

The article will help the readers understand what size_t and ptrdiff_t types are, what they are used for and when they must be used. The article will be interesting for those developers who begin creation of 64-bit applications where use of size_t and ptrdiff_t types provides high performance, possibility to operate large data sizes and portability between different platforms.

Introduction

Before we begin I would like to notice that the definitions and recommendations given in the article refer to the most popular architectures for the moment (IA-32, Intel 64, IA-64) and may not fully apply to some exotic architectures.
The types size_t and ptrdiff_t were created to perform correct address arithmetic. It had been assumed for a long time that the size of int coincides with the size of a computer word (microprocessor's capacity) and it can be used as indexes to store sizes of objects or pointers. Correspondingly, address arithmetic was built with the use of int and unsigned types as well. int type is used in most training materials on programming in C and C++ in the loops' bodies and as indexes. The following example is nearly a canon:
for (int i = 0; i < n; i++)
  a[i] = 0;
As microprocessors developed over time and their capacity increased, it became irrational to further increase int type's sizes. There are a lot of reasons for that: economy of memory used, maximum portability etc. As a result, several data model appeared declaring the relations of C/C++ base types. Table N1 shows the main data models and lists the most popular systems using them.
Table N1. Data models

Table N1. Data models
As you can see from the table, it is not so easy to choose a variable's type to store a pointer or an object's size. To find the smartest solution of this problem size _t and ptrdiff_t types were created. They are guaranteed to be used for address arithmetic. And now the following code must become a canon:
for (ptrdiff_t i = 0; i < n; i++)
  a[i] = 0;
It is this code that can provide safety, portability and good performance. The rest of the article explains why.

size_t type

size_t type is a base unsigned integer type of C/C++ language. It is the type of the result returned by sizeof operator. The type's size is chosen so that it could store the maximum size of a theoretically possible array of any type. On a 32-bit system size_t will take 32 bits, on a 64-bit one 64 bits. In other words, a variable of size_t type can safely store a pointer. The exception is pointers to class functions but this is a special case. Although size_t can store a pointer, it is better to use another unsinged integer type uintptr_t for that purpose (its name reflects its capability). The types size_t and uintptr_t are synonyms. size_t type is usually used for loop counters, array indexing and address arithmetic.
The maximum possible value of size_t type is constant SIZE_MAX.

ptrdiff_t type

ptrdiff_t type is a base signed integer type of C/C++ language. The type's size is chosen so that it could store the maximum size of a theoretically possible array of any type. On a 32-bit system ptrdiff_t will take 32 bits, on a 64-bit one 64 bits. Like in size_t, ptrdiff_t can safely store a pointer except for a pointer to a class function. Also, ptrdiff_t is the type of the result of an expression where one pointer is subtracted from the other (ptr1-ptr2). ptrdiff_t type is usually used for loop counters, array indexing, size storage and address arithmetic. ptrdiff_t type has its synonym intptr_t whose name indicates more clearly that it can store a pointer.

Portability of size_t and ptrdiff_t

The types size_t and ptrdiff_t enable you to write well-portable code. The code created with the use of size_t and ptrdiff_t types is easy-portable. The size of size_t and ptrdiff_t always coincide with the pointer's size. Because of this, it is these types that should be used as indexes for large arrays, for storage of pointers and pointer arithmetic.
Linux-application developers often use long type for these purposes. Within the framework of 32-bit and 64-bit data models accepted in Linux, this really works. long type's size coincides with the pointer's size. But this code is incompatible with Windows data model and, consequently, you cannot consider it easy-portable. A more correct solution is to use types size_t and ptrdiff_t.
As an alternative to size_t and ptrdiff_t, Windows-developers can use types DWORD_PTR, SIZE_T, SSIZE_T etc. But still it is desirable to confine to size_t and ptrdiff_t types.

Safety of ptrdiff_t and size_t types in address arithmetic

Address arithmetic issues have been occurring very frequently since the beginning of adaptation of 64-bit systems. Most problems of porting 32-bit applications to 64-bit systems relate to the use of such types as int and long which are unsuitable for working with pointers and type arrays. The problems of porting applications to 64-bit systems are not limited by this, but most errors relate to address arithmetic and operation with indexes.
Here is a simplest example:
size_t n = ...;
for (unsigned i = 0; i < n; i++)
  a[i] = 0;
If we deal with the array consisting of more than UINT_MAX items, this code is incorrect. It is not easy to detect an error and predict the behavior of this code. The debug-version will hung but hardly will anyone process gigabytes of data in the debug-version. And the release-version, depending on the optimization settings and code's peculiarities, can either hung or suddenly fill all the array cells correctly producing thus an illusion of correct operation. As a result, there appear floating errors in the program occurring and vanishing with a subtlest change of the code. To learn more about such phantom errors and their dangerous consequences see the article "A 64-bit horse that can count" [1].
Another example of one more "sleeping" error which occurs at a particular combination of the input data (values of A and B variable):
int A = -2;
unsigned B = 1;
int array[5] = { 1, 2, 3, 4, 5 };
int *ptr = array + 3;
ptr = ptr + (A + B); //Error
printf("%i\n", *ptr);
This code will be correctly performed in the 32-bit version and print number "3". After compilation in 64-bit mode there will be a fail when executing the code. Let's examine the sequence of code execution and the cause of the error:
  • A variable of int type is cast into unsigned type;
  • A and B are summed. As a result, we get 0xFFFFFFFF value of unsigned type;
  • "ptr + 0xFFFFFFFFu" expression is calculated. The result depends on the pointer's size on the current platform. In the 32-bit program, the expression will be equal to "ptr - 1" and we will successfully print number 3. In the 64-bit program, 0xFFFFFFFFu value will be added to the pointer and as a result, the pointer will be far beyond the array's limits.
Such errors can be easily avoided by using size_t or ptrdiff_t types. In the first case, if the type of "i" variable is size_t, there will be no infinite loop. In the second case, if we use size_t or ptrdiff_t types for "A" and "B" variable, we will correctly print number "3".
Let's formulate a guideline: wherever you deal with pointers or arrays you should use size_t and ptrdiff_t types.
To learn more about the errors you can avoid by using size_t and ptrdiff_t types, see the following articles:

Performance of code using ptrdiff_t and size_t

Besides code safety, the use of ptrdiff_t and size_t types in address arithmetic can give you an additional gain of performance. For example, using int type as an index, the former's capacity being different from that of the pointer, will lead to that the binary code will contain additional data conversion commands. We speak about 64-bit code where pointers' size is 64 bits and int type's size remains 32 bits.
It is a difficult task to give a brief example of size_t type's advantage over unsigned type. To be objective we should use the compiler's optimizing abilities. And the two variants of the optimized code frequently become too different to show this very difference. We managed to create something like a simple example only with a sixth try. And still the example is not ideal because it demonstrates not those unnecessary data type conversions we spoke above, but that the compiler can build a more efficient code when using size_t type. Let's consider a program code arranging an array's items in the inverse order:
unsigned arraySize;
...
for (unsigned i = 0; i < arraySize / 2; i++)
{
  float value = array[i];
  array[i] = array[arraySize - i - 1];
  array[arraySize - i - 1] = value;
}
In the example, "arraySize" and "i" variables have unsigned type. This type can be easily replaced with size_t type, and now compare a small fragment of assembler code shown on Figure 1.
Figure N1.Comparison of 64-bit assembler code when using unsigned and size_t types
Figure N1.Comparison of 64-bit assembler code when using unsigned and size_t types
The compiler managed to build a more laconic code when using 64-bit registers. I am not affirming that the code created with the use of unsigned type will operate slower than the code using size_t. It is a very difficult task to compare speeds of code execution on modern processors. But from the example you can see that when the compiler operates arrays using 64-bit types it can build a shorter and faster code.
Proceeding from my own experience I can say that reasonable replacement of int and unsigned types with ptrdiff_t and size_t can give you an additional performance gain up to 10% on a 64-bit system. You can see an example of speed increase when using ptrdiff_t and size_t types in the fourth section of the article "Development of Resource-intensive Applications in Visual C++" [5].

Code refactoring with the purpose of moving to ptrdiff_t and size_t

As the reader can see, using ptrdiff_t and size_t types gives some advantages for 64-bit programs. However, it is not a good way out to replace all unsigned types with size_t ones. Firstly, it does not guarantee correct operation of a program on a 64-bit system. Secondly, it is most likely that due to this replacement, new errors will appear data format compatibility will be violated and so on. You should not forget that after this replacement the memory size needed for the program will greatly increase as well. And increase of the necessary memory size will slow down the application's work for cache will store fewer objects being dealt with.
Consequently, introduction of ptrdiff_t and size_t types into old code is a task of gradual refactoring demanding a great amount of time. In fact, you should look through the whole code and make the necessary alterations. Actually, this approach is too expensive and inefficient. There are two possible variants:
  1. To use specialized tools like Viva64 included into PVS-Studio. Viva64 is a static code analyzer detecting sections where it is reasonable to replace data types for the program to become correct and work efficiently on 64-bit systems. To learn more, see "PVS-Studio Tutorial" [6].
  2. If you do not plan to adapt a 32-bit program for 64-bit systems, there is no sense in data types' refactoring. A 32-bit program will not benefit in any way from using ptrdiff_t and size_t types.

References

  1. Andrey Karpov. A 64-bit horse that can count. http://www.viva64.com/art-1-2-377673569.html
  2. Andrey Karpov, Evgeniy Ryzhkov. 20 issues of porting C++ code on the 64-bit platform. http://www.viva64.com/art-1-2-599168895.html
  3. Andrey Karpov. Safety of 64-bit code. http://www.viva64.com/art-1-2-416605136.html
  4. Andrey Karpov, Evgeniy Ryzhkov. Traps detection during migration of C and C++ code to 64-bit Windows. http://www.viva64.com/art-1-2-2140958669.html
  5. Andrey Karpov, Evgeniy Ryzhkov. Development of Resource-intensive Applications in Visual C++. http://www.viva64.com/art-1-2-2014169752.html
  6. Evgeniy Ryzhkov. PVS-Studio Tutorial. http://www.viva64.com/art-4-2-747004748.html

Development of resource-intensive applications in Visual C++

Abstract

The article will familiarize application developers with tasks given them by the mass introduction of 64-bit multi-core processors symbolizing revolutionary increase of computing power available for an average user. It will also touch upon the problems of effective use of hardware resources for solving everyday applied tasks within the limits of Windows x64 operating system

Information for readers

By default in this article we mean Windows as the operating system. By 64-bit systems we mean the x86-64 (AMD64) architecture. By the development environment - Visual Studio 2005/2008. You may download a demo sample which will be touched upon in the article using this link: http://www.viva64.com/articles/testspeedexp.zip.

Introduction

Parallel computing and large RAM size are now available not only for large firmware complexes meant for large-scale scientific computing, but are also used for solving everyday tasks related to work, study, entertainment and computer games.
The possibility of paralleling and large RAM size, on the one hand, make the development of resource-intensive applications easier, but on the other hand, requires more qualification and knowledge in the area of parallel programming from a programmer. Unfortunately, a lot of developers are far from possessing such qualification and knowledge. And this is not because they are bad developers but because they simply haven't come across such tasks. This is not surprising since creation of parallel systems of information processing has been performed mostly in scientific institutions while solving tasks of modeling and forecasting until recently. Parallel computer complexes with large memory size were used also for solving applied tasks by enterprises, banks etc, but they have been rather expensive and very few developers were able to get acquainted with the peculiarities of developing software for such systems.
The authors of the article managed to take part in the development of resource-intensive software products related to visualization and modeling of physical processes and to learn the peculiarities of development, testing and debugging of such systems by themselves. By resource-intensive software we mean program code which efficiently uses abilities of multiprocessor systems and large amount of memory (2GB and more). That's why we'd like to bring some knowledge to developers who may find it useful while creating modern parallel 64-bit systems in the nearest future.
Problems related to parallel programming have been studied in detail long ago and described in many books, articles and study courses. That's why this article will pay more attention to the sphere of organizational and practical issues of developing high-performance applications and to the use of 64-bit technologies.
While talking about 64-bit systems we'll consider that they use LLP64 data model (see table 1). This data model is used in 64-bit versions of the Windows operating system. But information given here may be also useful for those who work with systems using a data model different from LLP64.
Table 1. Data models and their use in different operating systems.

Table 1. Data models and their use in different operating systems.

1. Be brave to use parallelism and 64-bit technology

Considering the conservatism in the sphere of developing large program systems, we would like, though, to advise you to use the abilities provided by multicore 64-bit processors. It may become a large competitive advantage over similar systems and also become a good reason for news in advertising campaigns.
It is senseless to delay 64-bit technology and parallelism usage since it is inevitable. You may ignore all-round passion for a new programming language or optimizing a program for MMX technology. But you cannot avoid increase of the amount of processed data and slowing down of clock frequency rise. Let's touch upon this statement in detail.
Parallelism is becoming the basis of productivity rise and this is related to the slowing down of the tempo of modern microprocessors' clock frequency rise. While the number of transistors on a dice is increasing, a sharp fall of clock frequency rise speed was outlined after 2005 (see figure 1). There is an article on this topic which is rather interesting: "The Free Lunch Is Over. A Fundamental Turn Toward Concurrency in Software" [1].
Figure 1. Rise of clock frequency and the number of transistors on a dice.
Figure 1. Rise of clock frequency and the number of transistors on a dice.
During last 30 years productivity has been determined by clock frequency, optimization of command execution and cache enlarging. In the next years it will be determined by the number of cores. Development of parallel programming will become the main direction of programming technologies' development.
Parallel programming will allow not only to solve the problem of slowing down of clock frequency rise but also to come to creation of scalable software which will fully use the increase of number of computational nodes in the processor. That is, software will gain productivity not only through the increase of the microprocessor's clock frequency, but through the rise of number of cores as well. Such systems are the future of software. And one who will master the new technologies quicker will be able to shift the software market for one's own benefit.
Although the use of 64-bit technologies doesn't look so impressive in comparison to parallelism, however, it provides a lot of new abilities too. Firstly, it is free 5-15% productivity rise. Secondly, large address space solves the problem of RAM fragmentation while working with large data sizes. The search for solution of this task has caused a lot of troubles for many developers whose programs crash because of memory shortage after several hours of work. Thirdly, this is an opportunity to easily work with data arrays of several GB. Sometimes it results in amazing rise of productivity, because HDD is not accessed.
If all said above doesn't convince you of advantages of 64-bit systems look closer what your colleagues or yourself are working at. Does somebody optimize code, raising the function's productivity in 10%, although this 10% can be got by simple recompilation of a program for the 64-bit architecture? Does somebody create his own class for working with arrays, loaded from files because there is no enough space for these arrays in memory? Do you develop your own manager of memory allocation in order not to fragment memory? If you say "Yes" to at least one of the question, you should stop and think for a while. Perhaps, you fight in vain. And perhaps, it would be more profitable to spend your time on porting your application on a 64-bit system where all these questions will disappear at once. Still, sooner or later you will have to spend your time on this.
Let's sum up all said above. It can be a waste of time trying to gain last profits from the 32-bit architecture. Save your time. Think about using parallelism and 64-bit address space to increase productivity. Give a new informational occasion and take the lead over your rivals while developing the market of high-performance applications.

2. Provide yourself with a good hardware

So, you decided to use parallelism and 64-bit technologies in your program development. Perfect. So, let's touch upon some organizational questions first.
Inspite of you have to face the development of complex programs processing large amounts of data, still the company management often doesn't understand the necessity of providing the developers with the most high-performance computers. Although your applications may be intended for high-performance workstations, you are likely to test and debug them on your own machine. Of course all programmers need high-performance computers even if their programs don't process many GB of data. All have to compile them, launch heavy auxiliary tools and so on. But only a developer of resource-intensive software reveals all the troubles caused by the lack of RAM or slow speed of the disk subsystem.
Show this part of the article to your manager. Now we'll try to explain why it is profitable to invest money into your tools - computers.
This may sound trivial but a fast processor and a fast disk subsystem may speed up the process of applications' compilation greatly! It seems to you that there is no difference between two minutes and one minute of compilation of some part of the code, doesn't it? The difference is enormous! Minutes turn out into hours, days, months. Ask your programmers to count how much time they spend awaiting the code compilation. Divide this time at least by 1.5 and then you can calculate how quickly the investment into the new machinery will be repaid. I assure you that you will be amused.
Keep in mind one more effect which will help you to save work time. If some action takes 5 minutes one will wait. If it is 10 minutes one will go to make coffee, read forums or play ping-pong, what will take more than 10 minutes! And not because he is an idler or wants coffee very much - he will be just bored. Don't let the action be interrupted: push the button - get the result. Make those processes consume 5 minutes instead of 10.
Once again I want you to note that the aim is not to occupy a programmer with a useful task in his leisure-time but to speed up all the processes in general. Setting up a second computer (dual-processor system) so that the the programmer will switch to other tasks while waiting is wrong at all. A programmer's labor is not that of a street cleaner, who can clear a bench from snow during a break between breaking ice. A programmer's labor needs concentration on the task and keeping in mind a lot of its elements. Don't try to make a programmer switch to another task (this try will be useless), try to make it so that he could continue to solve the task he's working at as soon as possible. According to the "Stresses of multitask work: how to fight them" article [2] a person needs 25 minutes to go deeply into some other task or return to an interrupted one. If you don't provide continuity of the process, half of the time will be wasted on this very switching over. It doesn't matter what it is - playing ping-pong or searching for an error in another program.
Don't spare money to buy some more gigabytes of memory. This purchase will be repaid after several steps of debugging a program allocating large memory size. Be aware that lack of RAM causes swapping and can slow down the process of debugging from minutes to hours.
Don't spare money to provide the machine with RAID subsystem. Not to be a theorist, here is an example from our own experience (table 2).
Configuration (pay attention to RAID) Time of building of an average project using large number of exterior libraries.
AMD Athlon(tm) 64 X2 Dual Core Processor 3800+, 2 GB of RAM,2 x 250Gb HDD SATA - RAID 0 95 minutes
AMD Athlon(tm) 64 X2 Dual Core Processor 4000+, 4 GB of RAM,500 Gb HDD SATA (No RAID) 140 minutes
Table 2. An example of how RAID affects the speed of an application's building.
Dear managers! Trust me that saving on hardware is compensated by delays of programmers' work. Such companies as Microsoft provide the developers with latest models of hardware not because of generosity and wastefulness. They do count their money and their example shouldn't be ignored.
At this point the part of the article concerning managers is over, and we would like to address creators of program solutions again. Do demand for the equipment you consider to be necessary for you. Don't be shy, since your manager is likely is not understanding that it is profitable for everybody. You should enlighten him. Moreover, in case the plan isn't fulfilled it is you who will seem to be guilty. It is easier to get new machinery than try to explain what are you wasting your time for. Imagine how can your excuse concerning why you have been correcting only one error all the day long sound: "It's a very large project. I launched the debugger, had to wait very long. And I have only one GB of memory. I can't work at anything else simultaneously. Windows began to swap. Found an error, corrected it, but I have to launch and check it again...". Perhaps, your manager won't say anything but will consider you just an idler. Don't allow this.
Your major task while developing a resource-intensive application is not designing a new system and even not the study of theory, this task is to demand to buy all the necessary hardware and software in time. Only after that you may begin to develop resource-intensive program solutions efficiently. It is impossible to write and check parallel programs without multicore processors. And it is impossible to write a system for processing large data sizes without necessary RAM size.
Before we switch to the next topic, we would like to share some ideas, which will help you to make your work process more comfortable.
You may try to solve the problem of slow building of a project by using special tools of parallel building of a system similar, for example, to IncrediBuild by Xoreax Software (http://www.xoreax.com). Of course, there are other similar systems which may be found in the Web.
The problem of testing applications with large data arrays (launch of a tests set) for which usual machines are not productive enough, may be solved by using several special high-performance machines via remote access. An example of remote access tool is Remote Desktop or X-Win. Usually simultaneous test launches are performed only by a few developers. And for a group of 5-7 developers 2 dedicated high-performance machines are quite enough. It won't be the most convenient solution but it will be rather saving in comparison to providing every developer with such workstations.

3. Use a logging system instead of a debugger

The next obstacle on your way while developing systems for processing large data sizes is that you are likely to reconsider your methodology of work with the debugger or even refuse to use it at all.
Some specialists offer to refuse the debugging methodology because of some ideological reasons. The main argument is that the use of the debugger is the use of the 'cut and try' method. A person noticing the incorrect behavior of the algorithm at some step of its execution brings corrections into it without examining why this error occurred and without thinking about the way he corrects it. If he didn't guess the right way of correction he will notice it during the next execution of the code and bring new corrections. It will result in lower quality of the code. And the author of this code is not always sure that he understands how it works. Opponents of debugging method offer to replace it with more strict discipline of algorithm development, with the use of functions as small as possible so that their working principles are clear. Besides, they offer to pay more attention to unit testing and to use logging systems for analysis of the program's correct work.
There are some rational arguments in the described criticism of debugging systems but, as in many cases, one should weigh everything and not run to extremes. The use of the debugger is often convenient and may save much efforts and time.

3.1. Causes why debugger is not so attractive

Bad applicability of the debugger for systems processing large data sizes is unfortunately related not to ideological but practical difficulties. We'd like to familiarize the readers with these difficulties to save their time on fighting with the debugging tool when it is of little use, and prompt them to search for alternative solutions.
Let's study some reasons why alternative means should be used instead of a traditional debugger (for example, the one integrated into Visual C++ environment).
1) Slow program execution.
Execution of a program under a debugger processing millions and billions of elements may become practically impossible because of great time costs. Firstly, it is necessary to use debugging code variant with optimization turned off and that already slows down the speed of algorithm's work. Secondly, in the debug version of the program more memory is allocated in order to check bounds of memory blocks, memory fill during allocation/deallocation etc. This slows down the work even more.
One can truly notice that a program may be debugged not necessarily at large working data sizes and testing tasks are anough. Unfortunately, this is not so. An unpleasant surprise is that while developing 64-bit systems you cannot be sure that the algorithms work correctly when testing them at small amounts of data instead of working amounts of many GB.
Here is another simple example demonstrating the problem of necessary testing at large amounts of data.
#include <vector>
#include <boost/filesystem/operations.hpp>
#include <fstream>
#include <iostream>
int main(int argc, char* argv[])
{
        std::ifstream file;
        file.open(argv[1], std::ifstream::binary);
        if (!file)
                return 1;
        boost::filesystem::path
                fullPath(argv[1], boost::filesystem::native);
        boost::uintmax_t fileSize =
                boost::filesystem::file_size(fullPath);
        std::vector<unsigned char> buffer;
        for (int i = 0; i != fileSize; ++i)
        {
                unsigned char c;
                file >> c;
                if (c >= 'A' && c <= 'Z')
                        buffer.push_back(c);
        }
        std::cout << "Array size=" << buffer.size()
                << std::endl;
        return 0;
}
This program reads the file and saves all the capital English letters in the array. If all the symbols in the output file are capital English letters we won't be able to put more than 2*1024*1024*1024 symbols into the array on a 32-bit system, and consequently to process the file of more than 2 GB. Let's imagine that this program has been used correctly on a 32-bit system, taking this limitation into consideration, and no errors occurred.
On a 64-bit system we'd like to process files of larger size as there is no limitation of the array's size. Unfortunately, the program is written incorrectly from the point of view of LLP64 data model (see table 1) used in the 64-bit version of Windows. The loop contains int type variable whose size is still 32 bits. If the file's size is 6 GB, condition "i != fileSize" will never be fulfilled and an infinite loop will occur.
This code is intended to show how difficult it is to use the debugger while searching for errors which occur only at a large memory size. When you get an infinite loop while processing the file on the 64-bit system you may take a file containing 50 bytes and watch how the functions work under the debugger. But an error won't occur at such data size and it is impossible to watch the processing of 6 billion elements under the debugger.
Of course, you should understand that this is only an example and that it can be debugged easily and the cause of the infinite loop may be found. Unfortunately, this often becomes practically impossible in complex systems because of the slow speed of the processing of large amount of data.
To learn more about such unpleasant examples see articles "Forgotten problems of 64-bit program development" [3] and "20 issues of porting C++ code on the 64-bit platform" [4].
2) Multi-threading.
The method of several instruction flows executed simultaneously for speeding up the processing of large amount of data has been used for a long time and rather successfully in cluster systems and high-performance servers. But only with the appearance of multicore processors on market, the possibility of parallel data processing is widely used in applications. And the urgency of the parallel system development will only increase in future.
Unfortunately, it is not easy to explain what is difficult in debugging of parallel programs. Only on facing the task of searching and correcting errors in parallel systems one may feel and understand the uselessness of such a tool as a debugger. But in general, all the problems may be reduced to the impossibility to reproduce many errors and to the way the debugging process affects the workflow of parallel algorithms.
To learn more about the problems of debugging parallel systems you may read the following articles: "Program debugging technology for machines with mass parallelism" [5], "Multi-threaded Debugging Techniques" [6], "Detecting Potential Deadlocks" [7].
Using specialized methods and tools solves the difficulties described. You may handle 64-bit code by using static analyzers, which work with the input program code and do not require its launch. An example is the Viva64 static code analyzer [8].
To debug parallel systems you should pay attention to such tools as TotalView Debugger (TVD) [9]. TotalView is the debugger for the C, C++ and Fortran languages, which works on Unix-compatible operating system and Mac OS X. It allows controlling execution threads, shows data of one or all the threads, can synchronize the threads through breakpoints. It also supports parallel programs using MPI and OpenMP.
Another interesting application for multithreaded analysis is Intel® Threading Analysis Tools [10].

3.2. Using a logging system

All the tools both mentioned and remaining undiscussed are surely useful and may be of great help while developing high-performance applications. But one shouldn't forget about such time-proved methodology as logging systems. Debugging via logging method hasn't become less urgent for several decades and still remains a good tool which we'll speak about in detail. The only change concerning logging systems is growing requirements applied to them. Let's try to list the properties a modern logging system should have for high-performance systems:
  • The code providing logging of data in the debug version must be absent in the release version of a software product. Firstly, this is related to the increase of performance and decrease of the software product's size. Secondly, it doesn't allow using debugging information for cracking the application and other illegal actions.
  • The logging system's interfaces should be compact so that they do not overload the main program code.
  • Data writing should be as fast as possible in order to bring minimal changes into time characteristics of parallel algorithms.
  • The output log should be understandable and analyzable. There should be a possibility to divide information gained from different threads and also to vary its specification level.
A logging system which meets these requirements allows solving the task of debugging parallel algorithms and also to debug algorithms processing large data arrays.
We won't give a particular example of a logging system's code in this article. It is hard to make such a system universal as it depends on the development environment very much, as well as the project's specificity, the developer's wishes and so on. Instead, we'll touch upon some technical solutions which will help you to create a convenient and effective logging system if you need it.
The simplest way to perform logging is to use the function similar to printf as shown in the example:
int x = 5, y = 10;
  ...   
  printf("Coordinate = (%d, %d)\n", x, y);
Its natural disadvantage is that the information will be shown both in the debugging mode and in the output product. That's why we have to change the code as follows:
#ifdef DEBUG_MODE
  #define WriteLog printf
#else
  #define WriteLog(a)
#endif
  WriteLog("Coordinate = (%d, %d)\n", x, y);
This is better. And pay attention that we use our own macro DEBUG_MODE instead of a standard macro _DEBUG to choose how the function WriteLog will be implemented. This allows including debug information into release versions that is important when debugging at large amount of data.
Unfortunately, when compiling the non-debugging version in Visual C++ environment we get a warning message: "warning C4002: too many actual parameters for macro 'WriteLog'". We may turn this message off, but it is a bad style. We can rewrite the code as follows:
#ifdef DEBUG_MODE
  #define WriteLog(a) printf a
#else
  #define WriteLog(a)
#endif
  WriteLog(("Coordinate = (%d, %d)\n", x, y));
This code is not smart because we have to use double parentheses and this is often forgotten. That's why let's bring some improvements:
#ifdef DEBUG_MODE
  #define WriteLog printf
#else
  inline int StubElepsisFunctionForLog(...) { return 0; }
  static class StubClassForLog {
  public:
    inline void operator =(size_t) {}
  private:
    inline StubClassForLog &operator =(const StubClassForLog &)
      { return *this; }
  } StubForLogObject;
  
  #define WriteLog \
    StubForLogObject = sizeof StubElepsisFunctionForLog
#endif
  WriteLog("Coordinate = (%d, %d)\n", x, y);
This code looks complicated but it allows to write single brackets. When the DEBUG_MODE flag is turned off this code doesn't matter at all and you can safely use it in critical code fragments.
The next modification is to add parameters to the logging function. Let us add the specification level and information type. The specification level may be defined as a parameter, for example:
enum E_LogVerbose {
  Main,
  Full
};
#ifdef DEBUG_MODE
  void WriteLog(E_LogVerbose,
                const char *strFormat, ...)
  {
    ...
  }
#else
  ...
#endif
WriteLog (Full, "Coordinate = (%d, %d)\n", x, y);
This is convenient because you can decide whether to filter unimportant messages or not after the program's shutdown using a special utility. The disadvantage of this method is that all the information is shown - both important and unimportant, and this may affect the productivity badly. That's why you may create several functions of the WriteLogMain, WriteLogFull type and so on, whose implementation will depend on the mode of the program's building.
We mentioned that the writing of the debugging information must not affect the speed of the algorithm's execution too much. We can reach this by creating a system for gathering messages, which are written in the thread executed simultaneously. The outline of this mechanism is shown on figure 2.
Figure 2. Logging system with the lazy write algorithm.

Figure 2. Logging system with the lazy write algorithm.
As you can see from the figure the next data portion is written into an intermediate array with strings of fixed length. The fixed size of the array and its strings allows excluding expensive memory allocation operations. This doesn't reduce the possibilities of this system at all. You can just select the strings' length and the array's size with spare. For example, 5000 strings of 4000 symbols will be enough for debugging nearly any system. And memory size of 20 MB which is necessary for that is not critical for modern systems, I think you'll agree with me. But if the array's overflow occurs anyway, it's easy to provide a mechanism for anticipatory writing of the information into a file.
The described mechanism provides practically instant execution of the WriteLog function. If there are offloaded processor's cores in the system the writing into the file will be practically transparent for the main program code.
The advantage of the described system is that it can function without changes while debugging the parallel program, when several threads are being written into the log simultaneously. You need just to add a process identifier so that you can know later from what threads the messages were received (see figure 3).
Figure 3. Logging system for multithread applications debugging.
Figure 3. Logging system for multithread applications debugging.
The last improvement we'd like to suggest is showing the level of nesting of messages at the moment of functions' call or the beginning of a logical block. This can be easily organized by using a special class that writes an identifier of a block's beginning into the log in the constructor, and an identifier of the block's end in the destructor. Let's try showing this on an example.
Program code:
class NewLevel {
public:
  NewLevel() { WriteLog("__BEGIN_LEVEL__\n"); }
  ~NewLevel() { WriteLog("__END_LEVEL__\n"); }
};
#define NEW_LEVEL NewLevel tempLevelObject;
void MyFoo() {
  WriteLog("Begin MyFoo()\n");
  NEW_LEVEL;
  int x = 5, y = 10;
  printf("Coordinate = (%d, %d)\n", x, y);
  WriteLog("Begin Loop:\n");
  for (unsigned i = 0; i != 3; ++i)
  {
    NEW_LEVEL;
    WriteLog("i=%u\n", i);
  }
}
Log's content:
Begin MyFoo()
__BEGIN_LEVEL__
Coordinate = (5, 10)
Begin Loop:
__BEGIN_LEVEL__
i=0
__END_LEVEL__
__BEGIN_LEVEL__
i=1     
__END_LEVEL__
__BEGIN_LEVEL__
i=2
__END_LEVEL__
Coordinate = (5, 10)
__END_LEVEL__
Log after transformation:
Begin MyFoo()
    Coordinate = (5, 10)
    Begin Loop:
        i=0
        i=1
        i=2
    Coordinate = (5, 10)
I think that's all for this topic. We'd like to mention at last that the article "Logging In C++" [11] may be of use for you too. We wish you successful debugging.

4. Using correct data types from the 64-bit technologies viewpoint

Using base data types corresponding to the hardware platform in C/C++ languages is an important point of creating quality and high-performance program solutions. With the appearance of 64-bit systems new data models have been used - LLP64, LP64, ILP64 (see table 1) and this changes the rules and recommendations concerning the use of base data types. Such types are int, unsigned, long, unsigned long, ptrdiff_t, size_t and pointers. Unfortunately, there are practically no popular literature and articles which touch upon the problems of choosing types. And those sources which do, for example "Software Optimization Guide for AMD64 Processors" [12], are seldom read by application developers.
The urgency of correct choice of base types for processing data is determined by two important causes: the correctness of the code execution and its efficiency.
Due to the historical development the base and the most often used integer type in C and C++ languages is int or unsigned int. It is accepted to consider the int type the most optimal as its size coincides with the length of the processor's computer word.
The computer word is a group of RAM bits taken by the processor at one call (or processed by it as a single group) and usually contains 16, 32 or 64 bits.
The tradition to make the int type size equal to the computer word has been seldom broken until recently. On 16-bit processors int consisted of 16 bits. On 32-bit processors it is 32 bits. Of course, there existed other correlations between int-size and the computer word's size but they were used seldom and are not of interest for us now.
We are interested in that fact that with the appearance of 64-bit processors the size of int type remained 32-bits in most systems. Int type has 32-bit size in the LLP64 and LP64 data models, which are used in 64-bit Windows operating system and most Unix systems (Linux, Solaris, SGI Irix, HP UX 11).
It is a bad decision to leave the int type size equal to 32 bits due to many reasons, but it is really a reasonable way to choose the lesser of two evils. First of all, it is related to the problems of providing backward compatibility. To learn more about the reasons of this choice you may read the "Why did the Win64 team choose the LLP64 model?" blog [13] and the "64-Bit Programming Models: Why LP64?" article [14].
For developers of 64-bit applications all said above is the reason to follow two new recommendations in the process of software development.
Recommendation 1. Use the ptrdiff_t and size_t types for loop counters and pointers arithmetic instead of the int and unsigned types.
Recommendation 2. Use the ptrdiff_t and size_t types for indexing in arrays instead of the int and unsigned types.
In other words you should use 64-bit data types in a 64-bit system whenever possible. Therefore you shouldn't use constructions like:
for (int i = 0; i != n; i++)
  array[i] = 0.0;       
Yes, this is a canonical code example. Yes, it is included in many programs. Yes, with this learning the C and C++ languages begins. But it is recommended not to use it anymore. Use either an iterator or the ptdriff_t and size_t data types as it is shown in the improved example:
for (size_t i = 0; i != n; i++)
  array[i] = 0.0;
Developers of Unix applications may notice that the practice of using long type for counters and indexing arrays has appeared long ago. Long type is 64-bit in 64-bit Unix systems and it looks smarter than ptdriff_t or size_t. Yes, it's true but we should keep in mind two important points.
1) long type's size remained 32-bit in 64-bit Windows operating system (see table 1). Therefore it cannot be used instead of the ptrdiff_t and size_t types.
2) Using the long and unsigned long types causes a lot of troubles for developers of cross-platform applications for Windows and Linux systems. Long type has different sizes in these systems and only complicates the situation. It's better to stick to types, which have the same size in 32-bit and 64-bit Windows and Linux systems.
Now let's explain why we are so insistent asking you to use the ptrdiff_t/size_t type instead of the usual int/unsigned type on an example.
We'll begin with an example illustrating the typical error of using unsigned type for loop counter in 64-bit code. We have already described a similar example before but let's see it once again as this error is widespread:
size_t Count = BigValue;
for (unsigned Index = 0; Index != Count; ++Index)
{ ... }         
This code is typical and its variants can be met in many programs. It is executed correctly in 32-bit systems where the value of Count variable cannot exceed SIZE_MAX (which is equal to UINT_MAX in a 32-bit system). In a 64-bit system the range of possible values for Count may be extended and in this case when Count > UINT_MAX, an eternal loop occurs. The proper correction of this code is to use the size_t type instead of unsigned.
The next example shows incorrect using of the int type for indexing large arrays:
double *BigArray;
int Index = 0;
while (...)
  BigArray[Index++] = 3.14f;
This code doesn't seem suspicious to an application developer accustomed to the practice of using variables of the int or unsigned types as arrays' indexes. Unfortunately, this code won't work in a 64-bit system if the size of the processed BigArray array becomes more than four billion items. In this case an overflow of Index variable will occur and the result of the program's work will be incorrect (not the entire array will be filled). Again, the correction of the code is to use the ptrdiff_t or size_t types for indexes.
As the last example we'd like to demonstrate the potential danger of mixed usage of 32-bit and 64-bit types, which you should avoid whenever possible. Unfortunately, few developers think about the consequences of inaccurate mixed arithmetic and the next example is absolutely unexpected for many (the results are received using Microsoft Visual C++ 2005 at 64-bit compilation mode):
int x = 100000;
int y = 100000;
int z = 100000;
intptr_t size = 1;                  // Result:
intptr_t v1 = x * y * z;            // -1530494976
intptr_t v2 = intptr_t(x) * y * z;  // 1000000000000000
intptr_t v3 = x * y * intptr_t(z);  // 141006540800000
intptr_t v4 = size * x * y * z;     // 1000000000000000
intptr_t v5 = x * y * z * size;     // -1530494976
intptr_t v6 = size * (x * y * z);   // -1530494976
intptr_t v7 = size * (x * y) * z;   // 141006540800000
intptr_t v8 = ((size * x) * y) * z; // 1000000000000000
intptr_t v9 = size * (x * (y * z)); // -1530494976
We want you to pay attention that expression of "intptr_t v2 = intptr_t(x) * y * z;" type doesn't guarantee the correct result at all. It guarantees only that expression "intptr_t(x) * y * z" will have intptr_t type. The article "20 issues of porting C++ code on the 64-bit platform" [4] will help you to learn more about this problem.
Now let's look at the example demonstrating the advantages of using the ptrdiff_t and size_t types from the viewpoint of productivity. For demonstration we'll take a simple algorithm of the minimal length of a path in the labyrinth calculation. You may see the whole code of the program here: http://www.Viva64.com/articles/testspeedexp.zip.
In this article we place only the text of the FindMinPath32 and FindMinPath64 functions. Both these functions calculate the length of the minimal path between two points in a labyrinth. The rest of the code is not of interest for us now.
typedef char FieldCell;
#define FREE_CELL 1
#define BARRIER_CELL 2
#define TRAVERSED_PATH_CELL 3
unsigned FindMinPath32(FieldCell (*field)[ArrayHeight_32], 
unsigned x, unsigned y, unsigned bestPathLen,
unsigned currentPathLen) {
  ++currentPathLen;
  if (currentPathLen >= bestPathLen)
    return UINT_MAX;
  if (x == FinishX_32 && y == FinishY_32)
    return currentPathLen;
  FieldCell oldState = field[x][y];
  field[x][y] = TRAVERSED_PATH_CELL;
  unsigned len = UINT_MAX;
  if (x > 0 && field[x - 1][y] == FREE_CELL) {
    unsigned reslen =
      FindMinPath32(field, x - 1, y, bestPathLen, currentPathLen);
    len = min(reslen, len);
  }
  if (x < ArrayWidth_32 - 1 && field[x + 1][y] == FREE_CELL) {
    unsigned reslen =
      FindMinPath32(field, x + 1, y, bestPathLen, currentPathLen);
    len = min(reslen, len);
  }
  if (y > 0 && field[x][y - 1] == FREE_CELL) {
    unsigned reslen =
      FindMinPath32(field, x, y - 1, bestPathLen, currentPathLen);
    len = min(reslen, len);
  }
  if (y < ArrayHeight_32 - 1 && field[x][y + 1] == FREE_CELL) {
    unsigned reslen =
      FindMinPath32(field, x, y + 1, bestPathLen, currentPathLen);
    len = min(reslen, len);
  }
  field[x][y] = oldState;
  
  if (len >= bestPathLen)
    return UINT_MAX;
  return len;
}
size_t FindMinPath64(FieldCell (*field)[ArrayHeight_64],
size_t x, size_t y, size_t bestPathLen, size_t 
currentPathLen) {
  ++currentPathLen;
  if (currentPathLen >= bestPathLen)
    return SIZE_MAX;
  if (x == FinishX_64 && y == FinishY_64)
    return currentPathLen;
  FieldCell oldState = field[x][y];
  field[x][y] = TRAVERSED_PATH_CELL;
  size_t len = SIZE_MAX;
  if (x > 0 && field[x - 1][y] == FREE_CELL) {
    size_t reslen =
      FindMinPath64(field, x - 1, y, bestPathLen, currentPathLen);
    len = min(reslen, len);
  }
  if (x < ArrayWidth_64 - 1 && field[x + 1][y] == FREE_CELL) {
    size_t reslen =
      FindMinPath64(field, x + 1, y, bestPathLen, currentPathLen);
    len = min(reslen, len);
  }
  if (y > 0 && field[x][y - 1] == FREE_CELL) {
    size_t reslen =
      FindMinPath64(field, x, y - 1, bestPathLen, currentPathLen);
    len = min(reslen, len);
  }
  if (y < ArrayHeight_64 - 1 && field[x][y + 1] == FREE_CELL) {
    size_t reslen =
      FindMinPath64(field, x, y + 1, bestPathLen, currentPathLen);
    len = min(reslen, len);
  }
  field[x][y] = oldState;
  
  if (len >= bestPathLen)
    return SIZE_MAX;
  return len;
}
The FindMinPath32 function is written in classic 32-bit style using unsigned types. FindMinPath64 function differs from it only in that all the unsigned types in it are replaced by the size_t type. There are no other differences! I think you'll agree that this is an easy modification of the program. And now let's compare the execution speeds of these two functions (see table 2).
Mode and function. Function's execution time
1 32-bit compilation mode. Function FindMinPath32 1
2 32-bit compilation mode. Function FindMinPath64 1.002
3 64-bit compilation mode. Function FindMinPath32 0.93
4 64-bit compilation mode. Function FindMinPath64 0.85
Table 2. Execution time of the FindMinPath32 and FindMinPath64 functions.
Table 2 shows the time relative to the FindMinPath32 function's execution speed in a 32-bit system. This is made for more clearness.
In the first line the work time of FindMinPath32 function in a 32-bit system is 1. It is because we take its work time for a unit of measure.
In the second line we see that FindMinPath64 function's work time in a 32-bit system is 1 too. This is not surprising because unsigned type coincides with size_t type in a 32-bit system and there is no difference between FindMinPath32 and FindMinPath64 functions. Some deviation (1.002) means only a little time calculation error.
In the third line we see 7% productivity increase. This is an expected result of the code recompilation for a 64-bit system.
The fourth line is the most interesting. Productivity increase is 15% here. This means that the simple use of the size_t type instead of unsigned allows the compiler to construct more efficient code working 8% faster!
It is a simple and clear example of how the use of data not equal to the computer word's size decreases the algorithms productivity. Simple replacement of the int and unsigned types with ptrdiff_t and size_t may give a great productivity increase. First of all this concerns usage of these data types for indexing arrays, pointer arithmetic and organization of loops.
We hope that having read all said above you will think if you should continue to write:
for (int i = 0; i !=n; i++)
  array[i] = 0.0;
Developers of Windows applications may take into consideration the Viva64 static code analyzer [8] to automate the errors search in 64-bit code. Firstly, its usage will help to find most errors. Secondly, while developing programs under its control you will use 32-bit variables more seldom, avoid mixed arithmetic with 32-bit and 64-bit data types that will at once increase productivity of your code. Developers of Unix application should take a look at the Gimpel Software PC-Lint [15] and Parasoft C++test [16] static analyzers. They can diagnose some 64-bit errors in the code with LP64 data model, which is used in most Unix-systems.
To learn more about the problems of developing quality and efficient 64-bit code you may read the following articles: "Problems of testing 64-bit applications" [17], "24 Considerations for Moving Your Application to a 64-bit Platform" [18], "Porting and Optimizing Multimedia Codecs for AMD64 architecture on Microsoft Windows" [19], "Porting and Optimizing Applications on 64-bit Windows for AMD64 Architecture" [20].

5. Additional ways of increasing productivity of program systems

In the last part of this article we'd like to touch upon some more technologies, which may be useful for you while developing resource-intensive program solutions.

5.1. Intrinsic functions

Intrinsic functions are special system-dependent functions, which execute actions impossible to be executed on the level of C/C++ code or which execute these actions much more efficiently. As the matter of fact they allow you to avoid using inline assembler because it is often impossible or undesirable.
Programs may use intrinsic functions for creating faster code due to the absence of overhead costs on the call of a usual function. In this case, of course, the code's size will be a bit larger. The list of functions which may be replaced with their intrinsic versions is given in the MSDN Library. For example, these functions are memcpy, strcmp, etc.
There is a special option, "/Oi" in Microsoft Visual C++ compiler. This option allows you to replace calls of some functions with intrinsic analogs automatically.
Besides automatic replacement of functions with their intrinsic analogs we can explicitly use intrinsic functions in code. This feature may be used due to the following reasons:
  • Inline assembler code is not supported by Visual C++ compiler in 64-bit mode, while intrinsic code is.
  • Intrinsic functions are simpler to use as they don't require the knowledge of registers and other similar low-level constructions.
  • Intrinsic functions are updated by compilers, while assembler code has to be updated manually.
  • Built-it optimizer doesn't work with assembler code, that's why you have to use external assembler modules, while intrinsic code doesn't need this.
  • Intrinsic code is easier to port than assembler code.
  • Using intrinsic functions in automatic mode (with the help of the compiler's key) you can get some increase of productivity, and "manual" mode will give you even more. That's why the usage of intrinsic functions is reasonable.
  • To learn more about intrinsic functions you may see the Visual C++ team's blog [21].

5.2. Data packing and alignment

Nowadays data alignment doesn't affect the code productivity so greatly as it did 10 years ago. But sometimes you can get a little profit in this sphere too, saving some memory and productivity.
Let's take a look at an example:
struct foo_original {int a; void *b; int c; };
This structure occupies 12 bytes in 32-bit mode but in 64-bit mode it takes 24 bytes. In order to make this structure take prescribed 16 bytes in 64-bit mode you should change the order of fields:
struct foo_new { void *b; int a;  int c; };
In some cases it is useful to help the compiler explicitly by defining the alignment manually in order to increase productivity. For example, SSE data should be aligned at the border of 16 bytes. You can do this in the following way:
// 16-byte aligned data
__declspec(align(16)) double init_val [3.14, 3.14];
// SSE2 movapd instruction
_m128d vector_var = __mm_load_pd(init_val);
The "Porting and Optimizing Multimedia Codecs for AMD64 architecture on Microsoft Windows" [19] and "Porting and Optimizing Applications on 64-bit Windows for AMD64 Architecture" [20] sources contain detailed review of these problems.

5.3. Files mapped into memory

With the appearance of 64-bit systems the technology of mapping files into memory became more attractive because the data access window increased its size. It may be a very good addition for some applications. Don't forget about it.
Memory mapping of files is becoming less useful with 32-bit architectures, especially with the introduction of relatively cheap recordable DVD technology. A 4 Gb file is no longer uncommon, and such large files cannot be memory mapped easily on 32-bit architectures. Only a region of the file can be mapped into memory and such regions will have to be mapped into and out of memory in order to access such a file via memory mapping. On 64-bit versions of Windows you have much larger address space, so you may map the entire file at once.

5.4. The __restrict keyword

One of the most serious problems for a compiler is aliasing. When a program source code contains memory read/write operations it is often impossible to find out whether more than one identifier refers to a certain memory space, i.e. whether more than one identifier can be a "synonym" for one and the same memory space. That's why the compiler should be very careful inside a loop in which memory is both read and written while storing data in registers and not in memory. This insufficient usage of registers may influence the performance greatly.
The __restrict keyword is used to make it easier for the compiler to make a decision. It "tells" the compiler to use registers widely.
The __restrict keyword makes the compiler not to consider the marked pointers to be aliased, i.e. referring to the same memory area. In this case the compiler can provide more efficient optimization. Let's take a look at the following example:
int * __restrict a;
int *b, *c;
for (int i = 0; i < 100; i++)
{
        *a += *b++ - *c++ ;  // no aliases exist
}
In this code the compiler can safely keep the sum in the register related to variable "a" avoiding writing into memory. MSDN Library is a good source of information about the __restrict keyword.

5.5. SSE instructions

Applications executed on 64-bit processors (regardless of the mode) will work more efficiently if the SSE instructions are used in them instead of MMX/3DNow. This behavior is related to the capacity of processed data. SSE/SSE2 instructions operate with 128-bit data, while MMX/3DNow work only with 64-bit data. That's why it is better to rewrite the code which uses MMX/3DNow so that it will use SSE instructions.
We won't dwell upon SSE-constructions in this article. The readers who may be interested in this topic can read the documentation written by processor architecture developers.

5.6. Some specific rules of language constructions usage

64-bit architecture gives new opportunities for optimizing a programming language on the level of separate operators. These are the methods (which have become traditional already) of "rewriting" pieces of a program to make the compiler optimize them better. Of course, we cannot recommend these methods for mass usage but it may be useful to learn about them.
On the first place of the entire list of these optimizations is manual unrolling of loops. The idea of this method is shown in the example:
double a[100], sum, sum1, sum2, sum3, sum4;
sum = sum1 = sum2 = sum3 = sum4 = 0.0;
for (int i = 0; i < 100; I += 4)
{
sum1 += a[i];
sum2 += a[i+1];
sum3 += a[i+2];
sum4 += a[i+3];
}
sum = sum1 + sum2 + sum3 + sum4;
In many cases the compiler can unroll the loop to this form (see the description of the /fp:fast key for Visual C++) but not always.
Another syntax optimization is the use of array notation instead of pointer one.
A lot of such methods are described in the "Software Optimization Guide for AMD64 Processors" article [12].

Conclusion

Though you'll have to face many difficulties while creating program systems using hardware abilities of modern computers efficiently, it is worthwhile. Parallel 64-bit systems provide new possibilities for developing real, scalable solutions. They allow to expand the abilities of modern data processing software tools, such as games, CAD-systems or pattern recognition. We wish you good luck in creating new technologies!

References

  1. Herb Sutter. The Free Lunch Is Over. A Fundamental Turn Toward Concurrency in Software. http://www.viva64.com/go.php?url=53
  2. Stresses of multitask work: how to fight them. http://www.viva64.com/go.php?url=54
  3. Andrey Karpov. Forgotten problems of developing 64-bit programs.
  4. http://www.viva64.com/art-1-2-16511733.html
  5. Andrey Karpov, Evgeniy Ryzhkov. 20 issues of porting C++ code on the 64-bit platform. http://www.viva64.com/art-1-2-599168895.html
  6. V.V. Samofalov, A.V.Konovalon. Program debugging technology for machines with mass parallelism//"Issues of atomic science and technology" Series "Mathematical modeling of physical processes". 1996. Issue 4. pp. 52-56.
  7. Shameem Akhter and Jason Roberts. Multi-threaded Debugging Techniques // Dr. Dobb's Journal, April 23, 2007, http://www.viva64.com/go.php?url=55
  8. Tomer Abramson. Detecting Potential Deadlocks // Dr. Dobb's Journal, January 01, 2006, http://www.viva64.com/go.php?url=56
  9. Viva64 Tool Overview. http://www.viva64.com/viva64.php
  10. The TotalView Debugger (TVD). http://www.viva64.com/go.php?url=57
  11. Intel Threading Analysis Tools. http://www.viva64.com/go.php?url=58
  12. Petru Marginean. Logging In C++ // Dr. Dobb's Journal, September 05, 2007, http://www.viva64.com/go.php?url=37
  13. Software Optimization Guide for AMD64 Processors. http://www.viva64.com/go.php?url=59
  14. Blog "The Old New Thing": "Why did the Win64 team choose the LLP64 model?". http://www.viva64.com/go.php?url=25
  15. "64-Bit Programming Models: Why LP64?" http://www.viva64.com/go.php?url=24
  16. Gimpel Software PC-Lint. http://www.gimpel.com
  17. Parasoft C++test. http://www.parasoft.com
  18. Andrey Karpov. Problems of testing 64-bit applications.
  19. http://www.viva64.com/art-1-2-1289354852.html
  20. John Paul Mueller. 24 Considerations for Moving Your Application to a 64-bit Platform. http://www.viva64.com/go.php?url=30
  21. Harsha Jaquasia. Porting and Optimizing Multimedia Codecs for AMD64 architecture on Microsoft Windows. http://www.viva64.com/go.php?url=60
  22. Mike Wall. Porting and Optimizing Applications on 64-bit Windows for AMD64 Architecture. http://www.viva64.com/go.php?url=61
  23. Dylan Birtolo. Visual C++ Team Blog: New Intrinsic Support in Visual Studio 2008. http://www.viva64.com/go.php?url=62