вторник, 19 июля 2011 г.

Peculiarities of the Development of 64-bit Applications

Abstract

What did programmers get when 64-bit systems came to power? Besides the numerous advantages described in many advertising articles, programmers got the whole bunch of brainteasers, puzzles and even traps. Everyone who wants to get real advantages of using 64-bit systems has to face these ones.

Introduction

The traditional 32-bit applications are reaching the capacity deadline in using their main memory. In Windows-systems two gigabytes of memory are available for user's application (in some cases three gigabytes), whereas it is necessary to store a greater amount of data in the main memory in order to rise the program efficiency. So, the memory limitations often make computer games lovers wait for the additional loading of the parts of the same level, and this greatly reduces the "presence effect". The users who work with video clips have to edit video frames using the hard drive instead of storing all the data in the main memory. And finally, the scientists for their scientific work (such as visualization and modeling) have to be limited with the minimum-size objects. Indeed, the modeling of large-scale objects at a speed adequate to the task is possible only when the data are stored in the main memory of the computer. And all this is true, to say nothing of the tasks whose solution requires using data base.
It was necessary to solve somehow the crisis which appeared in the world of programming. There are two ways of development in the history of mankind: evolution and revolution. Everyone is sure to be waiting for a revolution which will allow the programmers not to care of the main memory size, the speed of calculation and other things the disregard of which leads to the creation of monster-programs. However, the date of the next computer revolution is still obscure (at least, to the author of this article) and the problem has to be solved as early as today (not to say "yesterday").The kings of the computer world, such companies as AMD and Intel proposed the evolution increase of the digit capacity of a computer. We were offered the 64-bit architecture instead of 32-bit one. In other words, for the addressing a main memory location 64-bit numbers are used instead of 32-bit ones. All this leads to the enlargement of the available main memory up to inconceivable amounts. Such way of development is not entirely new in the computing world. Older programmers witnessed the transition from the 16-bit software to the 32-bit one which got its start with the appearance of the Intel 80386 processor. AMD and Intel engineers are eager to renew the overpassed success by expanding address space and the number of processor registers. As a consequence, the problems of modern computers were not entirely solved but the necessity of their immediate solving was delayed.

64 bits for programmers: the taming of programs

What did programmers get when 64-bit systems came to power? Besides the numerous advantages described in many advertising articles, programmers got the whole bunch of brainteasers, puzzles and even traps. Everyone who wants to get real advantages of using 64-bit systems has to face these ones.
When we talk about the real advantages, first of all we mean main memory available. Indeed, if a program is able to use 64-bit addressing space it doesn't mean that a certain program is capable to do it. What does the last sentence imply? It only emphasizes the fact that the program must be correctly written (or ported from the 32-bit platform) taking into account the support of 64-bit systems.
Larger manufacturers of development tools are trying to simplify the programmers' work finding some mistakes (connected with the 64-bit porting) by means of compiler. The main part of available documentation produced by these manufacturers claims that recompilation and correction of the mistakes found by this means will be sufficient for the correct work of an application under the 64-bit system. But the practice shows us that such "automatically captured" mistakes are only the upper part of an iceberg, and there are much more problems of porting in the real life.
Now let's turn to certain examples which are not to be found in official manuals for the development tools. For storing of memory blocks, array elements and other things in C++ language a special type of data named size_t is used. The size of this type coincides with the bit capacity, i. e. with 32-bit systems its size is 4 bytes, with 64-bit ones its size is 8 bytes. Consequently, theoretically we are able to get a memory block of maximum size consisting of 4 billion cells for 32-bit systems and a much larger memory block for 64-bit systems. It might seem that a program will automatically get advantages of 64-bit applications right after recompiling. The devil is in the detail. Do you always use size_t while working with large arrays and memory blocks? Have you ever said when writing a 32-bit system code:" This memory block is sure to be no more than one gigabyte!" If you did so, you might have used a variable of int-type for storing a memory block. But still this variable is equal to 4 bytes even with the 64-bit system. So, despite the fact that with the 64-bit system you might allocate any amount of memory for this block, in practice you will be limited by 4 Gb. This happens because of wrongly chosen type of variable in which the memory block size is stored.
Let us assume that the size of memory blocks in your program is calculated correctly. In this case the really large amount of memory will be allocated but the application may not work still. Why can it happen if we use a variable of size_t type for storing the elements of an array? Let's consider a simple cycle path in which an array of 5 billion elements is filled with numbers from 1 to 5000000000. We change the code in the following way:
size_t maxSize = 5000000000;
  int *buffer = new int[maxSize];
  size_t count = 0;
  for (int i = 0; i < maxSize; ++i) {
    buffer[i] = i;
  }
  // ...
  delete[] buffer;
If the array was sized not as 5 billion but as 5 million elements, this code would be correct with both 32-bit and 64-bit systems. But a 32-bit system won't be sufficient for 5 billion elements. We have a 64-bit system, and everything above is not a problem for us, is it? Unfortunately, it is still a problem! In this fragment the variable of maxSize is 64-bit for the 64-bit system. But the counter mechanism of the i (int) cycle remained a 32-bit one. As a result the value of the variable will vary from 0 to ... -2147483648 (minus 2 billions)! This unexpected effect occurs because of the variable overflow. Will the given array be filled correctly? Instead of theoretical argumentation let's hold an experiment. We'll change the code in the following way:
size_t maxSize = 5000000000;
size_t count = 0;
for (int i = 0; i < maxSize; ++i) {
  count++;
}
After the cycle is over we'll consider the value of the count variable. It will be equal to ... 2147483648. Instead of 5 billion times our cycle path was performed only 2 billion times. In the case of the array complete filling more than half of the elements will remain uninitialized!
What's the problem with such constructions? The matter is that compilers do not give diagnostic messages for a similar code, because from the point of view of C++ it is written correctly: the variable i transforms to type size_t. But we expected different behavior of the program. Static code analyzers are able to help with the diagnosis of such mistakes. They must be oriented to search the mistakes connected with the porting to 64-bit systems.
Some other problems are also connected with the coercion of the variable types. Let us suppose that there exists a function which assumes the argument of the parameter dimensionality size_t which esteems some quotient:
int Calc(size_t size) {
  // ...
}
If we activate this function with an argument of int-type, then the coercion of the type will be performed and no diagnostic messages will be produced by the compiler. However, there won't be any changes in the area of function determination. The function is determined for all the numbers of size_t type, and in fact it will be activated only for the numbers of the int type. And again here we deal with the same unpleasant situation: we've got a 64-bit code but in practice only 32-bit numbers are used.
There are some more interesting mistakes in the code, they may lead to unexpected behavior of programs ported to 64-bit platform from 32-bit one. For instance, the help subsystem of the application may go out of order. Is the help subsystem somehow connected with the 64-bit code? It is not. The author once had to face the following situation. A usual Windows application was written in Visual C++ language using MFC library. This library is held in respect with the developers because it easily allows to create application framework without any difficulties and even to attach support of the help system. For this purpose one only needs block the virtual function WinHelp().In this case the inheritance hierarchy
in Visual C++ 6.0 was like this:
class CWinApp {
  virtual void WinHelp(DWORD dwData, UINT nCmd);
};
class CMyApp : public CWinApp {
  virtual void WinHelp(DWORD dwData, UINT nCmd);
};
In the following versions of Visual C++ for the support of the 64-bit code the argument of the function WinHelp() in the MFC library was changed from the DWORD type to DWORD_PTR type:
class CWinApp {
  virtual void WinHelp(DWORD_PTR dwData, UINT nCmd);
}
But in the user's code no changes were held. As a result, while compiling the code for the 64-bit platform there appeared not only one overrided virtual function but two independent virtual functions, and this made the help system uncapable to work. To improve the situation the user's code shoul be corrected in the following way:
class CMyApp : public CWinApp {
  virtual void WinHelp(DWORD_PTR dwData, UINT nCmd);
};
After this operation the help system was able to work again.

Conclusion

Not all the problems which the programmers may face after recompiling their applications for working with 64-bit systems are regarded in this article. The problems of interaction of 32-bit and 64-bit applications, of storage support and data reconstruction in systems of different digit capacity, of compiler's search for incorrect overloaded functions remained uncovered. All these problems have similar features:
  • they emerge while porting old applications or developing new applications to a 64-bit platform;
  • most of them can not be diagnosed by a compiler because from the point of view of C++ language they look correct;
  • similar problems may bring down strongly the impression 64-bit version of your application.
Despite of all the possible difficulties with the application porting to 64-bit platforms the author still encourages you to put it into practice. The advantages of the 64-bit code allow to rise a software product to a new level. But one should not forget about the possible problems and should also check out his software with the static analiser code to be sure there're no such defects.

Комментариев нет:

Отправить комментарий