View Single Post
Old Mar 10th, 2006, 2:17 AM   #9
grumpy
Programming Guru
 
grumpy's Avatar
 
Join Date: Jun 2005
Location: Adelaide, South Australia
Posts: 1,329
Rep Power: 10 grumpy will become famous soon enough
Sigh.

Notions of heap and stack are often used to describe the diffference between variables that are dynamically allocated (explicitly created with operator new and delete) and those that are not (eg declared locally in a function). In fact, the notions of heap and stack are not even mentioned in the C or C++ standard. Heap and stack are, in fact, old concepts dating from a time when a moderately high end computer had a few kilobytes of total RAM that was able to be addressed directly by the CPU, and a few more kilobytes were able to be addressed on another memory card using a dedicated device driver. For example, IBM PCs in the early 1990s the CPU was able to access about 640K of RAM directly, and a second area of memory [in the sense of being different memory chips or cards] (which brought the total memory on the PC up to a couple of megabytes or so) was able to be addressed by use of a specially installed device driver. On those PC's, that 640K of RAM was referred to as stack and the second block of memory was referred to by various names, such as high memory, extended memory, or ..... heap. On those machines stack and heap were physically different areas of memory, and managed directly. The architecture of basic operating systems (eg MS-DOS) at the time was basically able to execute programs in the stack, and the heap was explicitly used by programs to store data. This was caused by the fact that the CPU could not (or would not) treat data in the heap as executable statements. In practice, the amount of "stack" was limited to 640K or so by capabilities of the CPU, and heap was up to a couple of megabytes i.e. stack was at a premium if the user wanted to execute a large program, and heap was used to store data that did not need to be executed. Some later versions of operating systems (eg MS-DOS 5) allowed the stack to be increased to about 1MB by some clever use of device drivers to put executable programs into the heap (the memory between 640K and 1 MB was then referred to as "upper memory") but, even with that tweaking, stack was still at a premium and programs had to hump through hoops to use other memory in the heap. All of this was driven by early CPU architectures and their ability to address memory.

With more modern CPUs (eg the first 32 bit CPUs) the ground rules changed. The CPUs were able to access larger amounts of memory and, on the computers from that time, there was typically only one area of memory. The only difference was that the term "stack" was used to describe the amount of memory available when a program started up (eg the default allowances for processes under unix and 32 bit windows) and "heap" was used to describe all other memory. The distinction is, however, arbitrary because physically stack and heap are the same memory.

Now, in the same picture are programming languages (eg C and C++) that support three types of variables: local (eg an auto variable declared localling in a function), static variables (a variable with lifetime that persists between function calls, whether the variable is local to a function or accessible from a group of functions), and dynamically allocated variables. On older machines (eg our older IBM PC) local and static variables were typically both placed in the stack, and dynamically allocated variables were placed in the heap. The reason for the difference was performance: local and static variables tended to be small (eg basic types or smallish structs) and were able to be created and destroyed quickly (static variables at program startup/exit, auto variables within the block they are declared in). Dynamically allocated variables (using malloc()/free() in C, new/delete in C++) were under explicit control of the programmer and could include much larger blocks of memory which is potentially (and often was) a little slower to allocate and cleanup.

The result, because of that history, is that dynamically allocated memory is often described as being on the heap, and other things are often described as being on the stack. And, because of that description, people still assume they have to be careful to avoid using the stack too much and should put everything they can in the heap. But on modern machines, stack and heap are the same thing.

OK. End history lesson. Now to answer the original question ....

To come back to the original question, "what is the difference between string str("hello"); and string *strptr = new string("hello"); and when should you use one over the other?" the arguments about using heap and stack are irrelevant (unless you're working on a VERY old machine with a very old compiler).

The distinction between the two forms, from a pure C++ perspective, is only to do with how lifetime of objects are managed. The first form "string str("hello")" declares an object whos lifetime is explicitly defined by language rules.

For example;
#include <string>

using std::string;

string global("hello");  // this string is created before main() is called and cleaned up after demo() returns

string *demo()
{
     string local("hello");    // this variable is created after main() is called
 
      // other code, which can access local as it exists

     //   local is destroyed when the function returns
      
     return local;
}

int main()
{
    string *x = demo();
    *x = "boom";    // undefined behaviour as x is pointing at a string that no longer exists
}

The form "string = new string("hello");" simply means that the lifetime of the object is explicitly controlled by the programmer. For example;

#include <string>

using std::string;

string *demo1()
{
       string *str = new string("hello");

        // str continues to exist here

       *str = "hello again";     // we can change the contents safely

      delete str;    // the string no longer exists

       *str = "boom";     // this now invokes undefined behaviour (common symptom is a program crash, but not necessarily)

       return str;    //   even if we remove previous line, this means we return the address of something that no longer exists.  See below for effects this causes to our caller.
}

string demo2()
{
     string *str = new string("Another string");
     delete str;   // destroy it
     str = new string("Hello");
     return str;
}

int main()
{
    string *s1 = demo1();
    *s1 = "boom";     // undefined behavior as s1 is the address of a string that no longer exists

    string *s2 = demo2();
    *s2 = "OK";         //   OK as s2 is the address of a string that still exists

    delete s2;           //   delete s2 here if we don't need it anymore

    *s2 = "boom";      // we've deleted s2, so undefined behaviour here
}

Note that, in these examples, I didn't use the words "heap" or "stack" anywhere. That's because they have no meaning. The only places where they do are in (outdated) textbooks or in outdated documentation of very old compilers. An old text book that uses these concepts, like an old compiler, can be forgiven as that was current at the time of those books and compilers. But a modern text book (after 2000 ish) or documentation of a modern compiler that uses these terms is actually out of date (or has not been updated fully).
grumpy is offline   Reply With Quote