Here's a low-level look at the question:
MIXED-LANGUAGE PROGRAMMING
Sometimes one needs to call library procedures which were written and compiled using a different language. Compiler writers, like the rest of us, have their own ideas about the best way to do things. Consequently, the interface to the library code may not match up with the interface that another language expects to be in force. It's somewhat like trying to plug a 110V appliance into a 220V outlet. If you force the fit, sparks are likely to fly.
The information presented here can't possibly be exhaustive. I hope it serves its purpose, and that there are no egregious errors.
Consider the following code snippet:
int add (int m, int n)
{
return m + n;
}
int main (void)
{
int answer;
int x = 2;
int y = 3;
answer = add (x, y);
}
Perhaps the 'add' function has been written as a Fortran (or whatever) subroutine and exists in an object or library module available to you for use.
When you have code like this, how does the called procedure know what values to add? How does the calling procedure know how (or where) to get the result? The answer is, "The compiler writer makes it happen."
If you consider assembly language, the native language of the machine, the mechanisms are up to the person writing the code. You may not know assembly language, but bear with me, the thrust is reasonably understandable. The machine has a number of highly localized variables called 'registers.' The assembly language coder can merely say,
1. Load register A with the value, 2.
2. Load register B with the value, 3.
3. Add register A and register B. (In this example, register A is the implicit
accumulator, receiving the result, and in the process, destroying its original
content. Perhaps you should have saved it in another place.)
4. Save the result (register A) to variable ANSWER (memory location 0x0040b3af).
The high-level-language user doesn't typically have access to the registers. Variables are located in memory and the compiler generates machine code to move them into and out of registers as necessary, performing the required operations in the process.
For the 'add' example, the compiler will generate instructions to move variable m into a register, add the variable n (or perhaps even move variable n into a register before adding), and move the result from the accumulator into the receiving variable.
Since you may call 'add' repeatedly with different variables, not just 'immediate' values stored in the instruction, or 'global' variables always located at the same memory address, where does 'add' get its values? In many implementations, off the stack. (Note that the variables in the call need not match the variables in the function by name. In practice the caller puts a value from a variable of some name in the appropriate place and the function retrieves it from there. The name is strictly a mnemonic for the memory address of the location to be used.)
Different programmers write different code. Different languages generate different machine code for identical or similar operations. Different microprocessors operate differently. Nevertheless, effective methods generally turn out to be highly similar in how they perform their functions. The same is true of moving information back and forth between code calling a function and the code representing the function. Consider the requirements. There must be a common ground used for placing data to be operated on. All participants must know where the common ground is. The data must be placed in positions within the common ground such that all participants are cognizant of the precise location. The type and form of the data must be known to all participants.
TYPE AND FORM
Consider the last requirements first. The type of the data must be known. The integral value 1 is identical to the ASCII character value, SOH. Yet it is rarely a goal to add 'characters.' Types, therefore, are generally important. Bear in mind that some languages are more stringent than others regarding how specific the information regarding type must be. The compiler writer may have imbued the compiler with more or less ability to infer proper operation from syntax or context.
The form of the data must certainly be known. This boils down to those two terms that confuse the bejaysus out of a lot of people: BY VALUE and BY REFERENCE. In this discussion, please do not assume the meaning of REFERENCE to be the specific meaning given to it by C++ (despite its widespread usage in other forms, even within C++).
Let us say that you are my calculator and that we have agreed on the rules under which you will operate (and that you will do so correctly). Our rules are that I will give you two pieces of paper, each of which contains a number, that you will sum them and return to me a piece of paper with the result. I then hand you one piece of paper with '2' written on it and another with '3' written on it. You perform appropriately and return to me a piece of paper with the number, '5' written on it. We have agreed to pass the arguments by VALUE.
Lets change the rules. I will still give you two pieces of paper with numbers on them. You, however, will interpret those as page numbers, refer to your notebook, and add the numbers found on those pages. I pass you '2' and '3' as before. Page two of your notebook contains the number, '10', and page three contains the number, '7.' You return to me a piece of paper with the number, '17.' We have passed the arguments by REFERENCE. It is clear that if we are not operating on the same wavelength, by the same rules, the outcome is nonsense, unusable.
In C (and mostly in C++), arguments are passed by value. Passing by reference is accomplished by passing the VALUE of a reference object. The function definition (char *p, say) makes this sneaky move clear to the function so that it knows to DEREFERENCE the VALUE when it uses it. C++ has an entity called (perhaps unfortunately) a REFERENCE. This thangy is actually an alias for another object. Just another mnemonic by which to refer to the object. The effect of passing this type of 'reference' is to make the original object known within the scope of the procedure, much as a global object would be known. One operates directly with it with no need for dereferencing. This is true 'pass by reference.'
THE COMMON GROUND
Consider now the common ground, the area of shared information. It is typically a stack. It is, moreover, typically THE stack; the very same mechanism that the machine uses to keep track of where it was when it 'called' a function, or an interrupt occurred, or whatever. This last function is an implicit mechanism in the operation of the machine and is not (normally, anyway) subject to your control, only your influence. Because it is a highly sensitive place to be messing with or corrupting, it is probably not a really good choice for a place for you to put buffers that you subsequently overflow (thus possibly destroying the place-marker for the CPU, as well as important values it has saved temporarily to free up some registers), but there you have it. The benefit, automatically recyclable memory in the short-term, seems to outweigh the drawbacks in most minds (including mine, but I try to be careful).
Most languages are arranged so that they push function arguments onto the stack, and may also push a place for the return (some returns will be directly in a machine register), and then call the function. The call causes the current location of the instruction pointer (where the code is that the machine is executing) to be stored on the stack. Other important machine state may also be saved. Interrupts usually cause more information to be saved than function calls. The instruction pointer is then set to the location where the function code (or interrupt service routine) lives. That makes execution proceed with the function's code. When the function terminates, a 'return' instruction is executed. Any stack usage by the function has been 'unwound' and the stack is back in the state it was when the function was called. If the function hasn't corrupted the stack, the stack pointer will be pointing to the address from which the function was called, that address will be placed into the instruction pointer and execution will resume from whence it left off.
A simple diagrammed illustration of a generic but typical stack operation is included at the end of this post.
We're back from seeing the sights from the home of the function. Lo and behold, the parameters that were pushed onto the stack BEFORE the call (branch) to the function are (barring specific actions) still there! Useless as mammary glands on a boar hog, but still there. The compiler will have generated instructions, though, to remove them before proceeding. If the return value happened to be passed back on the stack, it too will be disposed of. This first method is called, CALLER CLEANS THE STACK. The stack is now in the pristine condition it possessed prior to the call, and yet, miraculously, a function has been performed and its work is irretrievably history. Caller cleaning is the easiest cleaning to perform unless you can get some geek to build your processor with more complicated hardware and logic and expand on the instruction set. The method particularly makes sense if the caller is allowed to pass a variable number of arguments. Who knows better than he how many to pop off?
The other method is CALLEE CLEANS THE STACK. Bear in mind that when the callee (called function) is looking at the stack, the parameters are ON THE OTHER SIDE OF THE RETURN ADDRESS. Actually, it is easy to get around this, but you can bet your bottom dollar that some purist high-priest will make you do 137 "Oh hell, Martha!"'s if they catch YOU messing with it instead of the compiler writer. You pop the return address, pop the arguments, and sneak the return address back onto the stack. THEN you call 'return.' The 'x86 and others make it relatively simple. They let the compiler emit machine code for a 'return 8' (or whatever) that tells the CPU (that geek was at work) to return and then dispose of 8 (or whatever amount the arguments required) bytes.
DATA POSITION IN THE SHARED AREA
Lastly, precisely where are the arguments when the function receives control? If you add 2 and 3 it doesn't matter what order you fetch them in, but a divide is a tad more picky. You have to have agreement as to whether you push the 2 first, then the 3, or the 3 first and then the 2. The most common choices are 'push left to right' and 'push right to left.' I suppose you could devise a method to push odd ones first, from the middle outwards, then even ones from the outside in, but I've not encountered that method, yet. C/C++ pushes from right to left. Other languages push from left to right. Some implementations may differ from others. If you're a gambler, you can guess; perhaps you're not, and would like to inspect the code emitted by the other language. Perhaps you'd just like to research. There are some links at the end of the post.
So, then, you want to call a Fortran (or whatever) subroutine from C/C++? Learn how Fortran pushes its arguments, whether it uses BY VALUE or BY REFERENCE, and who cleans the stack. Then write your C function to do it the Fortran way, or write your Fortran subroutine to do it the C way. Don't do both, you'll just be in the same boat but rowing the other direction.
A STACK ILLUSTRATION
Most microprocessors have an internal pointer (the stack pointer) which references
memory so that the micro can keep track of the point of execution as it varies because
of interrupts, function calls, and so forth. The stack (memory to which it points) is also used by many systems (sometimes unfortunately) as a storage place for local values,
saved registers, and so forth.
Just prior to a call, the stack pointer, which is much like any pointer one defines, is pointing to some place in memory (designated by the programmer or the operating system) for its use. When you call a function, it works something like this (its usage varies somewhat from language to language).
stack pointer -->| orig position | In most systems, the stack pointer moves toward
| | lower addresses as you use it.
~ ~
At call: | orig position |
| arguments |
stack pointer -->| last argument |
| |
| |
| |
| |
~ ~
There may be zero or more arguments. They are pushed onto the stack in a predetermined
order. For C/C++, it is right-to-left. The stack pointer moves with each push.
After call: | orig position |
| argument here |
| (maybe more)
stack pointer -->| ret addr here |
| |
| |
| |
~ ~
Into procedure | orig position |
Arguments avail | arguments... | When you modify the argument(s), you modify
for use | ret addr here | the value(s) stored here. If an argument
| saved regs, | is a reference or pointer you may use it to
| locals, etc. | modify the value pointed to elsewhere
| in this area | (in the calling procedure, say). If you write
stack pointer -->| | more data to one of the local variables than it
~ ~ can store, guess where the excess winds up.
The function does its work and unwinds the stack (locals, etc.)
Before return | orig position | Immediately before the return, after storage
| arguments... | for saved registers, locals, etc. has already
stack pointer -->| ret addr here | been recovered (and disappeared). The arguments
| | are still on the stack.
| |
| |
~ ~
After return: | orig position | Immediately after the return. The very first
stack pointer -->| arguments... | thing the machine is going to do next is destroy
| | the arguments, whether you've modified them or not.
| | Sometimes the arguments are removed by the called
| | function and the return address position adjusted
| | appropriately.
~ ~
stack pointer -->| orig position | And it's done; you are right back where you started,
| | bookkeeping wise, when you made the call. Any
| | changes you made to the arguments are history, for
~ ~ all practical purposes (they may persist until the
next stack operation). If you passed an argument as a reference, any modifications you made to the value it
referred to are, of course, in force. If you modified the reference, itself, to point
to something else, you could modify that something else, also (for example, subsequent
bytes pointed to by a char *). The reference itself disappears. If you pass an argument by value, that value is perfectly usable to the called procedure; it could specify a length to use for some operation, for example. If you modify the value, such modifications disappear when the arguments disappear, immediately after the called procedure returns to the caller. If you want lasting changes in the caller, you need to make them by reference or RETURN a value from the called procedure.
LINKS
http://www.digitalmars.com/ctg/ctgMixingLanguages.html
http://weblogs.asp.net/oldnewthing/a.../02/47184.aspx
http://msdn.microsoft.com/library/de...to_fortran.asp