![]() |
Combining languages
How difficult is it to combine two programming languages to create a single application? Particularly, how would you go about this with C++ and Python? Thanks.
|
look into something called SWIG, if you want to use C++ functions in Python. I'm not sure how you would go about doing it the other way around.
|
Many scripting based languages have interpreters available that you can embed into C or C++ programs if that's what you want (languages such as Ruby, Perl, Lua, etc.)
|
you thought about using jython?
|
Quote:
But I should note that in the final release produced, in the resulting package (apps in OS X are folders containing the executable(s) and their resources such as images and sounds) the 2 programming languages are in separate files from one another and are never mixed up. I am saying all this because I think it's not difficult to create an application which uses 2 programming languages, the 2 must not be totally mixed up with no reason, otherwise, some untraceable bugs may occur or even spaghetti code. |
If you want to combine C++ and Python then you're in luck as there is already a comprehensive library written to achieve this; Boost.Python :) .
|
Uh, depends on what you are trying to do... so you mean have a single source file with source code for two different languages, say like PHP and JavaScript (one embedded in the other).
Or are we talking about creating libraries in one language and linking them to an application that has been created in another language? Most compilers will allow you to link C libraries dynamically, however support beyond that depends strictly on the language/compiler. |
Here's a low-level look at the question:
MIXED-LANGUAGE PROGRAMMING Sometimes one needs to call library procedures which were written and compiled using a different language. Compiler writers, like the rest of us, have their own ideas about the best way to do things. Consequently, the interface to the library code may not match up with the interface that another language expects to be in force. It's somewhat like trying to plug a 110V appliance into a 220V outlet. If you force the fit, sparks are likely to fly. The information presented here can't possibly be exhaustive. I hope it serves its purpose, and that there are no egregious errors. Consider the following code snippet: :
Perhaps the 'add' function has been written as a Fortran (or whatever) subroutine and exists in an object or library module available to you for use. When you have code like this, how does the called procedure know what values to add? How does the calling procedure know how (or where) to get the result? The answer is, "The compiler writer makes it happen." If you consider assembly language, the native language of the machine, the mechanisms are up to the person writing the code. You may not know assembly language, but bear with me, the thrust is reasonably understandable. The machine has a number of highly localized variables called 'registers.' The assembly language coder can merely say, :
The high-level-language user doesn't typically have access to the registers. Variables are located in memory and the compiler generates machine code to move them into and out of registers as necessary, performing the required operations in the process. For the 'add' example, the compiler will generate instructions to move variable m into a register, add the variable n (or perhaps even move variable n into a register before adding), and move the result from the accumulator into the receiving variable. Since you may call 'add' repeatedly with different variables, not just 'immediate' values stored in the instruction, or 'global' variables always located at the same memory address, where does 'add' get its values? In many implementations, off the stack. (Note that the variables in the call need not match the variables in the function by name. In practice the caller puts a value from a variable of some name in the appropriate place and the function retrieves it from there. The name is strictly a mnemonic for the memory address of the location to be used.) Different programmers write different code. Different languages generate different machine code for identical or similar operations. Different microprocessors operate differently. Nevertheless, effective methods generally turn out to be highly similar in how they perform their functions. The same is true of moving information back and forth between code calling a function and the code representing the function. Consider the requirements. There must be a common ground used for placing data to be operated on. All participants must know where the common ground is. The data must be placed in positions within the common ground such that all participants are cognizant of the precise location. The type and form of the data must be known to all participants. TYPE AND FORM Consider the last requirements first. The type of the data must be known. The integral value 1 is identical to the ASCII character value, SOH. Yet it is rarely a goal to add 'characters.' Types, therefore, are generally important. Bear in mind that some languages are more stringent than others regarding how specific the information regarding type must be. The compiler writer may have imbued the compiler with more or less ability to infer proper operation from syntax or context. The form of the data must certainly be known. This boils down to those two terms that confuse the bejaysus out of a lot of people: BY VALUE and BY REFERENCE. In this discussion, please do not assume the meaning of REFERENCE to be the specific meaning given to it by C++ (despite its widespread usage in other forms, even within C++). Let us say that you are my calculator and that we have agreed on the rules under which you will operate (and that you will do so correctly). Our rules are that I will give you two pieces of paper, each of which contains a number, that you will sum them and return to me a piece of paper with the result. I then hand you one piece of paper with '2' written on it and another with '3' written on it. You perform appropriately and return to me a piece of paper with the number, '5' written on it. We have agreed to pass the arguments by VALUE. Lets change the rules. I will still give you two pieces of paper with numbers on them. You, however, will interpret those as page numbers, refer to your notebook, and add the numbers found on those pages. I pass you '2' and '3' as before. Page two of your notebook contains the number, '10', and page three contains the number, '7.' You return to me a piece of paper with the number, '17.' We have passed the arguments by REFERENCE. It is clear that if we are not operating on the same wavelength, by the same rules, the outcome is nonsense, unusable. In C (and mostly in C++), arguments are passed by value. Passing by reference is accomplished by passing the VALUE of a reference object. The function definition (char *p, say) makes this sneaky move clear to the function so that it knows to DEREFERENCE the VALUE when it uses it. C++ has an entity called (perhaps unfortunately) a REFERENCE. This thangy is actually an alias for another object. Just another mnemonic by which to refer to the object. The effect of passing this type of 'reference' is to make the original object known within the scope of the procedure, much as a global object would be known. One operates directly with it with no need for dereferencing. This is true 'pass by reference.' THE COMMON GROUND Consider now the common ground, the area of shared information. It is typically a stack. It is, moreover, typically THE stack; the very same mechanism that the machine uses to keep track of where it was when it 'called' a function, or an interrupt occurred, or whatever. This last function is an implicit mechanism in the operation of the machine and is not (normally, anyway) subject to your control, only your influence. Because it is a highly sensitive place to be messing with or corrupting, it is probably not a really good choice for a place for you to put buffers that you subsequently overflow (thus possibly destroying the place-marker for the CPU, as well as important values it has saved temporarily to free up some registers), but there you have it. The benefit, automatically recyclable memory in the short-term, seems to outweigh the drawbacks in most minds (including mine, but I try to be careful). Most languages are arranged so that they push function arguments onto the stack, and may also push a place for the return (some returns will be directly in a machine register), and then call the function. The call causes the current location of the instruction pointer (where the code is that the machine is executing) to be stored on the stack. Other important machine state may also be saved. Interrupts usually cause more information to be saved than function calls. The instruction pointer is then set to the location where the function code (or interrupt service routine) lives. That makes execution proceed with the function's code. When the function terminates, a 'return' instruction is executed. Any stack usage by the function has been 'unwound' and the stack is back in the state it was when the function was called. If the function hasn't corrupted the stack, the stack pointer will be pointing to the address from which the function was called, that address will be placed into the instruction pointer and execution will resume from whence it left off. A simple diagrammed illustration of a generic but typical stack operation is included at the end of this post. We're back from seeing the sights from the home of the function. Lo and behold, the parameters that were pushed onto the stack BEFORE the call (branch) to the function are (barring specific actions) still there! Useless as mammary glands on a boar hog, but still there. The compiler will have generated instructions, though, to remove them before proceeding. If the return value happened to be passed back on the stack, it too will be disposed of. This first method is called, CALLER CLEANS THE STACK. The stack is now in the pristine condition it possessed prior to the call, and yet, miraculously, a function has been performed and its work is irretrievably history. Caller cleaning is the easiest cleaning to perform unless you can get some geek to build your processor with more complicated hardware and logic and expand on the instruction set. The method particularly makes sense if the caller is allowed to pass a variable number of arguments. Who knows better than he how many to pop off? The other method is CALLEE CLEANS THE STACK. Bear in mind that when the callee (called function) is looking at the stack, the parameters are ON THE OTHER SIDE OF THE RETURN ADDRESS. Actually, it is easy to get around this, but you can bet your bottom dollar that some purist high-priest will make you do 137 "Oh hell, Martha!"'s if they catch YOU messing with it instead of the compiler writer. You pop the return address, pop the arguments, and sneak the return address back onto the stack. THEN you call 'return.' The 'x86 and others make it relatively simple. They let the compiler emit machine code for a 'return 8' (or whatever) that tells the CPU (that geek was at work) to return and then dispose of 8 (or whatever amount the arguments required) bytes. DATA POSITION IN THE SHARED AREA Lastly, precisely where are the arguments when the function receives control? If you add 2 and 3 it doesn't matter what order you fetch them in, but a divide is a tad more picky. You have to have agreement as to whether you push the 2 first, then the 3, or the 3 first and then the 2. The most common choices are 'push left to right' and 'push right to left.' I suppose you could devise a method to push odd ones first, from the middle outwards, then even ones from the outside in, but I've not encountered that method, yet. C/C++ pushes from right to left. Other languages push from left to right. Some implementations may differ from others. If you're a gambler, you can guess; perhaps you're not, and would like to inspect the code emitted by the other language. Perhaps you'd just like to research. There are some links at the end of the post. So, then, you want to call a Fortran (or whatever) subroutine from C/C++? Learn how Fortran pushes its arguments, whether it uses BY VALUE or BY REFERENCE, and who cleans the stack. Then write your C function to do it the Fortran way, or write your Fortran subroutine to do it the C way. Don't do both, you'll just be in the same boat but rowing the other direction. A STACK ILLUSTRATION Most microprocessors have an internal pointer (the stack pointer) which references memory so that the micro can keep track of the point of execution as it varies because of interrupts, function calls, and so forth. The stack (memory to which it points) is also used by many systems (sometimes unfortunately) as a storage place for local values, saved registers, and so forth. Just prior to a call, the stack pointer, which is much like any pointer one defines, is pointing to some place in memory (designated by the programmer or the operating system) for its use. When you call a function, it works something like this (its usage varies somewhat from language to language). :
stack pointer -->| orig position | In most systems, the stack pointer moves towardreferred to are, of course, in force. If you modified the reference, itself, to point to something else, you could modify that something else, also (for example, subsequent bytes pointed to by a char *). The reference itself disappears. If you pass an argument by value, that value is perfectly usable to the called procedure; it could specify a length to use for some operation, for example. If you modify the value, such modifications disappear when the arguments disappear, immediately after the called procedure returns to the caller. If you want lasting changes in the caller, you need to make them by reference or RETURN a value from the called procedure. LINKS http://www.digitalmars.com/ctg/ctgMixingLanguages.html http://weblogs.asp.net/oldnewthing/a.../02/47184.aspx http://msdn.microsoft.com/library/de...to_fortran.asp |
Did you just write that on the spot, DaWei? If you did, um... holy crap.
To the OP: you might be interested to know that it is possible to compile source files using different languages and link them into the same program using .NET. Of course, this does limit you somewhat. |
the runtime enviroment(==everything Da Wei said) in other words, is what differs
Build a virtual machine over the hardware, make the neccessary assumptions and you will work it out. anyway good post Da Wei(had to say) |
| All times are GMT -5. The time now is 10:27 AM. |
Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC