Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Nov 29th, 2005, 6:47 AM   #1
SittingDuck
Programmer
 
SittingDuck's Avatar
 
Join Date: Nov 2005
Location: Moseley, Birmingham, England, Earth
Posts: 51
Rep Power: 3 SittingDuck is on a distinguished road
Structures to Intel instruction

I have some structures like so:

type
TThingType = (ttNone, ttReg, ttStack, ttImm);
TThing = record
	ThingType: TThingType;
	Deref: Boolean;			// Will be dereferenced
	Offset: Integer;		// Immediate offset when dereferencing
	OffsetReg: TRegister;		// CPU register offset when dereferencing
	Scale: Integer;			// Scale - x1, x2, x4, x8 of OffsetReg
	case Integer of
		0: (Reg: TRegister);	// Register
		1: (Pos: Integer);	// Offset from esp
		2: (Imm: Integer);	// Immediate value
end;
pThing = ^TThing;

TOperationEnum = (opMove, opNot, opDivide, opAnd, opShiftLeft, opShiftRight,
opRotateLeft, opRotateRight, opMultiply, opAdd, opSubtract, opOr, opExOr);

TOperation = record
	OpType: TOperationEnum;
	Src1: TThing;
	Src2: TThing;
	Dest: TThing;
	Output: TThing;
end;

Basically, TThing holds information about an operand, and TOperation holds information about a (or more than one) Intel instruction.

What I've been trying to puzzle out for probably the last month is how to get from the TOperation to some Intel machine code. There are so many combinations it seems almost hopeless. Any ideas?

(From the TRegister the register number can be obtained, and a free register can be obtained with GetFreeRegister.)
SittingDuck is offline   Reply With Quote
Old Dec 3rd, 2005, 5:51 AM   #2
lectricpharaoh
Caffeinated Neural Net
 
lectricpharaoh's Avatar
 
Join Date: Jun 2005
Location: Dry west coast of Canada
Posts: 1,033
Rep Power: 5 lectricpharaoh will become famous soon enough
Quote:
Originally Posted by SittingDuck
Basically, TThing holds information about an operand, and TOperation holds information about a (or more than one) Intel instruction.

What I've been trying to puzzle out for probably the last month is how to get from the TOperation to some Intel machine code. There are so many combinations it seems almost hopeless. Any ideas?

(From the TRegister the register number can be obtained, and a free register can be obtained with GetFreeRegister.)
I've never used Delphi, and it's been years since I've touched Pascal (and that made me want to go wash my hands), but this looks like an interesting post. I assume you're trying to write an assembler?

The main reason that assemblers exist is because it's very hard (not to mention tedious) for humans to code instructions by hand. Most assembly mnemonics have more than one form. The main criteria are what size it is (8-bit, or 16/32 bit; the latter two are identical opcodes, and determined by the CPU mode and presence/absence of an instruction size override byte), the address size (again, determined by CPU mode and the presence/absence of an override byte), and what types the operands are. For example, a register-to-register MOV is different from a register-to-memory. For some instructions, like a MOV into a register of an immediate value, the register operand is encoded directly into the opcode.

Besides the opcode, there may be several other components. I've already mentioned override prefix bytes; there are two of these (one for operand size, and another for address size). There's also mnemonics like LOCK, REP, and segment overrides that generate prefix bytes. After the opcode comes the ModR/M byte (maybe), which can specify a register operand, indexing mode (to support mode complex memory addressing), and/or an extension to the opcode. With certain 32-bit memory addressing modes, you get a SIB (scale/index/base) byte, which adjusts the target address. Then you get an optional displacement (ie offset) value for instructions that address memory; this value may be 1, 2, or 4 bytes. Finally, you have an optional immediate value; this too is 1, 2, or 4 bytes.

I suggest you get some good books on instruction encoding for the Intel processors. What I have given you is very sketchy information gleaned from my TASM 3.0 manual, so it's rather dated (it only covers up to 486 opcodes). You could also try downloading NASM; I seem to recall they had opcode tables in the documentation, but I'm not 100% sure.

I figure your best approach after getting the required information would be to build a big table of what forms of what instructions were encoded in what way. For example, if you see inc ecx, you know it's a 32-bit instruction (so you only need an operand size override byte if you're assembling in a 16-bit segment). You also know that it'll encode the register value in the R/M field of the ModR/M byte, and for ECX, that's 1, For this opcode, the reg field of the ModR/M byte contains an extension to the opcode, in this case 0, and the opcode byte itself is FFh. This makes the final instruction FFh, 01h. On the 386+, you can use a shorter form for inc <reg>, and that's to use the opcode 40h + the register's value. This yields the instruction 41h. There are many instructions with alternate forms for the same ends, usually by having a simplified form if you use a specific register.

As you can see, it can get pretty complicated. If you're serious about this, you should probably try building your table with a subset of available instructions. Once you've done that, assemble a trivial piece of code (it doesn't even have to be a working program), and then rip it apart with a disassembler, and see if the disasm output matches your source. Try this with all the variants of the instructions you've got in your table, and if it passes on all of them, it's a good bet that your table and code generator are working.

If Delphi allows you to use bit fields, this would be another handy feature, as it would alleviate the bit manipulation you'd need to do, since both the ModR/M and SIB bytes are composed of three bit fields.

Hope this helps.

PS- You might want to expand your structures to hold more operands/psuedo-operands. For example, the instruction lea eax, [ebx + ecx * 4 + 100] is perfectly legal.
__________________
And once again, Probability proves itself willing to sneak into a back alley and service Drama as would a copper-piece harlot.
- Vaarsuvius, Order of the Stick
lectricpharaoh is offline   Reply With Quote
Old Dec 7th, 2005, 8:10 AM   #3
SittingDuck
Programmer
 
SittingDuck's Avatar
 
Join Date: Nov 2005
Location: Moseley, Birmingham, England, Earth
Posts: 51
Rep Power: 3 SittingDuck is on a distinguished road
Thanks for the help. (Actually, I am writing a compiler.) No, Delphi 3 doesn't support bit fields. I could fill a record structure and manipulate that before calling a procedure to pack it into the actual bitcode.
The table is a good idea. Another way of doing it is having a seperate function for each opcode, and for them to be dealt with seperately. (I think I'll use a table.) I must make sure here that the code is not too rigid.

Last edited by SittingDuck; Dec 7th, 2005 at 8:25 AM.
SittingDuck is offline   Reply With Quote
Old Dec 8th, 2005, 5:40 AM   #4
Klipt
Hobbyist Programmer
 
Join Date: Dec 2005
Posts: 118
Rep Power: 0 Klipt is an unknown quantity at this point
You could also download the source for a compiler like gcc and look at that. I think the compiler itself outputs assembler which another program assembles although the code is probably generated by Lexx and Yacc and isn't supposed to be read by humans.
Klipt is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 7:50 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC