What's new

CALL instruction (x86)

blueshogun96

A lowdown dirty shame
Hey. It's been quite a while since the last time I posted here. I'm still around for those who wonder where I went.

Now, for quite some time I've been having some trouble writing this binary translator (it's a new x86 -> x86 core for my emulator... the interpreter is too slow and a pain to implement and maintain). The big problem is executing a call instruction. What I'm doing is byte encoding x86 instructions in an allocated pointer (char) and jumping to it's address to execute the code within the block and return with a RET instruction at the end. That works fine, but when emulating a specific opcode that can't be emulated easily (i.e. CPUID, WBINVD, IN/OUT, FENCE, etc), I need to make a call to a function that handles the code in software instead.

Code:
// The real code is actually much more complicated than this.  This is just an example/dramatization.

// void function containing code to emulate the wbinvd instruction
void wbinvd_inst()
{ ... }

// Code block that contains byte encoded x86 code.
unsigned char* code_block = malloc( size_of_code_block_in_bytes );
// Actual address to code block (we're calling this address directly from inline asm)
unsigned int* code_address = (usigned int*) code_block;
// Actual location of the function we're calling from byte code.
unsigned int* inst_address = (unsigned int*) wbinvd_inst;

code_block[0] = 0xE8; // CALL instruction
code_block[1] = (inst_address << 24 ) & 0xFF;
code_block[2] = (inst_address << 16 ) & 0xFF;
code_block[3] = (inst_address << 8 ) & 0xFF;
code_block[4] = (inst_address << 0 ) & 0xFF;
code_block[5] = 0xC3; // RET function (required to continue from the point where we called this block!)

// Call code address to execute this code.
__asm call code_block;

The problem is that I need to byte encode the CALL instruction myself and I'm not sure which one to use. I'm assuming that the code is being called from the data section (DS). Here's a list of different CALL variations for x86 (32-bit versions only):

1. 0xE8: CALL rel32 - Call near, relative, displacement
2. 0xFF /2: CALL r/m32 - Call near, absolute indirect, address in r/m32
3. 0x9A: CALL ptr16:32 - Call far, absolute, address in operand
4. 0xFF /3: CALL m16:32 - Call far, absolute indirect address in m16:32

I know the CALL instruction I used in the example was encoded wrong, but that's just an illustration. So which one do you think I should use? The last two I don't understand how to encode tbh and quite frankly, the Intel documentation doesn't tell me everything I want to know. Once I work this out, I can have some real progress. Any ideas?? Thanks.
 

Exophase

Emulator Developer
Okay, some things you should know:

- 32bit OSes for x86 (Windows and Linux) use "flat" addressing that tries to make the segments irrelevant by making the four main ones all point to the same 32bit address space and generally has the same permissions. From your post it looks like you aren't generating x86-64 code so I'll leave that at that.

- A "near" address in x86 means within the same "offset" in a segment, in protected mode that's 32bit. So a near call works fine - in 32bit OSes you only use far calls for pretty special purposes like call gating to go to kernel mode (and there are other ways to do that, I don't think anyone uses those really)

- Using immediates (constants encoded in the instruction) usually works better than using something in memory, although that could produce smaller code if it's something relative to a register within 8bits.

- Relative calls and jumps add their operand to the current PC (the one after that points to the next instruction)

This is what I use for the x86 recompiler in gpSP:

Code:
#define generate_function_call(function_location)                             \
  x86_emit_call_offset(x86_relative_offset(translation_ptr,                   \
   function_location, 4)); 

#define x86_emit_call_offset(relative_offset)                                 \
  x86_emit_byte(x86_opcode_call_offset);                                      \
  x86_emit_dword(relative_offset)

#define x86_relative_offset(source, offset, next)                             \
  ((u32)offset - ((u32)source + next))   

x86_opcode_call_offset                = 0xE8

Note: translation_ptr is where code is being generated to, and x86_emit_byte increases it by 1. Because of the way macros work, the offset ends up being translation_ptr + 4 by the time it's evaluated. If you use functions to do the same things you'll want to use + 5 instead.
 
Last edited:
OP
blueshogun96

blueshogun96

A lowdown dirty shame
Ok, that makes a bit more sense. I would have replied sooner, but IRL and coding time can't always co-exist.

I've been trying the new approach on a simple test program that's designed to call the function from the emitted code, but the code still crashes on me. If you want, I'll show you the example I'm testing (written in C)

Code:
#include <stdio.h>
#include <assert.h>


#define uptr unsigned int*
#define sptr int*
#define u32 unsigned int

/* call rel32 */
_inline void CALL32( u32 to, uptr x86ptr )
{
	(*x86ptr++) = 0xE8;
	(*x86ptr++) = (to>>0)&0xFF;
	(*x86ptr++) = (to>>8)&0xFF;
	(*x86ptr++) = (to>>16)&0xFF;
	(*x86ptr++) = (to>>24)&0xFF;
}

/* call func */
_inline void CALLFunc( uptr func, uptr x86ptr )
{
	u32 offset = func - ( x86ptr + 5 );
//	assert( (sptr)func <= 0x7fffffff && (sptr)func >= -0x7fffffff );
	CALL32(func, x86ptr);
}

void __stdcall func()
{
	printf( "Function called...\n\n" );
}

int main()
{
	unsigned int func_addr = (unsigned int) &func;

	unsigned char* code = malloc( sizeof( unsigned char ) * 6 );
	unsigned int code_addr = (unsigned int) &code;

	CALLFunc( func, code );

	void (*Execute)() = (void(*)()) &code;

	Execute();

       // This works too
//	__asm call code_addr

	free( code );

	return 0;
}

I've been at this for months. Any ideas as to the problem. Thanks for your help.
 

Exophase

Emulator Developer
x86ptr is the wrong type. It has to be char *, not int *. This will also crash because after the call returns it goes on into nothingness, so you have to emit a ret instruction too.

Also, for future reference (in case you do change to it), I wouldn't do a raw call instruction in ASM because you'll break the calling convention.
 
OP
blueshogun96

blueshogun96

A lowdown dirty shame
Well, I finally got it. I had to use 0xFF /2 instead, but for now, all that matters is that it works. What I didn't realize is that for some reason when getting the function address, the address inside the integer was always different o_O! When I rewrote the code, everything worked fine. Here it is now:

Code:
void func()
{
	printf( "Function called...\n\n" );
}

void main()
{
	void (*f)() = &func;
	unsigned int i = (unsigned int) f;

	unsigned char code[] = 
	{
		0xB8,			// mov eax, i
		((i>>0)&0xFF), 
		((i>>8)&0xFF), 
		((i>>16)&0xFF),
		((i>>24)&0xFF),	
		0xFF, 0xD0,		// call eax
		0xC3			// ret
	};

	unsigned int* c = (unsigned int*) code;

	printf( "f = 0x%8.8X\ni = 0x%8.8X\n\n", f, i );

	__asm call c
}

So now that it works, I can get back to work on what really matters. Thanks again for everything Exophase.
 

Exophase

Emulator Developer
I don't think you should use this method, it may work but it's less efficient. I recommend writing a small test in a simple system that you can easily step instruction by instruction in, either through a debugger or an emulator (ie Bochs) if you're still having trouble.

I should elaborate why using call is a bad idea. The standard C (and more or less, C++) calling convention dictates that a function is free to to modify a few registers called "callee save" (meaning that the function calling the one being ran has to save and restore them if it cares about the values). Unless your compile examines the assembly code, or you specifically tell it that those registers are going to get smacked, then it has no way of knowing that you're about to potentially erase them. In this case you're probably fine since the you do the call at the very end, but if you change things around (who knows how you had it before when it wasn't working) it could easily break. You really need to use a function pointer instead, or wrap the call in a dispatch function written in assembly, so that this problem doesn't happen.
 

Top