August 15th, 2005

pce

Fixmapping VM pages, Vsyscalls

Faster system calls

The classical Linux system call mechanism is to put the call number in the eax register (in the case of i386) and simply invoke:

int $0x80
Here is a code fragment which measures the time taken to invoke `getpid' 10000000 times:

#define N 10000000
int pid;
main()
{
	int i;

	for(i = 0; i < N; i++) {
		asm("movl $20, %%eax \n"
		    "int $0x80 \n"
		    "movl %%eax, pid \n"
		    :
		    :
		    :"eax");
	}
	printf("got pid = %d, actual pid = %d\n", pid, getpid());
}

On my P4 (HT) 2.8GHz system, the program took about 3.9 seconds to execute. Modern Pentium/AMD processors support instructions like sysenter/syscall using which it is possible to get into kernel mode faster. The problem here is that checking which mechanism (ie, int80/ syscall/sysenter) is supported by the processor as part of the system call invocation will itself incur an unnecessary overhead. What is the solution?

Fixmapping

It's possible to assign hard-coded virtual addresses to physical addresses during system bootup - note that only the virtual address is hard coded, the physical address is determined dynamically. The solution which Linus has implemented is: during bootup, get a free page and map it to virtual address 0xffffe000. Determine what kind of syscall mechanism your CPU supports and simply store a few bytes of machine code at that location; machine code which will trap into the kernel using the fastest available mechanism. Now, the user program can execute a system call by simply jumping to this particular virtual address!

Here is a small C program which reads from 0xffffe000 and dumps it to stdout; the output can be redirected to a file and analyzed.


main()
{
	char *s, buf[4096];
	s = (char *)0xffffe000;
	memcpy(buf, s, sizeof(buf));
	write(1, buf, sizeof(buf));
}

We run the program:

./a.out > dat

and do a `file dat'. Here is the output:

dat: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), stripped

The fixmapped page contains an ELF shared object. Let's do:

objdump -d dat

Here is part of the output we get:

Disassembly of section .text:

ffffe400 <.text>:
ffffe400:	51                   	push   %ecx
ffffe401:	52                   	push   %edx
ffffe402:	55                   	push   %ebp
ffffe403:	89 e5                	mov    %esp,%ebp
ffffe405:	0f 34                	sysenter 
ffffe407:	90                   	nop    

Note that 0xffffe400 is the start of the code sequence which ultimately traps into the kernel by calling `sysenter'.

Is there a speed up?

Let's find out. Here is a test program:


#define N 10000000
int pid;
main()
{
	int i;

	for(i = 0; i < N; i++) {
		asm("movl $20, %%eax \n"
		    "call 0xffffe400 \n"
		    "movl %%eax, pid \n"
		    :
		    :
		    :"eax");
	}
	printf("got pid = %d, actual pid = %d\n", pid, getpid());
}

I am getting a run time of 1.4 seconds (down from 3.9 for the int80 version)!

Fixmap your own Hello,World

Problem: Write a Hello,World printing program which doesn't have the sequence "Hello,World" stored in it.

Solution: Let's fixmap a page containing "Hello,World"

  1. Edit include/asm/fixmap.h; just add FIX_HELLO_WORLD below FIX_VSYSCALL and change the macro FIXADDR_USER_END to make it look like (FIXADDR_USER_START + 2*PAGE_SIZE)
  2. Edit arch/i386/kernel/sysenter.c. First, add an intialization:
    
    unsigned long page2 = get_zeroed_page(GFP_ATOMIC);
    
    
    Then add the code:
    
    __set_fixmap(FIX_HELLO_WORLD, __pa(page2), PAGE_READONLY_EXEC);
    memcpy((void*)page2, "Hello,World", 12);
    
    
    That's all. Recompile the kernel (I am using a kernel.org 2.6.12) and write a user program which copies from virtual address 0xffffd000; you should get your Hello,World.

References