Writing 64-bits assembler functions for Windows is quite different from the ones written for Linux or even for Windows 32 bits.
Even when using NASM for a simplified approach, you shall care about a few details.
First steps
The first step is to add the following lines at the very beginning of the .asm file:
64 bits default rel
We need the ‘default rel’ to avoid any linker problem accessing items in .data and/or .bbs sections.
The alternative is to add rel keyword to any memory access for example:
mov rax,[rel _my_data]
The second step is to define ‘section .data‘ and/or ‘section .bbs‘ with all their contents (if present).
The third step is to define ‘section .text‘ to write our functions.
At this point, things begin to differ from 32-bits conventions or even 64 bits Linux counterpart.
Function names
The C function name doesn’t have any decoration to apply, or to be aware.
For example, C declaration:
extern int MySampleProc ();
NASM declaration:
global MySampleProc MySampleProc: %push proc_ctx %stacksize flat64 <... my code here ...> ret %pop
Function parameters
The function parameters are passed in a different way (see this link for further details).
From 1st to 4th…
The first 4 parameters are always set to the registers below:
- any ptr orĀ 64 to 8 bits integer values are set in RCX, RDX, R8, R9 registers,
(note that, for any parameter smaller than 64 bits, we shall consider the unused part of the register as uninitialized/void) - any floating point values are set in XMM0, XMM1, XMM2, XMM3,
- any other values types are passed by reference.
The parent function shall always reserve a 32 bytes ‘shadow space‘ right before the return address:
- [RSP + 0x08] shadow space (32 bytes)
- [RSP + 0x00] return address (8 bytes)
This area might be used as a free storage area by the child function.
For example, C declaration:
extern int MySampleProc (unsigned p1, void *p2, int p3);
NASM declaration:
global MySampleProc MySampleProc: %push proc_ctx %stacksize flat64 ; rcx = p1 ; rdx = p2 ; r8 = p3 %arg _w64_shadow_space:YWORD ; set the stack frame push rbp mov rbp,rsp <... my code here ...> ; destroy the stack frame leave ret %pop
Note that simple functions that:
- have 4 or fewer params,
- don’t have local variables,
- don’t call any C-conventioned functions
might avoid setting up the stack frame.
from 5th to all other params
The 5th and further parameters are stored in the stack almost following the C convention after the shadow area.
The parent function always reserves a 32 bytes shadow area right before the return address:
- [RSP + …] the 6th param…
- [RSP + 0x28] the 5th param
- [RSP + 0x08] shadow space (32 bytes)
- [RSP + 0x00] return address (8 bytes)
For example, C declaration:
extern int MySampleProc (unsigned p1, void *p2, int p3, const void *p4, unsigned p5);
NASM declaration:
global MySampleProc MySampleProc: %push proc_ctx %stacksize flat64 ; rcx = p1 ; rdx = p2 ; r8 = p3 ; r9 = p4 %arg _w64_shadow_space:YWORD, p5:QWORD ; set the stack frame push rbp mov rbp,rsp <... my code here ...> ; destroy the stack frame leave ret %pop
Varagrs function parameters
They shall use the same rules above (i.e. the shadow area, the first 4 params in the proper register, and the others in the stack).
Local variables
If we need to define any local variable, it is working as usual, caring the 8 bytes alignment of RSP register (i.e. the sum of the used bytes shall be a multiple of 8).
Return values
For scalar types (pointers, integers from 8 to 64 bits), the value shall be stored in RAX without initializing any unused portion of the register.
For any non-scalar types (floats, double, and vector), the return value shall be stored in XMM0.
Volatile registers
All registers shall be considered volatile with the exceptions listed below:
- RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15,
- and XMM6–XMM15
These registers above shall always be saved before being used.
Volatile registers shall be saved only if your code calls a procedure with the very same assumptions.