Intro

I know you were expecting another post about deobfuscation/devirtualization, I didnt expect to write about this myself neither, but I think this will be an equally interesting mini-post for finding errors without the source code. However, before starting, let’s look at some popular ways to find errors in a program.

Overview of Error Detection Techniques

Clang static analyzer (Requires source code)

This tool will help you catch errors on source code level, however its not magical, and possible for it to miss potential bugs.

Here is an example Clang Static Analyzer helping us find a bug.

https://godbolt.org/z/1MYW3x4EE #0

int foo(int* num) {
    return *num; //  warning: 
    // Undefined or garbage value returned to caller [core.uninitialized.UndefReturn]
}

int foo2(int offset) {
    int* a = new int(5);
    auto result = foo( (a+offset) );
    delete a;
    return result;
}

int main(int argc) {
    return foo2(1);
}

And here is an example Clang Static Analyzer fails to find the same bug.

https://godbolt.org/z/bzdh53qhE #1

int foo(int* num) {
    return *num; // nothing happens?
}

int foo2(int offset) {
    int* a = new int(5);
    auto result = foo( (a+offset) );
    delete a;
    return result;
}

int main(int argc) {
    return foo2(argc);
}

Runtime Sanitizers (Requires source code)

LLVM also provides us with powerful tools such as:

AddressSanitizer : Memory error detector.
MemorySanitizer : Detector of uninitialized memory use
ThreadSanitizer : A tool that detects data races.
LeakSanitizer : A run-time memory leak detector.
UndefinedBehaviorSanitizer : Undefined behavior detector.

The disadvantage of these sanitizers detect issues during program execution (runtime), not at compile time.

Unlike other sanitizers UndefinedBehaviourSanitizer, places its checks while transforming the source code to LLVM IR.

Let’s try AddressSanitizer with previous examples.

https://godbolt.org/z/rv9ocYqdh #0

https://godbolt.org/z/6j9exGve9 #1

==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x502000000014 at pc 0x62c43ca90a69 bp 0x7ffd84999250 sp 0x7ffd84999248
READ of size 4 at 0x502000000014 thread T0
    #0 0x62c43ca90a68 in foo(int*) /app/example.cpp:2:12
    #1 0x62c43ca90aef in foo2(int) /app/example.cpp:7:19
    #2 0x62c43ca90b3b in main /app/example.cpp:13:12
    #3 0x79620c429d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: cd410b710f0f094c6832edd95931006d883af48e)
    #4 0x79620c429e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: cd410b710f0f094c6832edd95931006d883af48e)
    #5 0x62c43c9b8344 in _start (/app/output.s+0x2c344)

0x502000000014 is located 0 bytes after 4-byte region [0x502000000010,0x502000000014)
allocated by thread T0 here:
    #0 0x62c43ca8e72d in operator new(unsigned long) /root/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:86:3
    #1 0x62c43ca90a94 in foo2(int) /app/example.cpp:6:14
    #2 0x62c43ca90b3b in main /app/example.cpp:13:12
    #3 0x79620c429d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: cd410b710f0f094c6832edd95931006d883af48e)

SUMMARY: AddressSanitizer: heap-buffer-overflow /app/example.cpp:2:12 in foo(int*)
Shadow bytes around the buggy address:
  0x501ffffffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x501ffffffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x501ffffffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x501fffffff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x501fffffff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x502000000000: fa fa[04]fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x502000000280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1==ABORTING

It tells us about the error:

==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x502000000014 at pc 0x62c43ca90a69 bp 0x7ffd84999250 sp 0x7ffd84999248
READ of size 4 at 0x502000000014 thread T0

It provides us the callstack with exact lines of the error (assuming built with debug symbols):

    #0 0x62c43ca90a68 in foo(int*) /app/example.cpp:2:12
    #1 0x62c43ca90aef in foo2(int) /app/example.cpp:7:19
    #2 0x62c43ca90b3b in main /app/example.cpp:13:12
    #3 0x79620c429d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: cd410b710f0f094c6832edd95931006d883af48e)
    #4 0x79620c429e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: cd410b710f0f094c6832edd95931006d883af48e)
    #5 0x62c43c9b8344 in _start (/app/output.s+0x2c344)

And if we want to see how our target function look with ASan:

__int64 __fastcall foo(unsigned __int64 a1)
{
  char v3; // [rsp+2Fh] [rbp-9h]

  v3 = *((_BYTE *)_asan_shadow_memory_dynamic_address + (a1 >> 3));
  if ( v3 && (char)((a1 & 7) + 3) >= v3 )
    _asan_report_load4_0(a1);
  return *(unsigned int *)a1;
}

Klee

Klee is a very powerful tool for symbolic executing, the advantage of this tool unlike instrumentation, this tool will try to explore all the paths in the program.

When we compile the below snippet and run it with klee, it will try paths for different values of x. And since the below program has off-by-one error (x <= 3), it will catch that case and tell us something is wrong. This is especially useful when the program has a sizeable varying behaviour.

#include <klee/klee.h>

int foo(int x) {
    int* a = new int[3]{5, 10, 15};
    int result = 0;
    if (x >= 0 && x <= 3) {
        result = a[x];
    }
    delete[] a;
    return result;

}

int main() {
    int x;
    klee_make_symbolic(&x, sizeof(x), "x");
    return foo(x);
}

Klee will tell us there was a out-of-bound pointer access

KLEE: ERROR: test1.cpp:9: memory error: out of bound pointer

When we ls into the directory, we can see which test case has run into this error, it would generate something like “test000001.ptr.err” And when we check the klee test file, we can see which conditions run into this error:

ktest file : 'klee-out-1/test000001.ktest'
args       : ['test1.bc']
num objects: 1
object 0: name: 'x'
object 0: size: 4
object 0: data: b'\x03\x00\x00\x00'
object 0: hex : 0x03000000
object 0: int : 3
object 0: uint: 3

Debuggers

Manual breakpoints, watchpoints, memory inspection. Time Travel Debugging features (where available) can step backwards. Very versatile but manual and time consuming.

Dynamic Instrumentation

Tools like Valgrind or Dr.Memory insert checks into a binary at runtime. This means:

Slower than sanitizers
Coverage issues (like sanitizers)
No source code required

Target:

#include <iostream>
int foo(){
	int* x = new int[1024];
	for (int i = 0; i <= 1024; i++) {
		x[i] = 1;
	}
	return 1;
}


int main(int argc) {
    return foo();
}

Error #1: UNADDRESSABLE ACCESS beyond heap bounds: writing 0x0000000000c3a290-0x0000000000c3a294 4 byte(s)
# 0 foo                [C:\Users\User\testing.cpp:5]
# 1 main               [C:\Users\User\\testing.cpp:12]

( Disclaimer: This is a very simple and unreliable benchmark)

Rewriting (Lifting & Recompiling) with Sanitizers

Lifting is not a new technique, most solutions already utilize this.

Ida -> Hex-Rays Microcode
Binary Ninja -> BNIL
Valgrind -> Vex IR
Angr -> Vex IR
etc.

For today we will explore lifting into LLVM IR. Because we can recompile LLVM IR, and we have already existing tools for LLVM and it is easy to create new passes.

int foo(int* num) {
    return *num;
}

int foo2(int offset) {
    int* a = new int(5);
    auto result = foo( (a+offset) );
    delete a;
    return result;
}

int main(int argc) {
    return foo2(1);
}

Let’s say we want to sanitize foo. We can use a lifter to lift that function to LLVM IR, which would give us this if we use Mergen:

define i64 @foo(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory) local_unnamed_addr #0 {
real_return-5368738349-:
  %0 = inttoptr i64 %RCX to ptr
  %1 = load i32, ptr %0, align 4
  %2 = zext i32 %1 to i64
  ret i64 %2
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read) }

Then we can just append sanitize_address to function attributes, and run the ASan pass. Doing that will give us

$asan.module_ctor = comdat any

@llvm.used = appending global [1 x ptr] [ptr @asan.module_ctor], section "llvm.metadata"
@___asan_globals_registered = common hidden global i64 0
@__start_asan_globals = extern_weak hidden global i64
@__stop_asan_globals = extern_weak hidden global i64
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @asan.module_ctor, ptr @asan.module_ctor }]

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read)
define i64 @foo(i64 %RAX, i64 %RCX, i64 %RDX, i64 %RBX, i64 %RSP, i64 %RBP, i64 %RSI, i64 %RDI, i64 %R8, i64 %R9, i64 %R10, i64 %R11, i64 %R12, i64 %R13, i64 %R14, i64 %R15, ptr nocapture readnone %EIP, ptr nocapture readnone %memory) local_unnamed_addr #0 {
real_return-5368738349-:
  %0 = inttoptr i64 %RCX to ptr
  %1 = load i32, ptr %0, align 4
  %2 = zext i32 %1 to i64
  ret i64 %2
}

declare void @__asan_before_dynamic_init(i64)

declare void @__asan_after_dynamic_init()

declare void @__asan_register_globals(i64, i64)

declare void @__asan_unregister_globals(i64, i64)

declare void @__asan_register_image_globals(i64)

declare void @__asan_unregister_image_globals(i64)

declare void @__asan_register_elf_globals(i64, i64, i64)

declare void @__asan_unregister_elf_globals(i64, i64, i64)

declare void @__asan_init()

; Function Attrs: nounwind
define internal void @asan.module_ctor() #1 comdat {
  call void @__asan_init()
  call void @__asan_version_mismatch_check_v8()
  call void @__asan_register_elf_globals(i64 ptrtoint (ptr @___asan_globals_registered to i64), i64 ptrtoint (ptr @__start_asan_globals to i64), i64 ptrtoint (ptr @__stop_asan_globals to i64))
  ret void
}

declare void @__asan_version_mismatch_check_v8()

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read) }
attributes #1 = { nounwind }

Obviously we can’t use this directly. Not only this doesnt have an entrypoint, registers on foo is also not correct. We could fix the register by ourselves, or write a wrapper to load the registers.

This way we will have a binary that we can run and fuzz faster than dynamic instrumentation.

( Disclaimer: This is a very simple and unreliable benchmark. In theory, you should be able to achieve the same performance with regular asan or possibly even more performance because this new rewritten binary might benefit from better optimization passes.)

Lifting to benefit from existing tools

Let’s try using lifted LLVM IR + symbolic execution (KLEE).

int foo(int* a,int x) {
    int result = 0;
    if (x >= 0 && x <= 3) {
        result = a[x];
    }
    return result;
}

int main() {
    int* a = new int[3]{5, 10, 15};
    int x;
    std::cin >> x;
    return foo(a, x);
}

After lifting foo, we get this.

%CTX = type { i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64, i1, i1, i1, i1, i1, i1, i1, i1, i1, i1, i1, i1, i1 }

define i64 @foo(%CTX %0, ptr noalias nocapture readnone %1) local_unnamed_addr #0 {
previousjmp_block-0-:
  %.fca.1.extract = extractvalue %CTX %0, 1
  %.fca.2.extract = extractvalue %CTX %0, 2
  %.fca.4.extract = extractvalue %CTX %0, 4
  %2 = trunc i64 %.fca.2.extract to i32
  %3 = inttoptr i64 %.fca.4.extract to ptr
  %4 = getelementptr i8, ptr %3, i64 -4
  store i32 %2, ptr %4, align 4
  %5 = getelementptr i8, ptr %3, i64 -16
  store i64 %.fca.1.extract, ptr %5, align 4
  %6 = getelementptr i8, ptr %3, i64 -20
  store i32 0, ptr %6, align 4
  %or.cond = icmp ult i32 %2, 4
  br i1 %or.cond, label %real_return-5368757645-164, label %common.ret

common.ret:
  %common.ret.op.in = phi i32 [ %10, %real_return-5368757645-164 ], [ 0, %previousjmp_block-0- ]
  %common.ret.op = zext i32 %common.ret.op.in to i64
  ret i64 %common.ret.op

real_return-5368757645-164:
  %7 = shl i64 %.fca.2.extract, 2
  %mul_ea = and i64 %7, 17179869180
  %8 = inttoptr i64 %.fca.1.extract to ptr
  %9 = getelementptr i8, ptr %8, i64 %mul_ea
  %10 = load i32, ptr %9, align 4
  store i32 %10, ptr %6, align 4
  br label %common.ret
}

attributes #0 = { nofree }

And when we check the klee test file,

ktest file : 'klee-out-1/test000001.ktest'
args       : ['test2.bc']
num objects: 1
object 0: name: 'x'
object 0: size: 4
object 0: data: b'\x03\x00\x00\x00'
object 0: hex : 0x03000000
object 0: int : 3
object 0: uint: 3

Lifting to LLVM IR gives you a version that can be symbolically executed. But there’s a big downside: it requires:

1- Manually preparing the IR

2- Writing wrappers

3- Symbolizing memory/registers/functions/syscalls.

So while it works, it’s not plug-and-play.

TL:DR

There are many established ways to find errors in a program. If you have the source code for the program, you have mature and easy to setup tools.

Without source, lifting into LLVM and running sanitizers/symbolic execution can work and might be very fast and powerful with enough effort for the binaries without source code. My tool isn’t mature enough (yet) to automate all this. If you want a more usable experience, check out the tools in the resources.