Powerful Disassembler Library For AMD64

diStorm [pronounced dee┬Ěstorm]

Definition: A lightweight, Easy-to-Use and Fast Disassembler/Decomposer Library for x86/AMD64. A Decomposer means that you get a binary structure that describes an instruction rather than textual representation.

What's New?

diStorm version 3.3 -- Now Available For Commercial Use
diStorm3 includes the following new features:
diStorm3 also supports:
diStorm3 is dual-licensed under the GPL and a commercial license.

Showcase

This is a really simple example of how to use diStorm and how the results look, so you get the idea. We take a compiled C function and show how it gets once disassembled and once again decomposed. It shows all the fields of the binary structures you get from using diStorm. Users are strongly advised to read the documentation in order to use diStorm with all its glory. For more information about using the decode API, refer to the C Sample page. And for the decompose API refer to the Decompose Interface.

If you wish to see a complete code sample of how to decode a whole binary file, take a look at this sample.

When you want to display disassembly listing of binary code, you should use the decode API.
However, when you want to analyze code and extract more information about the instruction itself and its operands you should use the decompose API.
For instance, you get "EAX" as register-index 0, or you get "ADD" instruction as I_ADD enum.


Original function in C:

int f(int a, int b)
{
        return a + b;
}

Compiled and assembled to:
bin = 0x55 0x8b 0xec 0x8b 0x45 0x08 0x03 0x45 0x0c 0xc9 0xc3

Basic API distorm_decode usage example:

   _DecodeResult res;
   _DecodedInst disassembled[MAX_INSTRUCTIONS];
   unsigned int decodedInstructionsCount = 0;
   _OffsetType offset = 0;

   res = distorm_decode(offset,
                  (const unsigned char*)bin,
                  sizeof(bin),
                  Decode32Bits,
                  disassembled,
                  MAX_INSTRUCTIONS,
                  &decodedInstructionsCount);
   for (int i = 0; i < decodedInstructionsCount; i++) {
      printf("%08I64x (%02d) %-24s %s%s%s\r\n",
                disassembled[i].offset,
                disassembled[i].size,
                (char*)disassembled[i].instructionHex.p,
                (char*)disassembled[i].mnemonic.p,
                disassembled[i].operands.length != 0 ? " " : "",
                (char*)disassembled[i].operands.p);
   }

Output:
00000000 (01) 55                       PUSH EBP
00000001 (02) 8bec                     MOV EBP, ESP
00000003 (03) 8b4508                   MOV EAX, [EBP+0×8]
00000006 (03) 03450c                   ADD EAX, [EBP+0xc]
00000009 (01) c9                       LEAVE
0000000a (01) c3                       RET

Since this is the basic API, you only get the address of the instruction, its size in bytes, the textual mnemonic, textual operands and the bytes of the instruction in hex.
The following is a dump of the returned array of _DecodedInst structures.

Dump of the disassembled array:
-       disassembled[0]
+               mnemonic        {length=4 p="PUSH" }
+               operands        {length=3 p="EBP" }
+               instructionHex  {length=2 p="55" }
                size    1
                offset  0

-       disassemled[1]
+               mnemonic        {length=3 p="MOV" }
+               operands        {length=8 p="EBP, ESP" }
+               instructionHex  {length=4 p="8bec" }
                size    2
                offset  1

-       disassembled[2]
+               mnemonic        {length=3 p="MOV" }
+               operands        {length=14 p="EAX, [EBP+0x8]" }
+               instructionHex  {length=6 p="8b4508" }
                size    3
                offset  3

-       disassembled[3]
+               mnemonic        {length=3 p="ADD" }
+               operands        {length=14 p="EAX, [EBP+0xc]" }
+               instructionHex  {length=6 p="03450c" }
                size    3
                offset  6

-       disassembled[4]
+               mnemonic        {length=5 p="LEAVE" }
+               operands        {length=0 p="" }
+               instructionHex  {length=2 p="c9" }
                size    1
                offset  9

-       disassembled[5]
+               mnemonic        {length=3 p="RET" }
+               operands        {length=0 p="" }
+               instructionHex  {length=2 p="c3" }
                size    1
                offset  10



New API distorm_decompose example:

The distorm_decompose function requires to set up a tiny structure with the relevant binary stream to decompose.

   _DInst decomposed[MAX_INSTRUCTIONS];
   _CodeInfo ci = {0};
   ci.code = bin;
   ci.codeLen = sizeof(bin);
   ci.dt = Decode32Bits;
   res = distorm_decompose(&ci, decomposed, 10, &decodedInstructionsCount);
   for (int i = 0; i < decodedInstructionsCount; i++) { printf(

Dump:
To see what the binary structure _DInst holds upon return of the call to distorm_decompose, refer to Structure Layout. It will show you how to interpret the fields. Basically, it gives you all the information about the instruction itself and the operands too. In order to extract some of the information you will have to use some macros and enum's to ease the process. Note that at this point no text is returned at all. However you can use the distorm_format to convert this structure into the basic's API structure, if you're interested in the textual representation of the instruction.

- decomposed[0]
                addr    0
                size    1
                flags   1280 – FLAG_GET_OPSIZE(1280): Decode32Bits, FLAG_GET_ADDRSIZE(1280): Decode32Bits
                segment R_NONE
                base    R_NONE
                scale   0
                dispSize        0
                opcode  I_PUSH
-               ops[0]
                        type    O_REG
                        index   R_EBP
                        size    32
                disp    0
                imm     0
                unusedPrefixesMask      0
                meta    8 – META_GET_ISC(8): ISC_INTEGER
                usedRegistersMask       32

- decomposed[1]
                addr    1
                size    2
                flags   1344 – FLAG_DST_WR, FLAG_GET_OPSIZE(1280): Decode32Bits, FLAG_GET_ADDRSIZE(1280): Decode32Bits
                segment R_NONE
                base    R_NONE
                scale   0
                dispSize        0
                opcode  I_MOV
-               ops[0]
                        type    O_REG
                        index   R_EBP
                        size    32
-               ops[1]
                        type    O_REG
                        index   R_ESP
                        size    32
                disp    0
                imm     0
                unusedPrefixesMask      0
                meta    8 – META_GET_ISC(8): ISC_INTEGER
                usedRegistersMask       48

- decomposed[2]
                addr    3
                size    3
                flags   1344 – FLAG_DST_WR, FLAG_GET_OPSIZE(1280): Decode32Bits, FLAG_GET_ADDRSIZE(1280): Decode32Bits
                segment 198 – SEGMENT_IS_DEFAULT(198): TRUE, SEGMENT_GET(198): R_SS
                base    R_NONE
                scale   0
                dispSize        8
                opcode  I_MOV
-               ops[0]
                        type    O_REG
                        index   R_EAX
                        size    32
-               ops[1]
                        type    O_SMEM
                        index   R_EBP
                        size    32
                disp    8
                imm     0
                unusedPrefixesMask      0
                meta    8 – META_GET_ISC(8): ISC_INTEGER
                usedRegistersMask       33

- decomposed[3]
                addr    6
                size    3
                flags   1344
                segment 198 – SEGMENT_IS_DEFAULT(198): TRUE, SEGMENT_GET(198): R_SS
                base    R_NONE
                scale   0
                dispSize        8
                opcode  I_ADD
-               ops[0]
                        type    O_REG
                        index   R_EAX
                        size    32
-               ops[1]
                        type    O_SMEM
                        index   R_EBP
                        size    32
                disp    12
                imm     0
                unusedPrefixesMask      0
                meta    8 – META_GET_ISC(8): ISC_INTEGER
                usedRegistersMask       33

- decomposed[4]
                addr    9
                size    1
                flags   1280 – FLAG_GET_OPSIZE(1280): Decode32Bits, FLAG_GET_ADDRSIZE(1280): Decode32Bits
                segment R_NONE
                base    R_NONE
                scale   0
                dispSize        0
                opcode  I_LEAVE
                ops     0
                disp    0
                imm     0
                unusedPrefixesMask      0
                meta    8 – META_GET_ISC(8): ISC_INTEGER
                usedRegistersMask       0

- decomposed[5]
                addr    10
                size    1
                flags   1280 – FLAG_GET_OPSIZE(1280): Decode32Bits, FLAG_GET_ADDRSIZE(1280): Decode32Bits
                segment R_NONE
                base    R_NONE
                scale   0
                dispSize        0
                opcode  I_RET
                ops     0
                disp    0
                imm     0
                unusedPrefixesMask      0
                meta    10 – META_GET_ISC(10): ISC_INTEGER, META_GET_FC(10): FC_RET
                usedRegistersMask       0

Overview

Today with quickly evolving malware and viruses you have to analyze more code, accurately and faster. diStorm is a great solution to integrate in your binary code analysis algorithms. It is already being used in many open source projects, API hooking libraries, shellcode-searching, binary code-analysis and other fields. diStorm has been an open source project since its beginning in 2005, it is very robust and mature and being used widely all over the world. diStorm is the fastest disassembler in the world and is still highly maintained and updated by its creator.

Publications

diStorm is being used in many applications already, to name a few in no specific order:
  • Apple Shark Profiler
  • SolidShield Server Side Protector
  • DFSee Low Level disk tools
  • RotateRight Zoom Profiler
  • IdaStealth Plugin
  • BinNavi Input Generator
  • WinAppDbg

Appearances in Books