Wednesday, October 26, 2011

Disassemble

A neat trick that Factor provides is the ability to disassemble functions into the machine code that is generated by the compiler. In 2008, Slava Pestov created a disassembler, and has improved it a bit since then (switching to udis86 for its implementation).

Constant Folding

The compiler performs constant folding, using the compiler.tree.debugger vocabulary, you can output the optimized form of a quotation:

( scratchpad ) [ 2 2 + ] optimized.
[ 4 ]

Using the disassembler, you can see the machine code this generates:

( scratchpad ) [ 2 2 + ] disassemble
011c1a5530: 4983c608        add r14, 0x8
011c1a5534: 49c70640000000  mov qword [r14], 0x40
011c1a553b: c3              ret 
011c1a553c: 0000            add [rax], al
011c1a553e: 0000            add [rax], al

Local Variables

One of the questions that comes up sometimes is whether local variables affect performance. We can examine two words that add numbers together, one using locals and one just using the stack:

( scratchpad ) : foo ( x y -- z ) + ;

( scratchpad ) :: bar ( x y -- z ) x y + ;

The "optimized output" looks a little different:

( scratchpad ) \ foo optimized.
[ + ]

( scratchpad ) \ bar optimized.
[ "COMPLEX SHUFFLE" "COMPLEX SHUFFLE" R> + ]

But, the machine code that is generated is identical:

( scratchpad ) \ foo disassemble
01115de7b0: 488d1d05000000  lea rbx, [rip+0x5]
01115de7b7: e9e49439ff      jmp 0x110977ca0 (+)
01115de7bc: 0000            add [rax], al
01115de7be: 0000            add [rax], al

( scratchpad ) \ bar disassemble
01115ef620: 488d1d05000000  lea rbx, [rip+0x5]
01115ef627: e9748638ff      jmp 0x110977ca0 (+)
01115ef62c: 0000            add [rax], al
01115ef62e: 0000            add [rax], al

Dynamic Variables

Another frequently used feature is dynamic variables, implemented by the namespaces vocabulary. For example, the definition of the print word looks for the current value of the output-stream variable and then calls stream-print on it:

( scratchpad ) \ print see
USING: namespaces ;
IN: io
: print ( str -- ) output-stream get stream-print ; inline

The optimized output inlines the implementation of get:

( scratchpad ) [ "Hello, world" print ] optimized.
[
    "Hello, world" \ output-stream 0 context-object assoc-stack
    stream-print
]

You can inspect the machine code generated, seeing references to the factor words that are being called:

( scratchpad ) [ "Hello, world" print ] disassemble
011c0c6c40: 4c8d1df9ffffff        lea r11, [rip-0x7]
011c0c6c47: 6820000000            push dword 0x20
011c0c6c4c: 4153                  push r11
011c0c6c4e: 4883ec08              sub rsp, 0x8
011c0c6c52: 4983c618              add r14, 0x18
011c0c6c56: 48b8dbc5a31a01000000  mov rax, 0x11aa3c5db
011c0c6c60: 498946f0              mov [r14-0x10], rax
011c0c6c64: 498b4500              mov rax, [r13+0x0]
011c0c6c68: 488b4040              mov rax, [rax+0x40]
011c0c6c6c: 498906                mov [r14], rax
011c0c6c6f: 48b86c91810e01000000  mov rax, 0x10e81916c
011c0c6c79: 498946f8              mov [r14-0x8], rax
011c0c6c7d: e8de4e36ff            call 0x11b42bb60 (assoc-stack)
011c0c6c82: 4883c418              add rsp, 0x18
011c0c6c86: 488d1d05000000        lea rbx, [rip+0x5]
011c0c6c8d: e94e5264ff            jmp 0x11b70bee0 (stream-print)
011c0c6c92: 0000                  add [rax], al
011c0c6c94: 0000                  add [rax], al
011c0c6c96: 0000                  add [rax], al
011c0c6c98: 0000                  add [rax], al
011c0c6c9a: 0000                  add [rax], al
011c0c6c9c: 0000                  add [rax], al
011c0c6c9e: 0000                  add [rax], al

1 comment:

Slava Pestov said...

"The compiler performs constant folding, using the compiler.tree.debugger vocabulary, you can output the optimized form of a quotation:"

Actually it's compiler.tree.optimizer. The debugger just prints optimizer output, it is not used as part of normal execution.