x86 Difficulties

- Not all sensitive instructions are privileged with x86, i.e., non-virtualizable processor
- These instructions do not trap and behave differently in kernel and user mode
Possible Solutions
- Emulate: interpret each instruction, super slow (e.g., Virtual PC on Mac)
- Binary translation: rewrite non-virtualizable instructions (e.g., VMware)
- Para-virtualization: modify guest OS to avoid non-virtualizable instructions (e.g., Xen)
- Change hardware: add new CPU mode, extend page table, and other hardware assistance (e.g., Intel VT-x, EPT, VT-d, AMD-V)
Full Emulation / Hosted Interpretation
- VMM implements the complete hardware architecture in software
- VMM steps through VM’s instructions and update emulated hardware as needed
- Can handle all types of instructions, but super slow
Trap-and-Emulate

Basic Idea of Binary Translation
- Based on input guest binary, compile (translate) instructions in a cache and run them directly
- Challenges:
– Protection of the cache
– Correctness of direct memory addressing
– Handling relative memory addressing (e.g., jumps)
– Handling sensitive instructions
VMware’s Dynamic Binary Translation
- Binary: input is binary x86 code
- Dynamic: translation happens at runtime
- On demand: code is translated only when it is about to execute
- System level: rules set by x86 ISA, not higher-level ABIs
- Subsetting: output a safe subset of input full x86 instruction set
- Adaptive: translated code is adjusted according to guest behavior changes
Translation Unit
- TU: 12 instructions or a “terminating” instruction (a basic code block)
- Why TU as the unit not individual instruction? (overhead)
- TU -> Compiled Code Fragment (CCF)
- CCF stored in translation cache (TC)
- At the end of each CCF, call into translator (implemented by the VMM) to decide and translate the next TU (more optimization soon)
– If the destination code is already in TC, then directly jumps to it
– Otherwise, compiles the next CCF into TC
Architecture of VMware’s Binary Translation

IDENT/Non-IDENT Translation
- Most instructions can be translated IDENT (do nothing to the instructions), except for
- PC-relative address
- Direct control flow
- Indirect control flow
- Sensitive instructions
– If already traps, then can be handled when it traps (more optimization soon to be discussed)
– Otherwise, replace it with a call to the emulation function
Adaptive Binary Translation
- Binary translation can outperform classical virtualization by avoiding traps
– rdtsc on Pentium 4: trap-and-emulate 2030 cycles, callout-and-emulate 1254 cycles, in-TC emulation 216 cycles - What about sensitive instructions that are not priviledged?
– “Innocent until proven guilty”
– Start in the innocent state and detect instructions that trap frequently
. Retranslate non-IDENT to avoid the trap
. Patch the original IDENT translation with a forwarding jump to the new translation
Hardware-Assisted CPU Virtualization (Intel VT-x)
- Two new modes of execution (orthogonal to protection rings)
– VMX root mode: same as x86 without VT-x
– VMX non-root mode: runs VM, sensitive instructions cause transition to root mode, even in Ring 0 - New hardware structure: VMCS (virtual machine control structure)
– One VMCS for one virtual processor
– Configured by VMM to determine which sensitive instructions cause VM exit
– Specifies guest OS state
Example: Guest syscall with Hardware Virtualization
- VMM fills VMCS exception table for guest OS and sets bit in VMCS to not exit on syscall exception, VMM executes VM entry
- Guest application invokes a syscall, does not trap, but go to the VMCS exception table
Conclusion
- Virtualizing CPU is a non-trivial task, esp. for non-virtualizable architectures like x86
- Software binary translation is a neat (but very tricky) way to virtualize x86 and still meet Popek and Goldberg’s virtualization principles
- Hardware vendors keep adding more virtualization support, which makes life a lot easier
- Software and hardware techniques both have pros and cons