Why study virtualization?
- Almost all cloud applications run in the virtualization environment
- Most IT infrastructures run in the cloud or on-prem virtualization environment
- Understanding virtualization is key to building cloud infrastructures
- Understanding virtualization will help application design
Operating Systems
- A piece of software that manages and virtualizes hardware for applications
– An indirection layer between applications and hardware
– Provides a high-level interface to applications
– While interact with hardware devices with low-level interfaces
– Runs privileged instructions to interact with hardware devices - Applications
– Can only execute unprivileged instructions
– Perform system calls or faults to “trap” into OS
– OS protect applications from each other (to some extent) (e.g., address space)
Virtualization
- Adding another level of indirection to run OSes on an abstraction of hardware
- Virtual Machine (Guest OS)
– OS that runs on virtualized hardware resources
– Manages by another software (VMM/Hypervisor) - Virtual Machine Monitor (Hypervisor)
– The software that creates and manages the execution of virtual machines
– Runs on bare-metal hardware
History
Mainframes and IBM
- Before we have datacenters or PCs, there were giant metal frames
- Support computational and I/O intensive commercial/scientific workloads
- Expensive (IBM 704 (1954) costs $250K to millions)
- “IBM and the seven dwarfs” – their heyday was the late ’50s through ’70s
Issues with Early Mainframes
- Different generations were not architecturally compatible
- Batch-oriented (against interactive)
- Meanwhile, ideas started to appear towards a time-sharing OS
- The computer was becoming a multiplexed tool for a community of users, instead of being a batch tool for wizard programmers
IBM’s Response
- IBM bet the company on the System/360 hardware family [1964]
– S/360 was the first to clearly distinguish architecture and implementation
– Its architecture was virtualizable - The CP/CMS system software [1968]
– CP: a “control program” that created and managed virtual S/360 machines
– CMS: the “Cambridge monitor system” — a lightweight, single-user OS
– With CP/CMS, can run several different OSs concurrently on the same HW - IBM CP/CMS is the first virtualization systems. Main purpose: multiple users can share a mainframe
IBM’s Mainframe Product Line
- System/360 (1964-1970)
– Support virtualization via CP/CMS, channel I/O, virtual memory, … - System/370 (1970-1988)
– Reimplementation of CP/CMS as VM/370 - System/390 (1900-2000)
- zSeries (2000-present)
- Huge moneymaker for IBM, and may business still depend on these!
PCs and Multi-User OSes
- 1976: Steve Jobs and Steve Wozniak start Apple Computers and roll out the Apple I, the first computer with a single-circuit board
- 1981: The first IBM personal computer, code-named “Acorn,” is introduced. It uses Microsoft’s MS-DOS
- 1983: Apple’s Lisa is the first personal computer with a GUI
- 1985: Microsoft announces Windows
- The PC market (1980-90s): ship hundreds of millions of units, not hundreds of units
- Cluster computing (1990s): build a cheap mainframe out of a cluster of PCs
Multiprocessor and Stanford FLASH
- Development of multiprocessor hardware boomed (1990s)
- Stanford FLASH Multiprocessor
– A multiprocessor that integrates global cache coherence and message passing - But system software lagged behind
- Commodity OSes do not scale and cannot isolate/contain faults
Stanford Disco and VMWare
- Stanford Disco project (SOSP’97 Mendel Rosenblum et al.)
– Extend modern OS to run efficiently on shared memory multiprocessors
– A VMM built to run multiple copies of Silicon Graphics IRIX OS on FLASH - Mendel Rosenblum, Diane Greene, and others co-founded VMWare in 1998
– Brought virtualization to PCs. Main purpose: run different OSes on different architectures
– Initial market was software developers for testing software in multiple OSes
– Acquired by EMC (2003), which later merged with DELL (2016)
Server Consolidation
- Datacenters often run many services (e.g., search, mail server, database)
– Easier to manage by running one service per machine
– Leads to low resource utilization - Virtualization can “consolidate” servers by hosting many VMs per machine, each running one service
– Higher resource utilization while still delivering manageability
The Cloud Era
- The cloud revolution is what really took virtualization on
- Instead of renting physical machines, rent VMs
– Better consolidation and resource utilization
– Better portability and manageability
– Easy to deploy and maintain software
– However, raise certain security and QoS concerns - Many instance types, some with specialized hardware; all well maintained and patched
– AWS: 241 instance types in 30 families (as of Dec 2019)
The Virtuous Cycle for Cloud Providers
- More customers utilize more resources
- Greater utilization of resources requires more infrastructures
- Buying more infrastructure in volume leads to lower unit costs
- Lower unit costs allow for lower customer prices
- Lower prices attract more customers
Container
- VMs run a complete OS on emulated hardware
– Too heavy-weighted and unnecessary for many cloud usages
– Need to maintain OS versions, libraries, and make sure applications are compatible - Containers (e.g., Docker, LXC)
– Run multiple isolated user-space applications on the host OS
– Much more lightweight: better runtime performance, less memory, faster startup
– Easier to deploy and maintain applications
– But doesn’t provide as strong security boundaries as VMs
Managing Containers
- Need a way to manage a cluster of containers
– Handle failure, scheduling, monitoring, authentication, etc. - Kubernetes: the most popular container orchestration today
- Cloud providers also offer various container orchestration service
– e.g., AWS ECS, EKS
Serverless Computing
- VMs and containers in cloud still need to be “managed”
- Is there a way to just write software and let the cloud do all the rest?
- Serverless computing (mainly in the form of Function as a Service)
– Autoscaled and billed by request load
– No need to manage “server cluster” or handle failure
– A lot less control and customization (e.g., fixed CPU/memory/memory ratio, no direct communication across functions, no easy way to maintain states)
Summary of Virtualization History
- Invented by IBM in 1960s for sharing expensive mainframes
- Popular research ideas in 1960s and 1970s
- Interest died as the adoption of cheap PCs and multi-user OSes surged in 1980s
- A (somewhat accidental) research idea got transferred to VMWare
- Real adoption happend with the growth of cloud computing
- New forms of virtualization: container and serverless, in the modern cloud era