[Virtualization] Kubernetes and gVisor

Kubernetes

  • We need more than just packing and isolation: Scheduling, Lifecycle and health, Discovery, Monitoring, Auth{n,z}, Aggregates, Scaling, …
  • Everything at Google runs in containers: Gmail, Web Search, Maps, …
  • Open Source Containers: Kubernetes
    • Container orchestration
    • Builds on Docker containers
    • Multiple cloud and bare-metal environments
    • Supports existing OSS apps
    • Inspried and informed by Google’s experiences and internal systems
    • 100% open source, written in Go
    • Lets users manage applications, not machines

Primary concepts

  • Container: A sealed application package (Docker)
  • Pod: A small group of tightly coupled Containers
  • Labels: Identifying metadata attached to objects
  • Selector: A query against labels, producing a set result
  • Controller: A reconciliation loop that drives current state towards desired state
  • Service: A set of pods that work together

Pod

  • a Kubernetes abstraction that represents a group of one or more application containers, and some shared resources for those containers
    • Shared storage, as Volumes
    • Networking, as a unique cluster IP address
    • Information about how to run each container, such as the container image version or specific ports to use

Node

  • A node is a worker machine (either VM or physical machine)
  • One pod runs on one node, one node can run multiple pods
  • Nodes managed by control plane

Persistent Volumes

  • A higher-level abstraction – insulation from any one cloud environment
  • Admin provisions them, user claim them
  • Independent lifetime and fate
  • Can be handed-off between pods and lives until user is done with it
  • Dynamically “scheduled” and managed, like nodes and pods

Labels

  • Arbitrary metadata
  • Attached to any API object
  • Generally represent identity
  • Queryable by selectors
  • The only grouping mechanism
  • Use to determine which objects to apply an operation to

Pod lifecycle

  • Once scheduled to a node, pods do not move
  • Pods can be observed pending, running, succeeded, or failed
  • Pods are not rescheduled by the scheduler or apiserver
  • Apps should consider these rules

Internals

kube-apiserver

  • Provides a forward facing REST interface into the Kubernetes control plane and datastore
  • All clients and other applications interact with Kubernetes strictly through the API server
  • Acts as the gatekeeper to the cluster by handling authentication and authorization, request validation, mutation, and admission control in addition to being the front-end to the backing datastore

kube-controller-manager

  • Monitors the cluster state via the apiserver and steers the cluster towards the desired state

kube-scheduler

  • Component on the master that watches newly created pods that have no node assigned, and selects a node for them to run on
  • Factors taken into account for scheduling decisions include individual and collective resource requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference and deadlines

cloud-controller-manager

  • Node Controller: For checking the cloud provider to determining if a node has been deleted in the cloud after it stops responding
  • Route Controller: For setting up routes in the underlying cloud infrastructure
  • Service Controller: For creating, updating, and deleting cloud provider load balancers
  • Volume Controller: For creating, attaching, and mounting volumes, and interacting with the cloud provider to orchestrate volumes

etcd

  • etcd: an atomic key-value store that uses Raft consensus
  • Backing store for all control plane metadata
  • Provides a strong, consistent and highly available key-value store for persisting cluster state
  • Stores objets and config information

Node Components

kubelet

  • An agent that runs on each node in the cluster. It makes sure that containers are running in a pod
  • The kubelet takes a set of PodSpecs that are provided through various mechanisms and ensures that the containers described in those PodSpecs are running and healthy

kube-proxy

  • Manages the network rules on each node
  • Performs connection forwarding or load balancing for Kubernetes cluster services

gVisor

“Containers do not contain” – Dan Walsh

  • Still sharing the same kernel
  • Share same device drivers
  • Linux kernel represents a large attack surface
  • cgroup accounting may not be accurate

Are System Calls Secure?

  • The interface between containers and OS is system calls
  • Linux x86_64 has 319 64-bit syscalls
  • 2046 CVEs since 1999

Why can VMs be More Secure?

  • Virtual machines
    • Independent guest kernels
    • Virtual hardware interface: clear privilege separation and state encapsulation
    • But virtualized hardware interface is inflexible and VM is heavy weight with large memory footprint

Sandboxing

  • Rule-based sandboxing: reduce the attack surface by restricting what applications can access
    • e.g., AppArmor, SELinux, Secomp-bpf
    • Rules can be fragile (not properly capture threats) and can’t prevent side channel attacks

gVisor

  • Sandboxes untrusted applications
  • Implements Linux system API in user space
  • Secure by default
  • Written in Go, a memory/type-safe language

gVisor Architecture

  • Two separate processes (communicated through IPC
    • Sentry: emulated Linux system calls in user space
    • Gofer: file access
  • Most exploited syscalls: socket and open
    • Even if sentry is compromised, still can’t access files or open ports
  • Network is handled by user-mode network stack in Sentry

Trapping System Calls

  • Two modes supported
  • ptrace
  • KVM

Reference

Kubernetes 한글: https://kubernetes.io/ko/docs/concepts/overview/what-is-kubernetes/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.