\chapter{Introduction} \input{abstract.tex} \begin{description} \item[Straightforward:] Virtio devices use normal bus mechanisms of interrupts and DMA which should be familiar to any device driver author. There is no exotic page-flipping or COW mechanism: it's just a normal device.\footnote{This lack of page-sharing implies that the implementation of the device (e.g. the hypervisor or host) needs full access to the guest memory. Communication with untrusted parties (i.e. inter-guest communication) requires copying. } \item[Efficient:] Virtio devices consist of rings of descriptors for both input and output, which are neatly laid out to avoid cache effects from both driver and device writing to the same cache lines. \item[Standard:] Virtio makes no assumptions about the environment in which it operates, beyond supporting the bus to which device is attached. In this specification, virtio devices are implemented over MMIO, Channel I/O and PCI bus transports \footnote{The Linux implementation further separates the virtio transport code from the specific virtio drivers: these drivers are shared between different transports. }, earlier drafts have been implemented on other buses not included here. \item[Extensible:] Virtio devices contain feature bits which are acknowledged by the guest operating system during device setup. This allows forwards and backwards compatibility: the device offers all the features it knows about, and the driver acknowledges those it understands and wishes to use. \end{description} \section{Normative References} \begin{longtable}{l p{5in}} \phantomsection\label{intro:rfc2119}\textbf{[RFC2119]} & Bradner S., ``Key words for use in RFCs to Indicate Requirement Levels'', BCP 14, RFC 2119, March 1997. \newline\url{http://www.ietf.org/rfc/rfc2119.txt}\\ \phantomsection\label{intro:S390 PoP}\textbf{[S390 PoP]} & z/Architecture Principles of Operation, IBM Publication SA22-7832, \newline\url{http://publibfi.boulder.ibm.com/epubs/pdf/dz9zr009.pdf}, and any future revisions\\ \phantomsection\label{intro:S390 Common I/O}\textbf{[S390 Common I/O]} & ESA/390 Common I/O-Device and Self-Description, IBM Publication SA22-7204, \newline\url{http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar501/CCONTENTS}, and any future revisions\\ \phantomsection\label{intro:PCI}\textbf{[PCI]} & Conventional PCI Specifications, \newline\url{http://www.pcisig.com/specifications/conventional/}, PCI-SIG\\ \phantomsection\label{intro:PCIe}\textbf{[PCIe]} & PCI Express Specifications \newline\url{http://www.pcisig.com/specifications/pciexpress/}, PCI-SIG\\ \phantomsection\label{intro:IEEE 802}\textbf{[IEEE 802]} & IEEE Standard for Local and Metropolitan Area Networks: Overview and Architecture, \newline\url{http://standards.ieee.org/about/get/802/802.html}, IEEE\\ \phantomsection\label{intro:SAM}\textbf{[SAM]} & SCSI Architectural Model, \newline\url{http://www.t10.org/cgi-bin/ac.pl?t=f&f=sam4r05.pdf}\\ \phantomsection\label{intro:SCSI MMC}\textbf{[SCSI MMC]} & SCSI Multimedia Commands, \newline\url{http://www.t10.org/cgi-bin/ac.pl?t=f&f=mmc6r00.pdf}\\ \end{longtable} \section{Non-Normative References} \begin{longtable}{l p{5in}} \phantomsection\label{intro:Virtio PCI Draft}\textbf{[Virtio PCI Draft]} & Virtio PCI Draft Specification \newline\url{http://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf}\\ \end{longtable} \section{Terminology}\label{Terminology} The key words ``MUST'', ``MUST NOT'', ``REQUIRED'', ``SHALL'', ``SHALL NOT'', ``SHOULD'', ``SHOULD NOT'', ``RECOMMENDED'', ``MAY'', and ``OPTIONAL'' in this document are to be interpreted as described in \hyperref[intro:rfc2119]{[RFC2119]}. \subsection{Legacy Interface: Terminology}\label{intro:Legacy Interface: Terminology} Earlier drafts of this specification (i.e. revisions before 1.0, see e.g. \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}) defined a similar, but different interface between the driver and the device. Since these are widely deployed, this specification accommodates OPTIONAL features to simplify transition from these earlier draft interfaces. Specifically devices and drivers MAY support: \begin{description} \item[Legacy Interface] is an interface specified by an earlier draft of this specification (before 1.0) \item[Legacy Device] is a device implemented before this specification was released, and implementing a legacy interface on the host side \item[Legacy Driver] is a driver implemented before this specification was released, and implementing a legacy interface on the guest side \end{description} Legacy devices and legacy drivers are not compliant with this specification. To simplify transition from these earlier draft interfaces, a device MAY implement: \begin{description} \item[Transitional Device] a device supporting both drivers conforming to this specification, and allowing legacy drivers. \end{description} Similarly, a driver MAY implement: \begin{description} \item[Transitional Driver] a driver supporting both devices conforming to this specification, and legacy devices. \end{description} \begin{note} Legacy interfaces are not required; ie. don't implement them unless you have a need for backwards compatibility! \end{note} Devices or drivers with no legacy compatibility are referred to as non-transitional devices and drivers, respectively. \subsection{Transition from earlier specification drafts}\label{sec:Transition from earlier specification drafts} For devices and drivers already implementing the legacy interface, some changes will have to be made to support this specification. In this case, it might be beneficial for the reader to focus on sections tagged "Legacy Interface" in the section title. These highlight the changes made since the earlier drafts. \section{Structure Specifications} Many device and driver in-memory structure layouts are documented using the C struct syntax. All structures are assumed to be without additional padding. To stress this, cases where common C compilers are known to insert extra padding within structures are tagged using the GNU C __attribute__((packed)) syntax. For the integer data types used in the structure definitions, the following conventions are used: \begin{description} \item[u8, u16, u32, u64] An unsigned integer of the specified length in bits. \item[le16, le32, le64] An unsigned integer of the specified length in bits, in little-endian byte order. \item[be16, be32, be64] An unsigned integer of the specified length in bits, in big-endian byte order. \end{description} Some of the fields to be defined in this specification don't start or don't end on a byte boundary. Such fields are called bit-fields. A set of bit-fields is always a sub-division of an integer typed field. Bit-fields within integer fields are always listed in order, from the least significant to the most significant bit. The bit-fields are considered unsigned integers of the specified width with the next in significance relationship of the bits preserved. For example: \begin{lstlisting} struct S { be16 { A : 15; B : 1; } x; be16 y; }; \end{lstlisting} documents the value A stored in the low 15 bit of \field{x} and the value B stored in the high bit of \field{x}, the 16-bit integer \field{x} in turn stored using the big-endian byte order at the beginning of the structure S, and being followed immediately by an unsigned integer \field{y} stored in big-endian byte order at an offset of 2 bytes (16 bits) from the beginning of the structure. Note that this notation somewhat resembles the C bitfield syntax but should not be naively converted to a bitfield notation for portable code: it matches the way bitfields are packed by C compilers on little-endian architectures but not the way bitfields are packed by C compilers on big-endian architectures. Assuming that CPU_TO_BE16 converts a 16-bit integer from a native CPU to the big-endian byte order, the following is the equivalent portable C code to generate a value to be stored into \field{x}: \begin{lstlisting} CPU_TO_BE16(B << 15 | A) \end{lstlisting} \newpage