summaryrefslogtreecommitdiff
path: root/content.tex
diff options
context:
space:
mode:
Diffstat (limited to 'content.tex')
-rw-r--r--content.tex5887
1 files changed, 5887 insertions, 0 deletions
diff --git a/content.tex b/content.tex
new file mode 100644
index 0000000..e57ebc5
--- /dev/null
+++ b/content.tex
@@ -0,0 +1,5887 @@
+\chapter{Basic Facilities of a Virtio Device}\label{sec:Basic Facilities of a Virtio Device}
+
+A virtio device is discovered and identified by a bus-specific method
+(see the bus specific sections: \ref{sec:Virtio Transport Options / Virtio Over PCI Bus}~\nameref{sec:Virtio Transport Options / Virtio Over PCI Bus},
+\ref{sec:Virtio Transport Options / Virtio Over MMIO}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO} and \ref{sec:Virtio Transport Options / Virtio Over Channel I/O}~\nameref{sec:Virtio Transport Options / Virtio Over Channel I/O}). Each
+device consists of the following parts:
+
+\begin{itemize}
+\item Device status field
+\item Feature bits
+\item Device Configuration space
+\item One or more virtqueues
+\end{itemize}
+
+\section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Device / Device Status Field}
+During device initialization by a driver,
+the driver follows the sequence of steps specified in
+\ref{sec:General Initialization And Device Operation / Device
+Initialization}.
+
+The \field{device status} field provides a simple low-level
+indication of the completed steps of this sequence.
+It's most useful to imagine it hooked up to traffic
+lights on the console indicating the status of each device. The
+following bits are defined (listed below in the order in which
+they would be typically set):
+\begin{description}
+\item[ACKNOWLEDGE (1)] Indicates that the guest OS has found the
+ device and recognized it as a valid virtio device.
+
+\item[DRIVER (2)] Indicates that the guest OS knows how to drive the
+ device.
+ \begin{note}
+ There could be a significant (or infinite) delay before setting
+ this bit. For example, under Linux, drivers can be loadable modules.
+ \end{note}
+
+\item[FAILED (128)] Indicates that something went wrong in the guest,
+ and it has given up on the device. This could be an internal
+ error, or the driver didn't like the device for some reason, or
+ even a fatal error during device operation.
+
+\item[FEATURES_OK (8)] Indicates that the driver has acknowledged all the
+ features it understands, and feature negotiation is complete.
+
+\item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
+ drive the device.
+
+\item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
+ an error from which it can't recover.
+\end{description}
+
+\drivernormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
+The driver MUST update \field{device status},
+setting bits to indicate the completed steps of the driver
+initialization sequence specified in
+\ref{sec:General Initialization And Device Operation / Device
+Initialization}.
+The driver MUST NOT clear a
+\field{device status} bit. If the driver sets the FAILED bit,
+the driver MUST later reset the device before attempting to re-initialize.
+
+The driver SHOULD NOT rely on completion of operations of a
+device if DEVICE_NEEDS_RESET is set.
+\begin{note}
+For example, the driver can't assume requests in flight will be
+completed if DEVICE_NEEDS_RESET is set, nor can it assume that
+they have not been completed. A good implementation will try to
+recover by issuing a reset.
+\end{note}
+
+\devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
+The device MUST initialize \field{device status} to 0 upon reset.
+
+The device MUST NOT consume buffers or notify the driver before DRIVER_OK.
+
+\label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
+that a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
+MUST send a device configuration change notification to the driver.
+
+\section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
+
+Each virtio device offers all the features it understands. During
+device initialization, the driver reads this and tells the device the
+subset that it accepts. The only way to renegotiate is to reset
+the device.
+
+This allows for forwards and backwards compatibility: if the device is
+enhanced with a new feature bit, older drivers will not write that
+feature bit back to the device. Similarly, if a driver is enhanced with a feature
+that the device doesn't support, it see the new feature is not offered.
+
+Feature bits are allocated as follows:
+
+\begin{description}
+\item[0 to 23] Feature bits for the specific device type
+
+\item[24 to 33] Feature bits reserved for extensions to the queue and
+ feature negotiation mechanisms
+
+\item[34 and above] Feature bits reserved for future extensions.
+\end{description}
+
+\begin{note}
+For example, feature bit 0 for a network device (i.e.
+Device ID 1) indicates that the device supports checksumming of
+packets.
+\end{note}
+
+In particular, new fields in the device configuration space are
+indicated by offering a new feature bit.
+
+\drivernormative{\subsection}{Feature Bits}{Basic Facilities of a Virtio Device / Feature Bits}
+The driver MUST NOT accept a feature which the device did not offer,
+and MUST NOT accept a feature which requires another feature which was
+not accepted.
+
+The driver SHOULD go into backwards compatibility mode
+if the device does not offer a feature it understands, otherwise MUST
+set the FAILED \field{device status} bit and cease initialization.
+
+\devicenormative{\subsection}{Feature Bits}{Basic Facilities of a Virtio Device / Feature Bits}
+The device MUST NOT offer a feature which requires another feature
+which was not offered. The device SHOULD accept any valid subset
+of features the driver accepts, otherwise it MUST fail to set the
+FEATURES_OK \field{device status} bit when the driver writes it.
+
+\subsection{Legacy Interface: A Note on Feature
+Bits}\label{sec:Basic Facilities of a Virtio Device / Feature
+Bits / Legacy Interface: A Note on Feature Bits}
+
+Transitional Drivers MUST detect Legacy Devices by detecting that
+the feature bit VIRTIO_F_VERSION_1 is not offered.
+Transitional devices MUST detect Legacy drivers by detecting that
+VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
+
+In this case device is used through the legacy interface.
+
+Legacy interface support is OPTIONAL.
+Thus, both transitional and non-transitional devices and
+drivers are compliant with this specification.
+
+Requirements pertaining to transitional devices and drivers
+is contained in sections named 'Legacy Interface' like this one.
+
+When device is used through the legacy interface, transitional
+devices and transitional drivers MUST operate according to the
+requirements documented within these legacy interface sections.
+Specification text within these sections generally does not apply
+to non-transitional devices.
+
+\section{Device Configuration Space}\label{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
+
+Device configuration space is generally used for rarely-changing or
+initialization-time parameters. Where configuration fields are
+optional, their existence is indicated by feature bits: Future
+versions of this specification will likely extend the device
+configuration space by adding extra fields at the tail.
+
+\begin{note}
+The device configuration space uses the little-endian format
+for multi-byte fields.
+\end{note}
+
+Each transport also provides a generation count for the device configuration
+space, which will change whenever there is a possibility that two
+accesses to the device configuration space can see different versions of that
+space.
+
+\drivernormative{\subsection}{Device Configuration Space}{Basic Facilities of a Virtio Device / Device Configuration Space}
+Drivers MUST NOT assume reads from
+fields greater than 32 bits wide are atomic, nor are reads from
+multiple fields: drivers SHOULD read device configuration space fields like so:
+
+\begin{lstlisting}
+u32 before, after;
+do {
+ before = get_config_generation(device);
+ // read config entry/entries.
+ after = get_config_generation(device);
+} while (after != before);
+\end{lstlisting}
+
+For optional configuration space fields, the driver MUST check that the
+corresponding feature is offered before accessing that part of the configuration
+space.
+\begin{note}
+See section \ref{sec:General Initialization And Device Operation / Device Initialization} for details on feature negotiation.
+\end{note}
+
+Drivers MUST
+NOT limit structure size and device configuration space size. Instead,
+drivers SHOULD only check that device configuration space is {\em large enough} to
+contain the fields necessary for device operation.
+
+\begin{note}
+For example, if the specification states that device configuration
+space 'includes a single 8-bit field' drivers should understand this to mean that
+the device configuration space might also include an arbitrary amount of
+tail padding, and accept any device configuration space size equal to or
+greater than the specified 8-bit size.
+\end{note}
+
+\devicenormative{\subsection}{Device Configuration Space}{Basic Facilities of a Virtio Device / Device Configuration Space}
+The device MUST allow reading of any device-specific configuration
+field before FEATURES_OK is set by the driver. This includes fields which are
+conditional on feature bits, as long as those feature bits are offered
+by the device.
+
+\subsection{Legacy Interface: A Note on Device Configuration Space endian-ness}\label{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: A Note on Configuration Space endian-ness}
+
+Note that for legacy interfaces, device configuration space is generally the
+guest's native endian, rather than PCI's little-endian.
+The correct endian-ness is documented for each device.
+
+\subsection{Legacy Interface: Device Configuration Space}\label{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: Device Configuration Space}
+
+Legacy devices did not have a configuration generation field, thus are
+susceptible to race conditions if configuration is updated. This
+affects the block \field{capacity} (see \ref{sec:Device Types /
+Block Device / Device configuration layout}) and
+network \field{mac} (see \ref{sec:Device Types / Network Device /
+Device configuration layout}) fields;
+when using the legacy interface, drivers SHOULD
+read these fields multiple times until two reads generate a consistent
+result.
+
+\section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
+
+The mechanism for bulk data transport on virtio devices is
+pretentiously called a virtqueue. Each device can have zero or more
+virtqueues\footnote{For example, the simplest network device has one virtqueue for
+transmit and one for receive.}. Each queue has a 16-bit queue size
+parameter, which sets the number of entries and implies the total size
+of the queue.
+
+Each virtqueue consists of three parts:
+
+\begin{itemize}
+\item Descriptor Table
+\item Available Ring
+\item Used Ring
+\end{itemize}
+
+where each part is physically-contiguous in guest memory,
+and has different alignment requirements.
+
+The memory aligment and size requirements, in bytes, of each part of the
+virtqueue are summarized in the following table:
+
+\begin{tabular}{|l|l|l|}
+\hline
+Virtqueue Part & Alignment & Size \\
+\hline \hline
+Descriptor Table & 16 & $16 * $(Queue Size) \\
+\hline
+Available Ring & 2 & $6 + 2 * $(Queue Size) \\
+ \hline
+Used Ring & 4 & $6 + 8 * $(Queue Size) \\
+ \hline
+\end{tabular}
+
+The Alignment column gives the minimum alignment for each part
+of the virtqueue.
+
+The Size column gives the total number of bytes for each
+part of the virtqueue.
+
+Queue Size corresponds to the maximum number of buffers in the
+virtqueue\footnote{For example, if Queue Size is 4 then at most 4 buffers
+can be queued at any given time.}. Queue Size value is always a
+power of 2. The maximum Queue Size value is 32768. This value
+is specified in a bus-specific way.
+
+When the driver wants to send a buffer to the device, it fills in
+a slot in the descriptor table (or chains several together), and
+writes the descriptor index into the available ring. It then
+notifies the device. When the device has finished a buffer, it
+writes the descriptor index into the used ring, and sends an interrupt.
+
+\drivernormative{\subsection}{Virtqueues}{Basic Facilities of a Virtio Device / Virtqueues}
+The driver MUST ensure that the physical address of the first byte
+of each virtqueue part is a multiple of the specified alignment value
+in the above table.
+
+\subsection{Legacy Interfaces: A Note on Virtqueue Layout}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}
+
+For Legacy Interfaces, several additional
+restrictions are placed on the virtqueue layout:
+
+Each virtqueue occupies two or more physically-contiguous pages
+(usually defined as 4096 bytes, but depending on the transport;
+henceforth referred to as Queue Align)
+and consists of three parts:
+
+\begin{tabular}{|l|l|l|}
+\hline
+Descriptor Table & Available Ring (\ldots padding\ldots) & Used Ring \\
+\hline
+\end{tabular}
+
+The bus-specific Queue Size field controls the total number of bytes
+for the virtqueue.
+When using the legacy interface, the transitional
+driver MUST retrieve the Queue Size field from the device
+and MUST allocate the total number of bytes for the virtqueue
+according to the following formula (Queue Align given in qalign and
+Queue Size given in qsz):
+
+\begin{lstlisting}
+#define ALIGN(x) (((x) + qalign) & ~qalign)
+static inline unsigned virtq_size(unsigned int qsz)
+{
+ return ALIGN(sizeof(struct virtq_desc)*qsz + sizeof(u16)*(3 + qsz))
+ + ALIGN(sizeof(u16)*3 + sizeof(struct virtq_used_elem)*qsz);
+}
+\end{lstlisting}
+
+This wastes some space with padding.
+When using the legacy interface, both transitional
+devices and drivers MUST use the following virtqueue layout
+structure to locate elements of the virtqueue:
+
+\begin{lstlisting}
+struct virtq {
+ // The actual descriptors (16 bytes each)
+ struct virtq_desc desc[ Queue Size ];
+
+ // A ring of available descriptor heads with free-running index.
+ struct virtq_avail avail;
+
+ // Padding to the next Queue Align boundary.
+ u8 pad[ Padding ];
+
+ // A ring of used descriptor heads with free-running index.
+ struct virtq_used used;
+};
+\end{lstlisting}
+
+\subsection{Legacy Interfaces: A Note on Virtqueue Endianness}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Endianness}
+
+Note that when using the legacy interface, transitional
+devices and drivers MUST use the native
+endian of the guest as the endian of fields and in the virtqueue.
+This is opposed to little-endian for non-legacy interface as
+specified by this standard.
+It is assumed that the host is already aware of the guest endian.
+
+\subsection{Message Framing}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing}
+The framing of messages with descriptors is
+independent of the contents of the buffers. For example, a network
+transmit buffer consists of a 12 byte header followed by the network
+packet. This could be most simply placed in the descriptor table as a
+12 byte output descriptor followed by a 1514 byte output descriptor,
+but it could also consist of a single 1526 byte output descriptor in
+the case where the header and packet are adjacent, or even three or
+more descriptors (possibly with loss of efficiency in that case).
+
+Note that, some device implementations have large-but-reasonable
+restrictions on total descriptor size (such as based on IOV_MAX in the
+host OS). This has not been a problem in practice: little sympathy
+will be given to drivers which create unreasonably-sized descriptors
+such as by dividing a network packet into 1500 single-byte
+descriptors!
+
+\devicenormative{\subsubsection}{Message Framing}{Basic Facilities of a Virtio Device / Message Framing}
+The device MUST NOT make assumptions about the particular arrangement
+of descriptors. The device MAY have a reasonable limit of descriptors
+it will allow in a chain.
+
+\drivernormative{\subsubsection}{Message Framing}{Basic Facilities of a Virtio Device / Message Framing}
+The driver MUST place any device-writable descriptor elements after
+any device-readable descriptor elements.
+
+The driver SHOULD NOT use an excessive number of descriptors to
+describe a buffer.
+
+\subsubsection{Legacy Interface: Message Framing}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing / Legacy Interface: Message Framing}
+
+Regrettably, initial driver implementations used simple layouts, and
+devices came to rely on it, despite this specification wording. In
+addition, the specification for virtio_blk SCSI commands required
+intuiting field lengths from frame boundaries (see
+ \ref{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}~\nameref{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation})
+
+Thus when using the legacy interface, the VIRTIO_F_ANY_LAYOUT
+feature indicates to both the device and the driver that no
+assumptions were made about framing. Requirements for
+transitional drivers when this is not negotiated are included in
+each device section.
+
+\subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}
+
+The descriptor table refers to the buffers the driver is using for
+the device. \field{addr} is a physical address, and the buffers
+can be chained via \field{next}. Each descriptor describes a
+buffer which is read-only for the device (``device-readable'') or write-only for the device (``device-writable''), but a chain of
+descriptors can contain both device-readable and device-writable buffers.
+
+The actual contents of the memory offered to the device depends on the
+device type. Most common is to begin the data with a header
+(containing little-endian fields) for the device to read, and postfix
+it with a status tailer for the device to write.
+
+\begin{lstlisting}
+struct virtq_desc {
+ /* Address (guest-physical). */
+ le64 addr;
+ /* Length. */
+ le32 len;
+
+/* This marks a buffer as continuing via the next field. */
+#define VIRTQ_DESC_F_NEXT 1
+/* This marks a buffer as device write-only (otherwise device read-only). */
+#define VIRTQ_DESC_F_WRITE 2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VIRTQ_DESC_F_INDIRECT 4
+ /* The flags as indicated above. */
+ le16 flags;
+ /* Next field if flags & NEXT */
+ le16 next;
+};
+\end{lstlisting}
+
+The number of descriptors in the table is defined by the queue size
+for this virtqueue: this is the maximum possible descriptor chain length.
+
+\begin{note}
+The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
+referred to this structure as vring_desc, and the constants as
+VRING_DESC_F_NEXT, etc, but the layout and values were identical.
+\end{note}
+
+\devicenormative{\subsubsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}
+A device MUST NOT write to a device-readable buffer, and a device SHOULD NOT
+read a device-writable buffer (it MAY do so for debugging or diagnostic
+purposes).
+
+\drivernormative{\subsubsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}
+Drivers MUST NOT add a descriptor chain over than $2^{32}$ bytes long in total;
+this implies that loops in the descriptor chain are forbidden!
+
+\subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
+
+Some devices benefit by concurrently dispatching a large number
+of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
+ring capacity the driver can store a table of indirect
+descriptors anywhere in memory, and insert a descriptor in main
+virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer
+containing this indirect descriptor table; \field{addr} and \field{len}
+refer to the indirect table address and length in bytes,
+respectively.
+
+The indirect table layout structure looks like this
+(\field{len} is the length of the descriptor that refers to this table,
+which is a variable, so this code won't compile):
+
+\begin{lstlisting}
+struct indirect_descriptor_table {
+ /* The actual descriptors (16 bytes each) */
+ struct virtq_desc desc[len / 16];
+};
+\end{lstlisting}
+
+The first indirect descriptor is located at start of the indirect
+descriptor table (index 0), additional indirect descriptors are
+chained by \field{next}. An indirect descriptor without a valid \field{next}
+(with \field{flags}\&VIRTQ_DESC_F_NEXT off) signals the end of the descriptor.
+A single indirect descriptor
+table can include both device-readable and device-writable descriptors.
+
+\drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
+The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
+VIRTIO_F_INDIRECT_DESC feature was negotiated. The driver MUST NOT
+set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only
+one table per descriptor).
+
+A driver MUST NOT create a descriptor chain longer than the Queue Size of
+the device.
+
+A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
+in \field{flags}.
+
+\devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
+The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table.
+
+The device MUST handle the case of zero or more normal chained
+descriptors followed by a single descriptor with \field{flags}\&VIRTQ_DESC_F_INDIRECT.
+
+\begin{note}
+While unusual (most implementations either create a chain solely using
+non-indirect descriptors, or use a single indirect element), such a
+layout is valid.
+\end{note}
+
+\subsection{The Virtqueue Available Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}
+
+\begin{lstlisting}
+struct virtq_avail {
+#define VIRTQ_AVAIL_F_NO_INTERRUPT 1
+ le16 flags;
+ le16 idx;
+ le16 ring[ /* Queue Size */ ];
+ le16 used_event; /* Only if VIRTIO_F_EVENT_IDX */
+};
+\end{lstlisting}
+
+The driver uses the available ring to offer buffers to the
+device: each ring entry refers to the head of a descriptor chain. It is only
+written by the driver and read by the device.
+
+\field{idx} field indicates where the driver would put the next descriptor
+entry in the ring (modulo the queue size). This starts at 0, and increases.
+
+\begin{note}
+The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
+referred to this structure as vring_avail, and the constant as
+VRING_AVAIL_F_NO_INTERRUPT, but the layout and value were identical.
+\end{note}
+
+\subsection{Virtqueue Interrupt Suppression}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}
+
+If the VIRTIO_F_EVENT_IDX feature bit is not negotiated,
+the \field{flags} field in the available ring offers a crude mechanism for the driver to inform
+the device that it doesn't want interrupts when buffers are used. Otherwise
+\field{used_event} is a more performant alternative where the driver
+specifies how far the device can progress before interrupting.
+
+Neither of these interrupt suppression methods are reliable, as they
+are not synchronized with the device, but they serve as
+useful optimizations.
+
+\drivernormative{\subsubsection}{Virtqueue Interrupt Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}
+If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
+\begin{itemize}
+\item The driver MUST set \field{flags} to 0 or 1.
+\item The driver MAY set \field{flags} to 1 to advise
+the device that interrupts are not needed.
+\end{itemize}
+
+Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
+\begin{itemize}
+\item The driver MUST set \field{flags} to 0.
+\item The driver MAY use \field{used_event} to advise the device that interrupts are unnecessary until the device writes entry with an index specified by \field{used_event} into the used ring (equivalently, until \field{idx} in the
+used ring will reach the value \field{used_event} + 1).
+\end{itemize}
+
+The driver MUST handle spurious interrupts from the device.
+
+\devicenormative{\subsubsection}{Virtqueue Interrupt Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}
+
+If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
+\begin{itemize}
+\item The device MUST ignore the \field{used_event} value.
+\item After the device writes a descriptor index into the used ring:
+ \begin{itemize}
+ \item If \field{flags} is 1, the device SHOULD NOT send an interrupt.
+ \item If \field{flags} is 0, the device MUST send an interrupt.
+ \end{itemize}
+\end{itemize}
+
+Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
+\begin{itemize}
+\item The device MUST ignore the lower bit of \field{flags}.
+\item After the device writes a descriptor index into the used ring:
+ \begin{itemize}
+ \item If the \field{idx} field in the used ring (which determined
+ where that descriptor index was placed) was equal to
+ \field{used_event}, the device MUST send an interrupt.
+ \item Otherwise the device SHOULD NOT send an interrupt.
+ \end{itemize}
+\end{itemize}
+
+\begin{note}
+For example, if \field{used_event} is 0, then a device using
+ VIRTIO_F_EVENT_IDX would interrupt after the first buffer is
+ used (and again after the 65536th buffer, etc).
+\end{note}
+
+\subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring}
+
+\begin{lstlisting}
+struct virtq_used {
+#define VIRTQ_USED_F_NO_NOTIFY 1
+ le16 flags;
+ le16 idx;
+ struct virtq_used_elem ring[ /* Queue Size */];
+ le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
+};
+
+/* le32 is used here for ids for padding reasons. */
+struct virtq_used_elem {
+ /* Index of start of used descriptor chain. */
+ le32 id;
+ /* Total length of the descriptor chain which was used (written to) */
+ le32 len;
+};
+\end{lstlisting}
+
+The used ring is where the device returns buffers once it is done with
+them: it is only written to by the device, and read by the driver.
+
+Each entry in the ring is a pair: \field{id} indicates the head entry of the
+descriptor chain describing the buffer (this matches an entry
+placed in the available ring by the guest earlier), and \field{len} the total
+of bytes written into the buffer.
+
+\begin{note}
+\field{len} is particularly useful
+for drivers using untrusted buffers: if a driver does not know exactly
+how much has been written by the device, the driver would have to zero
+the buffer in advance to ensure no data leakage occurs.
+
+For example, a network driver may hand a received buffer directly to
+an unprivileged userspace application. If the network device has not
+overwritten the bytes which were in that buffer, this could leak the
+contents of freed memory from other processes to the application.
+\end{note}
+
+\field{idx} field indicates where the driver would put the next descriptor
+entry in the ring (modulo the queue size). This starts at 0, and increases.
+
+\begin{note}
+The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
+referred to these structures as vring_used and vring_used_elem, and
+the constant as VRING_USED_F_NO_NOTIFY, but the layout and value were
+identical.
+\end{note}
+
+\subsubsection{Legacy Interface: The Virtqueue Used
+Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues
+/ The Virtqueue Used Ring/ Legacy Interface: The Virtqueue Used
+Ring}
+
+Historically, many drivers ignored the \field{len} value, as a
+result, many devices set \field{len} incorrectly. Thus, when
+using the legacy interface, it is generally a good idea to ignore
+the \field{len} value in used ring entries if possible. Specific
+known issues are listed per device type.
+
+\devicenormative{\subsubsection}{The Virtqueue Used Ring}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring}
+
+The device MUST set \field{len} prior to updating the used \field{idx}.
+
+The device MUST write at least \field{len} bytes to descriptor,
+beginning at the first device-writable buffer,
+prior to updating the used \field{idx}.
+
+The device MAY write more than \field{len} bytes to descriptor.
+
+\begin{note}
+There are potential error cases where a device might not know what
+parts of the buffers have been written. This is why \field{len} is
+permitted to be an underestimate: that's preferable to the driver believing
+that uninitialized memory has been overwritten when it has not.
+\end{note}
+
+\drivernormative{\subsubsection}{The Virtqueue Used Ring}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring}
+
+The driver MUST NOT make assumptions about data in device-writable buffers
+beyond the first \field{len} bytes, and SHOULD ignore this data.
+
+\subsection{Virtqueue Notification Suppression}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression}
+
+The device can suppress notifications in a manner analogous to the way
+drivers can suppress interrupts as detailed in section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}.
+The device manipulates \field{flags} or \field{avail_event} in the used ring the
+same way the driver manipulates \field{flags} or \field{used_event} in the available ring.
+
+\drivernormative{\subsubsection}{Virtqueue Notification Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression}
+
+The driver MUST initialize \field{flags} in the used ring to 0 when
+allocating the used ring.
+
+If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
+\begin{itemize}
+\item The driver MUST ignore the \field{avail_event} value.
+\item After the driver writes a descriptor index into the available ring:
+ \begin{itemize}
+ \item If \field{flags} is 1, the driver SHOULD NOT send a notification.
+ \item If \field{flags} is 0, the driver MUST send a notification.
+ \end{itemize}
+\end{itemize}
+
+Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
+\begin{itemize}
+\item The driver MUST ignore the lower bit of \field{flags}.
+\item After the driver writes a descriptor index into the available ring:
+ \begin{itemize}
+ \item If the \field{idx} field in the available ring (which determined
+ where that descriptor index was placed) was equal to
+ \field{avail_event}, the driver MUST send a notification.
+ \item Otherwise the driver SHOULD NOT send a notification.
+ \end{itemize}
+\end{itemize}
+
+\devicenormative{\subsubsection}{Virtqueue Notification Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression}
+If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
+\begin{itemize}
+\item The device MUST set \field{flags} to 0 or 1.
+\item The device MAY set \field{flags} to 1 to advise
+the driver that notifications are not needed.
+\end{itemize}
+
+Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
+\begin{itemize}
+\item The device MUST set \field{flags} to 0.
+\item The device MAY use \field{avail_event} to advise the driver that notifications are unnecessary until the driver writes entry with an index specified by \field{avail_event} into the available ring (equivalently, until \field{idx} in the
+available ring will reach the value \field{avail_event} + 1).
+\end{itemize}
+
+The device MUST handle spurious notifications from the driver.
+
+\subsection{Helpers for Operating Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Helpers for Operating Virtqueues}
+
+The Linux Kernel Source code contains the definitions above and
+helper routines in a more usable form, in
+include/uapi/linux/virtio_ring.h. This was explicitly licensed by IBM
+and Red Hat under the (3-clause) BSD license so that it can be
+freely used by all other projects, and is reproduced (with slight
+variation) in \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}.
+
+\chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
+
+We start with an overview of device initialization, then expand on the
+details of the device and how each step is preformed. This section
+is best read along with the bus-specific section which describes
+how to communicate with the specific device.
+
+\section{Device Initialization}\label{sec:General Initialization And Device Operation / Device Initialization}
+
+\drivernormative{\subsection}{Device Initialization}{General Initialization And Device Operation / Device Initialization}
+The driver MUST follow this sequence to initialize a device:
+
+\begin{enumerate}
+\item Reset the device.
+
+\item Set the ACKNOWLEDGE status bit: the guest OS has notice the device.
+
+\item Set the DRIVER status bit: the guest OS knows how to drive the device.
+
+\item\label{itm:General Initialization And Device Operation /
+Device Initialization / Read feature bits} Read device feature bits, and write the subset of feature bits
+ understood by the OS and driver to the device. During this step the
+ driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it.
+
+\item\label{itm:General Initialization And Device Operation / Device Initialization / Set FEATURES-OK} Set the FEATURES_OK status bit. The driver MUST NOT accept
+ new feature bits after this step.
+
+\item\label{itm:General Initialization And Device Operation / Device Initialization / Re-read FEATURES-OK} Re-read \field{device status} to ensure the FEATURES_OK bit is still
+ set: otherwise, the device does not support our subset of features
+ and the device is unusable.
+
+\item\label{itm:General Initialization And Device Operation / Device Initialization / Device-specific Setup} Perform device-specific setup, including discovery of virtqueues for the
+ device, optional per-bus setup, reading and possibly writing the
+ device's virtio configuration space, and population of virtqueues.
+
+\item\label{itm:General Initialization And Device Operation / Device Initialization / Set DRIVER-OK} Set the DRIVER_OK status bit. At this point the device is
+ ``live''.
+\end{enumerate}
+
+If any of these steps go irrecoverably wrong, the driver SHOULD
+set the FAILED status bit to indicate that it has given up on the
+device (it can reset the device later to restart if desired). The
+driver MUST NOT continue initialization in that case.
+
+The driver MUST NOT notify the device before setting DRIVER_OK.
+
+\subsection{Legacy Interface: Device Initialization}\label{sec:General Initialization And Device Operation / Device Initialization / Legacy Interface: Device Initialization}
+Legacy devices did not support the FEATURES_OK status bit, and thus did
+not have a graceful way for the device to indicate unsupported feature
+combinations. They also did not provide a clear mechanism to end
+feature negotiation, which meant that devices finalized features on
+first-use, and no features could be introduced which radically changed
+the initial operation of the device.
+
+Legacy driver implementations often used the device before setting the
+DRIVER_OK bit, and sometimes even before writing the feature bits
+to the device.
+
+The result was the steps \ref{itm:General Initialization And
+Device Operation / Device Initialization / Set FEATURES-OK} and
+\ref{itm:General Initialization And Device Operation / Device
+Initialization / Re-read FEATURES-OK} were omitted, and steps
+\ref{itm:General Initialization And Device Operation /
+Device Initialization / Read feature bits},
+\ref{itm:General Initialization And Device Operation / Device Initialization / Device-specific Setup} and \ref{itm:General Initialization And Device Operation / Device Initialization / Set DRIVER-OK}
+were conflated.
+
+Therefore, when using the legacy interface:
+\begin{itemize}
+\item
+The transitional driver MUST execute the initialization
+sequence as described in \ref{sec:General Initialization And Device
+Operation / Device Initialization}
+but omitting the steps \ref{itm:General Initialization And Device
+Operation / Device Initialization / Set FEATURES-OK} and
+\ref{itm:General Initialization And Device Operation / Device
+Initialization / Re-read FEATURES-OK}.
+
+\item
+The transitional device MUST support the driver
+writing device configuration fields
+before the step \ref{itm:General Initialization And Device Operation /
+Device Initialization / Read feature bits}.
+\item
+The transitional device MUST support the driver
+using the device before the step \ref{itm:General Initialization
+And Device Operation / Device Initialization / Set DRIVER-OK}.
+\end{itemize}
+
+\section{Device Operation}\label{sec:General Initialization And Device Operation / Device Operation}
+
+There are two parts to device operation: supplying new buffers to
+the device, and processing used buffers from the device.
+
+\begin{note} As an
+example, the simplest virtio network device has two virtqueues: the
+transmit virtqueue and the receive virtqueue. The driver adds
+outgoing (device-readable) packets to the transmit virtqueue, and then
+frees them after they are used. Similarly, incoming (device-writable)
+buffers are added to the receive virtqueue, and processed after
+they are used.
+\end{note}
+
+\subsection{Supplying Buffers to The Device}\label{sec:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device}
+
+The driver offers buffers to one of the device's virtqueues as follows:
+
+\begin{enumerate}
+\item\label{itm:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Place Buffers} The driver places the buffer into free descriptor(s) in the
+ descriptor table, chaining as necessary (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}).
+
+\item\label{itm:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Place Index} The driver places the index of the head of the descriptor chain
+ into the next ring entry of the available ring.
+
+\item Steps \ref{itm:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Place Buffers} and \ref{itm:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Place Index} MAY be performed repeatedly if batching
+ is possible.
+
+\item The driver performs suitable a memory barrier to ensure the device sees
+ the updated descriptor table and available ring before the next
+ step.
+
+\item The available \field{idx} is increased by the number of
+ descriptor chain heads added to the available ring.
+
+\item The driver performs a suitable memory barrier to ensure that it updates
+ the \field{idx} field before checking for notification suppression.
+
+\item If notifications are not suppressed, the driver notifies the device
+ of the new available buffers.
+\end{enumerate}
+
+Note that the above code does not take precautions against the
+available ring buffer wrapping around: this is not possible since
+the ring buffer is the same size as the descriptor table, so step
+(1) will prevent such a condition.
+
+In addition, the maximum queue size is 32768 (the highest power
+of 2 which fits in 16 bits), so the 16-bit \field{idx} value can always
+distinguish between a full and empty buffer.
+
+What follows is the requirements of each stage in more detail.
+
+\subsubsection{Placing Buffers Into The Descriptor Table}\label{sec:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Placing Buffers Into The Descriptor Table}
+
+A buffer consists of zero or more device-readable physically-contiguous
+elements followed by zero or more physically-contiguous
+device-writable elements (each has at least one element). This
+algorithm maps it into the descriptor table to form a descriptor
+chain:
+
+for each buffer element, b:
+
+\begin{enumerate}
+\item Get the next free descriptor table entry, d
+\item Set \field{d.addr} to the physical address of the start of b
+\item Set \field{d.len} to the length of b.
+\item If b is device-writable, set \field{d.flags} to VIRTQ_DESC_F_WRITE,
+ otherwise 0.
+\item If there is a buffer element after this:
+ \begin{enumerate}
+ \item Set \field{d.next} to the index of the next free descriptor
+ element.
+ \item Set the VIRTQ_DESC_F_NEXT bit in \field{d.flags}.
+ \end{enumerate}
+\end{enumerate}
+
+In practice, \field{d.next} is usually used to chain free
+descriptors, and a separate count kept to check there are enough
+free descriptors before beginning the mappings.
+
+\subsubsection{Updating The Available Ring}\label{sec:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Updating The Available Ring}
+
+The descriptor chain head is the first d in the algorithm
+above, ie. the index of the descriptor table entry referring to the first
+part of the buffer. A naive driver implementation MAY do the following (with the
+appropriate conversion to-and-from little-endian assumed):
+
+\begin{lstlisting}
+avail->ring[avail->idx % qsz] = head;
+\end{lstlisting}
+
+However, in general the driver MAY add many descriptor chains before it updates
+\field{idx} (at which point they become visible to the
+device), so it is common to keep a counter of how many the driver has added:
+
+\begin{lstlisting}
+avail->ring[(avail->idx + added++) % qsz] = head;
+\end{lstlisting}
+
+\subsubsection{Updating \field{idx}}\label{sec:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Updating idx}
+
+\field{idx} always increments, and wraps naturally at
+65536:
+
+\begin{lstlisting}
+avail->idx += added;
+\end{lstlisting}
+
+Once available \field{idx} is updated by the driver, this exposes the
+descriptor and its contents. The device MAY
+access the descriptor chains the driver created and the
+memory they refer to immediately.
+
+\drivernormative{\paragraph}{Updating idx}{General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Updating idx}
+The driver MUST perform a suitable memory barrier before the \field{idx} update, to ensure the
+device sees the most up-to-date copy.
+
+\subsubsection{Notifying The Device}\label{sec:General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Notifying The Device}
+
+The actual method of device notification is bus-specific, but generally
+it can be expensive. So the device MAY suppress such notifications if it
+doesn't need them, as detailed in section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression}.
+
+The driver has to be careful to expose the new \field{idx}
+value before checking if notifications are suppressed.
+
+\drivernormative{\paragraph}{Notifying The Device}{General Initialization And Device Operation / Device Operation / Supplying Buffers to The Device / Notifying The Device}
+The driver MUST perform a suitable memory barrier before reading \field{flags} or
+\field{avail_event}, to avoid missing a notification.
+
+\subsection{Receiving Used Buffers From The Device}\label{sec:General Initialization And Device Operation / Device Operation / Receiving Used Buffers From The Device}
+
+Once the device has used buffers referred to by a descriptor (read from or written to them, or
+parts of both, depending on the nature of the virtqueue and the
+device), it interrupts the driver as detailed in section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}.
+
+\begin{note}
+For optimal performance, a driver MAY disable interrupts while processing
+the used ring, but beware the problem of missing interrupts between
+emptying the ring and reenabling interrupts. This is usually handled by
+re-checking for more used buffers after interrups are re-enabled:
+
+\begin{lstlisting}
+virtq_disable_interrupts(vq);
+
+for (;;) {
+ if (vq->last_seen_used != le16_to_cpu(virtq->used.idx)) {
+ virtq_enable_interrupts(vq);
+ mb();
+
+ if (vq->last_seen_used != le16_to_cpu(virtq->used.idx))
+ break;
+
+ virtq_disable_interrupts(vq);
+ }
+
+ struct virtq_used_elem *e = virtq.used->ring[vq->last_seen_used%vsz];
+ process_buffer(e);
+ vq->last_seen_used++;
+}
+\end{lstlisting}
+\end{note}
+
+\subsection{Notification of Device Configuration Changes}\label{sec:General Initialization And Device Operation / Device Operation / Notification of Device Configuration Changes}
+
+For devices where the device-specific configuration information can be changed, an
+interrupt is delivered when a device-specific configuration change occurs.
+
+In addition, this interrupt is triggered by the device setting
+DEVICE_NEEDS_RESET (see \ref{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}).
+
+\section{Device Cleanup}\label{sec:General Initialization And Device Operation / Device Cleanup}
+
+Once the driver has set the DRIVER_OK status bit, all the configured
+virtqueue of the device are considered live. None of the virtqueues
+of a device are live once the device has been reset.
+
+\drivernormative{\subsection}{Device Cleanup}{General Initialization And Device Operation / Device Cleanup}
+
+A driver MUST NOT alter descriptor table entries which have been
+exposed in the available ring (and not marked consumed by the device
+in the used ring) of a live virtqueue.
+
+A driver MUST NOT decrement the available \field{idx} on a live virtqueue (ie.
+there is no way to ``unexpose'' buffers).
+
+Thus a driver MUST ensure a virtqueue isn't live (by device reset) before removing exposed buffers.
+
+\chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
+
+Virtio can use various different buses, thus the standard is split
+into virtio general and bus-specific sections.
+
+\section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over PCI Bus}
+
+Virtio devices are commonly implemented as PCI devices.
+
+A Virtio device can be implemented as any kind of PCI device:
+a Conventional PCI device or a PCI Express
+device. To assure designs meet the latest level
+requirements, see
+the PCI-SIG home page at \url{http://www.pcisig.com} for any
+approved changes.
+
+\devicenormative{\subsection}{Virtio Over PCI Bus}{Virtio Transport Options / Virtio Over PCI Bus}
+A Virtio device using Virtio Over PCI Bus MUST expose to
+guest an interface that meets the specification requirements of
+the appropriate PCI specification: \hyperref[intro:PCI]{[PCI]}
+and \hyperref[intro:PCIe]{[PCIe]}
+respectively.
+
+\subsection{PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
+
+Any PCI device with PCI Vendor ID 0x1AF4, and PCI Device ID 0x1000 through
+0x107F inclusive is a virtio device. The actual value within this range
+indicates which virtio device is supported by the device.
+The PCI Device ID is calculated by adding 0x1040 to the Virtio Device ID,
+as indicated in section \ref{sec:Device Types}.
+Additionally, devices MAY utilize a Transitional PCI Device ID range,
+0x1000 to 0x103F depending on the device type.
+
+\devicenormative{\subsubsection}{PCI Device Discovery}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
+
+Devices MUST have the PCI Vendor ID 0x1AF4.
+Devices MUST either have the PCI Device ID calculated by adding 0x1040
+to the Virtio Device ID, as indicated in section \ref{sec:Device
+Types} or have the Transitional PCI Device ID depending on the device type,
+as follows:
+
+\begin{tabular}{|l|c|}
+\hline
+Transitional PCI Device ID & Virtio Device \\
+\hline \hline
+0x1000 & network card \\
+\hline
+0x1001 & block device \\
+\hline
+0x1002 & memory ballooning (traditional) \\
+\hline
+0x1003 & console \\
+\hline
+0x1004 & SCSI host \\
+\hline
+0x1005 & entropy source \\
+\hline
+0x1009 & 9P transport \\
+\hline
+\end{tabular}
+
+For example, the network card device with the Virtio Device ID 1
+has the PCI Device ID 0x1041 or the Transitional PCI Device ID 0x1000.
+
+The PCI Subsystem Vendor ID and the PCI Subsystem Device ID MAY reflect
+the PCI Vendor and Device ID of the environment (for informational purposes by the driver).
+
+Non-transitional devices SHOULD have a PCI Device ID in the range
+0x1040 to 0x107f.
+Non-transitional devices SHOULD have a PCI Revision ID of 1 or higher.
+Non-transitional devices SHOULD have a PCI Subsystem Device ID of 0x40 or higher.
+
+This is to reduce the chance of a legacy driver attempting
+to drive the device.
+
+\drivernormative{\subsubsection}{PCI Device Discovery}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
+Drivers MUST match devices with the PCI Vendor ID 0x1AF4 and
+the PCI Device ID in the range 0x1040 to 0x107f,
+calculated by adding 0x1040 to the Virtio Device ID,
+as indicated in section \ref{sec:Device Types}.
+Drivers for device types listed in section \ref{sec:Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
+MUST match devices with the PCI Vendor ID 0x1AF4 and
+the Transitional PCI Device ID indicated in section
+ \ref{sec:Virtio
+Transport Options / Virtio Over PCI Bus / PCI Device Discovery}.
+
+Drivers MUST match any PCI Revision ID value.
+Drivers MAY match any PCI Subsystem Vendor ID and any
+PCI Subsystem Device ID value.
+
+\subsubsection{Legacy Interfaces: A Note on PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery / Legacy Interfaces: A Note on PCI Device Discovery}
+Transitional devices MUST have a PCI Revision ID of 0.
+Transitional devices MUST have the PCI Subsystem Device ID
+matching the Virtio Device ID, as indicated in section \ref{sec:Device Types}.
+Transitional devices MUST have the Transitional PCI Device ID in
+the range 0x1000 to 0x103f.
+
+This is to match legacy drivers.
+
+\subsection{PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
+
+The device is configured via I/O and/or memory regions (though see
+\ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+for access via the PCI configuration space), as specified by Virtio
+Structure PCI Capabilities.
+
+Fields of different sizes are present in the device
+configuration regions.
+All 64-bit, 32-bit and 16-bit fields are little-endian.
+64-bit fields are to be treated as two 32-bit fields,
+with low 32 bit part followed by the high 32 bit part.
+
+\drivernormative{\subsubsection}{PCI Device Layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
+
+For device configuration access, the driver MUST use 8-bit wide
+accesses for 8-bit wide fields, 16-bit wide and aligned accesses
+for 16-bit wide fields and 32-bit wide and aligned accesses for
+32-bit and 64-bit wide fields. For 64-bit fields, the driver MAY
+access each of the high and low 32-bit parts of the field
+independently.
+
+\devicenormative{\subsubsection}{PCI Device Layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
+
+For 64-bit device configuration fields, the device MUST allow driver
+independent access to high and low 32-bit parts of the field.
+
+\subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+The virtio device configuration layout includes several structures:
+\begin{itemize}
+\item Common configuration
+\item Notifications
+\item ISR Status
+\item Device-specific configuration (optional)
+\item PCI configuration access
+\end{itemize}
+
+Each structure can be mapped by a Base Address register (BAR) belonging to
+the function, or accessed via the special VIRTIO_PCI_CAP_PCI_CFG field in the PCI configuration space.
+
+The location of each structure is specified using a vendor-specific PCI capability located
+on the capability list in PCI configuration space of the device.
+This virtio structure capability uses little-endian format; all fields are
+read-only for the driver unless stated otherwise:
+
+\begin{lstlisting}
+struct virtio_pci_cap {
+ u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
+ u8 cap_next; /* Generic PCI field: next ptr. */
+ u8 cap_len; /* Generic PCI field: capability length */
+ u8 cfg_type; /* Identifies the structure. */
+ u8 bar; /* Where to find it. */
+ u8 padding[3]; /* Pad to full dword. */
+ le32 offset; /* Offset within bar. */
+ le32 length; /* Length of the structure, in bytes. */
+};
+\end{lstlisting}
+
+This structure can be followed by extra data, depending on
+\field{cfg_type}, as documented below.
+
+The fields are interpreted as follows:
+
+\begin{description}
+\item[\field{cap_vndr}]
+ 0x09; Identifies a vendor-specific capability.
+
+\item[\field{cap_next}]
+ Link to next capability in the capability list in the PCI configuration space.
+
+\item[\field{cap_len}]
+ Length of this capability structure, including the whole of
+ struct virtio_pci_cap, and extra data if any.
+ This length MAY include padding, or fields unused by the driver.
+
+\item[\field{cfg_type}]
+ identifies the structure, according to the following table:
+
+\begin{lstlisting}
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG 1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG 2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG 3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG 4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG 5
+\end{lstlisting}
+
+ Any other value is reserved for future use.
+
+ Each structure is detailed individually below.
+
+ The device MAY offer more than one structure of any type - this makes it
+ possible for the device to expose multiple interfaces to drivers. The order of
+ the capabilities in the capability list specifies the order of preference
+ suggested by the device.
+ \begin{note}
+ For example, on some hypervisors, notifications using IO accesses are
+ faster than memory accesses. In this case, the device would expose two
+ capabilities with \field{cfg_type} set to VIRTIO_PCI_CAP_NOTIFY_CFG:
+ the first one addressing an I/O BAR, the second one addressing a memory BAR.
+ In this example, the driver would use the I/O BAR if I/O resources are available, and fall back on
+ memory BAR when I/O resources are unavailable.
+ \end{note}
+
+\item[\field{bar}]
+ values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
+ the function located beginning at 10h in PCI Configuration Space
+ and used to map the structure into Memory or I/O Space.
+ The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
+ or I/O Space.
+
+ Any other value is reserved for future use.
+
+\item[\field{offset}]
+ indicates where the structure begins relative to the base address associated
+ with the BAR. The alignment requirements of \field{offset} are indicated
+ in each structure-specific section below.
+
+\item[\field{length}]
+ indicates the length of the structure.
+
+ \field{length} MAY include padding, or fields unused by the driver, or
+ future extensions.
+
+ \begin{note}
+ For example, a future device might present a large structure size of several
+ MBytes.
+ As current devices never utilize structures larger than 4KBytes in size,
+ driver MAY limit the mapped structure size to e.g.
+ 4KBytes (thus ignoring parts of structure after the first
+ 4KBytes) to allow forward compatibility with such devices without loss of
+ functionality and without wasting resources.
+ \end{note}
+\end{description}
+
+\drivernormative{\subsubsection}{Virtio Structure PCI Capabilities}{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+The driver MUST ignore any vendor-specific capability structure which has
+a reserved \field{cfg_type} value.
+
+The driver SHOULD use the first instance of each virtio structure type they can
+support.
+
+The driver MUST accept a \field{cap_len} value which is larger than specified here.
+
+The driver MUST ignore any vendor-specific capability structure which has
+a reserved \field{bar} value.
+
+ The drivers SHOULD only map part of configuration structure
+ large enough for device operation. The drivers MUST handle
+ an unexpectedly large \field{length}, but MAY check that \field{length}
+ is large enough for device operation.
+
+The driver MUST NOT write into any field of the capability structure,
+with the exception of those with \field{cap_type} VIRTIO_PCI_CAP_PCI_CFG as
+detailed in \ref{drivernormative:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}.
+
+\devicenormative{\subsubsection}{Virtio Structure PCI Capabilities}{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+The device MUST include any extra data (from the beginning of the \field{cap_vndr} field
+through end of the extra data fields if any) in \field{cap_len}.
+The device MAY append extra data
+or padding to any structure beyond that.
+
+If the device presents multiple structures of the same type, it SHOULD order
+them from optimal (first) to least-optimal (last).
+
+\subsubsection{Common configuration structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
+
+The common configuration structure is found at the \field{bar} and \field{offset} within the VIRTIO_PCI_CAP_COMMON_CFG capability; its layout is below.
+
+\begin{lstlisting}
+struct virtio_pci_common_cfg {
+ /* About the whole device. */
+ le32 device_feature_select; /* read-write */
+ le32 device_feature; /* read-only for driver */
+ le32 driver_feature_select; /* read-write */
+ le32 driver_feature; /* read-write */
+ le16 msix_config; /* read-write */
+ le16 num_queues; /* read-only for driver */
+ u8 device_status; /* read-write */
+ u8 config_generation; /* read-only for driver */
+
+ /* About a specific virtqueue. */
+ le16 queue_select; /* read-write */
+ le16 queue_size; /* read-write, power of 2, or 0. */
+ le16 queue_msix_vector; /* read-write */
+ le16 queue_enable; /* read-write */
+ le16 queue_notify_off; /* read-only for driver */
+ le64 queue_desc; /* read-write */
+ le64 queue_avail; /* read-write */
+ le64 queue_used; /* read-write */
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{device_feature_select}]
+ The driver uses this to select which feature bits \field{device_feature} shows.
+ Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc.
+
+\item[\field{device_feature}]
+ The device uses this to report which feature bits it is
+ offering to the driver: the driver writes to
+ \field{device_feature_select} to select which feature bits are presented.
+
+\item[\field{driver_feature_select}]
+ The driver uses this to select which feature bits \field{driver_feature} shows.
+ Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc.
+
+\item[\field{driver_feature}]
+ The driver writes this to accept feature bits offered by the device.
+ Driver Feature Bits selected by \field{driver_feature_select}.
+
+\item[\field{config_msix_vector}]
+ The driver sets the Configuration Vector for MSI-X.
+
+\item[\field{num_queues}]
+ The device specifies the maximum number of virtqueues supported here.
+
+\item[\field{device_status}]
+ The driver writes the device status here (see \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}). Writing 0 into this
+ field resets the device.
+
+\item[\field{config_generation}]
+ Configuration atomicity value. The device changes this every time the
+ configuration noticeably changes.
+
+\item[\field{queue_select}]
+ Queue Select. The driver selects which virtqueue the following
+ fields refer to.
+
+\item[\field{queue_size}]
+ Queue Size. On reset, specifies the maximum queue size supported by
+ the hypervisor. This can be modified by driver to reduce memory requirements.
+ A 0 means the queue is unavailable.
+
+\item[\field{queue_msix_vector}]
+ The driver uses this to specify the queue vector for MSI-X.
+
+\item[\field{queue_enable}]
+ The driver uses this to selectively prevent the device from executing requests from this virtqueue.
+ 1 - enabled; 0 - disabled.
+
+\item[\field{queue_notify_off}]
+ The driver reads this to calculate the offset from start of Notification structure at
+ which this virtqueue is located.
+ \begin{note} this is \em{not} an offset in bytes.
+ See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} below.
+ \end{note}
+
+\item[\field{queue_desc}]
+ The driver writes the physical address of Descriptor Table here. See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+
+\item[\field{queue_avail}]
+ The driver writes the physical address of Available Ring here. See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+
+\item[\field{queue_used}]
+ The driver writes the physical address of Used Ring here. See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+\end{description}
+
+\devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
+\field{offset} MUST be 4-byte aligned.
+
+The device MUST present at least one common configuration capability.
+
+The device MUST present the feature bits it is offering in \field{device_feature}, starting at bit \field{device_feature_select} $*$ 32 for any \field{device_feature_select} written by the driver.
+\begin{note}
+ This means that it will present 0 for any \field{device_feature_select} other than 0 or 1, since no feature defined here exceeds 63.
+\end{note}
+
+The device MUST present any valid feature bits the driver has written in \field{driver_feature}, starting at bit \field{driver_feature_select} $*$ 32 for any \field{driver_feature_select} written by the driver. Valid feature bits are those which are subset of the corresponding \field{device_feature} bits. The device MAY present invalid bits written by the driver.
+
+\begin{note}
+ This means that a device can ignore writes for feature bits it never
+ offers, and simply present 0 on reads. Or it can just mirror what the driver wrote
+ (but it will still have to check them when the driver sets FEATURES_OK).
+\end{note}
+
+\begin{note}
+ A driver shouldn't write invalid bits anyway, as per \ref{drivernormative:General Initialization And Device Operation / Device Initialization}, but this attempts to handle it.
+\end{note}
+
+The device MUST present a changed \field{config_generation} after the
+driver has read a device-specific configuration value which has
+changed since any part of the device-specific configuration was last
+read.
+\begin{note}
+As \field{config_generation} is an 8-bit value, simply incrementing it
+on every configuration change could violate this requirement due to wrap.
+Better would be to set an internal flag when it has changed,
+and if that flag is set when the driver reads from the device-specific
+configuration, increment \field{config_generation} and clear the flag.
+\end{note}
+
+The device MUST reset when 0 is written to \field{device_status}, and
+present a 0 in \field{device_status} once that is done.
+
+The device MUST present a 0 in \field{queue_enable} on reset.
+
+The device MUST present a 0 in \field{queue_size} if the virtqueue
+corresponding to the current \field{queue_select} is unavailable.
+
+\drivernormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
+
+The driver MUST NOT write to \field{device_feature}, \field{num_queues}, \field{config_generation} or \field{queue_notify_off}.
+
+The driver MUST NOT write a value which is not a power of 2 to \field{queue_size}.
+
+The driver MUST configure the other virtqueue fields before enabling the virtqueue
+with \field{queue_enable}.
+
+After writing 0 to \field{device_status}, the driver MUST wait for a read of
+\field{device_status} to return 0 before reinitializing the device.
+
+The driver MUST NOT write a 0 to \field{queue_enable}.
+
+\subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
+
+The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
+capability. This capability is immediately followed by an additional
+field, like so:
+
+\begin{lstlisting}
+struct virtio_pci_notify_cap {
+ struct virtio_pci_cap cap;
+ le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
+};
+\end{lstlisting}
+
+\field{notify_off_multiplier} is combined with the \field{queue_notify_off} to
+derive the Queue Notify address within a BAR for a virtqueue:
+
+\begin{lstlisting}
+ cap.offset + queue_notify_off * notify_off_multiplier
+\end{lstlisting}
+
+The \field{cap.offset} and \field{notify_off_multiplier} are taken from the
+notification capability structure above, and the \field{queue_notify_off} is
+taken from the common configuration structure.
+
+\begin{note}
+For example, if \field{notifier_off_multiplier} is 0, the device uses
+the same Queue Notify address for all queues.
+\end{note}
+
+\devicenormative{\paragraph}{Notification capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
+The device MUST present at least one notification capability.
+
+The \field{cap.offset} MUST be 2-byte aligned.
+
+The device MUST either present \field{notify_off_multiplier} as an even power of 2,
+or present \field{notify_off_multiplier} as 0.
+
+The value \field{cap.length} presented by the device MUST be at least 2
+and MUST be large enough to support queue notification offsets
+for all supported queues in all possible configurations.
+
+For all queues, the value \field{cap.length} presented by the device MUST satisfy:
+\begin{lstlisting}
+cap.length >= queue_notify_off * notify_off_multiplier + 2
+\end{lstlisting}
+
+\subsubsection{ISR status capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
+
+The VIRTIO_PCI_CAP_ISR_CFG capability
+refers to at least a single byte, which contains the 8-bit ISR status field
+to be used for INT\#x interrupt handling.
+
+The \field{offset} for the \field{ISR status} has no alignment requirements.
+
+The ISR bits allow the device to distinguish between device-specific configuration
+change interrupts and normal virtqueue interrupts:
+
+\begin{tabular}{ |l||l|l|l| }
+\hline
+Bits & 0 & 1 & 2 to 31 \\
+\hline
+Purpose & Queue Interrupt & Device Configuration Interrupt & Reserved \\
+\hline
+\end{tabular}
+
+To avoid an extra access, simply reading this register resets it to 0 and
+causes the device to de-assert the interrupt.
+
+In this way, driver read of ISR status causes the device to de-assert
+an interrupt.
+
+See sections \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device} and \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes} for how this is used.
+
+\devicenormative{\paragraph}{ISR status capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
+
+The device MUST present at least one VIRTIO_PCI_CAP_ISR_CFG capability.
+
+The device MUST set the Device Configuration Interrupt bit
+in \field{ISR status} before sending a device configuration
+change notification to the driver.
+
+If MSI-X capability is disabled, the device MUST set the Queue
+Interrupt bit in \field{ISR status} before sending a virtqueue
+notification to the driver.
+
+If MSI-X capability is disabled, the device MUST set the Interrupt Status
+bit in the PCI Status register in the PCI Configuration Header of
+the device to the logical OR of all bits in \field{ISR status} of
+the device. The device then asserts/deasserts INT\#x interrupts unless masked
+according to standard PCI rules \hyperref[intro:PCI]{[PCI]}.
+
+The device MUST reset \field{ISR status} to 0 on driver read.
+
+\drivernormative{\paragraph}{ISR status capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
+
+If MSI-X capability is enabled, the driver SHOULD NOT access
+\field{ISR status} upon detecting a Queue Interrupt.
+
+\subsubsection{Device-specific configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device-specific configuration}
+
+The device MUST present at least one VIRTIO_PCI_CAP_DEVICE_CFG capability for
+any device type which has a device-specific configuration.
+
+\devicenormative{\paragraph}{Device-specific configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device-specific configuration}
+
+The \field{offset} for the device-specific configuration MUST be 4-byte aligned.
+
+\subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+
+The VIRTIO_PCI_CAP_PCI_CFG capability
+creates an alternative (and likely suboptimal) access method to the
+common configuration, notification, ISR and device-specific configuration regions.
+
+The capability is immediately followed by an additional field like so:
+
+\begin{lstlisting}
+struct virtio_pci_cfg_cap {
+ struct virtio_pci_cap cap;
+ u8 pci_cfg_data[4]; /* Data for BAR access. */
+};
+\end{lstlisting}
+
+The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset} and
+\field{pci_cfg_data} are read-write (RW) for the driver.
+
+To access a device region, the driver writes into the capability
+structure (ie. within the PCI configuration space) as follows:
+
+\begin{itemize}
+\item The driver sets the BAR to access by writing to \field{cap.bar}.
+
+\item The driver sets the size of the access by writing 1, 2 or 4 to
+ \field{cap.length}.
+
+\item The driver sets the offset within the BAR by writing to
+ \field{cap.offset}.
+\end{itemize}
+
+At that point, \field{pci_cfg_data} will provide a window of size
+\field{cap.length} into the given \field{cap.bar} at offset \field{cap.offset}.
+
+\devicenormative{\paragraph}{PCI configuration access capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+
+The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG capability.
+
+Upon detecting driver write access
+to \field{pci_cfg_data}, the device MUST execute a write access
+at offset \field{cap.offset} at BAR selected by \field{cap.bar} using the first \field{cap.length}
+bytes from \field{pci_cfg_data}.
+
+Upon detecting driver read access
+to \field{pci_cfg_data}, the device MUST
+execute a read access of length cap.length at offset \field{cap.offset}
+at BAR selected by \field{cap.bar} and store the first \field{cap.length} bytes in
+\field{pci_cfg_data}.
+
+\drivernormative{\paragraph}{PCI configuration access capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+
+The driver MUST NOT write a \field{cap.offset} which is not
+a multiple of \field{cap.length} (ie. all accesses MUST be aligned).
+
+The driver MUST NOT read or write \field{pci_cfg_data}
+unless \field{cap.bar}, \field{cap.length} and \field{cap.offset}
+address \field{cap.length} bytes within a BAR range
+specified by some other Virtio Structure PCI Capability
+of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
+
+\subsubsection{Legacy Interfaces: A Note on PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Legacy Interfaces: A Note on PCI Device Layout}
+
+Transitional devices MUST present part of configuration
+registers in a legacy configuration structure in BAR0 in the first I/O
+region of the PCI device, as documented below.
+When using the legacy interface, transitional drivers
+MUST use the legacy configuration structure in BAR0 in the first
+I/O region of the PCI device, as documented below.
+
+When using the legacy interface the driver MAY access
+the device-specific configuration region using any width accesses, and
+a transitional device MUST present driver with the same results as
+when accessed using the ``natural'' access method (i.e.
+32-bit accesses for 32-bit fields, etc).
+
+Note that this is possible because while the virtio common configuration structure is PCI
+(i.e. little) endian, when using the legacy interface the device-specific
+configuration region is encoded in the native endian of the guest (where such distinction is
+applicable).
+
+When used through the legacy interface, the virtio common configuration structure looks as follows:
+
+\begin{tabularx}{\textwidth}{ |X||X|X|X|X|X|X|X|X| }
+\hline
+ Bits & 32 & 32 & 32 & 16 & 16 & 16 & 8 & 8 \\
+\hline
+ Read / Write & R & R+W & R+W & R & R+W & R+W & R+W & R \\
+\hline
+ Purpose & Device Features bits 0:31 & Driver Features bits 0:31 &
+ Queue Address & \field{queue_size} & \field{queue_select} & Queue Notify &
+ Device Status & ISR \newline Status \\
+\hline
+\end{tabularx}
+
+If MSI-X is enabled for the device, two additional fields
+immediately follow this header:
+
+\begin{tabular}{ |l||l|l| }
+\hline
+Bits & 16 & 16 \\
+\hline
+Read/Write & R+W & R+W \\
+\hline
+Purpose (MSI-X) & \field{config_msix_vector} & \field{queue_msix_vector} \\
+\hline
+\end{tabular}
+
+Note: When MSI-X capability is enabled, device-specific configuration starts at
+byte offset 24 in virtio common configuration structure structure. When MSI-X capability is not
+enabled, device-specific configuration starts at byte offset 20 in virtio
+header. ie. once you enable MSI-X on the device, the other fields move.
+If you turn it off again, they move back!
+
+Any device-specific configuration space immediately follows
+these general headers:
+
+\begin{tabular}{|l||l|l|}
+\hline
+Bits & Device Specific & \multirow{3}{*}{\ldots} \\
+\cline{1-2}
+Read / Write & Device Specific & \\
+\cline{1-2}
+Purpose & Device Specific & \\
+\hline
+\end{tabular}
+
+When accessing the device-specific configuration space
+using the legacy interface, transitional
+drivers MUST access the device-specific configuration space
+at an offset immediately following the general headers.
+
+When using the legacy interface, transitional
+devices MUST present the device-specific configuration space
+if any at an offset immediately following the general headers.
+
+Note that only Feature Bits 0 to 31 are accessible through the
+Legacy Interface. When used through the Legacy Interface,
+Transitional Devices MUST assume that Feature Bits 32 to 63
+are not acknowledged by Driver.
+
+As legacy devices had no \field{config_generation} field,
+see \ref{sec:Basic Facilities of a Virtio Device / Device
+Configuration Space / Legacy Interface: Device Configuration
+Space}~\nameref{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: Device Configuration Space} for workarounds.
+
+\subsubsection{Non-transitional Device With Legacy Driver: A Note
+on PCI Device Layout}\label{sec:Virtio Transport Options / Virtio
+Over PCI Bus / PCI Device Layout / Non-transitional Device With
+Legacy Driver: A Note on PCI Device Layout}
+
+All known legacy drivers check either the PCI Revision or the
+Device and Vendor IDs, and thus won't attempt to drive a
+non-transitional device.
+
+A buggy legacy driver might mistakenly attempt to drive a
+non-transitional device. If support for such drivers is required
+(as opposed to fixing the bug), the following would be the
+recommended way to detect and handle them.
+\begin{note}
+Such buggy drivers are not currently known to be used in
+production.
+\end{note}
+
+\subparagraph{
+\DIFdeltextcstwo{Driver Requirements: Non-transitional Device With Legacy Driver}
+\DIFaddtextcstwo{Device Requirements: Non-transitional Device With Legacy Driver}
+}
+\label{drivernormative:Virtio Transport Options / Virtio Over PCI
+Bus / PCI-specific Initialization And Device Operation /
+Device Initialization / Non-transitional Device With Legacy
+Driver}
+\label{devicenormative:Virtio Transport Options / Virtio Over PCI
+Bus / PCI-specific Initialization And Device Operation /
+Device Initialization / Non-transitional Device With Legacy
+Driver}
+
+Non-transitional devices, on a platform where a legacy driver for
+a legacy device with the same ID (including PCI Revision, Device
+and Vendor IDs) is known to have previously existed,
+SHOULD take the following steps to cause the legacy driver to
+fail gracefully when it attempts to drive them:
+
+\begin{enumerate}
+\item Present an I/O BAR in BAR0, and
+\item Respond to a single-byte zero write to offset 18
+ (corresponding to Device Status register in the legacy layout)
+ of BAR0 by presenting zeroes on every BAR and ignoring writes.
+\end{enumerate}
+
+\subsection{PCI-specific Initialization And Device Operation}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation}
+
+\subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization}
+
+This documents PCI-specific steps executed during Device Initialization.
+
+\paragraph{Virtio Device Configuration Layout Detection}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection}
+
+As a prerequisite to device initialization, the driver scans the
+PCI capability list, detecting virtio configuration layout using Virtio
+Structure PCI capabilities as detailed in \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+\subparagraph{Legacy Interface: A Note on Device Layout Detection}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection / Legacy Interface: A Note on Device Layout Detection}
+
+Legacy drivers skipped the Device Layout Detection step, assuming legacy
+device configuration space in BAR0 in I/O space unconditionally.
+
+Legacy devices did not have the Virtio PCI Capability in their
+capability list.
+
+Therefore:
+
+Transitional devices MUST expose the Legacy Interface in I/O
+space in BAR0.
+
+Transitional drivers MUST look for the Virtio PCI
+Capabilities on the capability list.
+If these are not present, driver MUST assume a legacy device,
+and use it through the legacy interface.
+
+Non-transitional drivers MUST look for the Virtio PCI
+Capabilities on the capability list.
+If these are not present, driver MUST assume a legacy device,
+and fail gracefully.
+
+\paragraph{MSI-X Vector Configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
+
+When MSI-X capability is present and enabled in the device
+(through standard PCI configuration space) \field{config_msix_vector} and \field{queue_msix_vector} are used to map configuration change and queue
+interrupts to MSI-X vectors. In this case, the ISR Status is unused.
+
+Writing a valid MSI-X Table entry number, 0 to 0x7FF, to
+\field{config_msix_vector}/\field{queue_msix_vector} maps interrupts triggered
+by the configuration change/selected queue events respectively to
+the corresponding MSI-X vector. To disable interrupts for an
+event type, the driver unmaps this event by writing a special NO_VECTOR
+value:
+
+\begin{lstlisting}
+/* Vector value used to disable MSI for queue */
+#define VIRTIO_MSI_NO_VECTOR 0xffff
+\end{lstlisting}
+
+Note that mapping an event to vector might require device to
+allocate internal device resources, and thus could fail.
+
+\devicenormative{\subparagraph}{MSI-X Vector Configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
+
+A device that has an MSI-X capability SHOULD support at least 2
+and at most 0x800 MSI-X vectors.
+Device MUST report the number of vectors supported in
+\field{Table Size} in the MSI-X Capability as specified in
+\hyperref[intro:PCI]{[PCI]}.
+The device SHOULD restrict the reported MSI-X Table Size field
+to a value that might benefit system performance.
+\begin{note}
+For example, a device which does not expect to send
+interrupts at a high rate might only specify 2 MSI-X vectors.
+\end{note}
+Device MUST support mapping any event type to any valid
+vector 0 to MSI-X \field{Table Size}.
+Device MUST support unmapping any event type.
+
+The device MUST return vector mapped to a given event,
+(NO_VECTOR if unmapped) on read of \field{config_msix_vector}/\field{queue_msix_vector}.
+The device MUST have all queue and configuration change
+events are unmapped upon reset.
+
+Devices SHOULD NOT cause mapping an event to vector to fail
+unless it is impossible for the device to satisfy the mapping
+request. Devices MUST report mapping
+failures by returning the NO_VECTOR value when the relevant
+\field{config_msix_vector}/\field{queue_msix_vector} field is read.
+
+\drivernormative{\subparagraph}{MSI-X Vector Configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
+
+Driver MUST support device with any MSI-X Table Size 0 to 0x7FF.
+Driver MAY fall back on using INT\#x interrupts for a device
+which only supports one MSI-X vector (MSI-X Table Size = 0).
+
+Driver MAY intepret the Table Size as a hint from the device
+for the suggested number of MSI-X vectors to use.
+
+Driver MUST NOT attempt to map an event to a vector
+outside the MSI-X Table supported by the device,
+as reported by \field{Table Size} in the MSI-X Capability.
+
+After mapping an event to vector, the
+driver MUST verify success by reading the Vector field value: on
+success, the previously written value is returned, and on
+failure, NO_VECTOR is returned. If a mapping failure is detected,
+the driver MAY retry mapping with fewer vectors, disable MSI-X
+or report device failure.
+
+\paragraph{Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtqueue Configuration}
+
+As a device can have zero or more virtqueues for bulk data
+transport\footnote{For example, the simplest network device has two virtqueues.}, the driver
+needs to configure them as part of the device-specific
+configuration.
+
+The driver typically does this as follows, for each virtqueue a device has:
+
+\begin{enumerate}
+\item Write the virtqueue index (first queue is 0) to \field{queue_select}.
+
+\item Read the virtqueue size from \field{queue_size}. This controls how big the virtqueue is
+ (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues}). If this field is 0, the virtqueue does not exist.
+
+\item Optionally, select a smaller virtqueue size and write it to \field{queue_size}.
+
+\item Allocate and zero Descriptor Table, Available and Used rings for the
+ virtqueue in contiguous physical memory.
+
+\item Optionally, if MSI-X capability is present and enabled on the
+ device, select a vector to use to request interrupts triggered
+ by virtqueue events. Write the MSI-X Table entry number
+ corresponding to this vector into \field{queue_msix_vector}. Read
+ \field{queue_msix_vector}: on success, previously written value is
+ returned; on failure, NO_VECTOR value is returned.
+\end{enumerate}
+
+\subparagraph{Legacy Interface: A Note on Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtqueue Configuration / Legacy Interface: A Note on Virtqueue Configuration}
+When using the legacy interface, the queue layout follows \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout} with an alignment of 4096.
+Driver writes the physical address, divided
+by 4096 to the Queue Address field\footnote{The 4096 is based on the x86 page size, but it's also large
+enough to ensure that the separate parts of the virtqueue are on
+separate cache lines.
+}. There was no mechanism to negotiate the queue size.
+
+\subsubsection{Notifying The Device}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notifying The Device}
+
+The driver notifies the device by writing the 16-bit virtqueue index
+of this virtqueue to the Queue Notify address. See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} for how to calculate this address.
+
+\subsubsection{Virtqueue Interrupts From The Device}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device}
+
+If an interrupt is necessary for a virtqueue, the device would typically act as follows:
+
+\begin{itemize}
+ \item If MSI-X capability is disabled:
+ \begin{enumerate}
+ \item Set the lower bit of the ISR Status field for the device.
+
+ \item Send the appropriate PCI interrupt for the device.
+ \end{enumerate}
+
+ \item If MSI-X capability is enabled:
+ \begin{enumerate}
+ \item If \field{queue_msix_vector} is not NO_VECTOR,
+ request the appropriate MSI-X interrupt message for the
+ device, \field{queue_msix_vector} sets the MSI-X Table entry
+ number.
+ \end{enumerate}
+\end{itemize}
+
+\devicenormative{\paragraph}{Virtqueue Interrupts From The Device}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device}
+
+If MSI-X capability is enabled and \field{queue_msix_vector} is
+NO_VECTOR for a virtqueue, the device MUST NOT deliver an interrupt
+for that virtqueue.
+
+\subsubsection{Notification of Device Configuration Changes}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
+
+Some virtio PCI devices can change the device configuration
+state, as reflected in the device-specific configuration region of the device. In this case:
+
+\begin{itemize}
+ \item If MSI-X capability is disabled:
+ \begin{enumerate}
+ \item Set the second lower bit of the ISR Status field for the device.
+
+ \item Send the appropriate PCI interrupt for the device.
+ \end{enumerate}
+
+ \item If MSI-X capability is enabled:
+ \begin{enumerate}
+ \item If \field{config_msix_vector} is not NO_VECTOR,
+ request the appropriate MSI-X interrupt message for the
+ device, \field{config_msix_vector} sets the MSI-X Table entry
+ number.
+ \end{enumerate}
+\end{itemize}
+
+A single interrupt MAY indicate both that one or more virtqueue has
+been used and that the configuration space has changed.
+
+\devicenormative{\paragraph}{Notification of Device Configuration Changes}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
+
+If MSI-X capability is enabled and \field{config_msix_vector} is
+NO_VECTOR, the device MUST NOT deliver an interrupt
+for device configuration space changes.
+
+\drivernormative{\paragraph}{Notification of Device Configuration Changes}{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
+
+A driver MUST handle the case where the same interrupt is used to indicate
+both device configuration space change and one or more virtqueues being used.
+
+\subsubsection{Driver Handling Interrupts}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Driver Handling Interrupts}
+The driver interrupt handler would typically:
+
+\begin{itemize}
+ \item If MSI-X capability is disabled:
+ \begin{itemize}
+ \item Read the ISR Status field, which will reset it to zero.
+ \item If the lower bit is set:
+ look through the used rings of all virtqueues for the
+ device, to see if any progress has been made by the device
+ which requires servicing.
+ \item If the second lower bit is set:
+ re-examine the configuration space to see what changed.
+ \end{itemize}
+ \item If MSI-X capability is enabled:
+ \begin{itemize}
+ \item
+ Look through the used rings of
+ all virtqueues mapped to that MSI-X vector for the
+ device, to see if any progress has been made by the device
+ which requires servicing.
+ \item
+ If the MSI-X vector is equal to \field{config_msix_vector},
+ re-examine the configuration space to see what changed.
+ \end{itemize}
+\end{itemize}
+
+\section{Virtio Over MMIO}\label{sec:Virtio Transport Options / Virtio Over MMIO}
+
+Virtual environments without PCI support (a common situation in
+embedded devices models) might use simple memory mapped device
+(``virtio-mmio'') instead of the PCI device.
+
+The memory mapped virtio device behaviour is based on the PCI
+device specification. Therefore most operations including device
+initialization, queues configuration and buffer transfers are
+nearly identical. Existing differences are described in the
+following sections.
+
+\subsection{MMIO Device Discovery}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO Device Discovery}
+
+Unlike PCI, MMIO provides no generic device discovery mechanism. For each
+device, the guest OS will need to know the location of the registers
+and interrupt(s) used. The suggested binding for systems using
+flattened device trees is shown in this example:
+
+\begin{lstlisting}
+// EXAMPLE: virtio_block device taking 512 bytes at 0x1e000, interrupt 42.
+virtio_block@1e000 {
+ compatible = "virtio,mmio";
+ reg = <0x1e000 0x200>;
+ interrupts = <42>;
+}
+\end{lstlisting}
+
+\subsection{MMIO Device Register Layout}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
+
+MMIO virtio devices provide a set of memory mapped control
+registers followed by a device-specific configuration space,
+described in the table~\ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}.
+
+All register values are organized as Little Endian.
+
+\newcommand{\mmioreg}[5]{% Name Function Offset Direction Description
+ {\field{#1}} \newline #3 \newline #4 & {\bf#2} \newline #5 \\
+}
+
+\newcommand{\mmiodreg}[7]{% NameHigh NameLow Function OffsetHigh OffsetLow Direction Description
+ {\field{#1}} \newline #4 \newline {\field{#2}} \newline #5 \newline #6 & {\bf#3} \newline #7 \\
+}
+
+\begin{longtable}{p{0.2\textwidth}p{0.7\textwidth}}
+ \caption {MMIO Device Register Layout}
+ \label{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout} \\
+ \hline
+ \mmioreg{Name}{Function}{Offset from base}{Direction}{Description}
+ \hline
+ \hline
+ \endfirsthead
+ \hline
+ \mmioreg{Name}{Function}{Offset from the base}{Direction}{Description}
+ \hline
+ \hline
+ \endhead
+ \endfoot
+ \endlastfoot
+ \mmioreg{MagicValue}{Magic value}{0x000}{R}{%
+ 0x74726976
+ (a Little Endian equivalent of the ``virt'' string).
+ }
+ \hline
+ \mmioreg{Version}{Device version number}{0x004}{R}{%
+ 0x2.
+ \begin{note}
+ Legacy devices (see \ref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}) used 0x1.
+ \end{note}
+ }
+ \hline
+ \mmioreg{DeviceID}{Virtio Subsystem Device ID}{0x008}{R}{%
+ See \ref{sec:Device Types}~\nameref{sec:Device Types} for possible values.
+ Value zero (0x0) is used to
+ define a system memory map with placeholder devices at static,
+ well known addresses, assigning functions to them depending
+ on user's needs.
+ }
+ \hline
+ \mmioreg{VendorID}{Virtio Subsystem Vendor ID}{0x00c}{R}{}
+ \hline
+ \mmioreg{DeviceFeatures}{Flags representing features the device supports}{0x010}{R}{%
+ Reading from this register returns 32 consecutive flag bits,
+ the least significant bit depending on the last value written to
+ \field{DeviceFeaturesSel}. Access to this register returns
+ bits $\field{DeviceFeaturesSel}*32$ to $(\field{DeviceFeaturesSel}*32)+31$, eg.
+ feature bits 0 to 31 if \field{DeviceFeaturesSel} is set to 0 and
+ features bits 32 to 63 if \field{DeviceFeaturesSel} is set to 1.
+ Also see \ref{sec:Basic Facilities of a Virtio Device / Feature Bits}~\nameref{sec:Basic Facilities of a Virtio Device / Feature Bits}.
+ }
+ \hline
+ \mmioreg{DeviceFeaturesSel}{Device (host) features word selection.}{0x014}{W}{%
+ Writing to this register selects a set of 32 device feature bits
+ accessible by reading from \field{DeviceFeatures}.
+ }
+ \hline
+ \mmioreg{DriverFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{%
+ Writing to this register sets 32 consecutive flag bits, the least significant
+ bit depending on the last value written to \field{DriverFeaturesSel}.
+ Access to this register sets bits $\field{DriverFeaturesSel}*32$
+ to $(\field{DriverFeaturesSel}*32)+31$, eg. feature bits 0 to 31 if
+ \field{DriverFeaturesSel} is set to 0 and features bits 32 to 63 if
+ \field{DriverFeaturesSel} is set to 1. Also see \ref{sec:Basic Facilities of a Virtio Device / Feature Bits}~\nameref{sec:Basic Facilities of a Virtio Device / Feature Bits}.
+ }
+ \hline
+ \mmioreg{DriverFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{%
+ Writing to this register selects a set of 32 activated feature
+ bits accessible by writing to \field{DriverFeatures}.
+ }
+ \hline
+ \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
+ Writing to this register selects the virtual queue that the
+ following operations on \field{QueueNumMax}, \field{QueueNum}, \field{QueueReady},
+ \field{QueueDescLow}, \field{QueueDescHigh}, \field{QueueAvailLow}, \field{QueueAvailHigh},
+ \field{QueueUsedLow} and \field{QueueUsedHigh} apply to. The index
+ number of the first queue is zero (0x0).
+ }
+ \hline
+ \mmioreg{QueueNumMax}{Maximum virtual queue size}{0x034}{R}{%
+ Reading from the register returns the maximum size (number of
+ elements) of the queue the device is ready to process or
+ zero (0x0) if the queue is not available. This applies to the
+ queue selected by writing to \field{QueueSel}.
+ }
+ \hline
+ \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
+ Queue size is the number of elements in the queue, therefore in each
+ of the Descriptor Table, the Available Ring and the Used Ring.
+ Writing to this register notifies the device what size of the
+ queue the driver will use. This applies to the queue selected by
+ writing to \field{QueueSel}.
+ }
+ \hline
+ \mmioreg{QueueReady}{Virtual queue ready bit}{0x044}{RW}{%
+ Writing one (0x1) to this register notifies the device that it can
+ execute requests from this virtual queue. Reading from this register
+ returns the last value written to it. Both read and write
+ accesses apply to the queue selected by writing to \field{QueueSel}.
+ }
+ \hline
+ \mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{%
+ Writing a queue index to this register notifies the device that
+ there are new buffers to process in the queue.
+ }
+ \hline
+ \mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{%
+ Reading from this register returns a bit mask of events that
+ caused the device interrupt to be asserted.
+ The following events are possible:
+ \begin{description}
+ \item[Used Ring Update] - bit 0 - the interrupt was asserted
+ because the device has updated the Used
+ Ring in at least one of the active virtual queues.
+ \item [Configuration Change] - bit 1 - the interrupt was
+ asserted because the configuration of the device has changed.
+ \end{description}
+ }
+ \hline
+ \mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{%
+ Writing a value with bits set as defined in \field{InterruptStatus}
+ to this register notifies the device that events causing
+ the interrupt have been handled.
+ }
+ \hline
+ \mmioreg{Status}{Device status}{0x070}{RW}{%
+ Reading from this register returns the current device status
+ flags.
+ Writing non-zero values to this register sets the status flags,
+ indicating the driver progress. Writing zero (0x0) to this
+ register triggers a device reset.
+ See also p. \ref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}.
+ }
+ \hline
+ \mmiodreg{QueueDescLow}{QueueDescHigh}{Virtual queue's Descriptor Table 64 bit long physical address}{0x080}{0x084}{W}{%
+ Writing to these two registers (lower 32 bits of the address
+ to \field{QueueDescLow}, higher 32 bits to \field{QueueDescHigh}) notifies
+ the device about location of the Descriptor Table of the queue
+ selected by writing to \field{QueueSel} register.
+ }
+ \hline
+ \mmiodreg{QueueAvailLow}{QueueAvailHigh}{Virtual queue's Available Ring 64 bit long physical address}{0x090}{0x094}{W}{%
+ Writing to these two registers (lower 32 bits of the address
+ to \field{QueueAvailLow}, higher 32 bits to \field{QueueAvailHigh}) notifies
+ the device about location of the Available Ring of the queue
+ selected by writing to \field{QueueSel}.
+ }
+ \hline
+ \mmiodreg{QueueUsedLow}{QueueUsedHigh}{Virtual queue's Used Ring 64 bit long physical address}{0x0a0}{0x0a4}{W}{%
+ Writing to these two registers (lower 32 bits of the address
+ to \field{QueueUsedLow}, higher 32 bits to \field{QueueUsedHigh}) notifies
+ the device about location of the Used Ring of the queue
+ selected by writing to \field{QueueSel}.
+ }
+ \hline
+ \mmioreg{ConfigGeneration}{Configuration atomicity value}{0x0fc}{R}{
+ Reading from this register returns a value describing a version of the device-specific configuration space (see \field{Config}).
+ The driver can then access the configuration space and, when finished, read \field{ConfigGeneration} again.
+ If no part of the configuration space has changed between these two \field{ConfigGeneration} reads, the returned values are identical.
+ If the values are different, the configuration space accesses were not atomic and the driver has to perform the operations again.
+ See also \ref {sec:Basic Facilities of a Virtio Device / Device Configuration Space}.
+ }
+ \hline
+ \mmioreg{Config}{Configuration space}{0x100+}{RW}{
+ Device-specific configuration space starts at the offset 0x100
+ and is accessed with byte alignment. Its meaning and size
+ depend on the device and the driver.
+ }
+ \hline
+\end{longtable}
+
+\devicenormative{\subsubsection}{MMIO Device Register Layout}{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
+
+The device MUST return 0x74726976 in \field{MagicValue}.
+
+The device MUST return value 0x2 in \field{Version}.
+
+The device MUST present each event by setting the corresponding bit in \field{InterruptStatus} from the
+moment it takes place, until the driver acknowledges the interrupt
+by writing a corresponding bit mask to the \field{InterruptACK} register. Bits which
+do not represent events which took place MUST be zero.
+
+Upon reset, the device MUST clear all bits in \field{InterruptStatus} and ready bits in the
+\field{QueueReady} register for all queues in the device.
+
+The device MUST change value returned in \field{ConfigGeneration} if there is any risk of a
+driver seeing an inconsistent configuration state.
+
+The device MUST NOT access virtual queue contents when \field{QueueReady} is zero (0x0).
+
+\drivernormative{\subsubsection}{MMIO Device Register Layout}{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
+The driver MUST NOT access memory locations not described in the
+table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}
+(or, in case of the configuration space, described in the device specification),
+MUST NOT write to the read-only registers (direction R) and
+MUST NOT read from the write-only registers (direction W).
+
+The driver MUST only use 32 bit wide and aligned reads and writes to access the control registers
+described in table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}.
+For the device-specific configuration space, the driver MUST use 8 bit wide accesses for
+8 bit wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and 32 bit wide and
+aligned accesses for 32 and 64 bit wide fields.
+
+The driver MUST ignore a device with \field{MagicValue} which is not 0x74726976,
+although it MAY report an error.
+
+The driver MUST ignore a device with \field{Version} which is not 0x2,
+although it MAY report an error.
+
+The driver MUST ignore a device with \field{DeviceID} 0x0,
+but MUST NOT report any error.
+
+Before reading from \field{DeviceFeatures}, the driver MUST write a value to \field{DeviceFeaturesSel}.
+
+Before writing to the \field{DriverFeatures} register, the driver MUST write a value to the \field{DriverFeaturesSel} register.
+
+The driver MUST write a value to \field{QueueNum} which is less than
+or equal to the value presented by the device in \field{QueueNumMax}.
+
+When \field{QueueReady} is not zero, the driver MUST NOT access
+\field{QueueNum}, \field{QueueDescLow}, \field{QueueDescHigh},
+\field{QueueAvailLow}, \field{QueueAvailHigh}, \field{QueueUsedLow}, \field{QueueUsedHigh}.
+
+To stop using the queue the driver MUST write zero (0x0) to this
+\field{QueueReady} and MUST read the value back to ensure
+synchronization.
+
+The driver MUST ignore undefined bits in \field{InterruptStatus}.
+
+The driver MUST write a value with a bit mask describing events it handled into \field{InterruptACK} when
+it finishes handling an interrupt and MUST NOT set any of the undefined bits in the value.
+
+\subsection{MMIO-specific Initialization And Device Operation}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation}
+
+\subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}
+
+\drivernormative{\paragraph}{Device Initialization}{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Device Initialization}
+
+The driver MUST start the device initialization by reading and
+checking values from \field{MagicValue} and \field{Version}.
+If both values are valid, it MUST read \field{DeviceID}
+and if its value is zero (0x0) MUST abort initialization and
+MUST NOT access any other register.
+
+Further initialization MUST follow the procedure described in
+\ref{sec:General Initialization And Device Operation / Device Initialization}~\nameref{sec:General Initialization And Device Operation / Device Initialization}.
+
+\subsubsection{Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Virtqueue Configuration}
+
+The driver will typically initialize the virtual queue in the following way:
+
+\begin{enumerate}
+\item Select the queue writing its index (first queue is 0) to
+ \field{QueueSel}.
+
+\item Check if the queue is not already in use: read \field{QueueReady},
+ and expect a returned value of zero (0x0).
+
+\item Read maximum queue size (number of elements) from
+ \field{QueueNumMax}. If the returned value is zero (0x0) the
+ queue is not available.
+
+\item Allocate and zero the queue pages, making sure the memory
+ is physically contiguous. It is recommended to align the
+ Used Ring to an optimal boundary (usually the page size).
+
+\item Notify the device about the queue size by writing the size to
+ \field{QueueNum}.
+
+\item Write physical addresses of the queue's Descriptor Table,
+ Available Ring and Used Ring to (respectively) the
+ \field{QueueDescLow}/\field{QueueDescHigh},
+ \field{QueueAvailLow}/\field{QueueAvailHigh} and
+ \field{QueueUsedLow}/\field{QueueUsedHigh} register pairs.
+
+\item Write 0x1 to \field{QueueReady}.
+\end{enumerate}
+
+\subsubsection{Notifying The Device}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifying The Device}
+
+The driver notifies the device about new buffers being available in
+a queue by writing the index of the updated queue to \field{QueueNotify}.
+
+\subsubsection{Notifications From The Device}\label{sec:Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device}
+
+The memory mapped virtio device is using a single, dedicated
+interrupt signal, which is asserted when at least one of the
+bits described in the description of \field{InterruptStatus}
+is set. This is how the device notifies the
+driver about a new used buffer being available in the queue
+or about a change in the device configuration.
+
+\drivernormative{\paragraph}{Notifications From The Device}{Virtio Transport Options / Virtio Over MMIO / MMIO-specific Initialization And Device Operation / Notifications From The Device}
+After receiving an interrupt, the driver MUST read
+\field{InterruptStatus} to check what caused the interrupt
+(see the register description). After the interrupt is handled,
+the driver MUST acknowledge it by writing a bit mask
+corresponding to the handled events to the InterruptACK register.
+
+\subsection{Legacy interface}\label{sec:Virtio Transport Options / Virtio Over MMIO / Legacy interface}
+
+The legacy MMIO transport used page-based addressing, resulting
+in a slightly different control register layout, the device
+initialization and the virtual queue configuration procedure.
+
+Table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Legacy Register Layout}
+presents control registers layout, omitting
+descriptions of registers which did not change their function
+nor behaviour:
+
+\begin{longtable}{p{0.2\textwidth}p{0.7\textwidth}}
+ \caption {MMIO Device Legacy Register Layout}
+ \label{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Legacy Register Layout} \\
+ \hline
+ \mmioreg{Name}{Function}{Offset from base}{Direction}{Description}
+ \hline
+ \hline
+ \endfirsthead
+ \hline
+ \mmioreg{Name}{Function}{Offset from the base}{Direction}{Description}
+ \hline
+ \hline
+ \endhead
+ \endfoot
+ \endlastfoot
+ \mmioreg{MagicValue}{Magic value}{0x000}{R}{}
+ \hline
+ \mmioreg{Version}{Device version number}{0x004}{R}{Legacy device returns value 0x1.}
+ \hline
+ \mmioreg{DeviceID}{Virtio Subsystem Device ID}{0x008}{R}{}
+ \hline
+ \mmioreg{VendorID}{Virtio Subsystem Vendor ID}{0x00c}{R}{}
+ \hline
+ \mmioreg{HostFeatures}{Flags representing features the device supports}{0x010}{R}{}
+ \hline
+ \mmioreg{HostFeaturesSel}{Device (host) features word selection.}{0x014}{W}{}
+ \hline
+ \mmioreg{GuestFeatures}{Flags representing device features understood and activated by the driver}{0x020}{W}{}
+ \hline
+ \mmioreg{GuestFeaturesSel}{Activated (guest) features word selection}{0x024}{W}{}
+ \hline
+ \mmioreg{GuestPageSize}{Guest page size}{0x028}{W}{%
+ The driver writes the guest page size in bytes to the
+ register during initialization, before any queues are used.
+ This value should be a power of 2 and is used by the device to
+ calculate the Guest address of the first queue page
+ (see QueuePFN).
+ }
+ \hline
+ \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
+ Writing to this register selects the virtual queue that the
+ following operations on the \field{QueueNumMax}, \field{QueueNum}, \field{QueueAlign}
+ and \field{QueuePFN} registers apply to. The index
+ number of the first queue is zero (0x0).
+.
+ }
+ \hline
+ \mmioreg{QueueNumMax}{Maximum virtual queue size}{0x034}{R}{%
+ Reading from the register returns the maximum size of the queue
+ the device is ready to process or zero (0x0) if the queue is not
+ available. This applies to the queue selected by writing to
+ \field{QueueSel} and is allowed only when \field{QueuePFN} is set to zero
+ (0x0), so when the queue is not actively used.
+ }
+ \hline
+ \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
+ Queue size is the number of elements in the queue, therefore size
+ of the descriptor table and both available and used rings.
+ Writing to this register notifies the device what size of the
+ queue the driver will use. This applies to the queue selected by
+ writing to \field{QueueSel}.
+ }
+ \hline
+ \mmioreg{QueueAlign}{Used Ring alignment in the virtual queue}{0x03c}{W}{%
+ Writing to this register notifies the device about alignment
+ boundary of the Used Ring in bytes. This value should be a power
+ of 2 and applies to the queue selected by writing to \field{QueueSel}.
+ }
+ \hline
+ \mmioreg{QueuePFN}{Guest physical page number of the virtual queue}{0x040}{RW}{%
+ Writing to this register notifies the device about location of the
+ virtual queue in the Guest's physical address space. This value
+ is the index number of a page starting with the queue
+ Descriptor Table. Value zero (0x0) means physical address zero
+ (0x00000000) and is illegal. When the driver stops using the
+ queue it writes zero (0x0) to this register.
+ Reading from this register returns the currently used page
+ number of the queue, therefore a value other than zero (0x0)
+ means that the queue is in use.
+ Both read and write accesses apply to the queue selected by
+ writing to \field{QueueSel}.
+ }
+ \hline
+ \mmioreg{QueueNotify}{Queue notifier}{0x050}{W}{}
+ \hline
+ \mmioreg{InterruptStatus}{Interrupt status}{0x60}{R}{}
+ \hline
+ \mmioreg{InterruptACK}{Interrupt acknowledge}{0x064}{W}{}
+ \hline
+ \mmioreg{Status}{Device status}{0x070}{RW}{%
+ Reading from this register returns the current device status
+ flags.
+ Writing non-zero values to this register sets the status flags,
+ indicating the OS/driver progress. Writing zero (0x0) to this
+ register triggers a device reset. The device
+ sets \field{QueuePFN} to zero (0x0) for all queues in the device.
+ Also see \ref{sec:General Initialization And Device Operation / Device Initialization}~\nameref{sec:General Initialization And Device Operation / Device Initialization}.
+ }
+ \hline
+ \mmioreg{Config}{Configuration space}{0x100+}{RW}{}
+ \hline
+\end{longtable}
+
+The virtual queue page size is defined by writing to \field{GuestPageSize},
+as written by the guest. The driver does this before the
+virtual queues are configured.
+
+The virtual queue layout follows
+p. \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout},
+with the alignment defined in \field{QueueAlign}.
+
+The virtual queue is configured as follows:
+\begin{enumerate}
+\item Select the queue writing its index (first queue is 0) to
+ \field{QueueSel}.
+
+\item Check if the queue is not already in use: read \field{QueuePFN},
+ expecting a returned value of zero (0x0).
+
+\item Read maximum queue size (number of elements) from
+ \field{QueueNumMax}. If the returned value is zero (0x0) the
+ queue is not available.
+
+\item Allocate and zero the queue pages in contiguous virtual
+ memory, aligning the Used Ring to an optimal boundary (usually
+ page size). The driver should choose a queue size smaller than or
+ equal to \field{QueueNumMax}.
+
+\item Notify the device about the queue size by writing the size to
+ \field{QueueNum}.
+
+\item Notify the device about the used alignment by writing its value
+ in bytes to \field{QueueAlign}.
+
+\item Write the physical number of the first page of the queue to
+ the \field{QueuePFN} register.
+\end{enumerate}
+
+Notification mechanisms did not change.
+
+\section{Virtio Over Channel I/O}\label{sec:Virtio Transport Options / Virtio Over Channel I/O}
+
+S/390 based virtual machines support neither PCI nor MMIO, so a
+different transport is needed there.
+
+virtio-ccw uses the standard channel I/O based mechanism used for
+the majority of devices on S/390. A virtual channel device with a
+special control unit type acts as proxy to the virtio device
+(similar to the way virtio-pci uses a PCI device) and
+configuration and operation of the virtio device is accomplished
+(mostly) via channel commands. This means virtio devices are
+discoverable via standard operating system algorithms, and adding
+virtio support is mainly a question of supporting a new control
+unit type.
+
+As the S/390 is a big endian machine, the data structures transmitted
+via channel commands are big-endian: this is made clear by use of
+the types be16, be32 and be64.
+
+\subsection{Basic Concepts}\label{sec:Virtio Transport Options / Virtio over channel I/O / Basic Concepts}
+
+As a proxy device, virtio-ccw uses a channel-attached I/O control
+unit with a special control unit type (0x3832) and a control unit
+model corresponding to the attached virtio device's subsystem
+device ID, accessed via a virtual I/O subchannel and a virtual
+channel path of type 0x32. This proxy device is discoverable via
+normal channel subsystem device discovery (usually a STORE
+SUBCHANNEL loop) and answers to the basic channel commands:
+
+\begin{itemize}
+\item NO-OPERATION (0x03)
+\item BASIC SENSE (0x04)
+\item TRANSFER IN CHANNEL (0x08)
+\item SENSE ID (0xe4)
+\end{itemize}
+
+For a virtio-ccw proxy device, SENSE ID will return the following
+information:
+
+\begin{tabular}{ |l|l|l| }
+\hline
+Bytes & Description & Contents \\
+\hline \hline
+0 & reserved & 0xff \\
+\hline
+1-2 & control unit type & 0x3832 \\
+\hline
+3 & control unit model & <virtio device id> \\
+\hline
+4-5 & device type & zeroes (unset) \\
+\hline
+6 & device model & zeroes (unset) \\
+\hline
+7-255 & extended SenseId data & zeroes (unset) \\
+\hline
+\end{tabular}
+
+In addition to the basic channel commands, virtio-ccw defines a
+set of channel commands related to configuration and operation of
+virtio:
+
+\begin{lstlisting}
+#define CCW_CMD_SET_VQ 0x13
+#define CCW_CMD_VDEV_RESET 0x33
+#define CCW_CMD_SET_IND 0x43
+#define CCW_CMD_SET_CONF_IND 0x53
+#define CCW_CMD_SET_IND_ADAPTER 0x73
+#define CCW_CMD_READ_FEAT 0x12
+#define CCW_CMD_WRITE_FEAT 0x11
+#define CCW_CMD_READ_CONF 0x22
+#define CCW_CMD_WRITE_CONF 0x21
+#define CCW_CMD_WRITE_STATUS 0x31
+#define CCW_CMD_READ_VQ_CONF 0x32
+#define CCW_CMD_SET_VIRTIO_REV 0x83
+#define CCW_CMD_READ_STATUS 0x72
+\end{lstlisting}
+
+\devicenormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio over channel I/O / Basic Concepts}
+
+The virtio-ccw device acts like a normal channel device, as specified
+in \hyperref[intro:S390 PoP]{[S390 PoP]} and \hyperref[intro:S390 Common I/O]{[S390 Common I/O]}. In particular:
+
+\begin{itemize}
+\item A device MUST post a unit check with command reject for any command
+ it does not support.
+
+\item If a driver did not suppress length checks for a channel command,
+ the device MUST present a subchannel status as detailed in the
+ architecture when the actual length did not match the expected length.
+
+\item If a driver did suppress length checks for a channel command, the
+ device MUST present a check condition if the transmitted data does
+ not contain enough data to process the command. If the driver submitted
+ a buffer that was too long, the device SHOULD accept the command.
+\end{itemize}
+
+\drivernormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio over channel I/O / Basic Concepts}
+
+A driver for virtio-ccw devices MUST check for a control unit
+type of 0x3832 and MUST ignore the device type and model.
+
+A driver SHOULD attempt to provide the correct length in a channel
+command even if it suppresses length checks for that command.
+
+\subsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization}
+
+virtio-ccw uses several channel commands to set up a device.
+
+\subsubsection{Setting the Virtio Revision}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision}
+
+CCW_CMD_SET_VIRTIO_REV is issued by the driver to set the revision of
+the virtio-ccw transport it intends to drive the device with. It uses the
+following communication structure:
+
+\begin{lstlisting}
+struct virtio_rev_info {
+ be16 revision;
+ be16 length;
+ u8 data[];
+};
+\end{lstlisting}
+
+\field{revision} contains the desired revision id, \field{length} the length of the
+data portion and \field{data} revision-dependent additional desired options.
+
+The following values are supported:
+
+\begin{tabular}{ |l|l|l|l| }
+\hline
+\field{revision} & \field{length} & \field{data} & remarks \\
+\hline \hline
+0 & 0 & <empty> & legacy interface; transitional devices only \\
+\hline
+1 & 0 & <empty> & Virtio 1.0 \\
+\hline
+2 & 0 & <empty> & CCW_CMD_READ_STATUS support \\
+\hline
+3-n & & & reserved for later revisions \\
+\hline
+\end{tabular}
+
+Note that a change in the virtio standard does not necessarily
+correspond to a change in the virtio-ccw revision.
+
+\devicenormative{\paragraph}{Setting the Virtio Revision}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision}
+
+A device MUST post a unit check with command reject for any \field{revision}
+it does not support. For any invalid combination of \field{revision}, \field{length}
+and \field{data}, it MUST post a unit check with command reject as well. A
+non-transitional device MUST reject revision id 0.
+
+A device MUST answer with command reject to any virtio-ccw specific
+channel command that is not contained in the revision selected by the
+driver.
+
+A device MUST answer with command reject to any attempt to select a different revision
+after a revision has been successfully selected by the driver.
+
+A device MUST treat the revision as unset from the time the associated
+subchannel has been enabled until a revision has been successfully set
+by the driver. This implies that revisions are not persistent across
+disabling and enabling of the associated subchannel.
+
+\drivernormative{\paragraph}{Setting the Virtio Revision}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision}
+
+A driver SHOULD start with trying to set the highest revision it
+supports and continue with lower revisions if it gets a command reject.
+
+A driver MUST NOT issue any other virtio-ccw specific channel commands
+prior to setting the revision.
+
+After a revision has been successfully selected by the driver, it
+MUST NOT attempt to select a different revision.
+
+\paragraph{Legacy Interfaces: A Note on Setting the Virtio Revision}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting the Virtio Revision / Legacy Interfaces: A Note on Setting the Virtio Revision}
+
+A legacy device will not support the CCW_CMD_SET_VIRTIO_REV and answer
+with a command reject. A non-transitional driver MUST stop trying to
+operate this device in that case. A transitional driver MUST operate
+the device as if it had been able to set revision 0.
+
+A legacy driver will not issue the CCW_CMD_SET_VIRTIO_REV prior to
+issuing other virtio-ccw specific channel commands. A non-transitional
+device therefore MUST answer any such attempts with a command reject.
+A transitional device MUST assume in this case that the driver is a
+legacy driver and continue as if the driver selected revision 0. This
+implies that the device MUST reject any command not valid for revision
+0, including a subsequent CCW_CMD_SET_VIRTIO_REV.
+
+\subsubsection{Configuring a Virtqueue}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue}
+
+CCW_CMD_READ_VQ_CONF is issued by the driver to obtain information
+about a queue. It uses the following structure for communicating:
+
+\begin{lstlisting}
+struct vq_config_block {
+ be16 index;
+ be16 max_num;
+};
+\end{lstlisting}
+
+The requested number of buffers for queue \field{index} is returned in
+\field{max_num}.
+
+Afterwards, CCW_CMD_SET_VQ is issued by the driver to inform the
+device about the location used for its queue. The transmitted
+structure is
+
+\begin{lstlisting}
+struct vq_info_block {
+ be64 desc;
+ be32 res0;
+ be16 index;
+ be16 num;
+ be64 avail;
+ be64 used;
+};
+\end{lstlisting}
+
+\field{desc}, \field{avail} and \field{used} contain the guest addresses for the descriptor table,
+available ring and used ring for queue \field{index}, respectively. The actual
+virtqueue size (number of allocated buffers) is transmitted in \field{num}.
+
+\devicenormative{\paragraph}{Configuring a Virtqueue}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue}
+
+\field{res0} is reserved and MUST be ignored by the device.
+
+\paragraph{Legacy Interface: A Note on Configuring a Virtqueue}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Configuring a Virtqueue / Legacy Interface: A Note on Configuring a Virtqueue}
+
+For a legacy driver or for a driver that selected revision 0,
+CCW_CMD_SET_VQ uses the following communication block:
+
+\begin{lstlisting}
+struct vq_info_block_legacy {
+ be64 queue;
+ be32 align;
+ be16 index;
+ be16 num;
+};
+\end{lstlisting}
+
+\field{queue} contains the guest address for queue \field{index}, \field{num} the number of buffers
+and \field{align} the alignment. The queue layout follows \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}.
+
+\subsubsection{Communicating Status Information}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Communicating Status Information}
+
+The driver changes the status of a device via the
+CCW_CMD_WRITE_STATUS command, which transmits an 8 bit status
+value.
+
+As described in
+\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits},
+a device sometimes fails to set the \field{status} field: For example, it
+might fail to accept the FEATURES_OK status bit during device initialization.
+
+With revision 2, CCW_CMD_READ_STATUS is defined: It reads an 8 bit status
+value from the device and acts as a reverse operation to CCW_CMD_WRITE_STATUS.
+
+\drivernormative{\paragraph}{Communicating Status Information}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Communicating Status Information}
+
+If the device posts a unit check with command reject in response to the
+CCW_CMD_WRITE_STATUS command, the driver MUST assume that the device failed
+to set the status and the \field{status} field retained its previous value.
+
+If at least revision 2 has been negotiated, the driver SHOULD use the
+CCW_CMD_READ_STATUS command to retrieve the \field{status} field after
+a configuration change has been detected.
+
+If not at least revision 2 has been negotiated, the driver MUST NOT attempt
+to issue the CCW_CMD_READ_STATUS command.
+
+\devicenormative{\paragraph}{Communicating Status Information}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Communicating Status Information}
+
+If the device fails to set the \field{status} field to the value written by
+the driver, the device MUST assure that the \field{status} field is left
+unchanged and MUST post a unit check with command reject.
+
+If at least revision 2 has been negotiated, the device MUST return the
+current \field{status} field if the CCW_CMD_READ_STATUS command is issued.
+
+\subsubsection{Handling Device Features}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Device Features}
+
+Feature bits are arranged in an array of 32 bit values, making
+for a total of 8192 feature bits. Feature bits are in
+little-endian byte order.
+
+The CCW commands dealing with features use the following
+communication block:
+
+\begin{lstlisting}
+struct virtio_feature_desc {
+ le32 features;
+ u8 index;
+};
+\end{lstlisting}
+
+\field{features} are the 32 bits of features currently accessed, while
+\field{index} describes which of the feature bit values is to be
+accessed. No padding is added at the end of the structure, it is
+exactly 5 bytes in length.
+
+The guest obtains the device's device feature set via the
+CCW_CMD_READ_FEAT command. The device stores the features at \field{index}
+to \field{features}.
+
+For communicating its supported features to the device, the driver
+uses the CCW_CMD_WRITE_FEAT command, denoting a \field{features}/\field{index}
+combination.
+
+\subsubsection{Device Configuration}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Device Configuration}
+
+The device's configuration space is located in host memory.
+
+To obtain information from the configuration space, the driver
+uses CCW_CMD_READ_CONF, specifying the guest memory for the device
+to write to.
+
+For changing configuration information, the driver uses
+CCW_CMD_WRITE_CONF, specifying the guest memory for the device to
+read from.
+
+In both cases, the complete configuration space is transmitted. This
+allows the driver to compare the new configuration space with the old
+version, and keep a generation count internally whenever it changes.
+
+\subsubsection{Setting Up Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators}
+
+In order to set up the indicator bits for host->guest notification,
+the driver uses different channel commands depending on whether it
+wishes to use traditional I/O interrupts tied to a subchannel or
+adapter I/O interrupts for virtqueue notifications. For any given
+device, the two mechanisms are mutually exclusive.
+
+For the configuration change indicators, only a mechanism using
+traditional I/O interrupts is provided, regardless of whether
+traditional or adapter I/O interrupts are used for virtqueue
+notifications.
+
+\paragraph{Setting Up Classic Queue Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Classic Queue Indicators}
+
+Indicators for notification via classic I/O interrupts are contained
+in a 64 bit value per virtio-ccw proxy device.
+
+To communicate the location of the indicator bits for host->guest
+notification, the driver uses the CCW_CMD_SET_IND command,
+pointing to a location containing the guest address of the
+indicators in a 64 bit value.
+
+If the driver has already set up two-staged queue indicators via the
+CCW_CMD_SET_IND_ADAPTER command, the device MUST post a unit check
+with command reject to any subsequent CCW_CMD_SET_IND command.
+
+\paragraph{Setting Up Configuration Change Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Configuration Change Indicators}
+
+Indicators for configuration change host->guest notification are
+contained in a 64 bit value per virtio-ccw proxy device.
+
+To communicate the location of the indicator bits used in the
+configuration change host->guest notification, the driver issues the
+CCW_CMD_SET_CONF_IND command, pointing to a location containing the
+guest address of the indicators in a 64 bit value.
+
+\paragraph{Setting Up Two-Stage Queue Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Two-Stage Queue Indicators}
+
+Indicators for notification via adapter I/O interrupts consist of
+two stages:
+\begin{itemize}
+\item a summary indicator byte covering the virtqueues for one or more
+ virtio-ccw proxy devices
+\item a set of contigous indicator bits for the virtqueues for a
+ virtio-ccw proxy device
+\end{itemize}
+
+To communicate the location of the summary and queue indicator bits,
+the driver uses the CCW_CMD_SET_IND_ADAPTER command with the following
+payload:
+
+\begin{lstlisting}
+struct virtio_thinint_area {
+ be64 summary_indicator;
+ be64 indicator;
+ be64 bit_nr;
+ u8 isc;
+} __attribute__ ((packed));
+\end{lstlisting}
+
+\field{summary_indicator} contains the guest address of the 8 bit summary
+indicator.
+\field{indicator} contains the guest address of an area wherein the indicators
+for the devices are contained, starting at \field{bit_nr}, one bit per
+virtqueue of the device. Bit numbers start at the left, i.e. the most
+significant bit in the first byte is assigned the bit number 0.
+\field{isc} contains the I/O interruption subclass to be used for the adapter
+I/O interrupt. It MAY be different from the isc used by the proxy
+virtio-ccw device's subchannel.
+No padding is added at the end of the structure, it is exactly 25 bytes
+in length.
+
+
+\devicenormative{\subparagraph}{Setting Up Two-Stage Queue Indicators}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Setting Up Two-Stage Queue Indicators}
+If the driver has already set up classic queue indicators via the
+CCW_CMD_SET_IND command, the device MUST post a unit check with
+command reject to any subsequent CCW_CMD_SET_IND_ADAPTER command.
+
+\paragraph{Legacy Interfaces: A Note on Setting Up Indicators}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Setting Up Indicators / Legacy Interfaces: A Note on Setting Up Indicators}
+
+In some cases, legacy devices will only support classic queue indicators;
+in that case, they will reject CCW_CMD_SET_IND_ADAPTER as they don't know that
+command. Some legacy devices will support two-stage queue indicators, though,
+and a driver will be able to successfully use CCW_CMD_SET_IND_ADAPTER to set
+them up.
+
+\subsection{Device Operation}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation}
+
+\subsubsection{Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification}
+
+There are two modes of operation regarding host->guest notification,
+classic I/O interrupts and adapter I/O interrupts. The mode to be
+used is determined by the driver by using CCW_CMD_SET_IND respectively
+CCW_CMD_SET_IND_ADAPTER to set up queue indicators.
+
+For configuration changes, the driver always uses classic I/O
+interrupts.
+
+\paragraph{Notification via Classic I/O Interrupts}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Classic I/O Interrupts}
+
+If the driver used the CCW_CMD_SET_IND command to set up queue
+indicators, the device will use classic I/O interrupts for
+host->guest notification about virtqueue activity.
+
+For notifying the driver of virtqueue buffers, the device sets the
+corresponding bit in the guest-provided indicators. If an
+interrupt is not already pending for the subchannel, the device
+generates an unsolicited I/O interrupt.
+
+If the device wants to notify the driver about configuration
+changes, it sets bit 0 in the configuration indicators and
+generates an unsolicited I/O interrupt, if needed. This also
+applies if adapter I/O interrupts are used for queue notifications.
+
+\paragraph{Notification via Adapter I/O Interrupts}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
+
+If the driver used the CCW_CMD_SET_IND_ADAPTER command to set up
+queue indicators, the device will use adapter I/O interrupts for
+host->guest notification about virtqueue activity.
+
+For notifying the driver of virtqueue buffers, the device sets the
+bit in the guest-provided indicator area at the corresponding offset.
+The guest-provided summary indicator is set to 0x01. An adapter I/O
+interrupt for the corresponding interruption subclass is generated.
+
+The recommended way to process an adapter I/O interrupt by the driver
+is as follows:
+
+\begin{itemize}
+\item Process all queue indicator bits associated with the summary indicator.
+\item Clear the summary indicator, performing a synchronization (memory
+barrier) afterwards.
+\item Process all queue indicator bits associated with the summary indicator
+again.
+\end{itemize}
+
+\devicenormative{\subparagraph}{Notification via Adapter I/O Interrupts}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
+
+The device SHOULD only generate an adapter I/O interrupt if the
+summary indicator had not been set prior to notification.
+
+\drivernormative{\subparagraph}{Notification via Adapter I/O Interrupts}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Notification via Adapter I/O Interrupts}
+The driver
+MUST clear the summary indicator after receiving an adapter I/O
+interrupt before it processes the queue indicators.
+
+\paragraph{Legacy Interfaces: A Note on Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification / Legacy Interfaces: A Note on Host->Guest Notification}
+
+As legacy devices and drivers support only classic queue indicators,
+host->guest notification will always be done via classic I/O interrupts.
+
+\subsubsection{Guest->Host Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
+
+For notifying the device of virtqueue buffers, the driver
+unfortunately can't use a channel command (the asynchronous
+characteristics of channel I/O interact badly with the host block
+I/O backend). Instead, it uses a diagnose 0x500 call with subcode
+3 specifying the queue, as follows:
+
+\begin{tabular}{ |l|l|l| }
+\hline
+GPR & Input Value & Output Value \\
+\hline \hline
+ 1 & 0x3 & \\
+\hline
+ 2 & Subchannel ID & Host Cookie \\
+\hline
+ 3 & Virtqueue number & \\
+\hline
+ 4 & Host Cookie & \\
+\hline
+\end{tabular}
+
+\devicenormative{\paragraph}{Guest->Host Notification}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
+The device MUST ignore bits 0-31 (counting from the left) of GPR2.
+This aligns passing the subchannel ID with the way it is passed
+for the existing I/O instructions.
+
+The device MAY return a 64-bit host cookie in GPR2 to speed up the
+notification execution.
+
+\drivernormative{\paragraph}{Guest->Host Notification}{Virtio Transport Options / Virtio over channel I/O / Device Operation / Guest->Host Notification}
+
+For each notification, the driver SHOULD use GPR4 to pass the host cookie received in GPR2 from the previous notication.
+
+\begin{note}
+For example:
+\begin{lstlisting}
+info->cookie = do_notify(schid,
+ virtqueue_get_queue_index(vq),
+ info->cookie);
+\end{lstlisting}
+\end{note}
+
+\subsubsection{Resetting Devices}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Resetting Devices}
+
+In order to reset a device, a driver sends the
+CCW_CMD_VDEV_RESET command.
+
+
+\chapter{Device Types}\label{sec:Device Types}
+
+On top of the queues, config space and feature negotiation facilities
+built into virtio, several devices are defined.
+
+The following device IDs are used to identify different types of virtio
+devices. Some device IDs are reserved for devices which are not currently
+defined in this standard.
+
+Discovering what devices are available and their type is bus-dependent.
+
+\begin{tabular} { |l|c| }
+\hline
+Device ID & Virtio Device \\
+\hline \hline
+0 & reserved (invalid) \\
+\hline
+1 & network card \\
+\hline
+2 & block device \\
+\hline
+3 & console \\
+\hline
+4 & entropy source \\
+\hline
+5 & memory ballooning (traditional) \\
+\hline
+6 & ioMemory \\
+\hline
+7 & rpmsg \\
+\hline
+8 & SCSI host \\
+\hline
+9 & 9P transport \\
+\hline
+10 & mac80211 wlan \\
+\hline
+11 & rproc serial \\
+\hline
+12 & virtio CAIF \\
+\hline
+13 & memory balloon \\
+\hline
+16 & GPU device \\
+\hline
+17 & Timer/Clock device \\
+\hline
+18 & Input device \\
+\hline
+19 & Socket device \\
+\hline
+20 & Crypto device \\
+\hline
+21 & Signal Distribution Module \\
+\hline
+22 & pstore device \\
+\hline
+\end{tabular}
+
+Some of the devices above are unspecified by this document,
+because they are seen as immature or especially niche. Be warned
+that some are only specified by the sole existing implementation;
+they could become part of a future specification, be abandoned
+entirely, or live on outside this standard. We shall speak of
+them no further.
+
+\section{Network Device}\label{sec:Device Types / Network Device}
+
+The virtio network device is a virtual ethernet card, and is the
+most complex of the devices supported so far by virtio. It has
+enhanced rapidly and demonstrates clearly how support for new
+features are added to an existing device. Empty buffers are
+placed in one virtqueue for receiving packets, and outgoing
+packets are enqueued into another for transmission in that order.
+A third command queue is used to control advanced filtering
+features.
+
+\subsection{Device ID}\label{sec:Device Types / Network Device / Device ID}
+
+ 1
+
+\subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
+
+\begin{description}
+\item[0] receiveq1
+\item[1] transmitq1
+\item[\ldots]
+\item[2N] receiveqN
+\item[2N+1] transmitqN
+\item[2N+2] controlq
+\end{description}
+
+ N=1 if VIRTIO_NET_F_MQ is not negotiated, otherwise N is set by
+ \field{max_virtqueue_pairs}.
+
+ controlq only exists if VIRTIO_NET_F_CTRL_VQ set.
+
+\subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits}
+
+\begin{description}
+\item[VIRTIO_NET_F_CSUM (0)] Device handles packets with partial checksum. This
+ ``checksum offload'' is a common feature on modern network cards.
+
+\item[VIRTIO_NET_F_GUEST_CSUM (1)] Driver handles packets with partial checksum.
+
+\item[VIRTIO_NET_F_CTRL_GUEST_OFFLOADS (2)] Control channel offloads
+ reconfiguration support.
+
+\item[VIRTIO_NET_F_MTU(3)] Device maximum MTU reporting is supported. If
+ offered by the device, device advises driver about the value of
+ its maximum MTU. If negotiated, the driver uses \field{mtu} as
+ the maximum MTU value.
+
+\item[VIRTIO_NET_F_MAC (5)] Device has given MAC address.
+
+\item[VIRTIO_NET_F_GUEST_TSO4 (7)] Driver can receive TSOv4.
+
+\item[VIRTIO_NET_F_GUEST_TSO6 (8)] Driver can receive TSOv6.
+
+\item[VIRTIO_NET_F_GUEST_ECN (9)] Driver can receive TSO with ECN.
+
+\item[VIRTIO_NET_F_GUEST_UFO (10)] Driver can receive UFO.
+
+\item[VIRTIO_NET_F_HOST_TSO4 (11)] Device can receive TSOv4.
+
+\item[VIRTIO_NET_F_HOST_TSO6 (12)] Device can receive TSOv6.
+
+\item[VIRTIO_NET_F_HOST_ECN (13)] Device can receive TSO with ECN.
+
+\item[VIRTIO_NET_F_HOST_UFO (14)] Device can receive UFO.
+
+\item[VIRTIO_NET_F_MRG_RXBUF (15)] Driver can merge receive buffers.
+
+\item[VIRTIO_NET_F_STATUS (16)] Configuration status field is
+ available.
+
+\item[VIRTIO_NET_F_CTRL_VQ (17)] Control channel is available.
+
+\item[VIRTIO_NET_F_CTRL_RX (18)] Control channel RX mode support.
+
+\item[VIRTIO_NET_F_CTRL_VLAN (19)] Control channel VLAN filtering.
+
+\item[VIRTIO_NET_F_GUEST_ANNOUNCE(21)] Driver can send gratuitous
+ packets.
+
+\item[VIRTIO_NET_F_MQ(22)] Device supports multiqueue with automatic
+ receive steering.
+
+\item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
+ channel.
+\end{description}
+
+\subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements}
+
+Some networking feature bits require other networking feature bits
+(see \ref{drivernormative:Basic Facilities of a Virtio Device / Feature Bits}):
+
+\begin{description}
+\item[VIRTIO_NET_F_GUEST_TSO4] Requires VIRTIO_NET_F_GUEST_CSUM.
+\item[VIRTIO_NET_F_GUEST_TSO6] Requires VIRTIO_NET_F_GUEST_CSUM.
+\item[VIRTIO_NET_F_GUEST_ECN] Requires VIRTIO_NET_F_GUEST_TSO4 or VIRTIO_NET_F_GUEST_TSO6.
+\item[VIRTIO_NET_F_GUEST_UFO] Requires VIRTIO_NET_F_GUEST_CSUM.
+
+\item[VIRTIO_NET_F_HOST_TSO4] Requires VIRTIO_NET_F_CSUM.
+\item[VIRTIO_NET_F_HOST_TSO6] Requires VIRTIO_NET_F_CSUM.
+\item[VIRTIO_NET_F_HOST_ECN] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
+\item[VIRTIO_NET_F_HOST_UFO] Requires VIRTIO_NET_F_CSUM.
+
+\item[VIRTIO_NET_F_CTRL_RX] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_CTRL_VLAN] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_GUEST_ANNOUNCE] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_MQ] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_CTRL_MAC_ADDR] Requires VIRTIO_NET_F_CTRL_VQ.
+\end{description}
+
+\subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
+\begin{description}
+\item[VIRTIO_NET_F_GSO (6)] Device handles packets with any GSO type.
+\end{description}
+
+This was supposed to indicate segmentation offload support, but
+upon further investigation it became clear that multiple bits
+were needed.
+
+\subsection{Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout}
+\label{sec:Device Types / Block Device / Feature bits / Device configuration layout}
+
+Three driver-read-only configuration fields are currently defined. The \field{mac} address field
+always exists (though is only valid if VIRTIO_NET_F_MAC is set), and
+\field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
+read-only bits (for the driver) are currently defined for the status field:
+VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\begin{lstlisting}
+#define VIRTIO_NET_S_LINK_UP 1
+#define VIRTIO_NET_S_ANNOUNCE 2
+\end{lstlisting}
+
+The following driver-read-only field, \field{max_virtqueue_pairs} only exists if
+VIRTIO_NET_F_MQ is set. This field specifies the maximum number
+of each of transmit and receive virtqueues (receiveq1\ldots receiveqN
+and transmitq1\ldots transmitqN respectively) that can be configured once VIRTIO_NET_F_MQ
+is negotiated.
+
+The following driver-read-only field, \field{mtu} only exists if
+VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to
+use.
+
+\begin{lstlisting}
+struct virtio_net_config {
+ u8 mac[6];
+ le16 status;
+ le16 max_virtqueue_pairs;
+ le16 mtu;
+};
+\end{lstlisting}
+
+\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
+
+The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
+if it offers VIRTIO_NET_F_MQ.
+
+The device MUST set \field{mtu} to between 68 and 65535 inclusive,
+if it offers VIRTIO_NET_F_MTU.
+
+The device SHOULD set \field{mtu} to at least 1280, if it offers
+VIRTIO_NET_F_MTU.
+
+The device MUST NOT modify \field{mtu} once it has been set.
+
+The device MUST NOT pass received packets that exceed \field{mtu} (plus low
+level ethernet header length) size with \field{gso_type} NONE or ECN
+after VIRTIO_NET_F_MTU has been successfully negotiated.
+
+The device MUST forward transmitted packets of up to \field{mtu} (plus low
+level ethernet header length) size with \field{gso_type} NONE or ECN, and do
+so without fragmentation, after VIRTIO_NET_F_MTU has been successfully
+negotiated.
+
+\drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
+
+A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
+If the driver negotiates the VIRTIO_NET_F_MAC feature, the driver MUST set
+the physical address of the NIC to \field{mac}. Otherwise, it SHOULD
+use a locally-administered MAC address (see \hyperref[intro:IEEE 802]{IEEE 802},
+``9.2 48-bit universal LAN MAC addresses'').
+
+If the driver does not negotiate the VIRTIO_NET_F_STATUS feature, it SHOULD
+assume the link is active, otherwise it SHOULD read the link status from
+the bottom bit of \field{status}.
+
+A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it.
+
+If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive
+buffers to receive at least one receive packet of size \field{mtu} (plus low
+level ethernet header length) with \field{gso_type} NONE or ECN.
+
+If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
+size exceeding the value of \field{mtu} (plus low level ethernet header length)
+with \field{gso_type} NONE or ECN.
+
+\subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
+\label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
+When using the legacy interface, transitional devices and drivers
+MUST format \field{status} and
+\field{max_virtqueue_pairs} in struct virtio_net_config
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+When using the legacy interface, \field{mac} is driver-writable
+which provided a way for drivers to update the MAC without
+negotiating VIRTIO_NET_F_CTRL_MAC_ADDR.
+
+\subsection{Device Initialization}\label{sec:Device Types / Network Device / Device Initialization}
+
+A driver would perform a typical initialization routine like so:
+
+\begin{enumerate}
+\item Identify and initialize the receive and
+ transmission virtqueues, up to N of each kind. If
+ VIRTIO_NET_F_MQ feature bit is negotiated,
+ N=\field{max_virtqueue_pairs}, otherwise identify N=1.
+
+\item If the VIRTIO_NET_F_CTRL_VQ feature bit is negotiated,
+ identify the control virtqueue.
+
+\item Fill the receive queues with buffers: see \ref{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers}.
+
+\item Even with VIRTIO_NET_F_MQ, only receiveq1, transmitq1 and
+ controlq are used by default. The driver would send the
+ VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command specifying the
+ number of the transmit and receive queues to use.
+
+\item If the VIRTIO_NET_F_MAC feature bit is set, the configuration
+ space \field{mac} entry indicates the ``physical'' address of the
+ network card, otherwise the driver would typically generate a random
+ local MAC address.
+
+\item If the VIRTIO_NET_F_STATUS feature bit is negotiated, the link
+ status comes from the bottom bit of \field{status}.
+ Otherwise, the driver assumes it's active.
+
+\item A performant driver would indicate that it will generate checksumless
+ packets by negotating the VIRTIO_NET_F_CSUM feature.
+
+\item If that feature is negotiated, a driver can use TCP or UDP
+ segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4
+ TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
+ (UDP fragmentation) features.
+
+\item The converse features are also available: a driver can save
+ the virtual device some work by negotiating these features.\note{For example, a network packet transported between two guests on
+the same system might not need checksumming at all, nor segmentation,
+if both guests are amenable.}
+ The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
+ checksummed packets can be received, and if it can do that then
+ the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
+ VIRTIO_NET_F_GUEST_UFO and VIRTIO_NET_F_GUEST_ECN are the input
+ equivalents of the features described above.
+ See \ref{sec:Device Types / Network Device / Device Operation /
+Setting Up Receive Buffers}~\nameref{sec:Device Types / Network
+Device / Device Operation / Setting Up Receive Buffers} and
+\ref{sec:Device Types / Network Device / Device Operation /
+Processing of Incoming Packets}~\nameref{sec:Device Types /
+Network Device / Device Operation / Processing of Incoming Packets} below.
+\end{enumerate}
+
+A truly minimal driver would only accept VIRTIO_NET_F_MAC and ignore
+everything else.
+
+\subsection{Device Operation}\label{sec:Device Types / Network Device / Device Operation}
+
+Packets are transmitted by placing them in the
+transmitq1\ldots transmitqN, and buffers for incoming packets are
+placed in the receiveq1\ldots receiveqN. In each case, the packet
+itself is preceded by a header:
+
+\begin{lstlisting}
+struct virtio_net_hdr {
+#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1
+ u8 flags;
+#define VIRTIO_NET_HDR_GSO_NONE 0
+#define VIRTIO_NET_HDR_GSO_TCPV4 1
+#define VIRTIO_NET_HDR_GSO_UDP 3
+#define VIRTIO_NET_HDR_GSO_TCPV6 4
+#define VIRTIO_NET_HDR_GSO_ECN 0x80
+ u8 gso_type;
+ le16 hdr_len;
+ le16 gso_size;
+ le16 csum_start;
+ le16 csum_offset;
+ le16 num_buffers;
+};
+\end{lstlisting}
+
+The controlq is used to control device features such as
+filtering.
+
+\subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Network Device / Device Operation / Legacy Interface: Device Operation}
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_net_hdr
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+The legacy driver only presented \field{num_buffers} in the struct virtio_net_hdr
+when VIRTIO_NET_F_MRG_RXBUF was negotiated; without that feature the
+structure was 2 bytes shorter.
+
+When using the legacy interface, the driver SHOULD ignore the
+\field{len} value in used ring entries for the transmit queues
+and the controlq queue.
+\begin{note}
+Historically, some devices put
+the total descriptor length there, even though no data was
+actually written.
+\end{note}
+
+\subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / Device Operation / Packet Transmission}
+
+Transmitting a single packet is simple, but varies depending on
+the different features the driver negotiated.
+
+\begin{enumerate}
+\item The driver can send a completely checksummed packet. In this case,
+ \field{flags} will be zero, and \field{gso_type} will be VIRTIO_NET_HDR_GSO_NONE.
+
+\item If the driver negotiated VIRTIO_NET_F_CSUM, it can skip
+ checksumming the packet:
+ \begin{itemize}
+ \item \field{flags} has the VIRTIO_NET_HDR_F_NEEDS_CSUM set,
+
+ \item \field{csum_start} is set to the offset within the packet to begin checksumming,
+ and
+
+ \item \field{csum_offset} indicates how many bytes after the csum_start the
+ new (16 bit ones' complement) checksum is placed by the device.
+
+ \item The TCP checksum field in the packet is set to the sum
+ of the TCP pseudo header, so that replacing it by the ones'
+ complement checksum of the TCP header and body will give the
+ correct result.
+ \end{itemize}
+
+\begin{note}
+For example, consider a partially checksummed TCP (IPv4) packet.
+It will have a 14 byte ethernet header and 20 byte IP header
+followed by the TCP header (with the TCP checksum field 16 bytes
+into that header). \field{csum_start} will be 14+20 = 34 (the TCP
+checksum includes the header), and \field{csum_offset} will be 16.
+\end{note}
+
+\item If the driver negotiated
+ VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires
+ TCP segmentation or UDP fragmentation, then \field{gso_type}
+ is set to VIRTIO_NET_HDR_GSO_TCPV4, TCPV6 or UDP.
+ (Otherwise, it is set to VIRTIO_NET_HDR_GSO_NONE). In this
+ case, packets larger than 1514 bytes can be transmitted: the
+ metadata indicates how to replicate the packet header to cut it
+ into smaller packets. The other gso fields are set:
+
+ \begin{itemize}
+ \item \field{hdr_len} is a hint to the device as to how much of the header
+ needs to be kept to copy into each packet, usually set to the
+ length of the headers, including the transport header\footnote{Due to various bugs in implementations, this field is not useful
+as a guarantee of the transport header size.
+}.
+
+ \item \field{gso_size} is the maximum size of each packet beyond that
+ header (ie. MSS).
+
+ \item If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature,
+ the VIRTIO_NET_HDR_GSO_ECN bit in \field{gso_type}
+ indicates that the TCP packet has the ECN bit set\footnote{This case is not handled by some older hardware, so is called out
+specifically in the protocol.}.
+ \end{itemize}
+
+\item \field{num_buffers} is set to zero. This field is unused on transmitted packets.
+
+\item The header and packet are added as one output descriptor to the
+ transmitq, and the device is notified of the new entry
+ (see \ref{sec:Device Types / Network Device / Device Initialization}~\nameref{sec:Device Types / Network Device / Device Initialization}).
+\end{enumerate}
+
+\drivernormative{\paragraph}{Packet Transmission}{Device Types / Network Device / Device Operation / Packet Transmission}
+
+The driver MUST set \field{num_buffers} to zero.
+
+If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set
+\field{flags} to zero and SHOULD supply a fully checksummed
+packet to the device.
+
+If VIRTIO_NET_F_HOST_TSO4 is negotiated, the driver MAY set
+\field{gso_type} to VIRTIO_NET_HDR_GSO_TCPV4 to request TCPv4
+segmentation, otherwise the driver MUST NOT set
+\field{gso_type} to VIRTIO_NET_HDR_GSO_TCPV4.
+
+If VIRTIO_NET_F_HOST_TSO6 is negotiated, the driver MAY set
+\field{gso_type} to VIRTIO_NET_HDR_GSO_TCPV6 to request TCPv6
+segmentation, otherwise the driver MUST NOT set
+\field{gso_type} to VIRTIO_NET_HDR_GSO_TCPV6.
+
+If VIRTIO_NET_F_HOST_UFO is negotiated, the driver MAY set
+\field{gso_type} to VIRTIO_NET_HDR_GSO_UDP to request UDP
+segmentation, otherwise the driver MUST NOT set
+\field{gso_type} to VIRTIO_NET_HDR_GSO_UDP.
+
+The driver SHOULD NOT send to the device TCP packets requiring segmentation offload
+which have the Explicit Congestion Notification bit set, unless the
+VIRTIO_NET_F_HOST_ECN feature is negotiated, in which case the
+driver MUST set the VIRTIO_NET_HDR_GSO_ECN bit in
+\field{gso_type}.
+
+If the VIRTIO_NET_F_CSUM feature has been negotiated, the
+driver MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in
+\field{flags}, if so:
+\begin{enumerate}
+\item the driver MUST validate the packet checksum at
+ offset \field{csum_offset} from \field{csum_start} as well as all
+ preceding offsets;
+\item the driver MUST set the packet checksum stored in the
+ buffer to the TCP/UDP pseudo header;
+\item the driver MUST set \field{csum_start} and
+ \field{csum_offset} such that calculating a ones'
+ complement checksum from \field{csum_start} up until the end of
+ the packet and storing the result at offset \field{csum_offset}
+ from \field{csum_start} will result in a fully checksummed
+ packet;
+\end{enumerate}
+
+If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have
+been negotiated, the driver MUST set \field{gso_type} to
+VIRTIO_NET_HDR_GSO_NONE.
+
+If \field{gso_type} differs from VIRTIO_NET_HDR_GSO_NONE, then
+the driver MUST also set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in
+\field{flags} and MUST set \field{gso_size} to indicate the
+desired MSS.
+
+If one of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have
+been negotiated, the driver SHOULD set \field{hdr_len} to a value
+not less than the length of the headers, including the transport
+header.
+
+The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID bit in
+\field{flags}.
+
+\devicenormative{\paragraph}{Packet Transmission}{Device Types / Network Device / Device Operation / Packet Transmission}
+The device MUST ignore \field{flag} bits that it does not recognize.
+
+If VIRTIO_NET_HDR_F_NEEDS_CSUM bit in \field{flags} is not set, the
+device MUST NOT use the \field{csum_start} and \field{csum_offset}.
+
+If one of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have
+been negotiated, the device MAY use \field{hdr_len} only as a hint about the
+transport header size.
+The device MUST NOT rely on \field{hdr_len} to be correct.
+\begin{note}
+This is due to various bugs in implementations.
+\end{note}
+
+If VIRTIO_NET_HDR_F_NEEDS_CSUM is not set, the device MUST NOT
+rely on the packet checksum being correct.
+\paragraph{Packet Transmission Interrupt}\label{sec:Device Types / Network Device / Device Operation / Packet Transmission / Packet Transmission Interrupt}
+
+Often a driver will suppress transmission interrupts using the
+VIRTQ_AVAIL_F_NO_INTERRUPT flag
+ (see \ref{sec:General Initialization And Device Operation / Device Operation / Receiving Used Buffers From The Device}~\nameref{sec:General Initialization And Device Operation / Device Operation / Receiving Used Buffers From The Device})
+and check for used packets in the transmit path of following
+packets.
+
+The normal behavior in this interrupt handler is to retrieve and
+new descriptors from the used ring and free the corresponding
+headers and packets.
+
+\subsubsection{Setting Up Receive Buffers}\label{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers}
+
+It is generally a good idea to keep the receive virtqueue as
+fully populated as possible: if it runs out, network performance
+will suffer.
+
+If the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or
+VIRTIO_NET_F_GUEST_UFO features are used, the maximum incoming packet
+will be to 65550 bytes long (the maximum size of a
+TCP or UDP packet, plus the 14 byte ethernet header), otherwise
+1514 bytes. The 12-byte struct virtio_net_hdr is prepended to this,
+making for 65562 or 1526 bytes.
+
+\drivernormative{\paragraph}{Setting Up Receive Buffers}{Device Types / Network Device / Device Operation / Setting Up Receive Buffers}
+
+\begin{itemize}
+\item If VIRTIO_NET_F_MRG_RXBUF is not negotiated:
+ \begin{itemize}
+ \item If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or
+ VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD populate
+ the receive queue(s) with buffers of at least 65562 bytes.
+ \item Otherwise, the driver SHOULD populate the receive queue(s)
+ with buffers of at least 1526 bytes.
+ \end{itemize}
+\item If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST be at
+ greater than the size of the struct virtio_net_hdr.
+\end{itemize}
+
+\begin{note}
+Obviously each buffer can be split across multiple descriptor elements.
+\end{note}
+
+If VIRTIO_NET_F_MQ is negotiated, each of receiveq1\ldots receiveqN
+that will be used SHOULD be populated with receive buffers.
+
+\devicenormative{\paragraph}{Setting Up Receive Buffers}{Device Types / Network Device / Device Operation / Setting Up Receive Buffers}
+
+The device MUST set \field{num_buffers} to the number of descriptors used to
+hold the incoming packet.
+
+The device MUST use only a single descriptor if VIRTIO_NET_F_MRG_RXBUF
+was not negotiated.
+\begin{note}
+{This means that \field{num_buffers} will always be 1
+if VIRTIO_NET_F_MRG_RXBUF is not negotiated.}
+\end{note}
+
+\subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets}
+\label{sec:Device Types / Network Device / Device Operation / Processing of Packets}%old label for latexdiff
+
+When a packet is copied into a buffer in the receiveq, the
+optimal path is to disable further interrupts for the receiveq
+(see \ref{sec:General Initialization And Device Operation / Device Operation / Receiving Used Buffers From The Device}~\nameref{sec:General Initialization And Device Operation / Device Operation / Receiving Used Buffers From The Device}) and process
+packets until no more are found, then re-enable them.
+
+Processing incoming packets involves:
+
+\begin{enumerate}
+\item \field{num_buffers} indicates how many descriptors
+ this packet is spread over (including this one): this will
+ always be 1 if VIRTIO_NET_F_MRG_RXBUF was not negotiated.
+ This allows receipt of large packets without having to allocate large
+ buffers. In this case, there will be at least \field{num_buffers} in
+ the used ring, and the device chains them together to form a
+ single packet. The other buffers will not begin with a struct
+ virtio_net_hdr.
+
+\item If
+ \field{num_buffers} is one, then the entire packet will be
+ contained within this buffer, immediately following the struct
+ virtio_net_hdr.
+\item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
+ VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be
+ set: if so, device has validated the packet checksum.
+ In case of multiple encapsulated protocols, one level of checksums
+ has been validated.
+\end{enumerate}
+
+Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN
+features enable receive checksum, large receive offload and ECN
+support which are the input equivalents of the transmit checksum,
+transmit segmentation offloading and ECN features, as described
+in \ref{sec:Device Types / Network Device / Device Operation /
+Packet Transmission}:
+\begin{enumerate}
+\item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
+ VIRTIO_NET_HDR_F_NEEDS_CSUM bit in \field{flags} can be
+ set: if so, the packet checksum at offset \field{csum_offset}
+ from \field{csum_start} and any preceding checksums
+ have been validated. The checksum on the packet is incomplete and
+ \field{csum_start} and \field{csum_offset} indicate how to calculate
+ it (see Packet Transmission point 1).
+
+\item If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
+ negotiated, then \field{gso_type} MAY be something other than
+ VIRTIO_NET_HDR_GSO_NONE, and \field{gso_size} field indicates the
+ desired MSS (see Packet Transmission point 2).
+\end{enumerate}
+
+\devicenormative{\paragraph}{Processing of Incoming Packets}{Device Types / Network Device / Device Operation / Processing of Incoming Packets}
+\label{devicenormative:Device Types / Network Device / Device Operation / Processing of Packets}%old label for latexdiff
+
+If VIRTIO_NET_F_MRG_RXBUF has not been negotiated, the device MUST set
+\field{num_buffers} to 1.
+
+If VIRTIO_NET_F_MRG_RXBUF has been negotiated, the device MUST set
+\field{num_buffers} to indicate the number of descriptors
+the packet (including the header) is spread over.
+
+The device MUST use all descriptors used by a single receive
+packet together, by atomically incrementing \field{idx} in the
+used ring by the \field{num_buffers} value.
+
+If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device MUST set
+\field{flags} to zero and SHOULD supply a fully checksummed
+packet to the driver.
+
+If VIRTIO_NET_F_GUEST_TSO4 is not negotiated, the device MUST NOT set
+\field{gso_type} to VIRTIO_NET_HDR_GSO_TCPV4.
+
+If VIRTIO_NET_F_GUEST_UDP is not negotiated, the device MUST NOT set
+\field{gso_type} to VIRTIO_NET_HDR_GSO_UDP.
+
+If VIRTIO_NET_F_GUEST_TSO6 is not negotiated, the device MUST NOT set
+\field{gso_type} to VIRTIO_NET_HDR_GSO_TCPV6.
+
+The device SHOULD NOT send to the driver TCP packets requiring segmentation offload
+which have the Explicit Congestion Notification bit set, unless the
+VIRTIO_NET_F_GUEST_ECN feature is negotiated, in which case the
+device MUST set the VIRTIO_NET_HDR_GSO_ECN bit in
+\field{gso_type}.
+
+If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
+device MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in
+\field{flags}, if so:
+\begin{enumerate}
+\item the device MUST validate the packet checksum at
+ offset \field{csum_offset} from \field{csum_start} as well as all
+ preceding offsets;
+\item the device MUST set the packet checksum stored in the
+ receive buffer to the TCP/UDP pseudo header;
+\item the device MUST set \field{csum_start} and
+ \field{csum_offset} such that calculating a ones'
+ complement checksum from \field{csum_start} up until the
+ end of the packet and storing the result at offset
+ \field{csum_offset} from \field{csum_start} will result in a
+ fully checksummed packet;
+\end{enumerate}
+
+If none of the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options have
+been negotiated, the device MUST set \field{gso_type} to
+VIRTIO_NET_HDR_GSO_NONE.
+
+If \field{gso_type} differs from VIRTIO_NET_HDR_GSO_NONE, then
+the device MUST also set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in
+\field{flags} MUST set \field{gso_size} to indicate the desired MSS.
+
+If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options have
+been negotiated, the device SHOULD set \field{hdr_len} to a value
+not less than the length of the headers, including the transport
+header.
+
+If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
+device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in
+\field{flags}, if so, the device MUST validate the packet
+checksum (in case of multiple encapsulated protocols, one level
+of checksums is validated).
+
+\drivernormative{\paragraph}{Processing of Incoming
+Packets}{Device Types / Network Device / Device Operation /
+Processing of Incoming Packets}
+
+The driver MUST ignore \field{flag} bits that it does not recognize.
+
+If VIRTIO_NET_HDR_F_NEEDS_CSUM bit in \field{flags} is not set, the
+driver MUST NOT use the \field{csum_start} and \field{csum_offset}.
+
+If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options have
+been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
+transport header size.
+The driver MUST NOT rely on \field{hdr_len} to be correct.
+\begin{note}
+This is due to various bugs in implementations.
+\end{note}
+
+If neither VIRTIO_NET_HDR_F_NEEDS_CSUM nor
+VIRTIO_NET_HDR_F_DATA_VALID is set, the driver MUST NOT
+rely on the packet checksum being correct.
+\subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue}
+
+The driver uses the control virtqueue (if VIRTIO_NET_F_CTRL_VQ is
+negotiated) to send commands to manipulate various features of
+the device which would not easily map into the configuration
+space.
+
+All commands are of the following form:
+
+\begin{lstlisting}
+struct virtio_net_ctrl {
+ u8 class;
+ u8 command;
+ u8 command-specific-data[];
+ u8 ack;
+};
+
+/* ack values */
+#define VIRTIO_NET_OK 0
+#define VIRTIO_NET_ERR 1
+\end{lstlisting}
+
+The \field{class}, \field{command} and command-specific-data are set by the
+driver, and the device sets the \field{ack} byte. There is little it can
+do except issue a diagnostic if \field{ack} is not
+VIRTIO_NET_OK.
+
+\paragraph{Packet Receive Filtering}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Packet Receive Filtering}
+\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Setting Promiscuous Mode}%old label for latexdiff
+
+If the VIRTIO_NET_F_CTRL_RX and VIRTIO_NET_F_CTRL_RX_EXTRA
+features are negotiated, the driver can send control commands for
+promiscuous mode, multicast, unicast and broadcast receiving.
+
+\begin{note}
+In general, these commands are best-effort: unwanted
+packets could still arrive.
+\end{note}
+
+\begin{lstlisting}
+#define VIRTIO_NET_CTRL_RX 0
+ #define VIRTIO_NET_CTRL_RX_PROMISC 0
+ #define VIRTIO_NET_CTRL_RX_ALLMULTI 1
+ #define VIRTIO_NET_CTRL_RX_ALLUNI 2
+ #define VIRTIO_NET_CTRL_RX_NOMULTI 3
+ #define VIRTIO_NET_CTRL_RX_NOUNI 4
+ #define VIRTIO_NET_CTRL_RX_NOBCAST 5
+\end{lstlisting}
+
+
+\devicenormative{\subparagraph}{Packet Receive Filtering}{Device Types / Network Device / Device Operation / Control Virtqueue / Packet Receive Filtering}
+
+If the VIRTIO_NET_F_CTRL_RX feature has been negotiated,
+the device MUST support the following VIRTIO_NET_CTRL_RX class
+commands:
+\begin{itemize}
+\item VIRTIO_NET_CTRL_RX_PROMISC turns promiscuous mode on and
+off. The command-specific-data is one byte containing 0 (off) or
+1 (on). If promiscous mode is on, the device SHOULD receive all
+incoming packets.
+This SHOULD take effect even if one of the other modes set by
+a VIRTIO_NET_CTRL_RX class command is on.
+\item VIRTIO_NET_CTRL_RX_ALLMULTI turns all-multicast receive on and
+off. The command-specific-data is one byte containing 0 (off) or
+1 (on). When all-multicast receive is on the device SHOULD allow
+all incoming multicast packets.
+\end{itemize}
+
+If the VIRTIO_NET_F_CTRL_RX_EXTRA feature has been negotiated,
+the device MUST support the following VIRTIO_NET_CTRL_RX class
+commands:
+\begin{itemize}
+\item VIRTIO_NET_CTRL_RX_ALLUNI turns all-unicast receive on and
+off. The command-specific-data is one byte containing 0 (off) or
+1 (on). When all-unicast receive is on the device SHOULD allow
+all incoming unicast packets.
+\item VIRTIO_NET_CTRL_RX_NOMULTI suppresses multicast receive.
+The command-specific-data is one byte containing 0 (multicast
+receive allowed) or 1 (multicast receive suppressed).
+When multicast receive is suppressed, the device SHOULD NOT
+send multicast packets to the driver.
+This SHOULD take effect even if VIRTIO_NET_CTRL_RX_ALLMULTI is on.
+This filter SHOULD NOT apply to broadcast packets.
+\item VIRTIO_NET_CTRL_RX_NOUNI suppresses unicast receive.
+The command-specific-data is one byte containing 0 (unicast
+receive allowed) or 1 (unicast receive suppressed).
+When unicast receive is suppressed, the device SHOULD NOT
+send unicast packets to the driver.
+This SHOULD take effect even if VIRTIO_NET_CTRL_RX_ALLUNI is on.
+\item VIRTIO_NET_CTRL_RX_NOBCAST suppresses broadcast receive.
+The command-specific-data is one byte containing 0 (broadcast
+receive allowed) or 1 (broadcast receive suppressed).
+When broadcast receive is suppressed, the device SHOULD NOT
+send broadcast packets to the driver.
+This SHOULD take effect even if VIRTIO_NET_CTRL_RX_ALLMULTI is on.
+\end{itemize}
+
+\drivernormative{\subparagraph}{Packet Receive Filtering}{Device Types / Network Device / Device Operation / Control Virtqueue / Packet Receive Filtering}
+
+If the VIRTIO_NET_F_CTRL_RX feature has not been negotiated,
+the driver MUST NOT issue commands VIRTIO_NET_CTRL_RX_PROMISC or
+VIRTIO_NET_CTRL_RX_ALLMULTI.
+
+If the VIRTIO_NET_F_CTRL_RX_EXTRA feature has not been negotiated,
+the driver MUST NOT issue commands
+ VIRTIO_NET_CTRL_RX_ALLUNI,
+ VIRTIO_NET_CTRL_RX_NOMULTI,
+ VIRTIO_NET_CTRL_RX_NOUNI or
+ VIRTIO_NET_CTRL_RX_NOBCAST.
+
+\paragraph{Setting MAC Address Filtering}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Setting MAC Address Filtering}
+
+If the VIRTIO_NET_F_CTRL_RX feature is negotiated, the driver can
+send control commands for MAC address filtering.
+
+\begin{lstlisting}
+struct virtio_net_ctrl_mac {
+ le32 entries;
+ u8 macs[entries][6];
+};
+
+#define VIRTIO_NET_CTRL_MAC 1
+ #define VIRTIO_NET_CTRL_MAC_TABLE_SET 0
+ #define VIRTIO_NET_CTRL_MAC_ADDR_SET 1
+\end{lstlisting}
+
+The device can filter incoming packets by any number of destination
+MAC addresses\footnote{Since there are no guarantees, it can use a hash filter or
+silently switch to allmulti or promiscuous mode if it is given too
+many addresses.
+}. This table is set using the class
+VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The
+command-specific-data is two variable length tables of 6-byte MAC
+addresses (as described in struct virtio_net_ctrl_mac). The first table contains unicast addresses, and the second
+contains multicast addresses.
+
+The VIRTIO_NET_CTRL_MAC_ADDR_SET command is used to set the
+default MAC address which rx filtering
+accepts (and if VIRTIO_NET_F_MAC_ADDR has been negotiated,
+this will be reflected in \field{mac} in config space).
+
+The command-specific-data for VIRTIO_NET_CTRL_MAC_ADDR_SET is
+the 6-byte MAC address.
+
+\devicenormative{\subparagraph}{Setting MAC Address Filtering}{Device Types / Network Device / Device Operation / Control Virtqueue / Setting MAC Address Filtering}
+
+The device MUST have an empty MAC filtering table on reset.
+
+The device MUST update the MAC filtering table before it consumes
+the VIRTIO_NET_CTRL_MAC_TABLE_SET command.
+
+The device MUST update \field{mac} in config space before it consumes
+the VIRTIO_NET_CTRL_MAC_ADDR_SET command, if VIRTIO_NET_F_MAC_ADDR has
+been negotiated.
+
+The device SHOULD drop incoming packets which have a destination MAC which
+matches neither the \field{mac} (or that set with VIRTIO_NET_CTRL_MAC_ADDR_SET)
+nor the MAC filtering table.
+
+\drivernormative{\subparagraph}{Setting MAC Address Filtering}{Device Types / Network Device / Device Operation / Control Virtqueue / Setting MAC Address Filtering}
+
+If VIRTIO_NET_F_CTRL_RX has not been negotiated,
+the driver MUST NOT issue VIRTIO_NET_CTRL_MAC class commands.
+
+If VIRTIO_NET_F_CTRL_RX has been negotiated,
+the driver SHOULD issue VIRTIO_NET_CTRL_MAC_ADDR_SET
+to set the default mac if it is different from \field{mac}.
+
+The driver MUST follow the VIRTIO_NET_CTRL_MAC_TABLE_SET command
+by a le32 number, followed by that number of non-multicast
+MAC addresses, followed by another le32 number, followed by
+that number of multicast addresses. Either number MAY be 0.
+
+\subparagraph{Legacy Interface: Setting MAC Address Filtering}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Setting MAC Address Filtering / Legacy Interface: Setting MAC Address Filtering}
+When using the legacy interface, transitional devices and drivers
+MUST format \field{entries} in struct virtio_net_ctrl_mac
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+Legacy drivers that didn't negotiate VIRTIO_NET_F_CTRL_MAC_ADDR
+changed \field{mac} in config space when NIC is accepting
+incoming packets. These drivers always wrote the mac value from
+first to last byte, therefore after detecting such drivers,
+a transitional device MAY defer MAC update, or MAY defer
+processing incoming packets until driver writes the last byte
+of \field{mac} in the config space.
+
+\paragraph{VLAN Filtering}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / VLAN Filtering}
+
+If the driver negotiates the VIRTION_NET_F_CTRL_VLAN feature, it
+can control a VLAN filter table in the device.
+
+\begin{lstlisting}
+#define VIRTIO_NET_CTRL_VLAN 2
+ #define VIRTIO_NET_CTRL_VLAN_ADD 0
+ #define VIRTIO_NET_CTRL_VLAN_DEL 1
+\end{lstlisting}
+
+Both the VIRTIO_NET_CTRL_VLAN_ADD and VIRTIO_NET_CTRL_VLAN_DEL
+command take a little-endian 16-bit VLAN id as the command-specific-data.
+
+\subparagraph{Legacy Interface: VLAN Filtering}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / VLAN Filtering / Legacy Interface: VLAN Filtering}
+When using the legacy interface, transitional devices and drivers
+MUST format the VLAN id
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+\paragraph{Gratuitous Packet Sending}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Gratuitous Packet Sending}
+
+If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE (depends
+on VIRTIO_NET_F_CTRL_VQ), the device can ask the driver to send gratuitous
+packets; this is usually done after the guest has been physically
+migrated, and needs to announce its presence on the new network
+links. (As hypervisor does not have the knowledge of guest
+network configuration (eg. tagged vlan) it is simplest to prod
+the guest in this way).
+
+\begin{lstlisting}
+#define VIRTIO_NET_CTRL_ANNOUNCE 3
+ #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
+\end{lstlisting}
+
+The driver checks VIRTIO_NET_S_ANNOUNCE bit in the device configuration \field{status} field
+when it notices the changes of device configuration. The
+command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
+driver has received the notification and device clears the
+VIRTIO_NET_S_ANNOUNCE bit in \field{status}.
+
+Processing this notification involves:
+
+\begin{enumerate}
+\item Sending the gratuitous packets (eg. ARP) or marking there are pending
+ gratuitous packets to be sent and letting deferred routine to
+ send them.
+
+\item Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control
+ vq.
+\end{enumerate}
+
+\drivernormative{\subparagraph}{Gratuitous Packet Sending}{Device Types / Network Device / Device Operation / Control Virtqueue / Gratuitous Packet Sending}
+
+If the driver negotiates VIRTIO_NET_F_GUEST_ANNOUNCE, it SHOULD notify
+network peers of its new location after it sees the VIRTIO_NET_S_ANNOUNCE bit
+in \field{status}. The driver MUST send a command on the command queue
+with class VIRTIO_NET_CTRL_ANNOUNCE and command VIRTIO_NET_CTRL_ANNOUNCE_ACK.
+
+\devicenormative{\subparagraph}{Gratuitous Packet Sending}{Device Types / Network Device / Device Operation / Control Virtqueue / Gratuitous Packet Sending}
+
+If VIRTIO_NET_F_GUEST_ANNOUNCE is negotiated, the device MUST clear the
+VIRTIO_NET_S_ANNOUNCE bit in \field{status} upon receipt of a command buffer
+with class VIRTIO_NET_CTRL_ANNOUNCE and command VIRTIO_NET_CTRL_ANNOUNCE_ACK
+before marking the buffer as used.
+
+\paragraph{Automatic receive steering in multiqueue mode}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
+
+If the driver negotiates the VIRTIO_NET_F_MQ feature bit (depends
+on VIRTIO_NET_F_CTRL_VQ), it MAY transmit outgoing packets on one
+of the multiple transmitq1\ldots transmitqN and ask the device to
+queue incoming packets into one of the multiple receiveq1\ldots receiveqN
+depending on the packet flow.
+
+\begin{lstlisting}
+struct virtio_net_ctrl_mq {
+ le16 virtqueue_pairs;
+};
+
+#define VIRTIO_NET_CTRL_MQ 4
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET 0
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN 1
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX 0x8000
+\end{lstlisting}
+
+Multiqueue is disabled by default. The driver enables multiqueue by
+executing the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command, specifying
+the number of the transmit and receive queues to be used up to
+\field{max_virtqueue_pairs}; subsequently,
+transmitq1\ldots transmitqn and receiveq1\ldots receiveqn where
+n=\field{virtqueue_pairs} MAY be used.
+
+When multiqueue is enabled, the device MUST use automatic receive steering
+based on packet flow. Programming of the receive steering
+classificator is implicit. After the driver transmitted a packet of a
+flow on transmitqX, the device SHOULD cause incoming packets for that flow to
+be steered to receiveqX. For uni-directional protocols, or where
+no packets have been transmitted yet, the device MAY steer a packet
+to a random queue out of the specified receiveq1\ldots receiveqn.
+
+Multiqueue is disabled by setting \field{virtqueue_pairs} to 1 (this is
+the default) and waiting for the device to use the command buffer.
+
+\drivernormative{\subparagraph}{Automatic receive steering in multiqueue mode}{Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
+
+The driver MUST configure the virtqueues before enabling them with the
+VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command.
+
+The driver MUST NOT request a \field{virtqueue_pairs} of 0 or
+greater than \field{max_virtqueue_pairs} in the device configuration space.
+
+The driver MUST queue packets only on any transmitq1 before the
+VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command.
+
+The driver MUST NOT queue packets on transmit queues greater than
+\field{virtqueue_pairs} once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command in the available ring.
+
+\devicenormative{\subparagraph}{Automatic receive steering in multiqueue mode}{Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
+
+The device MUST queue packets only on any receiveq1 before the
+VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command.
+
+The device MUST NOT queue packets on receive queues greater than
+\field{virtqueue_pairs} once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command in the used ring.
+
+\subparagraph{Legacy Interface: Automatic receive steering in multiqueue mode}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Legacy Interface: Automatic receive steering in multiqueue mode}
+When using the legacy interface, transitional devices and drivers
+MUST format \field{virtqueue_pairs}
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+\paragraph{Offloads State Configuration}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration}
+
+If the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS feature is negotiated, the driver can
+send control commands for dynamic offloads state configuration.
+
+\subparagraph{Setting Offloads State}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
+
+\begin{lstlisting}
+le64 offloads;
+
+#define VIRTIO_NET_F_GUEST_CSUM 1
+#define VIRTIO_NET_F_GUEST_TSO4 7
+#define VIRTIO_NET_F_GUEST_TSO6 8
+#define VIRTIO_NET_F_GUEST_ECN 9
+#define VIRTIO_NET_F_GUEST_UFO 10
+
+#define VIRTIO_NET_CTRL_GUEST_OFFLOADS 5
+ #define VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET 0
+\end{lstlisting}
+
+The class VIRTIO_NET_CTRL_GUEST_OFFLOADS has one command:
+VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET applies the new offloads configuration.
+
+le64 value passed as command data is a bitmask, bits set define
+offloads to be enabled, bits cleared - offloads to be disabled.
+
+There is a corresponding device feature for each offload. Upon feature
+negotiation corresponding offload gets enabled to preserve backward
+compartibility.
+
+\drivernormative{\subparagraph}{Setting Offloads State}{Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
+
+A driver MUST NOT enable an offload for which the appropriate feature
+has not been negotiated.
+
+\subparagraph{Legacy Interface: Setting Offloads State}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State / Legacy Interface: Setting Offloads State}
+When using the legacy interface, transitional devices and drivers
+MUST format \field{offloads}
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+
+\subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
+Types / Network Device / Legacy Interface: Framing Requirements}
+
+When using legacy interfaces, transitional drivers which have not
+negotiated VIRTIO_F_ANY_LAYOUT MUST use a single descriptor for the
+struct virtio_net_hdr on both transmit and receive, with the
+network data in the following descriptors.
+
+Additionally, when using the control virtqueue (see \ref{sec:Device
+Types / Network Device / Device Operation / Control Virtqueue})
+, transitional drivers which have not
+negotiated VIRTIO_F_ANY_LAYOUT MUST:
+\begin{itemize}
+\item for all commands, use a single 2-byte descriptor including the first two
+fields: \field{class} and \field{command}
+\item for all commands except VIRTIO_NET_CTRL_MAC_TABLE_SET
+use a single descriptor including command-specific-data
+with no padding.
+\item for the VIRTIO_NET_CTRL_MAC_TABLE_SET command use exactly
+two descriptors including command-specific-data with no padding:
+the first of these descriptors MUST include the
+virtio_net_ctrl_mac table structure for the unicast addresses with no padding,
+the second of these descriptors MUST include the
+virtio_net_ctrl_mac table structure for the multicast addresses
+with no padding.
+\item for all commands, use a single 1-byte descriptor for the
+\field{ack} field
+\end{itemize}
+
+See \ref{sec:Basic
+Facilities of a Virtio Device / Virtqueues / Message Framing}.
+
+\section{Block Device}\label{sec:Device Types / Block Device}
+
+The virtio block device is a simple virtual block device (ie.
+disk). Read and write requests (and other exotic requests) are
+placed in the queue, and serviced (probably out of order) by the
+device except where noted.
+
+\subsection{Device ID}\label{sec:Device Types / Block Device / Device ID}
+ 2
+
+\subsection{Virtqueues}\label{sec:Device Types / Block Device / Virtqueues}
+\begin{description}
+\item[0] requestq
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / Block Device / Feature bits}
+
+\begin{description}
+\item[VIRTIO_BLK_F_SIZE_MAX (1)] Maximum size of any single segment is
+ in \field{size_max}.
+
+\item[VIRTIO_BLK_F_SEG_MAX (2)] Maximum number of segments in a
+ request is in \field{seg_max}.
+
+\item[VIRTIO_BLK_F_GEOMETRY (4)] Disk-style geometry specified in
+ \field{geometry}.
+
+\item[VIRTIO_BLK_F_RO (5)] Device is read-only.
+
+\item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in \field{blk_size}.
+
+\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support.
+
+\item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O
+ alignment.
+
+\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
+ and writethrough modes.
+\end{description}
+
+\subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits}
+
+\begin{description}
+\item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers.
+
+\item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands.
+\end{description}
+
+\begin{note}
+ In the legacy interface, VIRTIO_BLK_F_FLUSH was also
+ called VIRTIO_BLK_F_WCE.
+\end{note}
+
+\subsection{Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout}
+
+The \field{capacity} of the device (expressed in 512-byte sectors) is always
+present. The availability of the others all depend on various feature
+bits as indicated above.
+
+\begin{lstlisting}
+struct virtio_blk_config {
+ le64 capacity;
+ le32 size_max;
+ le32 seg_max;
+ struct virtio_blk_geometry {
+ le16 cylinders;
+ u8 heads;
+ u8 sectors;
+ } geometry;
+ le32 blk_size;
+ struct virtio_blk_topology {
+ // # of logical blocks per physical block (log2)
+ u8 physical_block_exp;
+ // offset of first aligned logical block
+ u8 alignment_offset;
+ // suggested minimum I/O size in blocks
+ le16 min_io_size;
+ // optimal (suggested maximum) I/O size in blocks
+ le32 opt_io_size;
+ } topology;
+ u8 writeback;
+};
+\end{lstlisting}
+
+
+\subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout / Legacy Interface: Device configuration layout}
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_blk_config
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+
+\subsection{Device Initialization}\label{sec:Device Types / Block Device / Device Initialization}
+
+\begin{enumerate}
+\item The device size can be read from \field{capacity}.
+
+\item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated,
+ \field{blk_size} can be read to determine the optimal sector size
+ for the driver to use. This does not affect the units used in
+ the protocol (always 512 bytes), but awareness of the correct
+ value can affect performance.
+
+\item If the VIRTIO_BLK_F_RO feature is set by the device, any write
+ requests will fail.
+
+\item If the VIRTIO_BLK_F_TOPOLOGY feature is negotiated, the fields in the
+ \field{topology} struct can be read to determine the physical block size and optimal
+ I/O lengths for the driver to use. This also does not affect the units
+ in the protocol, only performance.
+
+\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache
+ mode can be read or set through the \field{writeback} field. 0 corresponds
+ to a writethrough cache, 1 to a writeback cache\footnote{Consistent with
+ \ref{devicenormative:Device Types / Block Device / Device Operation},
+ a writethrough cache can be defined broadly as a cache that commits
+ writes to persistent device backend storage before reporting their
+ completion. For example, a battery-backed writeback cache actually
+ counts as writethrough according to this definition.}. The cache mode
+ after reset can be either writeback or writethrough. The actual
+ mode can be determined by reading \field{writeback} after feature
+ negotiation.
+\end{enumerate}
+
+\drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
+
+Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of
+sending VIRTIO_BLK_T_FLUSH commands.
+
+If neither VIRTIO_BLK_F_CONFIG_WCE nor VIRTIO_BLK_F_FLUSH are
+negotiated, the driver MAY deduce the presence of a writethrough cache.
+If VIRTIO_BLK_F_CONFIG_WCE was not negotiated but VIRTIO_BLK_F_FLUSH was,
+the driver SHOULD assume presence of a writeback cache.
+
+The driver MUST NOT read \field{writeback} before setting
+the FEATURES_OK \field{status} bit.
+
+\devicenormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
+
+Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it
+if they offer VIRTIO_BLK_F_CONFIG_WCE.
+
+If VIRTIO_BLK_F_CONFIG_WCE is negotiated but VIRTIO_BLK_F_FLUSH
+is not, the device MUST initialize \field{writeback} to 0.
+
+\subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization}
+
+Because legacy devices do not have FEATURES_OK, transitional devices
+MUST implement slightly different behavior around feature negotiation
+when used through the legacy interface. In particular, when using the
+legacy interface:
+
+\begin{itemize}
+\item the driver MAY read or write \field{writeback} before setting
+ the DRIVER or DRIVER_OK \field{status} bit
+
+\item the device MUST NOT modify the cache mode (and \field{writeback})
+ as a result of a driver setting a status bit, unless
+ the DRIVER_OK bit is being set and the driver has not set the
+ VIRTIO_BLK_F_CONFIG_WCE driver feature bit.
+
+\item the device MUST NOT modify the cache mode (and \field{writeback})
+ as a result of a driver modifying the driver feature bits, for example
+ if the driver sets the VIRTIO_BLK_F_CONFIG_WCE driver feature bit but
+ does not set the VIRTIO_BLK_F_FLUSH bit.
+\end{itemize}
+
+
+\subsection{Device Operation}\label{sec:Device Types / Block Device / Device Operation}
+
+The driver queues requests to the virtqueue, and they are used by
+the device (not necessarily in order). Each request is of form:
+
+\begin{lstlisting}
+struct virtio_blk_req {
+ le32 type;
+ le32 reserved;
+ le64 sector;
+ u8 data[][512];
+ u8 status;
+};
+\end{lstlisting}
+
+The type of the request is either a read (VIRTIO_BLK_T_IN), a write
+(VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH).
+
+\begin{lstlisting}
+#define VIRTIO_BLK_T_IN 0
+#define VIRTIO_BLK_T_OUT 1
+#define VIRTIO_BLK_T_FLUSH 4
+\end{lstlisting}
+
+The \field{sector} number indicates the offset (multiplied by 512) where
+the read or write is to occur. This field is unused and set to 0
+for scsi packet commands and for flush commands.
+
+The final \field{status} byte is written by the device: either
+VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for device or driver
+error or VIRTIO_BLK_S_UNSUPP for a request unsupported by device:
+
+\begin{lstlisting}
+#define VIRTIO_BLK_S_OK 0
+#define VIRTIO_BLK_S_IOERR 1
+#define VIRTIO_BLK_S_UNSUPP 2
+\end{lstlisting}
+
+\drivernormative{\subsubsection}{Device Operation}{Device Types / Block Device / Device Operation}
+
+A driver MUST NOT submit a request which would cause a read or write
+beyond \field{capacity}.
+
+A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered.
+
+A driver MUST set \field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request.
+A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request.
+
+If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY
+switch to writethrough or writeback mode by writing respectively 0 and
+1 to the \field{writeback} field. After writing a 0 to \field{writeback},
+the driver MUST NOT assume that any volatile writes have been committed
+to persistent device backend storage.
+
+\devicenormative{\subsubsection}{Device Operation}{Device Types / Block Device / Device Operation}
+
+A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR
+for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT
+write any data.
+
+A write is considered volatile when it is submitted; the contents of
+sectors covered by a volatile write are undefined in persistent device
+backend storage until the write becomes stable. A write becomes stable
+once it is completed and one or more of the following conditions is true:
+
+\begin{enumerate}
+\item\label{item:flush1} neither VIRTIO_BLK_F_CONFIG_WCE nor
+ VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was
+ offered by the device;
+
+\item\label{item:flush2} the VIRTIO_BLK_F_CONFIG_WCE feature was negotiated and the
+ \field{writeback} field in configuration space was 0 \textbf{all the time between
+ the submission of the write and its completion};
+
+\item\label{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent \textbf{after the write is
+ completed} and is completed itself.
+\end{enumerate}
+
+If the device is backed by persistent storage, the device MUST ensure that
+stable writes are committed to it, before reporting completion of the write
+(cases~\ref{item:flush1} and~\ref{item:flush2}) or the flush
+(case~\ref{item:flush3}). Failure to do so can cause data loss
+in case of a crash.
+
+If the driver changes \field{writeback} between the submission of the write
+and its completion, the write could be either volatile or stable when
+its completion is reported; in other words, the exact behavior is undefined.
+
+% According to the device requirements for device initialization:
+% Offer(CONFIG_WCE) => Offer(FLUSH).
+%
+% After reversing the implication:
+% not Offer(FLUSH) => not Offer(CONFIG_WCE).
+
+If VIRTIO_BLK_F_FLUSH was not offered by the
+ device\footnote{Note that in this case, according to
+ \ref{devicenormative:Device Types / Block Device / Device Initialization},
+ the device will not have offered VIRTIO_BLK_F_CONFIG_WCE either.}, the
+device MAY also commit writes to persistent device backend storage before
+reporting their completion. Unlike case~\ref{item:flush1}, however, this
+is not an absolute requirement of the specification.
+
+\begin{note}
+ An implementation that does not offer VIRTIO_BLK_F_FLUSH and does not commit
+ completed writes will not be resilient to data loss in case of crashes.
+ Not offering VIRTIO_BLK_F_FLUSH is an absolute requirement
+ for implementations that do not wish to be safe against such data losses.
+\end{note}
+
+\subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_blk_req
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+When using the legacy interface, transitional drivers
+SHOULD ignore the \field{len} value in used ring entries.
+\begin{note}
+Historically, some devices put the total descriptor length,
+or the total length of device-writable buffers there,
+even when only the status byte was actually written.
+\end{note}
+
+The \field{reserved} field was previously called \field{ioprio}. \field{ioprio}
+is a hint about the relative priorities of requests to the device:
+higher numbers indicate more important requests.
+
+\begin{lstlisting}
+#define VIRTIO_BLK_T_FLUSH_OUT 5
+\end{lstlisting}
+
+The command VIRTIO_BLK_T_FLUSH_OUT was a synonym for VIRTIO_BLK_T_FLUSH;
+a driver MUST treat it as a VIRTIO_BLK_T_FLUSH command.
+
+\begin{lstlisting}
+#define VIRTIO_BLK_T_BARRIER 0x80000000
+\end{lstlisting}
+
+If the device has VIRTIO_BLK_F_BARRIER
+feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this
+request acts as a barrier and that all preceding requests SHOULD be
+complete before this one, and all following requests SHOULD NOT be
+started until this is complete.
+
+\begin{note} A barrier does not flush
+caches in the underlying backend device in host, and thus does not
+serve as data consistency guarantee. Only a VIRTIO_BLK_T_FLUSH request
+does that.
+\end{note}
+
+Some older legacy devices did not commit completed writes to persistent
+device backend storage when VIRTIO_BLK_F_FLUSH was offered but not
+negotiated. In order to work around this, the driver MAY set the
+\field{writeback} to 0 (if available) or it MAY send an explicit flush
+request after every completed write.
+
+If the device has VIRTIO_BLK_F_SCSI feature, it can also support
+scsi packet command requests, each of these requests is of form:
+
+\begin{lstlisting}
+/* All fields are in guest's native endian. */
+struct virtio_scsi_pc_req {
+ u32 type;
+ u32 ioprio;
+ u64 sector;
+ u8 cmd[];
+ u8 data[][512];
+#define SCSI_SENSE_BUFFERSIZE 96
+ u8 sense[SCSI_SENSE_BUFFERSIZE];
+ u32 errors;
+ u32 data_len;
+ u32 sense_len;
+ u32 residual;
+ u8 status;
+};
+\end{lstlisting}
+
+A request type can also be a scsi packet command (VIRTIO_BLK_T_SCSI_CMD or
+VIRTIO_BLK_T_SCSI_CMD_OUT). The two types are equivalent, the device
+does not distinguish between them:
+
+\begin{lstlisting}
+#define VIRTIO_BLK_T_SCSI_CMD 2
+#define VIRTIO_BLK_T_SCSI_CMD_OUT 3
+\end{lstlisting}
+
+The \field{cmd} field is only present for scsi packet command requests,
+and indicates the command to perform. This field MUST reside in a
+single, separate device-readable buffer; command length can be derived
+from the length of this buffer.
+
+Note that these first three (four for scsi packet commands)
+fields are always device-readable: \field{data} is either device-readable
+or device-writable, depending on the request. The size of the read or
+write can be derived from the total size of the request buffers.
+
+\field{sense} is only present for scsi packet command requests,
+and indicates the buffer for scsi sense data.
+
+\field{data_len} is only present for scsi packet command
+requests, this field is deprecated, and SHOULD be ignored by the
+driver. Historically, devices copied data length there.
+
+\field{sense_len} is only present for scsi packet command
+requests and indicates the number of bytes actually written to
+the \field{sense} buffer.
+
+\field{residual} field is only present for scsi packet command
+requests and indicates the residual size, calculated as data
+length - number of bytes actually transferred.
+
+\subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
+Types / Block Device / Legacy Interface: Framing Requirements}
+
+When using legacy interfaces, transitional drivers which have not
+negotiated VIRTIO_F_ANY_LAYOUT:
+
+\begin{itemize}
+\item MUST use a single 8-byte descriptor containing \field{type},
+ \field{reserved} and \field{sector}, followed by descriptors
+ for \field{data}, then finally a separate 1-byte descriptor
+ for \field{status}.
+
+\item For SCSI commands there are additional constraints.
+ \field{errors}, \field{data_len}, \field{sense_len} and
+ \field{residual} MUST reside in a single, separate
+ device-writable descriptor, \field{sense} MUST reside in a
+ single separate device-writable descriptor of size 96 bytes,
+ and \field{errors}, \field{data_len}, \field{sense_len} and
+ \field{residual} MUST reside a single separate
+ device-writable descriptor.
+\end{itemize}
+
+See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing}.
+
+\section{Console Device}\label{sec:Device Types / Console Device}
+
+The virtio console device is a simple device for data input and
+output. A device MAY have one or more ports. Each port has a pair
+of input and output virtqueues. Moreover, a device has a pair of
+control IO virtqueues. The control virtqueues are used to
+communicate information between the device and the driver about
+ports being opened and closed on either side of the connection,
+indication from the device about whether a particular port is a
+console port, adding new ports, port hot-plug/unplug, etc., and
+indication from the driver about whether a port or a device was
+successfully added, port open/close, etc. For data IO, one or
+more empty buffers are placed in the receive queue for incoming
+data and outgoing characters are placed in the transmit queue.
+
+\subsection{Device ID}\label{sec:Device Types / Console Device / Device ID}
+
+ 3
+
+\subsection{Virtqueues}\label{sec:Device Types / Console Device / Virtqueues}
+
+\begin{description}
+\item[0] receiveq(port0)
+\item[1] transmitq(port0)
+\item[2] control receiveq
+\item[3] control transmitq
+\item[4] receiveq(port1)
+\item[5] transmitq(port1)
+\item[\ldots]
+\end{description}
+
+The port 0 receive and transmit queues always exist: other queues
+only exist if VIRTIO_CONSOLE_F_MULTIPORT is set.
+
+\subsection{Feature bits}\label{sec:Device Types / Console Device / Feature bits}
+
+\begin{description}
+\item[VIRTIO_CONSOLE_F_SIZE (0)] Configuration \field{cols} and \field{rows}
+ are valid.
+
+\item[VIRTIO_CONSOLE_F_MULTIPORT (1)] Device has support for multiple
+ ports; \field{max_nr_ports} is valid and control virtqueues will be used.
+
+\item[VIRTIO_CONSOLE_F_EMERG_WRITE (2)] Device has support for emergency write.
+ Configuration field emerg_wr is valid.
+\end{description}
+
+\subsection{Device configuration layout}\label{sec:Device Types / Console Device / Device configuration layout}
+
+ The size of the console is supplied
+ in the configuration space if the VIRTIO_CONSOLE_F_SIZE feature
+ is set. Furthermore, if the VIRTIO_CONSOLE_F_MULTIPORT feature
+ is set, the maximum number of ports supported by the device can
+ be fetched.
+
+ If VIRTIO_CONSOLE_F_EMERG_WRITE is set then the driver can use emergency write
+ to output a single character without initializing virtio queues, or even
+ acknowledging the feature.
+
+\begin{lstlisting}
+struct virtio_console_config {
+ le16 cols;
+ le16 rows;
+ le32 max_nr_ports;
+ le32 emerg_wr;
+};
+\end{lstlisting}
+
+\subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Console Device / Device configuration layout / Legacy Interface: Device configuration layout}
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_console_config
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+\subsection{Device Initialization}\label{sec:Device Types / Console Device / Device Initialization}
+
+\begin{enumerate}
+\item If the VIRTIO_CONSOLE_F_EMERG_WRITE feature is offered,
+ \field{emerg_wr} field of the configuration can be written at any time.
+ Thus it works for very early boot debugging output as well as
+ catastophic OS failures (eg. virtio ring corruption).
+
+\item If the VIRTIO_CONSOLE_F_SIZE feature is negotiated, the driver
+ can read the console dimensions from \field{cols} and \field{rows}.
+
+\item If the VIRTIO_CONSOLE_F_MULTIPORT feature is negotiated, the
+ driver can spawn multiple ports, not all of which are necessarily
+ attached to a console. Some could be generic ports. In this
+ case, the control virtqueues are enabled and according to
+ \field{max_nr_ports}, the appropriate number
+ of virtqueues are created. A control message indicating the
+ driver is ready is sent to the device. The device can then send
+ control messages for adding new ports to the device. After
+ creating and initializing each port, a
+ VIRTIO_CONSOLE_PORT_READY control message is sent to the device
+ for that port so the device can let the driver know of any additional
+ configuration options set for that port.
+
+\item The receiveq for each port is populated with one or more
+ receive buffers.
+\end{enumerate}
+
+\devicenormative{\subsubsection}{Device Initialization}{Device Types / Console Device / Device Initialization}
+
+The device MUST allow a write to \field{emerg_wr}, even on an
+unconfigured device.
+
+The device SHOULD transmit the lower byte written to \field{emerg_wr} to
+an appropriate log or output method.
+
+\subsection{Device Operation}\label{sec:Device Types / Console Device / Device Operation}
+
+\begin{enumerate}
+\item For output, a buffer containing the characters is placed in
+ the port's transmitq\footnote{Because this is high importance and low bandwidth, the current
+Linux implementation polls for the buffer to be used, rather than
+waiting for an interrupt, simplifying the implementation
+significantly. However, for generic serial ports with the
+O_NONBLOCK flag set, the polling limitation is relaxed and the
+consumed buffers are freed upon the next write or poll call or
+when a port is closed or hot-unplugged.
+}.
+
+\item When a buffer is used in the receiveq (signalled by an
+ interrupt), the contents is the input to the port associated
+ with the virtqueue for which the notification was received.
+
+\item If the driver negotiated the VIRTIO_CONSOLE_F_SIZE feature, a
+ configuration change interrupt indicates that the updated size can
+ be read from the configuration fields. This size applies to port 0 only.
+
+\item If the driver negotiated the VIRTIO_CONSOLE_F_MULTIPORT
+ feature, active ports are announced by the device using the
+ VIRTIO_CONSOLE_PORT_ADD control message. The same message is
+ used for port hot-plug as well.
+\end{enumerate}
+
+\drivernormative{\subsubsection}{Device Operation}{Device Types / Console Device / Device Operation}
+
+The driver MUST NOT put a device-readable in a receiveq. The driver
+MUST NOT put a device-writable buffer in a transmitq.
+
+\subsubsection{Multiport Device Operation}\label{sec:Device Types / Console Device / Device Operation / Multiport Device Operation}
+
+If the driver negotiated the VIRTIO_CONSOLE_F_MULTIPORT, the two
+control queues are used to manipulate the different console ports: the
+control receiveq for messages from the device to the driver, and the
+control sendq for driver-to-device messages. The layout of the
+control messages is:
+
+\begin{lstlisting}
+struct virtio_console_control {
+ le32 id; /* Port number */
+ le16 event; /* The kind of control event */
+ le16 value; /* Extra information for the event */
+};
+\end{lstlisting}
+
+The values for \field{event} are:
+\begin{description}
+\item [VIRTIO_CONSOLE_DEVICE_READY (0)] Sent by the driver at initialization
+ to indicate that it is ready to receive control messages. A value of
+ 1 indicates success, and 0 indicates failure. The port number \field{id} is unused.
+\item [VIRTIO_CONSOLE_DEVICE_ADD (1)] Sent by the device, to create a new
+ port. \field{value} is unused.
+\item [VIRTIO_CONSOLE_DEVICE_REMOVE (2)] Sent by the device, to remove an
+ existing port. \field{value} is unused.
+\item [VIRTIO_CONSOLE_PORT_READY (3)] Sent by the driver in response
+ to the device's VIRTIO_CONSOLE_PORT_ADD message, to indicate that
+ the port is ready to be used. A \field{value} of 1 indicates success, and 0
+ indicates failure.
+\item [VIRTIO_CONSOLE_CONSOLE_PORT (4)] Sent by the device to nominate
+ a port as a console port. There MAY be more than one console port.
+\item [VIRTIO_CONSOLE_RESIZE (5)] Sent by the device to indicate
+ a console size change. \field{value} is unused. The buffer is followed by the number of columns and rows:
+\begin{lstlisting}
+struct virtio_console_resize {
+ le16 cols;
+ le16 rows;
+};
+\end{lstlisting}
+\item [VIRTIO_CONSOLE_PORT_OPEN (6)] This message is sent by both the
+ device and the driver. \field{value} indicates the state: 0 (port
+ closed) or 1 (port open). This allows for ports to be used directly
+ by guest and host processes to communicate in an application-defined
+ manner.
+\item [VIRTIO_CONSOLE_PORT_NAME (7)] Sent by the device to give a tag
+ to the port. This control command is immediately
+ followed by the UTF-8 name of the port for identification
+ within the guest (without a NUL terminator).
+\end{description}
+
+\devicenormative{\paragraph}{Multiport Device Operation}{Device Types / Console Device / Device Operation / Multiport Device Operation}
+
+The device MUST NOT specify a port which exists in a
+VIRTIO_CONSOLE_DEVICE_ADD message, nor a port which is equal or
+greater than \field{max_nr_ports}.
+
+The device MUST NOT specify a port in VIRTIO_CONSOLE_DEVICE_REMOVE
+which has not been created with a previous VIRTIO_CONSOLE_DEVICE_ADD.
+
+\drivernormative{\paragraph}{Multiport Device Operation}{Device Types / Console Device / Device Operation / Multiport Device Operation}
+
+The driver MUST send a VIRTIO_CONSOLE_DEVICE_READY message if
+VIRTIO_CONSOLE_F_MULTIPORT is negotiated.
+
+Upon receipt of a VIRTIO_CONSOLE_CONSOLE_PORT message, the driver
+SHOULD treat the port in a manner suitable for text console access
+and MUST respond with a VIRTIO_CONSOLE_PORT_OPEN message, which MUST
+have \field{value} set to 1.
+
+\subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Console Device / Device Operation / Legacy Interface: Device Operation}
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_console_control
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+When using the legacy interface, the driver SHOULD ignore the
+\field{len} value in used ring entries for the transmit queues
+and the control transmitq.
+\begin{note}
+Historically, some devices put the total descriptor length there,
+even though no data was actually written.
+\end{note}
+
+\subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
+Types / Console Device / Legacy Interface: Framing Requirements}
+
+When using legacy interfaces, transitional drivers which have not
+negotiated VIRTIO_F_ANY_LAYOUT MUST use only a single
+descriptor for all buffers in the control receiveq and control transmitq.
+
+\section{Entropy Device}\label{sec:Device Types / Entropy Device}
+
+The virtio entropy device supplies high-quality randomness for
+guest use.
+
+\subsection{Device ID}\label{sec:Device Types / Entropy Device / Device ID}
+ 4
+
+\subsection{Virtqueues}\label{sec:Device Types / Entropy Device / Virtqueues}
+\begin{description}
+\item[0] requestq
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / Entropy Device / Feature bits}
+ None currently defined
+
+\subsection{Device configuration layout}\label{sec:Device Types / Entropy Device / Device configuration layout}
+ None currently defined.
+
+\subsection{Device Initialization}\label{sec:Device Types / Entropy Device / Device Initialization}
+
+\begin{enumerate}
+\item The virtqueue is initialized
+\end{enumerate}
+
+\subsection{Device Operation}\label{sec:Device Types / Entropy Device / Device Operation}
+
+When the driver requires random bytes, it places the descriptor
+of one or more buffers in the queue. It will be completely filled
+by random data by the device.
+
+\drivernormative{\subsubsection}{Device Operation}{Device Types / Entropy Device / Device Operation}
+
+The driver MUST NOT place driver-readable buffers into the queue.
+
+The driver MUST examine the length written by the device to determine
+how many random bytes were received.
+
+\devicenormative{\subsubsection}{Device Operation}{Device Types / Entropy Device / Device Operation}
+
+The device MUST place one or more random bytes into the buffer, but it
+MAY use less than the entire buffer length.
+
+\section{Traditional Memory Balloon Device}\label{sec:Device Types / Memory Balloon Device}
+
+This is the traditional balloon device. The device number 13 is
+reserved for a new memory balloon interface, with different
+semantics, which is expected in a future version of the standard.
+
+The traditional virtio memory balloon device is a primitive device for
+managing guest memory: the device asks for a certain amount of
+memory, and the driver supplies it (or withdraws it, if the device
+has more than it asks for). This allows the guest to adapt to
+changes in allowance of underlying physical memory. If the
+feature is negotiated, the device can also be used to communicate
+guest memory statistics to the host.
+
+\subsection{Device ID}\label{sec:Device Types / Memory Balloon Device / Device ID}
+ 5
+
+\subsection{Virtqueues}\label{sec:Device Types / Memory Balloon Device / Virtqueues}
+\begin{description}
+\item[0] inflateq
+\item[1] deflateq
+\item[2] statsq.
+\end{description}
+
+ Virtqueue 2 only exists if VIRTIO_BALLON_F_STATS_VQ set.
+
+\subsection{Feature bits}\label{sec:Device Types / Memory Balloon Device / Feature bits}
+\begin{description}
+\item[VIRTIO_BALLOON_F_MUST_TELL_HOST (0)] Host has to be told before
+ pages from the balloon are used.
+
+\item[VIRTIO_BALLOON_F_STATS_VQ (1)] A virtqueue for reporting guest
+ memory statistics is present.
+\item[VIRTIO_BALLOON_F_DEFLATE_ON_OOM (2) ] Deflate balloon on
+ guest out of memory condition.
+
+\end{description}
+
+\drivernormative{\subsubsection}{Feature bits}{Device Types / Memory Balloon Device / Feature bits}
+The driver SHOULD accept the VIRTIO_BALLOON_F_MUST_TELL_HOST
+feature if offered by the device.
+
+\devicenormative{\subsubsection}{Feature bits}{Device Types / Memory Balloon Device / Feature bits}
+If the device offers the VIRTIO_BALLOON_F_MUST_TELL_HOST feature
+bit, and if the driver did not accept this feature bit, the
+device MAY signal failure by failing to set FEATURES_OK
+\field{device status} bit when the driver writes it.
+\subparagraph{Legacy Interface: Feature bits}\label{sec:Device
+Types / Memory Balloon Device / Feature bits / Legacy Interface:
+Feature bits}
+As the legacy interface does not have a way to gracefully report feature
+negotiation failure, when using the legacy interface,
+transitional devices MUST support guests which do not negotiate
+VIRTIO_BALLOON_F_MUST_TELL_HOST feature, and SHOULD
+allow guest to use memory before notifying host if
+VIRTIO_BALLOON_F_MUST_TELL_HOST is not negotiated.
+
+\subsection{Device configuration layout}\label{sec:Device Types / Memory Balloon Device / Device configuration layout}
+ Both fields of this configuration
+ are always available.
+
+\begin{lstlisting}
+struct virtio_balloon_config {
+ le32 num_pages;
+ le32 actual;
+};
+\end{lstlisting}
+
+\subparagraph{Legacy Interface: Device configuration layout}\label{sec:Device Types / Memory Balloon Device / Device
+configuration layout / Legacy Interface: Device configuration layout}
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_balloon_config
+according to the little-endian format.
+\begin{note}
+This is unlike the usual convention that legacy device fields are guest endian.
+\end{note}
+
+\subsection{Device Initialization}\label{sec:Device Types / Memory Balloon Device / Device Initialization}
+
+The device initialization process is outlined below:
+
+\begin{enumerate}
+\item The inflate and deflate virtqueues are identified.
+
+\item If the VIRTIO_BALLOON_F_STATS_VQ feature bit is negotiated:
+ \begin{enumerate}
+ \item Identify the stats virtqueue.
+ \item Add one empty buffer to the stats virtqueue.
+ \item DRIVER_OK is set: device operation begins.
+ \item Notify the device about the stats virtqueue buffer.
+ \end{enumerate}
+\end{enumerate}
+
+\subsection{Device Operation}\label{sec:Device Types / Memory Balloon Device / Device Operation}
+
+The device is driven either by the receipt of a configuration
+change interrupt, or by changing guest memory needs, such as
+performing memory compaction or responding to out of memory
+conditions.
+
+\begin{enumerate}
+\item \field{num_pages} configuration field is examined. If this is
+ greater than the \field{actual} number of pages, the balloon wants
+ more memory from the guest. If it is less than \field{actual},
+ the balloon doesn't need it all.
+
+\item To supply memory to the balloon (aka. inflate):
+ \begin{enumerate}
+ \item The driver constructs an array of addresses of unused memory
+ pages. These addresses are divided by 4096\footnote{This is historical, and independent of the guest page size.
+} and the descriptor
+ describing the resulting 32-bit array is added to the inflateq.
+ \end{enumerate}
+
+\item To remove memory from the balloon (aka. deflate):
+ \begin{enumerate}
+ \item The driver constructs an array of addresses of memory pages
+ it has previously given to the balloon, as described above.
+ This descriptor is added to the deflateq.
+
+ \item If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiated, the
+ guest informs the device of pages before it uses them.
+
+ \item Otherwise, the guest is allowed to re-use pages previously
+ given to the balloon before the device has acknowledged their
+ withdrawal\footnote{In this case, deflation advice is merely a courtesy.
+}.
+ \end{enumerate}
+
+\item In either case, the device acknowledges inflate and deflate
+requests by using the descriptor.
+\item Once the device has acknowledged the inflation or
+ deflation, the driver updates \field{actual} to reflect the new number of pages in the balloon.
+\end{enumerate}
+
+\drivernormative{\subsubsection}{Device Operation}{Device Types / Memory Balloon Device / Device Operation}
+The driver SHOULD supply pages to the balloon when \field{num_pages} is
+greater than the actual number of pages in the balloon.
+
+The driver MAY use pages from the balloon when \field{num_pages} is
+less than the actual number of pages in the balloon.
+
+The driver MAY supply pages to the balloon when \field{num_pages} is
+greater than or equal to the actual number of pages in the balloon.
+
+If VIRTIO_BALLOON_F_DEFLATE_ON_OOM has not been negotiated, the
+driver MUST NOT use pages from the balloon when \field{num_pages}
+is less than or equal to the actual number of pages in the
+balloon.
+
+If VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been negotiated, the
+driver MAY use pages from the balloon when \field{num_pages}
+is less than or equal to the actual number of pages in the
+balloon if this is required for system stability
+(e.g. if memory is required by applications running within
+ the guest).
+
+The driver MUST use the deflateq to inform the device of pages that it
+wants to use from the balloon.
+
+If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiated, the
+driver MUST NOT use pages from the balloon until
+the device has acknowledged the deflate request.
+
+Otherwise, if the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is not
+negotiated, the driver MAY begin to re-use pages previously
+given to the balloon before the device has acknowledged the
+deflate request.
+
+In any case, the driver MUST NOT use pages from the balloon
+after adding the pages to the balloon, but before the device has
+acknowledged the inflate request.
+
+The driver MUST NOT request deflation of pages in
+the balloon before the device has acknowledged the inflate
+request.
+
+The driver MUST update \field{actual} after changing the number
+of pages in the balloon.
+
+The driver MAY update \field{actual} once after multiple
+inflate and deflate operations.
+
+\devicenormative{\subsubsection}{Device Operation}{Device Types / Memory Balloon Device / Device Operation}
+
+The device MAY modify the contents of a page in the balloon
+after detecting its physical number in an inflate request
+and before acknowledging the inflate request by using the inflateq
+descriptor.
+
+If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiated, the
+device MAY modify the contents of a page in the balloon
+after detecting its physical number in an inflate request
+and before detecting its physical number in a deflate request
+and acknowledging the deflate request.
+
+\paragraph{Legacy Interface: Device Operation}\label{sec:Device
+Types / Memory Balloon Device / Device Operation / Legacy
+Interface: Device Operation}
+When using the legacy interface, the driver SHOULD ignore the \field{len} value in used ring entries.
+\begin{note}
+Historically, some devices put the total descriptor length there,
+even though no data was actually written.
+\end{note}
+When using the legacy interface, the driver MUST write out all
+4 bytes each time it updates the \field{actual} value in the
+configuration space, using a single atomic operation.
+
+When using the legacy interface, the device SHOULD NOT use the
+\field{actual} value written by the driver in the configuration
+space, until the last, most-significant byte of the value has been
+written.
+\begin{note}
+Historically, devices used the \field{actual} value, even though
+when using Virtio Over PCI Bus the device-specific configuration
+space was not guaranteed to be atomic. Using intermediate
+values during update by driver is best avoided, except for
+debugging.
+
+Historically, drivers using Virtio Over PCI Bus wrote the
+\field{actual} value by using multiple single-byte writes in
+order, from the least-significant to the most-significant value.
+\end{note}
+\subsubsection{Memory Statistics}\label{sec:Device Types / Memory Balloon Device / Device Operation / Memory Statistics}
+
+The stats virtqueue is atypical because communication is driven
+by the device (not the driver). The channel becomes active at
+driver initialization time when the driver adds an empty buffer
+and notifies the device. A request for memory statistics proceeds
+as follows:
+
+\begin{enumerate}
+\item The device pushes the buffer onto the used ring and sends an
+ interrupt.
+
+\item The driver pops the used buffer and discards it.
+
+\item The driver collects memory statistics and writes them into a
+ new buffer.
+
+\item The driver adds the buffer to the virtqueue and notifies the
+ device.
+
+\item The device pops the buffer (retaining it to initiate a
+ subsequent request) and consumes the statistics.
+\end{enumerate}
+
+ Within the buffer, statistics are an array of 6-byte entries.
+ Each statistic consists of a 16 bit
+ tag and a 64 bit value. All statistics are optional and the
+ driver chooses which ones to supply. To guarantee backwards
+ compatibility, devices omit unsupported statistics.
+
+\begin{lstlisting}
+struct virtio_balloon_stat {
+#define VIRTIO_BALLOON_S_SWAP_IN 0
+#define VIRTIO_BALLOON_S_SWAP_OUT 1
+#define VIRTIO_BALLOON_S_MAJFLT 2
+#define VIRTIO_BALLOON_S_MINFLT 3
+#define VIRTIO_BALLOON_S_MEMFREE 4
+#define VIRTIO_BALLOON_S_MEMTOT 5
+ le16 tag;
+ le64 val;
+} __attribute__((packed));
+\end{lstlisting}
+
+\drivernormative{\paragraph}{Memory Statistics}{Device Types / Memory Balloon Device / Device Operation / Memory Statistics}
+Normative statements in this section apply if and only if the
+VIRTIO_BALLOON_F_STATS_VQ feature has been negotiated.
+
+The driver MUST make at most one buffer available to the device
+in the statsq, at all times.
+
+After initializing the device, the driver MUST make an output
+buffer available in the statsq.
+
+Upon detecting that device has used a buffer in the statsq, the
+driver MUST make an output buffer available in the statsq.
+
+Before making an output buffer available in the statsq, the
+driver MUST initialize it, including one struct
+virtio_balloon_stat entry for each statistic that it supports.
+
+Driver MUST use an output buffer size which is a multiple of 6
+bytes for all buffers submitted to the statsq.
+
+Driver MAY supply struct virtio_balloon_stat entries in the
+output buffer submitted to the statsq in any order, without
+regard to \field{tag} values.
+
+Driver MAY supply a subset of all statistics in the output buffer
+submitted to the statsq.
+
+Driver MUST supply the same subset of statistics in all buffers
+submitted to the statsq.
+
+\devicenormative{\paragraph}{Memory Statistics}{Device Types / Memory Balloon Device / Device Operation / Memory Statistics}
+Normative statements in this section apply if and only if the
+VIRTIO_BALLOON_F_STATS_VQ feature has been negotiated.
+
+Within an output buffer submitted to the statsq,
+the device MUST ignore entries with \field{tag} values that it does not recognize.
+
+Within an output buffer submitted to the statsq,
+the device MUST accept struct virtio_balloon_stat entries in any
+order without regard to \field{tag} values.
+
+\paragraph{Legacy Interface: Memory Statistics}\label{sec:Device Types / Memory Balloon Device / Device Operation / Memory Statistics / Legacy Interface: Memory Statistics}
+
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_balloon_stat
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+When using the legacy interface,
+the device SHOULD ignore all values in the first buffer in the
+statsq supplied by the driver after device initialization.
+\begin{note}
+Historically, drivers supplied an uninitialized buffer in the
+first buffer.
+\end{note}
+
+\subsubsection{Memory Statistics Tags}\label{sec:Device Types / Memory Balloon Device / Device Operation / Memory Statistics Tags}
+
+\begin{description}
+\item[VIRTIO_BALLOON_S_SWAP_IN (0)] The amount of memory that has been
+ swapped in (in bytes).
+
+\item[VIRTIO_BALLOON_S_SWAP_OUT (1)] The amount of memory that has been
+ swapped out to disk (in bytes).
+
+\item[VIRTIO_BALLOON_S_MAJFLT (2)] The number of major page faults that
+ have occurred.
+
+\item[VIRTIO_BALLOON_S_MINFLT (3)] The number of minor page faults that
+ have occurred.
+
+\item[VIRTIO_BALLOON_S_MEMFREE (4)] The amount of memory not being used
+ for any purpose (in bytes).
+
+\item[VIRTIO_BALLOON_S_MEMTOT (5)] The total amount of memory available
+ (in bytes).
+\end{description}
+
+\section{SCSI Host Device}\label{sec:Device Types / SCSI Host Device}
+
+The virtio SCSI host device groups together one or more virtual
+logical units (such as disks), and allows communicating to them
+using the SCSI protocol. An instance of the device represents a
+SCSI host to which many targets and LUNs are attached.
+
+The virtio SCSI device services two kinds of requests:
+\begin{itemize}
+\item command requests for a logical unit;
+
+\item task management functions related to a logical unit, target or
+ command.
+\end{itemize}
+
+The device is also able to send out notifications about added and
+removed logical units. Together, these capabilities provide a
+SCSI transport protocol that uses virtqueues as the transfer
+medium. In the transport protocol, the virtio driver acts as the
+initiator, while the virtio SCSI host provides one or more
+targets that receive and process the requests.
+
+This section relies on definitions from \hyperref[intro:SAM]{SAM}.
+
+\subsection{Device ID}\label{sec:Device Types / SCSI Host Device / Device ID}
+ 8
+
+\subsection{Virtqueues}\label{sec:Device Types / SCSI Host Device / Virtqueues}
+
+\begin{description}
+\item[0] controlq
+\item[1] eventq
+\item[2\ldots n] request queues
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / SCSI Host Device / Feature bits}
+
+\begin{description}
+\item[VIRTIO_SCSI_F_INOUT (0)] A single request can include both
+ device-readable and device-writable data buffers.
+
+\item[VIRTIO_SCSI_F_HOTPLUG (1)] The host SHOULD enable reporting of
+ hot-plug and hot-unplug events for LUNs and targets on the SCSI bus.
+ The guest SHOULD handle hot-plug and hot-unplug events.
+
+\item[VIRTIO_SCSI_F_CHANGE (2)] The host will report changes to LUN
+ parameters via a VIRTIO_SCSI_T_PARAM_CHANGE event; the guest
+ SHOULD handle them.
+
+\item[VIRTIO_SCSI_F_T10_PI (3)] The extended fields for T10 protection
+ information (DIF/DIX) are included in the SCSI request header.
+\end{description}
+
+\subsection{Device configuration layout}\label{sec:Device Types / SCSI Host Device / Device configuration layout}
+
+ All fields of this configuration are always available.
+
+\begin{lstlisting}
+struct virtio_scsi_config {
+ le32 num_queues;
+ le32 seg_max;
+ le32 max_sectors;
+ le32 cmd_per_lun;
+ le32 event_info_size;
+ le32 sense_size;
+ le32 cdb_size;
+ le16 max_channel;
+ le16 max_target;
+ le32 max_lun;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{num_queues}] is the total number of request virtqueues exposed by
+ the device. The driver MAY use only one request queue,
+ or it can use more to achieve better performance.
+
+\item[\field{seg_max}] is the maximum number of segments that can be in a
+ command. A bidirectional command can include \field{seg_max} input
+ segments and \field{seg_max} output segments.
+
+\item[\field{max_sectors}] is a hint to the driver about the maximum transfer
+ size to use.
+
+\item[\field{cmd_per_lun}] is tells the driver the maximum number of
+ linked commands it can send to one LUN.
+
+\item[\field{event_info_size}] is the maximum size that the device will fill
+ for buffers that the driver places in the eventq. It is
+ written by the device depending on the set of negotiated
+ features.
+
+\item[\field{sense_size}] is the maximum size of the sense data that the
+ device will write. The default value is written by the device
+ and MUST be 96, but the driver can modify it. It is
+ restored to the default when the device is reset.
+
+\item[\field{cdb_size}] is the maximum size of the CDB that the driver will
+ write. The default value is written by the device and MUST
+ be 32, but the driver can likewise modify it. It is
+ restored to the default when the device is reset.
+
+\item[\field{max_channel}, \field{max_target} and \field{max_lun}] can be
+ used by the driver as hints to constrain scanning the logical units
+ on the host to channel/target/logical unit numbers that are less than
+ or equal to the value of the fields. \field{max_channel} SHOULD
+ be zero. \field{max_target} SHOULD be less than or equal to 255.
+ \field{max_lun} SHOULD be less than or equal to 16383.
+\end{description}
+
+\drivernormative{\subsubsection}{Device configuration layout}{Device Types / SCSI Host Device / Device configuration layout}
+
+The driver MUST NOT write to device configuration fields other than
+\field{sense_size} and \field{cdb_size}.
+
+The driver MUST NOT send more than \field{cmd_per_lun} linked commands
+to one LUN, and MUST NOT send more than the virtqueue size number of
+linked commands to one LUN.
+
+\devicenormative{\subsubsection}{Device configuration layout}{Device Types / SCSI Host Device / Device configuration layout}
+
+On reset, the device MUST set \field{sense_size} to 96 and
+\field{cdb_size} to 32.
+
+\subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / SCSI Host Device / Device configuration layout / Legacy Interface: Device configuration layout}
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_scsi_config
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+\devicenormative{\subsection}{Device Initialization}{Device Types / SCSI Host Device / Device Initialization}
+
+On initialization the driver SHOULD first discover the
+device's virtqueues.
+
+If the driver uses the eventq, the driver SHOULD place at least one
+buffer in the eventq.
+
+The driver MAY immediately issue requests\footnote{For example, INQUIRY
+or REPORT LUNS.} or task management functions\footnote{For example, I_T
+RESET.}.
+
+\subsection{Device Operation}\label{sec:Device Types / SCSI Host Device / Device Operation}
+
+Device operation consists of operating request queues, the control
+queue and the event queue.
+
+\paragraph{Legacy Interface: Device Operation}\label{sec:Device
+Types / SCSI Host Device / Device Operation / Legacy
+Interface: Device Operation}
+When using the legacy interface, the driver SHOULD ignore the \field{len} value in used ring entries.
+\begin{note}
+Historically, devices put the total descriptor length,
+or the total length of device-writable buffers there,
+even when only part of the buffers were actually written.
+\end{note}
+
+\subsubsection{Device Operation: Request Queues}\label{sec:Device Types / SCSI Host Device / Device Operation / Device Operation: Request Queues}
+
+The driver queues requests to an arbitrary request queue, and
+they are used by the device on that same queue. It is the
+responsibility of the driver to ensure strict request ordering
+for commands placed on different queues, because they will be
+consumed with no order constraints.
+
+Requests have the following format:
+
+\begin{lstlisting}
+struct virtio_scsi_req_cmd {
+ // Device-readable part
+ u8 lun[8];
+ le64 id;
+ u8 task_attr;
+ u8 prio;
+ u8 crn;
+ u8 cdb[cdb_size];
+ // The next two fields are only present if VIRTIO_SCSI_F_T10_PI
+ // is negotiated.
+ le32 pi_bytesout;
+ le32 pi_bytesin;
+ u8 pi_out[pi_bytesout];
+ u8 dataout[];
+
+ // Device-writable part
+ le32 sense_len;
+ le32 residual;
+ le16 status_qualifier;
+ u8 status;
+ u8 response;
+ u8 sense[sense_size];
+ // The next two fields are only present if VIRTIO_SCSI_F_T10_PI
+ // is negotiated
+ u8 pi_in[pi_bytesin];
+ u8 datain[];
+};
+
+
+/* command-specific response values */
+#define VIRTIO_SCSI_S_OK 0
+#define VIRTIO_SCSI_S_OVERRUN 1
+#define VIRTIO_SCSI_S_ABORTED 2
+#define VIRTIO_SCSI_S_BAD_TARGET 3
+#define VIRTIO_SCSI_S_RESET 4
+#define VIRTIO_SCSI_S_BUSY 5
+#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
+#define VIRTIO_SCSI_S_TARGET_FAILURE 7
+#define VIRTIO_SCSI_S_NEXUS_FAILURE 8
+#define VIRTIO_SCSI_S_FAILURE 9
+
+/* task_attr */
+#define VIRTIO_SCSI_S_SIMPLE 0
+#define VIRTIO_SCSI_S_ORDERED 1
+#define VIRTIO_SCSI_S_HEAD 2
+#define VIRTIO_SCSI_S_ACA 3
+\end{lstlisting}
+
+\field{lun} addresses the REPORT LUNS well-known logical unit, or
+a target and logical unit in the virtio-scsi device's SCSI domain.
+When used to address the REPORT LUNS logical unit, \field{lun} is 0xC1,
+0x01 and six zero bytes. The virtio-scsi device SHOULD implement the
+REPORT LUNS well-known logical unit.
+
+When used to address a target and logical unit, the only supported format
+for \field{lun} is: first byte set to 1, second byte set to target,
+third and fourth byte representing a single level LUN structure, followed
+by four zero bytes. With this representation, a virtio-scsi device can
+serve up to 256 targets and 16384 LUNs per target. The device MAY also
+support having a well-known logical units in the third and fourth byte.
+
+\field{id} is the command identifier (``tag'').
+
+\field{task_attr} defines the task attribute as in the table above, but
+all task attributes MAY be mapped to SIMPLE by the device. Some commands
+are defined by SCSI standards as "implicit head of queue"; for such
+commands, all task attributes MAY also be mapped to HEAD OF QUEUE.
+Drivers and applications SHOULD NOT send a command with the ORDERED
+task attribute if the command has an implicit HEAD OF QUEUE attribute,
+because whether the ORDERED task attribute is honored is vendor-specific.
+
+\field{crn} may also be provided by clients, but is generally expected
+to be 0. The maximum CRN value defined by the protocol is 255, since
+CRN is stored in an 8-bit integer.
+
+The CDB is included in \field{cdb} and its size, \field{cdb_size},
+is taken from the configuration space.
+
+All of these fields are defined in \hyperref[intro:SAM]{SAM} and are
+always device-readable.
+
+\field{pi_bytesout} determines the size of the \field{pi_out} field
+in bytes. If it is nonzero, the \field{pi_out} field contains outgoing
+protection information for write operations. \field{pi_bytesin} determines
+the size of the \field{pi_in} field in the device-writable section, in bytes.
+All three fields are only present if VIRTIO_SCSI_F_T10_PI has been negotiated.
+
+The remainder of the device-readable part is the data output buffer,
+\field{dataout}.
+
+\field{sense} and subsequent fields are always device-writable. \field{sense_len}
+indicates the number of bytes actually written to the sense
+buffer.
+
+\field{residual} indicates the residual size,
+calculated as ``data_length - number_of_transferred_bytes'', for
+read or write operations. For bidirectional commands, the
+number_of_transferred_bytes includes both read and written bytes.
+A \field{residual} that is less than the size of \field{datain} means that
+\field{dataout} was processed entirely. A \field{residual} that
+exceeds the size of \field{datain} means that \field{dataout} was
+processed partially and \field{datain} was not processed at
+all.
+
+If the \field{pi_bytesin} is nonzero, the \field{pi_in} field contains
+incoming protection information for read operations. \field{pi_in} is
+only present if VIRTIO_SCSI_F_T10_PI has been negotiated\footnote{There
+ is no separate residual size for \field{pi_bytesout} and
+ \field{pi_bytesin}. It can be computed from the \field{residual} field,
+ the size of the data integrity information per sector, and the sizes
+ of \field{pi_out}, \field{pi_in}, \field{dataout} and \field{datain}.}.
+
+The remainder of the device-writable part is the data input buffer,
+\field{datain}.
+
+
+\devicenormative{\paragraph}{Device Operation: Request Queues}{Device Types / SCSI Host Device / Device Operation / Device Operation: Request Queues}
+
+The device MUST write the \field{status} byte as the status code as
+defined in \hyperref[intro:SAM]{SAM}.
+
+The device MUST write the \field{response} byte as one of the following:
+
+\begin{description}
+
+\item[VIRTIO_SCSI_S_OK] when the request was completed and the \field{status}
+ byte is filled with a SCSI status code (not necessarily
+ ``GOOD'').
+
+\item[VIRTIO_SCSI_S_OVERRUN] if the content of the CDB (such as the
+ allocation length, parameter length or transfer size) requires
+ more data than is available in the datain and dataout buffers.
+
+\item[VIRTIO_SCSI_S_ABORTED] if the request was cancelled due to an
+ ABORT TASK or ABORT TASK SET task management function.
+
+\item[VIRTIO_SCSI_S_BAD_TARGET] if the request was never processed
+ because the target indicated by \field{lun} does not exist.
+
+\item[VIRTIO_SCSI_S_RESET] if the request was cancelled due to a bus
+ or device reset (including a task management function).
+
+\item[VIRTIO_SCSI_S_TRANSPORT_FAILURE] if the request failed due to a
+ problem in the connection between the host and the target
+ (severed link).
+
+\item[VIRTIO_SCSI_S_TARGET_FAILURE] if the target is suffering a
+ failure and to tell the driver not to retry on other paths.
+
+\item[VIRTIO_SCSI_S_NEXUS_FAILURE] if the nexus is suffering a failure
+ but retrying on other paths might yield a different result.
+
+\item[VIRTIO_SCSI_S_BUSY] if the request failed but retrying on the
+ same path is likely to work.
+
+\item[VIRTIO_SCSI_S_FAILURE] for other host or driver error. In
+ particular, if neither \field{dataout} nor \field{datain} is empty, and the
+ VIRTIO_SCSI_F_INOUT feature has not been negotiated, the
+ request will be immediately returned with a response equal to
+ VIRTIO_SCSI_S_FAILURE.
+\end{description}
+
+All commands must be completed before the virtio-scsi device is
+reset or unplugged. The device MAY choose to abort them, or if
+it does not do so MUST pick the VIRTIO_SCSI_S_FAILURE response.
+
+\drivernormative{\paragraph}{Device Operation: Request Queues}{Device Types / SCSI Host Device / Device Operation / Device Operation: Request Queues}
+
+\field{task_attr}, \field{prio} and \field{crn} SHOULD be zero.
+
+Upon receiving a VIRTIO_SCSI_S_TARGET_FAILURE response, the driver
+SHOULD NOT retry the request on other paths.
+
+\paragraph{Legacy Interface: Device Operation: Request Queues}\label{sec:Device Types / SCSI Host Device / Device Operation / Device Operation: Request Queues / Legacy Interface: Device Operation: Request Queues}
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_scsi_req_cmd
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+\subsubsection{Device Operation: controlq}\label{sec:Device Types / SCSI Host Device / Device Operation / Device Operation: controlq}
+
+The controlq is used for other SCSI transport operations.
+Requests have the following format:
+
+{
+\lstset{escapechar=\$}
+\begin{lstlisting}
+struct virtio_scsi_ctrl {
+ le32 type;
+$\ldots$
+ u8 response;
+};
+
+/* response values valid for all commands */
+#define VIRTIO_SCSI_S_OK 0
+#define VIRTIO_SCSI_S_BAD_TARGET 3
+#define VIRTIO_SCSI_S_BUSY 5
+#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
+#define VIRTIO_SCSI_S_TARGET_FAILURE 7
+#define VIRTIO_SCSI_S_NEXUS_FAILURE 8
+#define VIRTIO_SCSI_S_FAILURE 9
+#define VIRTIO_SCSI_S_INCORRECT_LUN 12
+\end{lstlisting}
+}
+
+The \field{type} identifies the remaining fields.
+
+The following commands are defined:
+
+\begin{itemize}
+\item Task management function.
+\begin{lstlisting}
+#define VIRTIO_SCSI_T_TMF 0
+
+#define VIRTIO_SCSI_T_TMF_ABORT_TASK 0
+#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1
+#define VIRTIO_SCSI_T_TMF_CLEAR_ACA 2
+#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET 3
+#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4
+#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5
+#define VIRTIO_SCSI_T_TMF_QUERY_TASK 6
+#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7
+
+struct virtio_scsi_ctrl_tmf
+{
+ // Device-readable part
+ le32 type;
+ le32 subtype;
+ u8 lun[8];
+ le64 id;
+ // Device-writable part
+ u8 response;
+}
+
+/* command-specific response values */
+#define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0
+#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 10
+#define VIRTIO_SCSI_S_FUNCTION_REJECTED 11
+\end{lstlisting}
+
+ The \field{type} is VIRTIO_SCSI_T_TMF; \field{subtype} defines which
+ task management function. All
+ fields except \field{response} are filled by the driver.
+
+ Other fields which are irrelevant for the requested TMF
+ are ignored but they are still present. \field{lun}
+ is in the same format specified for request queues; the
+ single level LUN is ignored when the task management function
+ addresses a whole I_T nexus. When relevant, the value of \field{id}
+ is matched against the id values passed on the requestq.
+
+ The outcome of the task management function is written by the
+ device in \field{response}. The command-specific response
+ values map 1-to-1 with those defined in \hyperref[intro:SAM]{SAM}.
+
+ Task management function can affect the response value for commands that
+ are in the request queue and have not been completed yet. For example,
+ the device MUST complete all active commands on a logical unit
+ or target (possibly with a VIRTIO_SCSI_S_RESET response code)
+ upon receiving a "logical unit reset" or "I_T nexus reset" TMF.
+ Similarly, the device MUST complete the selected commands (possibly
+ with a VIRTIO_SCSI_S_ABORTED response code) upon receiving an "abort
+ task" or "abort task set" TMF. Such effects MUST take place before
+ the TMF itself is successfully completed, and the device MUST use
+ memory barriers appropriately in order to ensure that the driver sees
+ these writes in the correct order.
+
+\item Asynchronous notification query.
+\begin{lstlisting}
+#define VIRTIO_SCSI_T_AN_QUERY 1
+
+struct virtio_scsi_ctrl_an {
+ // Device-readable part
+ le32 type;
+ u8 lun[8];
+ le32 event_requested;
+ // Device-writable part
+ le32 event_actual;
+ u8 response;
+}
+
+#define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE 2
+#define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT 4
+#define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST 8
+#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 16
+#define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST 32
+#define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY 64
+\end{lstlisting}
+
+ By sending this command, the driver asks the device which
+ events the given LUN can report, as described in paragraphs 6.6
+ and A.6 of \hyperref[intro:SCSI MMC]{SCSI MMC}. The driver writes the
+ events it is interested in into \field{event_requested}; the device
+ responds by writing the events that it supports into
+ \field{event_actual}.
+
+ The \field{type} is VIRTIO_SCSI_T_AN_QUERY. \field{lun} and \field{event_requested}
+ are written by the driver. \field{event_actual} and \field{response}
+ fields are written by the device.
+
+ No command-specific values are defined for the \field{response} byte.
+
+\item Asynchronous notification subscription.
+\begin{lstlisting}
+#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
+
+struct virtio_scsi_ctrl_an {
+ // Device-readable part
+ le32 type;
+ u8 lun[8];
+ le32 event_requested;
+ // Device-writable part
+ le32 event_actual;
+ u8 response;
+}
+\end{lstlisting}
+
+ By sending this command, the driver asks the specified LUN to
+ report events for its physical interface, again as described in
+ \hyperref[intro:SCSI MMC]{SCSI MMC}. The driver writes the events it is
+ interested in into \field{event_requested}; the device responds by
+ writing the events that it supports into \field{event_actual}.
+
+ Event types are the same as for the asynchronous notification
+ query message.
+
+ The \field{type} is VIRTIO_SCSI_T_AN_SUBSCRIBE. \field{lun} and
+ \field{event_requested} are written by the driver.
+ \field{event_actual} and \field{response} are written by the device.
+
+ No command-specific values are defined for the response byte.
+\end{itemize}
+
+\paragraph{Legacy Interface: Device Operation: controlq}\label{sec:Device Types / SCSI Host Device / Device Operation / Device Operation: controlq / Legacy Interface: Device Operation: controlq}
+
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_scsi_ctrl, struct
+virtio_scsi_ctrl_tmf, struct virtio_scsi_ctrl_an and struct
+virtio_scsi_ctrl_an
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+
+\subsubsection{Device Operation: eventq}\label{sec:Device Types / SCSI Host Device / Device Operation / Device Operation: eventq}
+
+The eventq is populated by the driver for the device to report information on logical
+units that are attached to it. In general, the device will not
+queue events to cope with an empty eventq, and will end up
+dropping events if it finds no buffer ready. However, when
+reporting events for many LUNs (e.g. when a whole target
+disappears), the device can throttle events to avoid dropping
+them. For this reason, placing 10-15 buffers on the event queue
+is sufficient.
+
+Buffers returned by the device on the eventq will be referred to
+as ``events'' in the rest of this section. Events have the
+following format:
+
+\begin{lstlisting}
+#define VIRTIO_SCSI_T_EVENTS_MISSED 0x80000000
+
+struct virtio_scsi_event {
+ // Device-writable part
+ le32 event;
+ u8 lun[8];
+ le32 reason;
+}
+\end{lstlisting}
+
+The devices sets bit 31 in \field{event} to report lost events
+due to missing buffers.
+
+The meaning of \field{reason} depends on the
+contents of \field{event}. The following events are defined:
+
+\begin{itemize}
+\item No event.
+\begin{lstlisting}
+#define VIRTIO_SCSI_T_NO_EVENT 0
+\end{lstlisting}
+
+ This event is fired in the following cases:
+
+\begin{itemize}
+\item When the device detects in the eventq a buffer that is
+ shorter than what is indicated in the configuration field, it
+ MAY use it immediately and put this dummy value in \field{event}.
+ A well-written driver will never observe this
+ situation.
+
+\item When events are dropped, the device MAY signal this event as
+ soon as the drivers makes a buffer available, in order to
+ request action from the driver. In this case, of course, this
+ event will be reported with the VIRTIO_SCSI_T_EVENTS_MISSED
+ flag.
+\end{itemize}
+
+\item Transport reset
+\begin{lstlisting}
+#define VIRTIO_SCSI_T_TRANSPORT_RESET 1
+
+#define VIRTIO_SCSI_EVT_RESET_HARD 0
+#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
+#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
+\end{lstlisting}
+
+ By sending this event, the device signals that a logical unit
+ on a target has been reset, including the case of a new device
+ appearing or disappearing on the bus. The device fills in all
+ fields. \field{event} is set to
+ VIRTIO_SCSI_T_TRANSPORT_RESET. \field{lun} addresses a
+ logical unit in the SCSI host.
+
+ The \field{reason} value is one of the three \#define values appearing
+ above:
+
+ \begin{description}
+ \item[VIRTIO_SCSI_EVT_RESET_REMOVED] (``LUN/target removed'') is used
+ if the target or logical unit is no longer able to receive
+ commands.
+
+ \item[VIRTIO_SCSI_EVT_RESET_HARD] (``LUN hard reset'') is used if the
+ logical unit has been reset, but is still present.
+
+ \item[VIRTIO_SCSI_EVT_RESET_RESCAN] (``rescan LUN/target'') is used if
+ a target or logical unit has just appeared on the device.
+ \end{description}
+
+ The ``removed'' and ``rescan'' events can happen when
+ VIRTIO_SCSI_F_HOTPLUG feature was negotiated; when sent for LUN 0,
+ they MAY apply to the entire target so the driver can ask the
+ initiator to rescan the target to detect this.
+
+ Events will also be reported via sense codes (this obviously
+ does not apply to newly appeared buses or targets, since the
+ application has never discovered them):
+
+ \begin{itemize}
+ \item ``LUN/target removed'' maps to sense key ILLEGAL REQUEST, asc
+ 0x25, ascq 0x00 (LOGICAL UNIT NOT SUPPORTED)
+
+ \item ``LUN hard reset'' maps to sense key UNIT ATTENTION, asc 0x29
+ (POWER ON, RESET OR BUS DEVICE RESET OCCURRED)
+
+ \item ``rescan LUN/target'' maps to sense key UNIT ATTENTION, asc
+ 0x3f, ascq 0x0e (REPORTED LUNS DATA HAS CHANGED)
+ \end{itemize}
+
+ The preferred way to detect transport reset is always to use
+ events, because sense codes are only seen by the driver when it
+ sends a SCSI command to the logical unit or target. However, in
+ case events are dropped, the initiator will still be able to
+ synchronize with the actual state of the controller if the
+ driver asks the initiator to rescan of the SCSI bus. During the
+ rescan, the initiator will be able to observe the above sense
+ codes, and it will process them as if it the driver had
+ received the equivalent event.
+
+ \item Asynchronous notification
+\begin{lstlisting}
+#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
+\end{lstlisting}
+
+ By sending this event, the device signals that an asynchronous
+ event was fired from a physical interface.
+
+ All fields are written by the device. \field{event} is set to
+ VIRTIO_SCSI_T_ASYNC_NOTIFY. \field{lun} addresses a logical
+ unit in the SCSI host. \field{reason} is a subset of the
+ events that the driver has subscribed to via the ``Asynchronous
+ notification subscription'' command.
+
+ \item LUN parameter change
+\begin{lstlisting}
+#define VIRTIO_SCSI_T_PARAM_CHANGE 3
+\end{lstlisting}
+
+ By sending this event, the device signals a change in the configuration parameters
+ of a logical unit, for example the capacity or cache mode.
+ \field{event} is set to VIRTIO_SCSI_T_PARAM_CHANGE.
+ \field{lun} addresses a logical unit in the SCSI host.
+
+ The same event SHOULD also be reported as a unit attention condition.
+ \field{reason} contains the additional sense code and additional sense code qualifier,
+ respectively in bits 0\ldots 7 and 8\ldots 15.
+ \begin{note}
+ For example, a change in capacity will be reported as asc 0x2a, ascq 0x09
+ (CAPACITY DATA HAS CHANGED).
+ \end{note}
+
+ For MMC devices (inquiry type 5) there would be some overlap between this
+ event and the asynchronous notification event, so for simplicity the host never
+ reports this event for MMC devices.
+\end{itemize}
+
+\drivernormative{\paragraph}{Device Operation: eventq}{Device Types / SCSI Host Device / Device Operation / Device Operation: eventq}
+
+The driver SHOULD keep the eventq populated with buffers. These
+buffers MUST be device-writable, and SHOULD be at least
+\field{event_info_size} bytes long, and MUST be at least the size of
+struct virtio_scsi_event.
+
+If \field{event} has bit 31 set, the driver SHOULD
+poll the logical units for unit attention conditions, and/or do
+whatever form of bus scan is appropriate for the guest operating
+system and SHOULD poll for asynchronous events manually using SCSI commands.
+
+When receiving a VIRTIO_SCSI_T_TRANSPORT_RESET message with
+\field{reason} set to VIRTIO_SCSI_EVT_RESET_REMOVED or
+VIRTIO_SCSI_EVT_RESET_RESCAN for LUN 0, the driver SHOULD ask the
+initiator to rescan the target, in order to detect the case when an
+entire target has appeared or disappeared.
+
+\devicenormative{\paragraph}{Device Operation: eventq}{Device Types / SCSI Host Device / Device Operation / Device Operation: eventq}
+
+The device MUST set bit 31 in \field{event} if events were lost due to
+missing buffers, and it MAY use a VIRTIO_SCSI_T_NO_EVENT event to report
+this.
+
+The device MUST NOT send VIRTIO_SCSI_T_TRANSPORT_RESET messages
+with \field{reason} set to VIRTIO_SCSI_EVT_RESET_REMOVED or
+VIRTIO_SCSI_EVT_RESET_RESCAN unless VIRTIO_SCSI_F_HOTPLUG was negotiated.
+
+The device MUST NOT report VIRTIO_SCSI_T_PARAM_CHANGE for MMC devices.
+
+\paragraph{Legacy Interface: Device Operation: eventq}\label{sec:Device Types / SCSI Host Device / Device Operation / Device Operation: eventq / Legacy Interface: Device Operation: eventq}
+When using the legacy interface, transitional devices and drivers
+MUST format the fields in struct virtio_scsi_event
+according to the native endian of the guest rather than
+(necessarily when not using the legacy interface) little-endian.
+
+\subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
+Types / SCSI Host Device / Legacy Interface: Framing Requirements}
+
+When using legacy interfaces, transitional drivers which have not
+negotiated VIRTIO_F_ANY_LAYOUT MUST use a single descriptor for the
+\field{lun}, \field{id}, \field{task_attr}, \field{prio},
+\field{crn} and \field{cdb} fields, and MUST only use a single
+descriptor for the \field{sense_len}, \field{residual},
+\field{status_qualifier}, \field{status}, \field{response} and
+\field{sense} fields.
+
+\chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
+
+Currently these device-independent feature bits defined:
+
+\begin{description}
+ \item[VIRTIO_F_RING_INDIRECT_DESC (28)] Negotiating this feature indicates
+ that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
+ flag set, as described in \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}.
+
+ \item[VIRTIO_F_RING_EVENT_IDX(29)] This feature enables the \field{used_event}
+ and the \field{avail_event} fields as described in \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} and \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring}.
+
+ \item[VIRTIO_F_VERSION_1(32)] This indicates compliance with this
+ specification, giving a simple way to detect legacy devices or drivers.
+
+ \item[VIRTIO_F_IOMMU_PLATFORM(33)] This feature indicates that the device is
+ behind an IOMMU that translates bus addresses from the device into physical
+ addresses in memory. If this feature bit is set to 0, then the device emits
+ physical addresses which are not translated further, even though an IOMMU
+ may be present.
+\end{description}
+
+\drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
+
+A driver MUST accept VIRTIO_F_VERSION_1 if it is offered. A driver
+MAY fail to operate further if VIRTIO_F_VERSION_1 is not offered.
+
+A driver SHOULD accept VIRTIO_F_IOMMU_PLATFORM if it is offered, and it MUST
+then either disable the IOMMU or configure the IOMMU to translate bus addresses
+passed to the device into physical addresses in memory. If
+VIRTIO_F_IOMMU_PLATFORM is not offered, then a driver MUST pass only physical
+addresses to the device.
+
+\devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
+
+A device MUST offer VIRTIO_F_VERSION_1. A device MAY fail to operate further
+if VIRTIO_F_VERSION_1 is not accepted.
+
+A device SHOULD offer VIRTIO_F_IOMMU_PLATFORM if it is behind an IOMMU that
+translates bus addresses from the device into physical addresses in memory.
+A device MAY fail to operate further if VIRTIO_F_IOMMU_PLATFORM is not
+accepted.
+
+\section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
+
+Transitional devices MAY offer the following:
+\begin{description}
+\item[VIRTIO_F_NOTIFY_ON_EMPTY (24)] If this feature
+ has been negotiated by driver, the device MUST issue
+ an interrupt if the device runs
+ out of available descriptors on a virtqueue, even though
+ interrupts are suppressed using the VIRTQ_AVAIL_F_NO_INTERRUPT
+ flag or the \field{used_event} field.
+\begin{note}
+ An example of a driver using this feature is the legacy
+ networking driver: it doesn't need to know every time a packet
+ is transmitted, but it does need to free the transmitted
+ packets a finite time after they are transmitted. It can avoid
+ using a timer if the device interrupts it when all the packets
+ are transmitted.
+\end{note}
+\end{description}
+
+Transitional devices MUST offer, and if offered by the device
+transitional drivers MUST accept the following:
+\begin{description}
+\item[VIRTIO_F_ANY_LAYOUT (27)] This feature indicates that the device
+ accepts arbitrary descriptor layouts, as described in Section
+ \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing / Legacy Interface: Message Framing}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing / Legacy Interface: Message Framing}.
+
+\item[UNUSED (30)] Bit 30 is used by qemu's implementation to check
+ for experimental early versions of virtio which did not perform
+ correct feature negotiation, and SHOULD NOT be negotiated.
+\end{description}