summaryrefslogtreecommitdiff
path: root/content.tex
diff options
context:
space:
mode:
Diffstat (limited to 'content.tex')
-rw-r--r--content.tex499
1 files changed, 1 insertions, 498 deletions
diff --git a/content.tex b/content.tex
index 36d54a1..8c7f532 100644
--- a/content.tex
+++ b/content.tex
@@ -244,504 +244,7 @@ a device event - i.e. send an interrupt to the driver.
For queue operation detail, see \ref{sec:Basic Facilities of a Virtio Device / Split Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Split Virtqueues}.
-\section{Split Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Split Virtqueues}
-The split virtqueue format is the original format used by legacy
-virtio devices. The split virtqueue format separates the
-virtqueue into several parts, where each part is write-able by
-either the driver or the device, but not both. Multiple
-locations need to be updated when making a buffer available
-and when marking it as used.
-
-
-Each queue has a 16-bit queue size
-parameter, which sets the number of entries and implies the total size
-of the queue.
-
-Each virtqueue consists of three parts:
-
-\begin{itemize}
-\item Descriptor Table
-\item Available Ring
-\item Used Ring
-\end{itemize}
-
-where each part is physically-contiguous in guest memory,
-and has different alignment requirements.
-
-The memory aligment and size requirements, in bytes, of each part of the
-virtqueue are summarized in the following table:
-
-\begin{tabular}{|l|l|l|}
-\hline
-Virtqueue Part & Alignment & Size \\
-\hline \hline
-Descriptor Table & 16 & $16 * $(Queue Size) \\
-\hline
-Available Ring & 2 & $6 + 2 * $(Queue Size) \\
- \hline
-Used Ring & 4 & $6 + 8 * $(Queue Size) \\
- \hline
-\end{tabular}
-
-The Alignment column gives the minimum alignment for each part
-of the virtqueue.
-
-The Size column gives the total number of bytes for each
-part of the virtqueue.
-
-Queue Size corresponds to the maximum number of buffers in the
-virtqueue\footnote{For example, if Queue Size is 4 then at most 4 buffers
-can be queued at any given time.}. Queue Size value is always a
-power of 2. The maximum Queue Size value is 32768. This value
-is specified in a bus-specific way.
-
-When the driver wants to send a buffer to the device, it fills in
-a slot in the descriptor table (or chains several together), and
-writes the descriptor index into the available ring. It then
-notifies the device. When the device has finished a buffer, it
-writes the descriptor index into the used ring, and sends an interrupt.
-
-\drivernormative{\subsection}{Virtqueues}{Basic Facilities of a Virtio Device / Virtqueues}
-The driver MUST ensure that the physical address of the first byte
-of each virtqueue part is a multiple of the specified alignment value
-in the above table.
-
-\subsection{Legacy Interfaces: A Note on Virtqueue Layout}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout}
-
-For Legacy Interfaces, several additional
-restrictions are placed on the virtqueue layout:
-
-Each virtqueue occupies two or more physically-contiguous pages
-(usually defined as 4096 bytes, but depending on the transport;
-henceforth referred to as Queue Align)
-and consists of three parts:
-
-\begin{tabular}{|l|l|l|}
-\hline
-Descriptor Table & Available Ring (\ldots padding\ldots) & Used Ring \\
-\hline
-\end{tabular}
-
-The bus-specific Queue Size field controls the total number of bytes
-for the virtqueue.
-When using the legacy interface, the transitional
-driver MUST retrieve the Queue Size field from the device
-and MUST allocate the total number of bytes for the virtqueue
-according to the following formula (Queue Align given in qalign and
-Queue Size given in qsz):
-
-\begin{lstlisting}
-#define ALIGN(x) (((x) + qalign) & ~qalign)
-static inline unsigned virtq_size(unsigned int qsz)
-{
- return ALIGN(sizeof(struct virtq_desc)*qsz + sizeof(u16)*(3 + qsz))
- + ALIGN(sizeof(u16)*3 + sizeof(struct virtq_used_elem)*qsz);
-}
-\end{lstlisting}
-
-This wastes some space with padding.
-When using the legacy interface, both transitional
-devices and drivers MUST use the following virtqueue layout
-structure to locate elements of the virtqueue:
-
-\begin{lstlisting}
-struct virtq {
- // The actual descriptors (16 bytes each)
- struct virtq_desc desc[ Queue Size ];
-
- // A ring of available descriptor heads with free-running index.
- struct virtq_avail avail;
-
- // Padding to the next Queue Align boundary.
- u8 pad[ Padding ];
-
- // A ring of used descriptor heads with free-running index.
- struct virtq_used used;
-};
-\end{lstlisting}
-
-\subsection{Legacy Interfaces: A Note on Virtqueue Endianness}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Endianness}
-
-Note that when using the legacy interface, transitional
-devices and drivers MUST use the native
-endian of the guest as the endian of fields and in the virtqueue.
-This is opposed to little-endian for non-legacy interface as
-specified by this standard.
-It is assumed that the host is already aware of the guest endian.
-
-\subsection{Message Framing}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing}
-The framing of messages with descriptors is
-independent of the contents of the buffers. For example, a network
-transmit buffer consists of a 12 byte header followed by the network
-packet. This could be most simply placed in the descriptor table as a
-12 byte output descriptor followed by a 1514 byte output descriptor,
-but it could also consist of a single 1526 byte output descriptor in
-the case where the header and packet are adjacent, or even three or
-more descriptors (possibly with loss of efficiency in that case).
-
-Note that, some device implementations have large-but-reasonable
-restrictions on total descriptor size (such as based on IOV_MAX in the
-host OS). This has not been a problem in practice: little sympathy
-will be given to drivers which create unreasonably-sized descriptors
-such as by dividing a network packet into 1500 single-byte
-descriptors!
-
-\devicenormative{\subsubsection}{Message Framing}{Basic Facilities of a Virtio Device / Message Framing}
-The device MUST NOT make assumptions about the particular arrangement
-of descriptors. The device MAY have a reasonable limit of descriptors
-it will allow in a chain.
-
-\drivernormative{\subsubsection}{Message Framing}{Basic Facilities of a Virtio Device / Message Framing}
-The driver MUST place any device-writable descriptor elements after
-any device-readable descriptor elements.
-
-The driver SHOULD NOT use an excessive number of descriptors to
-describe a buffer.
-
-\subsubsection{Legacy Interface: Message Framing}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing / Legacy Interface: Message Framing}
-
-Regrettably, initial driver implementations used simple layouts, and
-devices came to rely on it, despite this specification wording. In
-addition, the specification for virtio_blk SCSI commands required
-intuiting field lengths from frame boundaries (see
- \ref{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}~\nameref{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation})
-
-Thus when using the legacy interface, the VIRTIO_F_ANY_LAYOUT
-feature indicates to both the device and the driver that no
-assumptions were made about framing. Requirements for
-transitional drivers when this is not negotiated are included in
-each device section.
-
-\subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}
-
-The descriptor table refers to the buffers the driver is using for
-the device. \field{addr} is a physical address, and the buffers
-can be chained via \field{next}. Each descriptor describes a
-buffer which is read-only for the device (``device-readable'') or write-only for the device (``device-writable''), but a chain of
-descriptors can contain both device-readable and device-writable buffers.
-
-The actual contents of the memory offered to the device depends on the
-device type. Most common is to begin the data with a header
-(containing little-endian fields) for the device to read, and postfix
-it with a status tailer for the device to write.
-
-\begin{lstlisting}
-struct virtq_desc {
- /* Address (guest-physical). */
- le64 addr;
- /* Length. */
- le32 len;
-
-/* This marks a buffer as continuing via the next field. */
-#define VIRTQ_DESC_F_NEXT 1
-/* This marks a buffer as device write-only (otherwise device read-only). */
-#define VIRTQ_DESC_F_WRITE 2
-/* This means the buffer contains a list of buffer descriptors. */
-#define VIRTQ_DESC_F_INDIRECT 4
- /* The flags as indicated above. */
- le16 flags;
- /* Next field if flags & NEXT */
- le16 next;
-};
-\end{lstlisting}
-
-The number of descriptors in the table is defined by the queue size
-for this virtqueue: this is the maximum possible descriptor chain length.
-
-\begin{note}
-The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
-referred to this structure as vring_desc, and the constants as
-VRING_DESC_F_NEXT, etc, but the layout and values were identical.
-\end{note}
-
-\devicenormative{\subsubsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}
-A device MUST NOT write to a device-readable buffer, and a device SHOULD NOT
-read a device-writable buffer (it MAY do so for debugging or diagnostic
-purposes).
-
-\drivernormative{\subsubsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}
-Drivers MUST NOT add a descriptor chain over than $2^{32}$ bytes long in total;
-this implies that loops in the descriptor chain are forbidden!
-
-\subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
-
-Some devices benefit by concurrently dispatching a large number
-of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
-ring capacity the driver can store a table of indirect
-descriptors anywhere in memory, and insert a descriptor in main
-virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer
-containing this indirect descriptor table; \field{addr} and \field{len}
-refer to the indirect table address and length in bytes,
-respectively.
-
-The indirect table layout structure looks like this
-(\field{len} is the length of the descriptor that refers to this table,
-which is a variable, so this code won't compile):
-
-\begin{lstlisting}
-struct indirect_descriptor_table {
- /* The actual descriptors (16 bytes each) */
- struct virtq_desc desc[len / 16];
-};
-\end{lstlisting}
-
-The first indirect descriptor is located at start of the indirect
-descriptor table (index 0), additional indirect descriptors are
-chained by \field{next}. An indirect descriptor without a valid \field{next}
-(with \field{flags}\&VIRTQ_DESC_F_NEXT off) signals the end of the descriptor.
-A single indirect descriptor
-table can include both device-readable and device-writable descriptors.
-
-\drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
-The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
-VIRTIO_F_INDIRECT_DESC feature was negotiated. The driver MUST NOT
-set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only
-one table per descriptor).
-
-A driver MUST NOT create a descriptor chain longer than the Queue Size of
-the device.
-
-A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
-in \field{flags}.
-
-\devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
-The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table.
-
-The device MUST handle the case of zero or more normal chained
-descriptors followed by a single descriptor with \field{flags}\&VIRTQ_DESC_F_INDIRECT.
-
-\begin{note}
-While unusual (most implementations either create a chain solely using
-non-indirect descriptors, or use a single indirect element), such a
-layout is valid.
-\end{note}
-
-\subsection{The Virtqueue Available Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}
-
-\begin{lstlisting}
-struct virtq_avail {
-#define VIRTQ_AVAIL_F_NO_INTERRUPT 1
- le16 flags;
- le16 idx;
- le16 ring[ /* Queue Size */ ];
- le16 used_event; /* Only if VIRTIO_F_EVENT_IDX */
-};
-\end{lstlisting}
-
-The driver uses the available ring to offer buffers to the
-device: each ring entry refers to the head of a descriptor chain. It is only
-written by the driver and read by the device.
-
-\field{idx} field indicates where the driver would put the next descriptor
-entry in the ring (modulo the queue size). This starts at 0, and increases.
-
-\begin{note}
-The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
-referred to this structure as vring_avail, and the constant as
-VRING_AVAIL_F_NO_INTERRUPT, but the layout and value were identical.
-\end{note}
-
-\subsection{Virtqueue Interrupt Suppression}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}
-
-If the VIRTIO_F_EVENT_IDX feature bit is not negotiated,
-the \field{flags} field in the available ring offers a crude mechanism for the driver to inform
-the device that it doesn't want interrupts when buffers are used. Otherwise
-\field{used_event} is a more performant alternative where the driver
-specifies how far the device can progress before interrupting.
-
-Neither of these interrupt suppression methods are reliable, as they
-are not synchronized with the device, but they serve as
-useful optimizations.
-
-\drivernormative{\subsubsection}{Virtqueue Interrupt Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}
-If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
-\begin{itemize}
-\item The driver MUST set \field{flags} to 0 or 1.
-\item The driver MAY set \field{flags} to 1 to advise
-the device that interrupts are not needed.
-\end{itemize}
-
-Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
-\begin{itemize}
-\item The driver MUST set \field{flags} to 0.
-\item The driver MAY use \field{used_event} to advise the device that interrupts are unnecessary until the device writes entry with an index specified by \field{used_event} into the used ring (equivalently, until \field{idx} in the
-used ring will reach the value \field{used_event} + 1).
-\end{itemize}
-
-The driver MUST handle spurious interrupts from the device.
-
-\devicenormative{\subsubsection}{Virtqueue Interrupt Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}
-
-If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
-\begin{itemize}
-\item The device MUST ignore the \field{used_event} value.
-\item After the device writes a descriptor index into the used ring:
- \begin{itemize}
- \item If \field{flags} is 1, the device SHOULD NOT send an interrupt.
- \item If \field{flags} is 0, the device MUST send an interrupt.
- \end{itemize}
-\end{itemize}
-
-Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
-\begin{itemize}
-\item The device MUST ignore the lower bit of \field{flags}.
-\item After the device writes a descriptor index into the used ring:
- \begin{itemize}
- \item If the \field{idx} field in the used ring (which determined
- where that descriptor index was placed) was equal to
- \field{used_event}, the device MUST send an interrupt.
- \item Otherwise the device SHOULD NOT send an interrupt.
- \end{itemize}
-\end{itemize}
-
-\begin{note}
-For example, if \field{used_event} is 0, then a device using
- VIRTIO_F_EVENT_IDX would interrupt after the first buffer is
- used (and again after the 65536th buffer, etc).
-\end{note}
-
-\subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring}
-
-\begin{lstlisting}
-struct virtq_used {
-#define VIRTQ_USED_F_NO_NOTIFY 1
- le16 flags;
- le16 idx;
- struct virtq_used_elem ring[ /* Queue Size */];
- le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
-};
-
-/* le32 is used here for ids for padding reasons. */
-struct virtq_used_elem {
- /* Index of start of used descriptor chain. */
- le32 id;
- /* Total length of the descriptor chain which was used (written to) */
- le32 len;
-};
-\end{lstlisting}
-
-The used ring is where the device returns buffers once it is done with
-them: it is only written to by the device, and read by the driver.
-
-Each entry in the ring is a pair: \field{id} indicates the head entry of the
-descriptor chain describing the buffer (this matches an entry
-placed in the available ring by the guest earlier), and \field{len} the total
-of bytes written into the buffer.
-
-\begin{note}
-\field{len} is particularly useful
-for drivers using untrusted buffers: if a driver does not know exactly
-how much has been written by the device, the driver would have to zero
-the buffer in advance to ensure no data leakage occurs.
-
-For example, a network driver may hand a received buffer directly to
-an unprivileged userspace application. If the network device has not
-overwritten the bytes which were in that buffer, this could leak the
-contents of freed memory from other processes to the application.
-\end{note}
-
-\field{idx} field indicates where the driver would put the next descriptor
-entry in the ring (modulo the queue size). This starts at 0, and increases.
-
-\begin{note}
-The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
-referred to these structures as vring_used and vring_used_elem, and
-the constant as VRING_USED_F_NO_NOTIFY, but the layout and value were
-identical.
-\end{note}
-
-\subsubsection{Legacy Interface: The Virtqueue Used
-Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues
-/ The Virtqueue Used Ring/ Legacy Interface: The Virtqueue Used
-Ring}
-
-Historically, many drivers ignored the \field{len} value, as a
-result, many devices set \field{len} incorrectly. Thus, when
-using the legacy interface, it is generally a good idea to ignore
-the \field{len} value in used ring entries if possible. Specific
-known issues are listed per device type.
-
-\devicenormative{\subsubsection}{The Virtqueue Used Ring}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring}
-
-The device MUST set \field{len} prior to updating the used \field{idx}.
-
-The device MUST write at least \field{len} bytes to descriptor,
-beginning at the first device-writable buffer,
-prior to updating the used \field{idx}.
-
-The device MAY write more than \field{len} bytes to descriptor.
-
-\begin{note}
-There are potential error cases where a device might not know what
-parts of the buffers have been written. This is why \field{len} is
-permitted to be an underestimate: that's preferable to the driver believing
-that uninitialized memory has been overwritten when it has not.
-\end{note}
-
-\drivernormative{\subsubsection}{The Virtqueue Used Ring}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring}
-
-The driver MUST NOT make assumptions about data in device-writable buffers
-beyond the first \field{len} bytes, and SHOULD ignore this data.
-
-\subsection{Virtqueue Notification Suppression}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression}
-
-The device can suppress notifications in a manner analogous to the way
-drivers can suppress interrupts as detailed in section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}.
-The device manipulates \field{flags} or \field{avail_event} in the used ring the
-same way the driver manipulates \field{flags} or \field{used_event} in the available ring.
-
-\drivernormative{\subsubsection}{Virtqueue Notification Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression}
-
-The driver MUST initialize \field{flags} in the used ring to 0 when
-allocating the used ring.
-
-If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
-\begin{itemize}
-\item The driver MUST ignore the \field{avail_event} value.
-\item After the driver writes a descriptor index into the available ring:
- \begin{itemize}
- \item If \field{flags} is 1, the driver SHOULD NOT send a notification.
- \item If \field{flags} is 0, the driver MUST send a notification.
- \end{itemize}
-\end{itemize}
-
-Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
-\begin{itemize}
-\item The driver MUST ignore the lower bit of \field{flags}.
-\item After the driver writes a descriptor index into the available ring:
- \begin{itemize}
- \item If the \field{idx} field in the available ring (which determined
- where that descriptor index was placed) was equal to
- \field{avail_event}, the driver MUST send a notification.
- \item Otherwise the driver SHOULD NOT send a notification.
- \end{itemize}
-\end{itemize}
-
-\devicenormative{\subsubsection}{Virtqueue Notification Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression}
-If the VIRTIO_F_EVENT_IDX feature bit is not negotiated:
-\begin{itemize}
-\item The device MUST set \field{flags} to 0 or 1.
-\item The device MAY set \field{flags} to 1 to advise
-the driver that notifications are not needed.
-\end{itemize}
-
-Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated:
-\begin{itemize}
-\item The device MUST set \field{flags} to 0.
-\item The device MAY use \field{avail_event} to advise the driver that notifications are unnecessary until the driver writes entry with an index specified by \field{avail_event} into the available ring (equivalently, until \field{idx} in the
-available ring will reach the value \field{avail_event} + 1).
-\end{itemize}
-
-The device MUST handle spurious notifications from the driver.
-
-\subsection{Helpers for Operating Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Helpers for Operating Virtqueues}
-
-The Linux Kernel Source code contains the definitions above and
-helper routines in a more usable form, in
-include/uapi/linux/virtio_ring.h. This was explicitly licensed by IBM
-and Red Hat under the (3-clause) BSD license so that it can be
-freely used by all other projects, and is reproduced (with slight
-variation) in \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}.
+\input{split-ring.tex}
\chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}