diff options
Diffstat (limited to 'content.tex')
-rw-r--r-- | content.tex | 499 |
1 files changed, 1 insertions, 498 deletions
diff --git a/content.tex b/content.tex index 36d54a1..8c7f532 100644 --- a/content.tex +++ b/content.tex @@ -244,504 +244,7 @@ a device event - i.e. send an interrupt to the driver. For queue operation detail, see \ref{sec:Basic Facilities of a Virtio Device / Split Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Split Virtqueues}. -\section{Split Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Split Virtqueues} -The split virtqueue format is the original format used by legacy -virtio devices. The split virtqueue format separates the -virtqueue into several parts, where each part is write-able by -either the driver or the device, but not both. Multiple -locations need to be updated when making a buffer available -and when marking it as used. - - -Each queue has a 16-bit queue size -parameter, which sets the number of entries and implies the total size -of the queue. - -Each virtqueue consists of three parts: - -\begin{itemize} -\item Descriptor Table -\item Available Ring -\item Used Ring -\end{itemize} - -where each part is physically-contiguous in guest memory, -and has different alignment requirements. - -The memory aligment and size requirements, in bytes, of each part of the -virtqueue are summarized in the following table: - -\begin{tabular}{|l|l|l|} -\hline -Virtqueue Part & Alignment & Size \\ -\hline \hline -Descriptor Table & 16 & $16 * $(Queue Size) \\ -\hline -Available Ring & 2 & $6 + 2 * $(Queue Size) \\ - \hline -Used Ring & 4 & $6 + 8 * $(Queue Size) \\ - \hline -\end{tabular} - -The Alignment column gives the minimum alignment for each part -of the virtqueue. - -The Size column gives the total number of bytes for each -part of the virtqueue. - -Queue Size corresponds to the maximum number of buffers in the -virtqueue\footnote{For example, if Queue Size is 4 then at most 4 buffers -can be queued at any given time.}. Queue Size value is always a -power of 2. The maximum Queue Size value is 32768. This value -is specified in a bus-specific way. - -When the driver wants to send a buffer to the device, it fills in -a slot in the descriptor table (or chains several together), and -writes the descriptor index into the available ring. It then -notifies the device. When the device has finished a buffer, it -writes the descriptor index into the used ring, and sends an interrupt. - -\drivernormative{\subsection}{Virtqueues}{Basic Facilities of a Virtio Device / Virtqueues} -The driver MUST ensure that the physical address of the first byte -of each virtqueue part is a multiple of the specified alignment value -in the above table. - -\subsection{Legacy Interfaces: A Note on Virtqueue Layout}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout} - -For Legacy Interfaces, several additional -restrictions are placed on the virtqueue layout: - -Each virtqueue occupies two or more physically-contiguous pages -(usually defined as 4096 bytes, but depending on the transport; -henceforth referred to as Queue Align) -and consists of three parts: - -\begin{tabular}{|l|l|l|} -\hline -Descriptor Table & Available Ring (\ldots padding\ldots) & Used Ring \\ -\hline -\end{tabular} - -The bus-specific Queue Size field controls the total number of bytes -for the virtqueue. -When using the legacy interface, the transitional -driver MUST retrieve the Queue Size field from the device -and MUST allocate the total number of bytes for the virtqueue -according to the following formula (Queue Align given in qalign and -Queue Size given in qsz): - -\begin{lstlisting} -#define ALIGN(x) (((x) + qalign) & ~qalign) -static inline unsigned virtq_size(unsigned int qsz) -{ - return ALIGN(sizeof(struct virtq_desc)*qsz + sizeof(u16)*(3 + qsz)) - + ALIGN(sizeof(u16)*3 + sizeof(struct virtq_used_elem)*qsz); -} -\end{lstlisting} - -This wastes some space with padding. -When using the legacy interface, both transitional -devices and drivers MUST use the following virtqueue layout -structure to locate elements of the virtqueue: - -\begin{lstlisting} -struct virtq { - // The actual descriptors (16 bytes each) - struct virtq_desc desc[ Queue Size ]; - - // A ring of available descriptor heads with free-running index. - struct virtq_avail avail; - - // Padding to the next Queue Align boundary. - u8 pad[ Padding ]; - - // A ring of used descriptor heads with free-running index. - struct virtq_used used; -}; -\end{lstlisting} - -\subsection{Legacy Interfaces: A Note on Virtqueue Endianness}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Endianness} - -Note that when using the legacy interface, transitional -devices and drivers MUST use the native -endian of the guest as the endian of fields and in the virtqueue. -This is opposed to little-endian for non-legacy interface as -specified by this standard. -It is assumed that the host is already aware of the guest endian. - -\subsection{Message Framing}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing} -The framing of messages with descriptors is -independent of the contents of the buffers. For example, a network -transmit buffer consists of a 12 byte header followed by the network -packet. This could be most simply placed in the descriptor table as a -12 byte output descriptor followed by a 1514 byte output descriptor, -but it could also consist of a single 1526 byte output descriptor in -the case where the header and packet are adjacent, or even three or -more descriptors (possibly with loss of efficiency in that case). - -Note that, some device implementations have large-but-reasonable -restrictions on total descriptor size (such as based on IOV_MAX in the -host OS). This has not been a problem in practice: little sympathy -will be given to drivers which create unreasonably-sized descriptors -such as by dividing a network packet into 1500 single-byte -descriptors! - -\devicenormative{\subsubsection}{Message Framing}{Basic Facilities of a Virtio Device / Message Framing} -The device MUST NOT make assumptions about the particular arrangement -of descriptors. The device MAY have a reasonable limit of descriptors -it will allow in a chain. - -\drivernormative{\subsubsection}{Message Framing}{Basic Facilities of a Virtio Device / Message Framing} -The driver MUST place any device-writable descriptor elements after -any device-readable descriptor elements. - -The driver SHOULD NOT use an excessive number of descriptors to -describe a buffer. - -\subsubsection{Legacy Interface: Message Framing}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing / Legacy Interface: Message Framing} - -Regrettably, initial driver implementations used simple layouts, and -devices came to rely on it, despite this specification wording. In -addition, the specification for virtio_blk SCSI commands required -intuiting field lengths from frame boundaries (see - \ref{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}~\nameref{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}) - -Thus when using the legacy interface, the VIRTIO_F_ANY_LAYOUT -feature indicates to both the device and the driver that no -assumptions were made about framing. Requirements for -transitional drivers when this is not negotiated are included in -each device section. - -\subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table} - -The descriptor table refers to the buffers the driver is using for -the device. \field{addr} is a physical address, and the buffers -can be chained via \field{next}. Each descriptor describes a -buffer which is read-only for the device (``device-readable'') or write-only for the device (``device-writable''), but a chain of -descriptors can contain both device-readable and device-writable buffers. - -The actual contents of the memory offered to the device depends on the -device type. Most common is to begin the data with a header -(containing little-endian fields) for the device to read, and postfix -it with a status tailer for the device to write. - -\begin{lstlisting} -struct virtq_desc { - /* Address (guest-physical). */ - le64 addr; - /* Length. */ - le32 len; - -/* This marks a buffer as continuing via the next field. */ -#define VIRTQ_DESC_F_NEXT 1 -/* This marks a buffer as device write-only (otherwise device read-only). */ -#define VIRTQ_DESC_F_WRITE 2 -/* This means the buffer contains a list of buffer descriptors. */ -#define VIRTQ_DESC_F_INDIRECT 4 - /* The flags as indicated above. */ - le16 flags; - /* Next field if flags & NEXT */ - le16 next; -}; -\end{lstlisting} - -The number of descriptors in the table is defined by the queue size -for this virtqueue: this is the maximum possible descriptor chain length. - -\begin{note} -The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} -referred to this structure as vring_desc, and the constants as -VRING_DESC_F_NEXT, etc, but the layout and values were identical. -\end{note} - -\devicenormative{\subsubsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table} -A device MUST NOT write to a device-readable buffer, and a device SHOULD NOT -read a device-writable buffer (it MAY do so for debugging or diagnostic -purposes). - -\drivernormative{\subsubsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table} -Drivers MUST NOT add a descriptor chain over than $2^{32}$ bytes long in total; -this implies that loops in the descriptor chain are forbidden! - -\subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors} - -Some devices benefit by concurrently dispatching a large number -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase -ring capacity the driver can store a table of indirect -descriptors anywhere in memory, and insert a descriptor in main -virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer -containing this indirect descriptor table; \field{addr} and \field{len} -refer to the indirect table address and length in bytes, -respectively. - -The indirect table layout structure looks like this -(\field{len} is the length of the descriptor that refers to this table, -which is a variable, so this code won't compile): - -\begin{lstlisting} -struct indirect_descriptor_table { - /* The actual descriptors (16 bytes each) */ - struct virtq_desc desc[len / 16]; -}; -\end{lstlisting} - -The first indirect descriptor is located at start of the indirect -descriptor table (index 0), additional indirect descriptors are -chained by \field{next}. An indirect descriptor without a valid \field{next} -(with \field{flags}\&VIRTQ_DESC_F_NEXT off) signals the end of the descriptor. -A single indirect descriptor -table can include both device-readable and device-writable descriptors. - -\drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors} -The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the -VIRTIO_F_INDIRECT_DESC feature was negotiated. The driver MUST NOT -set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only -one table per descriptor). - -A driver MUST NOT create a descriptor chain longer than the Queue Size of -the device. - -A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT -in \field{flags}. - -\devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors} -The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table. - -The device MUST handle the case of zero or more normal chained -descriptors followed by a single descriptor with \field{flags}\&VIRTQ_DESC_F_INDIRECT. - -\begin{note} -While unusual (most implementations either create a chain solely using -non-indirect descriptors, or use a single indirect element), such a -layout is valid. -\end{note} - -\subsection{The Virtqueue Available Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring} - -\begin{lstlisting} -struct virtq_avail { -#define VIRTQ_AVAIL_F_NO_INTERRUPT 1 - le16 flags; - le16 idx; - le16 ring[ /* Queue Size */ ]; - le16 used_event; /* Only if VIRTIO_F_EVENT_IDX */ -}; -\end{lstlisting} - -The driver uses the available ring to offer buffers to the -device: each ring entry refers to the head of a descriptor chain. It is only -written by the driver and read by the device. - -\field{idx} field indicates where the driver would put the next descriptor -entry in the ring (modulo the queue size). This starts at 0, and increases. - -\begin{note} -The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} -referred to this structure as vring_avail, and the constant as -VRING_AVAIL_F_NO_INTERRUPT, but the layout and value were identical. -\end{note} - -\subsection{Virtqueue Interrupt Suppression}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} - -If the VIRTIO_F_EVENT_IDX feature bit is not negotiated, -the \field{flags} field in the available ring offers a crude mechanism for the driver to inform -the device that it doesn't want interrupts when buffers are used. Otherwise -\field{used_event} is a more performant alternative where the driver -specifies how far the device can progress before interrupting. - -Neither of these interrupt suppression methods are reliable, as they -are not synchronized with the device, but they serve as -useful optimizations. - -\drivernormative{\subsubsection}{Virtqueue Interrupt Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} -If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: -\begin{itemize} -\item The driver MUST set \field{flags} to 0 or 1. -\item The driver MAY set \field{flags} to 1 to advise -the device that interrupts are not needed. -\end{itemize} - -Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: -\begin{itemize} -\item The driver MUST set \field{flags} to 0. -\item The driver MAY use \field{used_event} to advise the device that interrupts are unnecessary until the device writes entry with an index specified by \field{used_event} into the used ring (equivalently, until \field{idx} in the -used ring will reach the value \field{used_event} + 1). -\end{itemize} - -The driver MUST handle spurious interrupts from the device. - -\devicenormative{\subsubsection}{Virtqueue Interrupt Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} - -If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: -\begin{itemize} -\item The device MUST ignore the \field{used_event} value. -\item After the device writes a descriptor index into the used ring: - \begin{itemize} - \item If \field{flags} is 1, the device SHOULD NOT send an interrupt. - \item If \field{flags} is 0, the device MUST send an interrupt. - \end{itemize} -\end{itemize} - -Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: -\begin{itemize} -\item The device MUST ignore the lower bit of \field{flags}. -\item After the device writes a descriptor index into the used ring: - \begin{itemize} - \item If the \field{idx} field in the used ring (which determined - where that descriptor index was placed) was equal to - \field{used_event}, the device MUST send an interrupt. - \item Otherwise the device SHOULD NOT send an interrupt. - \end{itemize} -\end{itemize} - -\begin{note} -For example, if \field{used_event} is 0, then a device using - VIRTIO_F_EVENT_IDX would interrupt after the first buffer is - used (and again after the 65536th buffer, etc). -\end{note} - -\subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} - -\begin{lstlisting} -struct virtq_used { -#define VIRTQ_USED_F_NO_NOTIFY 1 - le16 flags; - le16 idx; - struct virtq_used_elem ring[ /* Queue Size */]; - le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */ -}; - -/* le32 is used here for ids for padding reasons. */ -struct virtq_used_elem { - /* Index of start of used descriptor chain. */ - le32 id; - /* Total length of the descriptor chain which was used (written to) */ - le32 len; -}; -\end{lstlisting} - -The used ring is where the device returns buffers once it is done with -them: it is only written to by the device, and read by the driver. - -Each entry in the ring is a pair: \field{id} indicates the head entry of the -descriptor chain describing the buffer (this matches an entry -placed in the available ring by the guest earlier), and \field{len} the total -of bytes written into the buffer. - -\begin{note} -\field{len} is particularly useful -for drivers using untrusted buffers: if a driver does not know exactly -how much has been written by the device, the driver would have to zero -the buffer in advance to ensure no data leakage occurs. - -For example, a network driver may hand a received buffer directly to -an unprivileged userspace application. If the network device has not -overwritten the bytes which were in that buffer, this could leak the -contents of freed memory from other processes to the application. -\end{note} - -\field{idx} field indicates where the driver would put the next descriptor -entry in the ring (modulo the queue size). This starts at 0, and increases. - -\begin{note} -The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} -referred to these structures as vring_used and vring_used_elem, and -the constant as VRING_USED_F_NO_NOTIFY, but the layout and value were -identical. -\end{note} - -\subsubsection{Legacy Interface: The Virtqueue Used -Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues -/ The Virtqueue Used Ring/ Legacy Interface: The Virtqueue Used -Ring} - -Historically, many drivers ignored the \field{len} value, as a -result, many devices set \field{len} incorrectly. Thus, when -using the legacy interface, it is generally a good idea to ignore -the \field{len} value in used ring entries if possible. Specific -known issues are listed per device type. - -\devicenormative{\subsubsection}{The Virtqueue Used Ring}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} - -The device MUST set \field{len} prior to updating the used \field{idx}. - -The device MUST write at least \field{len} bytes to descriptor, -beginning at the first device-writable buffer, -prior to updating the used \field{idx}. - -The device MAY write more than \field{len} bytes to descriptor. - -\begin{note} -There are potential error cases where a device might not know what -parts of the buffers have been written. This is why \field{len} is -permitted to be an underestimate: that's preferable to the driver believing -that uninitialized memory has been overwritten when it has not. -\end{note} - -\drivernormative{\subsubsection}{The Virtqueue Used Ring}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} - -The driver MUST NOT make assumptions about data in device-writable buffers -beyond the first \field{len} bytes, and SHOULD ignore this data. - -\subsection{Virtqueue Notification Suppression}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} - -The device can suppress notifications in a manner analogous to the way -drivers can suppress interrupts as detailed in section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}. -The device manipulates \field{flags} or \field{avail_event} in the used ring the -same way the driver manipulates \field{flags} or \field{used_event} in the available ring. - -\drivernormative{\subsubsection}{Virtqueue Notification Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} - -The driver MUST initialize \field{flags} in the used ring to 0 when -allocating the used ring. - -If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: -\begin{itemize} -\item The driver MUST ignore the \field{avail_event} value. -\item After the driver writes a descriptor index into the available ring: - \begin{itemize} - \item If \field{flags} is 1, the driver SHOULD NOT send a notification. - \item If \field{flags} is 0, the driver MUST send a notification. - \end{itemize} -\end{itemize} - -Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: -\begin{itemize} -\item The driver MUST ignore the lower bit of \field{flags}. -\item After the driver writes a descriptor index into the available ring: - \begin{itemize} - \item If the \field{idx} field in the available ring (which determined - where that descriptor index was placed) was equal to - \field{avail_event}, the driver MUST send a notification. - \item Otherwise the driver SHOULD NOT send a notification. - \end{itemize} -\end{itemize} - -\devicenormative{\subsubsection}{Virtqueue Notification Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} -If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: -\begin{itemize} -\item The device MUST set \field{flags} to 0 or 1. -\item The device MAY set \field{flags} to 1 to advise -the driver that notifications are not needed. -\end{itemize} - -Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: -\begin{itemize} -\item The device MUST set \field{flags} to 0. -\item The device MAY use \field{avail_event} to advise the driver that notifications are unnecessary until the driver writes entry with an index specified by \field{avail_event} into the available ring (equivalently, until \field{idx} in the -available ring will reach the value \field{avail_event} + 1). -\end{itemize} - -The device MUST handle spurious notifications from the driver. - -\subsection{Helpers for Operating Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Helpers for Operating Virtqueues} - -The Linux Kernel Source code contains the definitions above and -helper routines in a more usable form, in -include/uapi/linux/virtio_ring.h. This was explicitly licensed by IBM -and Red Hat under the (3-clause) BSD license so that it can be -freely used by all other projects, and is reproduced (with slight -variation) in \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}. +\input{split-ring.tex} \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation} |