\section{Split Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Split Virtqueues} The split virtqueue format was the only format supported by the version 1.0 (and earlier) of this standard. The split virtqueue format separates the virtqueue into several parts, where each part is write-able by either the driver or the device, but not both. Multiple parts and/or locations within a part need to be updated when making a buffer available and when marking it as used. Each queue has a 16-bit queue size parameter, which sets the number of entries and implies the total size of the queue. Each virtqueue consists of three parts: \begin{itemize} \item Descriptor Table - occupies the Descriptor Area \item Available Ring - occupies the Driver Area \item Used Ring - occupies the Device Area \end{itemize} where each part is physically-contiguous in guest memory, and has different alignment requirements. The memory alignment and size requirements, in bytes, of each part of the virtqueue are summarized in the following table: \begin{tabular}{|l|l|l|} \hline Virtqueue Part & Alignment & Size \\ \hline \hline Descriptor Table & 16 & $16 * $(Queue Size) \\ \hline Available Ring & 2 & $6 + 2 * $(Queue Size) \\ \hline Used Ring & 4 & $6 + 8 * $(Queue Size) \\ \hline \end{tabular} The Alignment column gives the minimum alignment for each part of the virtqueue. The Size column gives the total number of bytes for each part of the virtqueue. Queue Size corresponds to the maximum number of buffers in the virtqueue\footnote{For example, if Queue Size is 4 then at most 4 buffers can be queued at any given time.}. Queue Size value is always a power of 2. The maximum Queue Size value is 32768. This value is specified in a bus-specific way. When the driver wants to send a buffer to the device, it fills in a slot in the descriptor table (or chains several together), and writes the descriptor index into the available ring. It then notifies the device. When the device has finished a buffer, it writes the descriptor index into the used ring, and sends an interrupt. \drivernormative{\subsection}{Virtqueues}{Basic Facilities of a Virtio Device / Virtqueues} The driver MUST ensure that the physical address of the first byte of each virtqueue part is a multiple of the specified alignment value in the above table. \subsection{Legacy Interfaces: A Note on Virtqueue Layout}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Layout} For Legacy Interfaces, several additional restrictions are placed on the virtqueue layout: Each virtqueue occupies two or more physically-contiguous pages (usually defined as 4096 bytes, but depending on the transport; henceforth referred to as Queue Align) and consists of three parts: \begin{tabular}{|l|l|l|} \hline Descriptor Table & Available Ring (\ldots padding\ldots) & Used Ring \\ \hline \end{tabular} The bus-specific Queue Size field controls the total number of bytes for the virtqueue. When using the legacy interface, the transitional driver MUST retrieve the Queue Size field from the device and MUST allocate the total number of bytes for the virtqueue according to the following formula (Queue Align given in qalign and Queue Size given in qsz): \begin{lstlisting} #define ALIGN(x) (((x) + qalign) & ~qalign) static inline unsigned virtq_size(unsigned int qsz) { return ALIGN(sizeof(struct virtq_desc)*qsz + sizeof(u16)*(3 + qsz)) + ALIGN(sizeof(u16)*3 + sizeof(struct virtq_used_elem)*qsz); } \end{lstlisting} This wastes some space with padding. When using the legacy interface, both transitional devices and drivers MUST use the following virtqueue layout structure to locate elements of the virtqueue: \begin{lstlisting} struct virtq { // The actual descriptors (16 bytes each) struct virtq_desc desc[ Queue Size ]; // A ring of available descriptor heads with free-running index. struct virtq_avail avail; // Padding to the next Queue Align boundary. u8 pad[ Padding ]; // A ring of used descriptor heads with free-running index. struct virtq_used used; }; \end{lstlisting} \subsection{Legacy Interfaces: A Note on Virtqueue Endianness}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on Virtqueue Endianness} Note that when using the legacy interface, transitional devices and drivers MUST use the native endian of the guest as the endian of fields and in the virtqueue. This is opposed to little-endian for non-legacy interface as specified by this standard. It is assumed that the host is already aware of the guest endian. \subsection{Message Framing}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing} The framing of messages with descriptors is independent of the contents of the buffers. For example, a network transmit buffer consists of a 12 byte header followed by the network packet. This could be most simply placed in the descriptor table as a 12 byte output descriptor followed by a 1514 byte output descriptor, but it could also consist of a single 1526 byte output descriptor in the case where the header and packet are adjacent, or even three or more descriptors (possibly with loss of efficiency in that case). Note that, some device implementations have large-but-reasonable restrictions on total descriptor size (such as based on IOV_MAX in the host OS). This has not been a problem in practice: little sympathy will be given to drivers which create unreasonably-sized descriptors such as by dividing a network packet into 1500 single-byte descriptors! \devicenormative{\subsubsection}{Message Framing}{Basic Facilities of a Virtio Device / Message Framing} The device MUST NOT make assumptions about the particular arrangement of descriptors. The device MAY have a reasonable limit of descriptors it will allow in a chain. \drivernormative{\subsubsection}{Message Framing}{Basic Facilities of a Virtio Device / Message Framing} The driver MUST place any device-writable descriptor elements after any device-readable descriptor elements. The driver SHOULD NOT use an excessive number of descriptors to describe a buffer. \subsubsection{Legacy Interface: Message Framing}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Framing / Legacy Interface: Message Framing} Regrettably, initial driver implementations used simple layouts, and devices came to rely on it, despite this specification wording. In addition, the specification for virtio_blk SCSI commands required intuiting field lengths from frame boundaries (see \ref{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}~\nameref{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}) Thus when using the legacy interface, the VIRTIO_F_ANY_LAYOUT feature indicates to both the device and the driver that no assumptions were made about framing. Requirements for transitional drivers when this is not negotiated are included in each device section. \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table} The descriptor table refers to the buffers the driver is using for the device. \field{addr} is a physical address, and the buffers can be chained via \field{next}. Each descriptor describes a buffer which is read-only for the device (``device-readable'') or write-only for the device (``device-writable''), but a chain of descriptors can contain both device-readable and device-writable buffers. The actual contents of the memory offered to the device depends on the device type. Most common is to begin the data with a header (containing little-endian fields) for the device to read, and postfix it with a status tailer for the device to write. \begin{lstlisting} struct virtq_desc { /* Address (guest-physical). */ le64 addr; /* Length. */ le32 len; /* This marks a buffer as continuing via the next field. */ #define VIRTQ_DESC_F_NEXT 1 /* This marks a buffer as device write-only (otherwise device read-only). */ #define VIRTQ_DESC_F_WRITE 2 /* This means the buffer contains a list of buffer descriptors. */ #define VIRTQ_DESC_F_INDIRECT 4 /* The flags as indicated above. */ le16 flags; /* Next field if flags & NEXT */ le16 next; }; \end{lstlisting} The number of descriptors in the table is defined by the queue size for this virtqueue: this is the maximum possible descriptor chain length. If VIRTIO_F_IN_ORDER has been negotiated, driver uses descriptors in ring order: starting from offset 0 in the table, and wrapping around at the end of the table. \begin{note} The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} referred to this structure as vring_desc, and the constants as VRING_DESC_F_NEXT, etc, but the layout and values were identical. \end{note} \devicenormative{\subsubsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table} A device MUST NOT write to a device-readable buffer, and a device SHOULD NOT read a device-writable buffer (it MAY do so for debugging or diagnostic purposes). \drivernormative{\subsubsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table} Drivers MUST NOT add a descriptor chain over than $2^{32}$ bytes long in total; this implies that loops in the descriptor chain are forbidden! If VIRTIO_F_IN_ORDER has been negotiated, and when making a descriptor with VRING_DESC_F_NEXT set in \field{flags} at offset $x$ in the table available to the device, driver MUST set \field{next} to $0$ for the last descriptor in the table (where $x = queue\_size - 1$) and to $x + 1$ for the rest of the descriptors. \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors} Some devices benefit by concurrently dispatching a large number of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase ring capacity the driver can store a table of indirect descriptors anywhere in memory, and insert a descriptor in main virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer containing this indirect descriptor table; \field{addr} and \field{len} refer to the indirect table address and length in bytes, respectively. The indirect table layout structure looks like this (\field{len} is the length of the descriptor that refers to this table, which is a variable, so this code won't compile): \begin{lstlisting} struct indirect_descriptor_table { /* The actual descriptors (16 bytes each) */ struct virtq_desc desc[len / 16]; }; \end{lstlisting} The first indirect descriptor is located at start of the indirect descriptor table (index 0), additional indirect descriptors are chained by \field{next}. An indirect descriptor without a valid \field{next} (with \field{flags}\&VIRTQ_DESC_F_NEXT off) signals the end of the descriptor. A single indirect descriptor table can include both device-readable and device-writable descriptors. If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors use sequential indices, in-order: index 0 followed by index 1 followed by index 2, etc. \drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors} The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the VIRTIO_F_INDIRECT_DESC feature was negotiated. The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only one table per descriptor). A driver MUST NOT create a descriptor chain longer than the Queue Size of the device. A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT in \field{flags}. If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors MUST appear sequentially, with \field{next} taking the value of 1 for the 1st descriptor, 2 for the 2nd one, etc. \devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors} The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table. The device MUST handle the case of zero or more normal chained descriptors followed by a single descriptor with \field{flags}\&VIRTQ_DESC_F_INDIRECT. \begin{note} While unusual (most implementations either create a chain solely using non-indirect descriptors, or use a single indirect element), such a layout is valid. \end{note} \subsection{The Virtqueue Available Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring} \begin{lstlisting} struct virtq_avail { #define VIRTQ_AVAIL_F_NO_INTERRUPT 1 le16 flags; le16 idx; le16 ring[ /* Queue Size */ ]; le16 used_event; /* Only if VIRTIO_F_EVENT_IDX */ }; \end{lstlisting} The driver uses the available ring to offer buffers to the device: each ring entry refers to the head of a descriptor chain. It is only written by the driver and read by the device. \field{idx} field indicates where the driver would put the next descriptor entry in the ring (modulo the queue size). This starts at 0, and increases. \begin{note} The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} referred to this structure as vring_avail, and the constant as VRING_AVAIL_F_NO_INTERRUPT, but the layout and value were identical. \end{note} \drivernormative{\subsubsection}{The Virtqueue Available Ring}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring} A driver MUST NOT decrement the available \field{idx} on a virtqueue (ie. there is no way to ``unexpose'' buffers). \subsection{Virtqueue Interrupt Suppression}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} If the VIRTIO_F_EVENT_IDX feature bit is not negotiated, the \field{flags} field in the available ring offers a crude mechanism for the driver to inform the device that it doesn't want interrupts when buffers are used. Otherwise \field{used_event} is a more performant alternative where the driver specifies how far the device can progress before interrupting. Neither of these interrupt suppression methods are reliable, as they are not synchronized with the device, but they serve as useful optimizations. \drivernormative{\subsubsection}{Virtqueue Interrupt Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: \begin{itemize} \item The driver MUST set \field{flags} to 0 or 1. \item The driver MAY set \field{flags} to 1 to advise the device that interrupts are not needed. \end{itemize} Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: \begin{itemize} \item The driver MUST set \field{flags} to 0. \item The driver MAY use \field{used_event} to advise the device that interrupts are unnecessary until the device writes entry with an index specified by \field{used_event} into the used ring (equivalently, until \field{idx} in the used ring will reach the value \field{used_event} + 1). \end{itemize} The driver MUST handle spurious interrupts from the device. \devicenormative{\subsubsection}{Virtqueue Interrupt Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: \begin{itemize} \item The device MUST ignore the \field{used_event} value. \item After the device writes a descriptor index into the used ring: \begin{itemize} \item If \field{flags} is 1, the device SHOULD NOT send an interrupt. \item If \field{flags} is 0, the device MUST send an interrupt. \end{itemize} \end{itemize} Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: \begin{itemize} \item The device MUST ignore the lower bit of \field{flags}. \item After the device writes a descriptor index into the used ring: \begin{itemize} \item If the \field{idx} field in the used ring (which determined where that descriptor index was placed) was equal to \field{used_event}, the device MUST send an interrupt. \item Otherwise the device SHOULD NOT send an interrupt. \end{itemize} \end{itemize} \begin{note} For example, if \field{used_event} is 0, then a device using VIRTIO_F_EVENT_IDX would interrupt after the first buffer is used (and again after the 65536th buffer, etc). \end{note} \subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} \begin{lstlisting} struct virtq_used { #define VIRTQ_USED_F_NO_NOTIFY 1 le16 flags; le16 idx; struct virtq_used_elem ring[ /* Queue Size */]; le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */ }; /* le32 is used here for ids for padding reasons. */ struct virtq_used_elem { /* Index of start of used descriptor chain. */ le32 id; /* Total length of the descriptor chain which was used (written to) */ le32 len; }; \end{lstlisting} The used ring is where the device returns buffers once it is done with them: it is only written to by the device, and read by the driver. Each entry in the ring is a pair: \field{id} indicates the head entry of the descriptor chain describing the buffer (this matches an entry placed in the available ring by the guest earlier), and \field{len} the total of bytes written into the buffer. \begin{note} \field{len} is particularly useful for drivers using untrusted buffers: if a driver does not know exactly how much has been written by the device, the driver would have to zero the buffer in advance to ensure no data leakage occurs. For example, a network driver may hand a received buffer directly to an unprivileged userspace application. If the network device has not overwritten the bytes which were in that buffer, this could leak the contents of freed memory from other processes to the application. \end{note} \field{idx} field indicates where the device would put the next descriptor entry in the ring (modulo the queue size). This starts at 0, and increases. \begin{note} The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]} referred to these structures as vring_used and vring_used_elem, and the constant as VRING_USED_F_NO_NOTIFY, but the layout and value were identical. \end{note} \subsubsection{Legacy Interface: The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring/ Legacy Interface: The Virtqueue Used Ring} Historically, many drivers ignored the \field{len} value, as a result, many devices set \field{len} incorrectly. Thus, when using the legacy interface, it is generally a good idea to ignore the \field{len} value in used ring entries if possible. Specific known issues are listed per device type. \devicenormative{\subsubsection}{The Virtqueue Used Ring}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} The device MUST set \field{len} prior to updating the used \field{idx}. The device MUST write at least \field{len} bytes to descriptor, beginning at the first device-writable buffer, prior to updating the used \field{idx}. The device MAY write more than \field{len} bytes to descriptor. \begin{note} There are potential error cases where a device might not know what parts of the buffers have been written. This is why \field{len} is permitted to be an underestimate: that's preferable to the driver believing that uninitialized memory has been overwritten when it has not. \end{note} \drivernormative{\subsubsection}{The Virtqueue Used Ring}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} The driver MUST NOT make assumptions about data in device-writable buffers beyond the first \field{len} bytes, and SHOULD ignore this data. \subsection{Virtqueue Notification Suppression}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} The device can suppress notifications in a manner analogous to the way drivers can suppress interrupts as detailed in section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}. The device manipulates \field{flags} or \field{avail_event} in the used ring the same way the driver manipulates \field{flags} or \field{used_event} in the available ring. \drivernormative{\subsubsection}{Virtqueue Notification Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} The driver MUST initialize \field{flags} in the used ring to 0 when allocating the used ring. If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: \begin{itemize} \item The driver MUST ignore the \field{avail_event} value. \item After the driver writes a descriptor index into the available ring: \begin{itemize} \item If \field{flags} is 1, the driver SHOULD NOT send a notification. \item If \field{flags} is 0, the driver MUST send a notification. \end{itemize} \end{itemize} Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: \begin{itemize} \item The driver MUST ignore the lower bit of \field{flags}. \item After the driver writes a descriptor index into the available ring: \begin{itemize} \item If the \field{idx} field in the available ring (which determined where that descriptor index was placed) was equal to \field{avail_event}, the driver MUST send a notification. \item Otherwise the driver SHOULD NOT send a notification. \end{itemize} \end{itemize} \devicenormative{\subsubsection}{Virtqueue Notification Suppression}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression} If the VIRTIO_F_EVENT_IDX feature bit is not negotiated: \begin{itemize} \item The device MUST set \field{flags} to 0 or 1. \item The device MAY set \field{flags} to 1 to advise the driver that notifications are not needed. \end{itemize} Otherwise, if the VIRTIO_F_EVENT_IDX feature bit is negotiated: \begin{itemize} \item The device MUST set \field{flags} to 0. \item The device MAY use \field{avail_event} to advise the driver that notifications are unnecessary until the driver writes entry with an index specified by \field{avail_event} into the available ring (equivalently, until \field{idx} in the available ring will reach the value \field{avail_event} + 1). \end{itemize} The device MUST handle spurious notifications from the driver. \subsection{Helpers for Operating Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Helpers for Operating Virtqueues} The Linux Kernel Source code contains the definitions above and helper routines in a more usable form, in include/uapi/linux/virtio_ring.h. This was explicitly licensed by IBM and Red Hat under the (3-clause) BSD license so that it can be freely used by all other projects, and is reproduced (with slight variation) in \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}. \subsection{Virtqueue Operation}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Operation} There are two parts to virtqueue operation: supplying new available buffers to the device, and processing used buffers from the device. \begin{note} As an example, the simplest virtio network device has two virtqueues: the transmit virtqueue and the receive virtqueue. The driver adds outgoing (device-readable) packets to the transmit virtqueue, and then frees them after they are used. Similarly, incoming (device-writable) buffers are added to the receive virtqueue, and processed after they are used. \end{note} What follows is the requirements of each of these two parts when using the split virtqueue format in more detail. \subsection{Supplying Buffers to The Device}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device} The driver offers buffers to one of the device's virtqueues as follows: \begin{enumerate} \item\label{itm:Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Place Buffers} The driver places the buffer into free descriptor(s) in the descriptor table, chaining as necessary (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}). \item\label{itm:Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Place Index} The driver places the index of the head of the descriptor chain into the next ring entry of the available ring. \item Steps \ref{itm:Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Place Buffers} and \ref{itm:Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Place Index} MAY be performed repeatedly if batching is possible. \item The driver performs suitable a memory barrier to ensure the device sees the updated descriptor table and available ring before the next step. \item The available \field{idx} is increased by the number of descriptor chain heads added to the available ring. \item The driver performs a suitable memory barrier to ensure that it updates the \field{idx} field before checking for notification suppression. \item If notifications are not suppressed, the driver notifies the device of the new available buffers. \end{enumerate} Note that the above code does not take precautions against the available ring buffer wrapping around: this is not possible since the ring buffer is the same size as the descriptor table, so step (1) will prevent such a condition. In addition, the maximum queue size is 32768 (the highest power of 2 which fits in 16 bits), so the 16-bit \field{idx} value can always distinguish between a full and empty buffer. What follows is the requirements of each stage in more detail. \subsubsection{Placing Buffers Into The Descriptor Table}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Placing Buffers Into The Descriptor Table} A buffer consists of zero or more device-readable physically-contiguous elements followed by zero or more physically-contiguous device-writable elements (each has at least one element). This algorithm maps it into the descriptor table to form a descriptor chain: for each buffer element, b: \begin{enumerate} \item Get the next free descriptor table entry, d \item Set \field{d.addr} to the physical address of the start of b \item Set \field{d.len} to the length of b. \item If b is device-writable, set \field{d.flags} to VIRTQ_DESC_F_WRITE, otherwise 0. \item If there is a buffer element after this: \begin{enumerate} \item Set \field{d.next} to the index of the next free descriptor element. \item Set the VIRTQ_DESC_F_NEXT bit in \field{d.flags}. \end{enumerate} \end{enumerate} In practice, \field{d.next} is usually used to chain free descriptors, and a separate count kept to check there are enough free descriptors before beginning the mappings. \subsubsection{Updating The Available Ring}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Updating The Available Ring} The descriptor chain head is the first d in the algorithm above, ie. the index of the descriptor table entry referring to the first part of the buffer. A naive driver implementation MAY do the following (with the appropriate conversion to-and-from little-endian assumed): \begin{lstlisting} avail->ring[avail->idx % qsz] = head; \end{lstlisting} However, in general the driver MAY add many descriptor chains before it updates \field{idx} (at which point they become visible to the device), so it is common to keep a counter of how many the driver has added: \begin{lstlisting} avail->ring[(avail->idx + added++) % qsz] = head; \end{lstlisting} \subsubsection{Updating \field{idx}}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Updating idx} \field{idx} always increments, and wraps naturally at 65536: \begin{lstlisting} avail->idx += added; \end{lstlisting} Once available \field{idx} is updated by the driver, this exposes the descriptor and its contents. The device MAY access the descriptor chains the driver created and the memory they refer to immediately. \drivernormative{\paragraph}{Updating idx}{Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Updating idx} The driver MUST perform a suitable memory barrier before the \field{idx} update, to ensure the device sees the most up-to-date copy. \subsubsection{Notifying The Device}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Notifying The Device} The actual method of device notification is bus-specific, but generally it can be expensive. So the device MAY suppress such notifications if it doesn't need them, as detailed in section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Notification Suppression}. The driver has to be careful to expose the new \field{idx} value before checking if notifications are suppressed. \drivernormative{\paragraph}{Notifying The Device}{Basic Facilities of a Virtio Device / Virtqueues / Supplying Buffers to The Device / Notifying The Device} The driver MUST perform a suitable memory barrier before reading \field{flags} or \field{avail_event}, to avoid missing a notification. \subsection{Receiving Used Buffers From The Device}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / Receiving Used Buffers From The Device} Once the device has used buffers referred to by a descriptor (read from or written to them, or parts of both, depending on the nature of the virtqueue and the device), it interrupts the driver as detailed in section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}. \begin{note} For optimal performance, a driver MAY disable interrupts while processing the used ring, but beware the problem of missing interrupts between emptying the ring and reenabling interrupts. This is usually handled by re-checking for more used buffers after interrups are re-enabled: \begin{lstlisting} virtq_disable_interrupts(vq); for (;;) { if (vq->last_seen_used != le16_to_cpu(virtq->used.idx)) { virtq_enable_interrupts(vq); mb(); if (vq->last_seen_used != le16_to_cpu(virtq->used.idx)) break; virtq_disable_interrupts(vq); } struct virtq_used_elem *e = virtq.used->ring[vq->last_seen_used%vsz]; process_buffer(e); vq->last_seen_used++; } \end{lstlisting} \end{note}