summaryrefslogtreecommitdiff
path: root/content.tex
diff options
context:
space:
mode:
Diffstat (limited to 'content.tex')
-rw-r--r--content.tex434
1 files changed, 221 insertions, 213 deletions
diff --git a/content.tex b/content.tex
index 37850c8..9713817 100644
--- a/content.tex
+++ b/content.tex
@@ -809,35 +809,138 @@ All drivers MUST match devices with any Revision ID, this
is to allow devices to be versioned without breaking drivers.
\subsubsection{Legacy Interfaces: A Note on PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery / Legacy Interfaces: A Note on PCI Device Discovery}
-Transitional devices must have a Revision ID of 0 to match
+Transitional devices MUST have a Revision ID of 0 to match
legacy drivers.
-Non-transitional devices must have a Revision ID of 1 or higher.
+Non-transitional devices MUST have a Revision ID of 1 or higher.
-Both transitional and non-transitional drivers must match
+Both transitional and non-transitional drivers MUST match
any Revision ID value.
\subsection{PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
The device is configured via I/O and/or memory regions (though see
-VIRTIO_PCI_CAP_PCI_CFG for access via the PCI configuration space).
+\ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability} for access via the PCI configuration space).
-These regions contain the virtio header registers, the notification register, the
-ISR status register and device specific registers, as specified by Virtio
-Structure PCI Capabilities.
+There may be different widths of accesses to the I/O region; the driver
+MUST access each field using the “natural” access method (i.e. 32-bit accesses for 32-bit fields, etc). All multi-byte fields are little-endian.
-There may be different widths of accesses to the I/O region; the
-“natural” access method for each field must be
-used (i.e. 32-bit accesses for 32-bit fields, etc).
+\subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+The virtio device configuration layout includes a common configuration header, notification area, ISR status area and a device-specific configuration area.
+
+Each structure can be mapped by a Base Address register (BAR) belonging to
+the function, or accessed via the special VIRTIO_PCI_CAP_PCI_CFG field in the PCI configuration space.
+
+The location of each structure is specified using a vendor-specific PCI capability located
+on the capability list in PCI configuration space of the device.
+This virtio structure capability uses little-endian format; all fields are
+read-only unless stated otherwise:
+
+\begin{lstlisting}
+struct virtio_pci_cap {
+ u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
+ u8 cap_next; /* Generic PCI field: next ptr. */
+ u8 cap_len; /* Generic PCI field: capability length */
+ u8 cfg_type; /* Identifies the structure. */
+ u8 bar; /* Where to find it. */
+ u8 padding[3]; /* Pad to full dword. */
+ le32 offset; /* Offset within bar. */
+ le32 length; /* Length of the structure, in bytes. */
+};
+\end{lstlisting}
+
+This structure can be followed by extra data, depending on
+cfg_type, as documented below. The device MAY append extra data
+or padding to any structure beyond that, the device MUST accept a cap_len field
+which is larger than specified here.
+
+The fields are interpreted as follows:
+
+\begin{description}
+\item[cap_vndr]
+ 0x09; Identifies a vendor-specific capability.
+
+\item[cap_next]
+ Link to next capability in the capability list in the configuration space.
+
+\item[cap_len]
+ Length of this capability structure, including the whole of
+ struct virtio_pci_cap, and extra data if any.
+ This length MAY include padding, or fields unused by the driver.
+
+\item[cfg_type]
+ identifies the structure, according to the following table:
+
+\begin{lstlisting}
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG 1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG 2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG 3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG 4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG 5
+\end{lstlisting}
-PCI Device Configuration Layout includes the common configuration,
-ISR, notification and device specific configuration
-structures.
+ Any other value - reserved for future use. Drivers MUST
+ ignore any vendor-specific capability structure which has
+ a reserved cfg_type value.
-All multi-byte fields are little-endian.
+ The device MAY offer more than one structure of any type - this makes it
+ possible for the device to expose multiple interfaces to drivers. The order of
+ the capabilities in the capability list specifies the order of preference
+ suggested by the device; drivers SHOULD use the first interface that they can
+ support. For example, on some hypervisors, notifications using IO accesses are
+ faster than memory accesses. In this case, the device would expose two
+ capabilities with cfg_type set to VIRTIO_PCI_CAP_NOTIFY_CFG:
+ the first one addressing an I/O BAR, the second one addressing a memory BAR.
+ In this example, the driver SHOULD use the I/O BAR if I/O resources are available, and fall back on
+ memory BAR when I/O resources are unavailable.
+
+ Each structure is detailed individually below.
+
+\item[bar]
+ values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
+ the function located beginning at 10h in Configuration Space
+ and used to map the structure into Memory or I/O Space.
+ The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
+ or I/O Space.
+
+ Any other value is reserved for future use. Drivers MUST
+ ignore any vendor-specific capability structure which has
+ a reserved bar value.
+
+\item[offset]
+ indicates where the structure begins relative to the base address associated
+ with the BAR.
+
+\item[length]
+ indicates the length of the structure.
+
+ length MAY include padding, or fields unused by the driver, or
+ future extensions.
+
+ Drivers SHOULD only map part of configuration structure
+ large enough for device operation. Drivers MUST handle
+ unexpectedly large length fields, but MAY check that length
+ is large enough for device operation.
+
+ For example, a future device might present a large structure size of several
+ MBytes.
+ As current devices never utilize structures larger than 4KBytes in size,
+ driver can limit the mapped structure size to e.g.
+ 4KBytes to allow forward compatibility with such devices without loss of
+ functionality and without wasting resources.
+\end{description}
\subsubsection{Common configuration structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
-Common configuration structure layout is documented below:
+
+The common configuration structure is found at the bar and offset within the VIRTIO_PCI_CAP_COMMON_CFG capability; its layout is below.
+
+The device MUST present at least one common configuration capability.
\begin{lstlisting}
struct virtio_pci_common_cfg {
@@ -865,19 +968,20 @@ struct virtio_pci_common_cfg {
\begin{description}
\item[device_feature_select]
- The driver uses this to select which Feature Bits the device_feature field shows.
+ The driver uses this to select which feature bits the device_feature field shows.
Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63.
- The device MUST present 0 on device_feature for any other value.
+ The device MUST present 0 on device_feature for any other value, but the driver MUST NOT rely on this.
\item[device_feature]
- The device uses this to report Feature Bits to the driver.
- Device Feature Bits selected by device_feature_select.
+ The device uses this to report which feature bits it is
+ offering to the driver: the driver writes to
+ device_feature_select to select which are presented.
\item[driver_feature_select]
- The driver uses this to select which Feature Bits the driver_feature field shows.
+ The driver uses this to select which feature bits the driver_feature field shows.
Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63.
- When set to any other value, reads from driver_feature
- return 0, writing 0 into driver_feature has no effect. The driver
+ When set to any other value, the device MUST return 0 on reads from driver_feature
+ return 0, and ignore writing of 0 into driver_feature. The driver
MUST not write any other value into driver_feature (a corollary of
the rule that the driver can only write a subset of device features).
@@ -924,7 +1028,7 @@ struct virtio_pci_common_cfg {
\item[queue_notify_off]
The driver reads this to calculate the offset from start of Notification structure at
which this virtqueue is located.
- Note: this is *not* an offset in bytes. See notify_off_multiplier below.
+ Note: this is *not* an offset in bytes. See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} below.
\item[queue_desc]
The driver writes the physical address of Descriptor Table here.
@@ -936,18 +1040,87 @@ struct virtio_pci_common_cfg {
The driver writes the physical address of Used Ring here.
\end{description}
-\subsubsection{ISR status structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status structure layout}
-ISR status structure includes a single 8-bit ISR status field.
+\subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
+
+The device MUST present at least one notification capability.
-\subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification structure layout}
-Notification structure is always a multiple of 2 bytes in size.
-It includes 2-byte Queue Notify fields for each virtqueue of
-the device. Note that multiple virtqueues can use the same
-Queue Notify field, if necessary: see notify_off_multiplier below.
+The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
+capability. This capability is immediately followed by an additional
+field, like so:
+
+\begin{lstlisting}
+struct virtio_pci_notify_cap {
+ struct virtio_pci_cap cap;
+ le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
+};
+\end{lstlisting}
+
+The device MUST present an even cap.length of at least 2.
+
+The device MUST present notify_off_multiplier as an even power of 2,
+or 0. The device MUST ignore a capability with notify_off_multiplier
+of 1.
+
+notify_off_multiplier field is combined with the queue_notify_off to
+derive the Queue Notify address within a BAR for a specific queue:
+
+\begin{lstlisting}
+ cap.offset + queue_notify_off * notify_off_multiplier
+\end{lstlisting}
+
+The BAR, offset and notify_off_multiplier are taken from the
+notification capability structure above, and the queue_notify_off is
+taken from the common configuration structure.
+
+For example, if notifier_off_multiplier is 0, all queues will use the same
+Queue Notify address.
+
+\subsubsection{ISR status capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
+
+The device MUST present at least one VIRTIO_PCI_CAP_ISR_CFG capability. This
+refers to at least a single byte, which contains the 8-bit ISR status field.
\subsubsection{Device specific structure}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device specific structure}
-Device specific structure is optional.
+The device MAY present at least one VIRTIO_PCI_CAP_DEVICE_CFG capability (some
+devices may not have any device specific structure).
+
+\subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+
+The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG. This
+creates an alternative (and likely suboptimal) access method to the
+common configuration, notification, ISR and device-specific regions.
+
+The capability is immediately followed by an additional field like so:
+
+\begin{lstlisting}
+struct virtio_pci_cfg_cap {
+ struct virtio_pci_cap cap;
+ u8 pci_cfg_data[4]; /* Data for BAR access. */
+};
+\end{lstlisting}
+
+To access a device region, the driver writes into the capability
+structure (ie. within the PCI configuration space) as follows:
+
+\begin{itemize}
+\item The driver sets the BAR to access by writing to the cap.bar field.
+
+\item The driver sets the size of the access by writing 1, 2 or 4 to
+ the cap.length field.
+
+\item The driver sets the offset within the BAR by writing to the
+ cap.offset field. The driver MUST NOT write an offset which is not
+ a multiple of cap.length (ie. all accesses must be aligned).
+\end{itemize}
+
+At that point, the pci_cfg_data field will provide a window of size
+cap.length into the given cap.bar at offset cap.offset: writes will
+have the same effect as writes into the BAR, and reads will have the
+same effect and return the same value as reads from the BAR.
+
+The driver MUST perform reads/writes from/to pci_cfg_data of the same
+width as given by cap.length.
\subsubsection{Legacy Interfaces: A Note on PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Legacy Interfaces: A Note on PCI Device Layout}
@@ -1032,179 +1205,13 @@ device.
\paragraph{Virtio Device Configuration Layout Detection}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection}
-As a prerequisite to device initialization, driver executes a
-PCI capability list scan, detecting virtio configuration layout using Virtio
+As a prerequisite to device initialization, the driver scans the
+PCI capability list, detecting virtio configuration layout using the Virtio
Structure PCI capabilities.
-Virtio Device Configuration Layout includes virtio configuration header, Notification
-and ISR Status and device configuration structures.
-Each structure can be mapped by a Base Address register (BAR) belonging to
-the function, located beginning at 10h in Configuration Space,
-or accessed though PCI configuration space.
-
-Actual location of each structure is specified using vendor-specific PCI capability located
-on capability list in PCI configuration space of the device.
-This virtio structure capability uses little-endian format; all bits are
-read-only:
-
-\begin{lstlisting}
-struct virtio_pci_cap {
- u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
- u8 cap_next; /* Generic PCI field: next ptr. */
- u8 cap_len; /* Generic PCI field: capability length */
- u8 cfg_type; /* Identifies the structure. */
- u8 bar; /* Where to find it. */
- u8 padding[3]; /* Pad to full dword. */
- le32 offset; /* Offset within bar. */
- le32 length; /* Length of the structure, in bytes. */
-};
-\end{lstlisting}
-
-This structure can optionally be followed by extra data, depending on
-other fields, as documented below.
-
-Note that future versions of this specification will likely
-extend devices by adding extra fields at the tail end of some structures.
-
-To allow forward compatibility with such extensions, drivers must
-not limit structure size. Instead, drivers should only
-check that structures are *large enough* to contain the fields
-required for device operation.
-
-For example, if the specification states 'structure includes a
-single 8-bit field' drivers should understand this to mean that
-the structure can also include an arbitrary amount of tail padding,
-and accept any structure size equal to or greater than the
-specified 8-bit size.
-
-The fields are interpreted as follows:
-
-\begin{description}
-\item[cap_vndr]
- 0x09; Identifies a vendor-specific capability.
-
-\item[cap_next]
- Link to next capability in the capability list in the configuration space.
-
-\item[cap_len]
- Length of the capability structure, including the whole of
- struct virtio_pci_cap, and extra data if any.
- This length might include padding, or fields unused by the driver.
-
-\item[cfg_type]
- identifies the structure, according to the following table.
-
-\begin{lstlisting}
-/* Common configuration */
-#define VIRTIO_PCI_CAP_COMMON_CFG 1
-/* Notifications */
-#define VIRTIO_PCI_CAP_NOTIFY_CFG 2
-/* ISR Status */
-#define VIRTIO_PCI_CAP_ISR_CFG 3
-/* Device specific configuration */
-#define VIRTIO_PCI_CAP_DEVICE_CFG 4
-/* PCI configuration access */
-#define VIRTIO_PCI_CAP_PCI_CFG 5
-\end{lstlisting}
-
- Any other value - reserved for future use. Drivers MUST
- ignore any vendor-specific capability structure which has
- a reserved cfg_type value.
-
- More than one capability can identify the same structure - this makes it
- possible for the device to expose multiple interfaces to drivers. The order of
- the capabilities in the capability list specifies the order of preference
- suggested by the device; drivers SHOULD use the first interface that they can
- support. For example, on some hypervisors, notifications using IO accesses are
- faster than memory accesses. In this case, hypervisor can expose two
- capabilities with cfg_type set to VIRTIO_PCI_CAP_NOTIFY_CFG:
- the first one addressing an I/O BAR, the second one addressing a memory BAR.
- Driver will use the I/O BAR if I/O resources are available, and fall back on
- memory BAR when I/O resources are unavailable.
-
-\item[bar]
- values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
- the function located beginning at 10h in Configuration Space
- and used to map the structure into Memory or I/O Space.
- The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
- or I/O Space.
-
- Any other value is reserved for future use. Drivers MUST
- ignore any vendor-specific capability structure which has
- a reserved bar value.
-
-\item[offset]
- indicates where the structure begins relative to the base address associated
- with the BAR.
-
-\item[length]
- indicates the length of the structure.
- This size might include padding, or fields unused by the driver.
- Drivers SHOULD only map part of configuration structure
- large enough for device operation.
- For example, a future device might present a large structure size of several
- MBytes.
- As current devices never utilize structures larger than 4KBytes in size,
- driver can limit the mapped structure size to e.g.
- 4KBytes to allow forward compatibility with such devices without loss of
- functionality and without wasting resources.
-\end{description}
-
-If cfg_type is VIRTIO_PCI_CAP_NOTIFY_CFG this structure is immediately followed
-by additional fields:
-
-\begin{lstlisting}
-struct virtio_pci_notify_cap {
- struct virtio_pci_cap cap;
- le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
-};
-\end{lstlisting}
-
-\begin{description}
-\item[notify_off_multiplier]
-
- Virtqueue offset multiplier, in bytes. Must be even and either a power of two, or 0.
- Value 0x1 is reserved.
- For a given virtqueue, the address to use for notifications is calculated as follows:
-
- queue_notify_off * notify_off_multiplier + offset
-
- If notify_off_multiplier is 0, all virtqueues use the same address in
- the Notifications structure!
-\end{description}
-
-If cfg_type is VIRTIO_PCI_CAP_PCI_CFG the fields bar, offset and length are RW
-and this structure is immediately followed by an additional field:
-
-\begin{lstlisting}
-struct virtio_pci_cfg_cap {
- __u8 pci_cfg_data[4]; /* Data for BAR access. */
-};
-\end{lstlisting}
-
-\begin{description}
-\item[pci_cfg_data]
-
- This RW field allows an indirect access to any BAR on the
- device using PCI configuration accesses.
-
- The BAR to access is selected using the bar field.
- The length of the access is specified by the length
- field, which can be set to 1, 2 and 4.
- The offset within the BAR is specified by the offset
- field, which must be aligned to length bytes.
-
- After this field is written by driver, the first length
- bytes in pci_cfg_data are written at the selected
- offset in the selected BAR.
-
- When this field is read by driver, length bytes at the
- selected offset in the selected BAR are read into pci_cfg_data.
-\end{description}
-
\subparagraph{Legacy Interface: A Note on Device Layout Detection}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection / Legacy Interface: A Note on Device Layout Detection}
-Legacy drivers skipped Device Layout Detection step, assuming legacy
+Legacy drivers skipped the Device Layout Detection step, assuming legacy
configuration space in BAR0 in I/O space unconditionally.
Legacy devices did not have the Virtio PCI Capability in their
@@ -1226,7 +1233,7 @@ and fail gracefully.
Non-transitional devices, on a platform where a legacy driver for
a legacy device with the same ID might have previously existed,
-must take the following steps to fail gracefully when a legacy
+MUST take the following steps to fail gracefully when a legacy
driver attempts to drive them:
\begin{enumerate}
@@ -1305,16 +1312,16 @@ device is defined as 4096 bytes. Driver writes the physical address, divided
by 4096 to the Queue Address field\footnote{The 4096 is based on the x86 page size, but it's also large
enough to ensure that the separate parts of the virtqueue are on
separate cache lines.
-}.
+}. There was no mechanism to negotiate the queue size.
\subsubsection{Notifying The Device}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notifying The Device}
-Device notification occurs by writing the 16-bit virtqueue index
-of this virtqueue to the Queue Notify field.
+The driver notifies the device by writing the 16-bit virtqueue index
+of this virtqueue to the Queue Notify address. See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} for how to calculate this address.
\subsubsection{Virtqueue Interrupts From The Device}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device}
-If an interrupt is necessary:
+If an interrupt is necessary, the device SHOULD:
\begin{itemize}
\item If MSI-X capability is disabled:
@@ -1331,22 +1338,23 @@ If an interrupt is necessary:
number.
\item If Queue Vector field value is NO_VECTOR, no interrupt
- message is requested for this event.
+ message is requested for this event, so the device MUST NOT
+ deliver an interrupt.
\end{enumerate}
\end{itemize}
-The driver interrupt handler should:
+The driver interrupt handler SHOULD:
\begin{itemize}
\item If MSI-X capability is disabled: read the ISR Status field,
which will reset it to zero. If the lower bit is zero, the
interrupt was not for this device. Otherwise, the driver
- should look through the used rings of each virtqueue for the
+ SHOULD look through the used rings of all virtqueues for the
device, to see if any progress has been made by the device
which requires servicing.
\item If MSI-X capability is enabled: look through the used rings of
- each virtqueue mapped to the specific MSI-X vector for the
+ all virtqueues mapped to the specific MSI-X vector for the
device, to see if any progress has been made by the device
which requires servicing.
\end{itemize}
@@ -1354,8 +1362,7 @@ The driver interrupt handler should:
\subsubsection{Notification of Device Configuration Changes}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
Some virtio PCI devices can change the device configuration
-state, as reflected in the virtio header in the PCI configuration
-space. In this case:
+state, as reflected in the device-specific region of the device. In this case:
\begin{itemize}
\item If MSI-X capability is disabled: an interrupt is delivered and
@@ -1364,12 +1371,13 @@ space. In this case:
space. Note that a single interrupt can indicate both that one
or more virtqueue has been used and that the configuration
space has changed: even if the config bit is set, virtqueues
- must be scanned.
+ MUST be scanned.
\item If MSI-X capability is enabled: an interrupt message is
requested. The Configuration Vector field sets the MSI-X Table
entry number to use. If Configuration Vector field value is
- NO_VECTOR, no interrupt message is requested for this event.
+ NO_VECTOR, no interrupt message is requested for this event and
+ the device MUST NOT deliver an interrupt.
\end{itemize}
\section{Virtio Over MMIO}\label{sec:Virtio Transport Options / Virtio Over MMIO}