summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--content.tex430
1 files changed, 278 insertions, 152 deletions
diff --git a/content.tex b/content.tex
index acfcee9..4578fc4 100644
--- a/content.tex
+++ b/content.tex
@@ -844,15 +844,18 @@ Virtio devices are commonly implemented as PCI devices.
A Virtio device can be implemented as any kind of PCI device:
a Conventional PCI device or a PCI Express
-device. A Virtio device using Virtio Over PCI Bus MUST expose to
-guest an interface that meets the specification requirements of
-the appropriate PCI specification: \hyperref[intro:PCI]{[PCI]}
-and \hyperref[intro:PCIe]{[PCIe]}
-respectively. To assure designs meet the latest level
+device. To assure designs meet the latest level
requirements, designers of Virtio Over PCI devices must refer to
the PCI-SIG home page at \url{http://www.pcisig.com} for any
approved changes.
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus}
+A Virtio device using Virtio Over PCI Bus MUST expose to
+guest an interface that meets the specification requirements of
+the appropriate PCI specification: \hyperref[intro:PCI]{[PCI]}
+and \hyperref[intro:PCIe]{[PCIe]}
+respectively.
+
\subsection{PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
Any PCI device with Vendor ID 0x1AF4, and Device ID 0x1000 through
@@ -860,22 +863,25 @@ Any PCI device with Vendor ID 0x1AF4, and Device ID 0x1000 through
}.
The Subsystem Device ID indicates which virtio device is
-supported by the device. The Subsystem Vendor ID SHOULD reflect
+supported by the device, as indicated in section \ref{sec:Device Types}.
+
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
+The Subsystem Vendor ID SHOULD reflect
the PCI Vendor ID of the environment (it's currently only used
for informational purposes by the driver).
+Non-transitional devices MUST have a Revision ID of 1 or higher.
+
+\drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
All drivers MUST match devices with any Revision ID, this
is to allow devices to be versioned without breaking drivers.
+Drivers MUST match any Revision ID value.
+
\subsubsection{Legacy Interfaces: A Note on PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery / Legacy Interfaces: A Note on PCI Device Discovery}
Transitional devices MUST have a Revision ID of 0 to match
legacy drivers.
-Non-transitional devices MUST have a Revision ID of 1 or higher.
-
-Both transitional and non-transitional drivers MUST match
-any Revision ID value.
-
\subsection{PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
The device is configured via I/O and/or memory regions (though see
@@ -884,11 +890,15 @@ for access via the PCI configuration space), as specified by Virtio
Structure PCI Capabilities.
Fields of different sizes are present in the device
-configuration regions; the driver
+configuration regions.
+All 32-bit and 16-bit fields are little-endian.
+
+\drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
+
+The driver
MUST access each field using the “natural” access method, i.e.
32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit
fields and 8-bit accesses for 8-bit fields.
-All 32-bit and 16-bit fields are little-endian.
\subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
@@ -922,13 +932,7 @@ struct virtio_pci_cap {
\end{lstlisting}
This structure can be followed by extra data, depending on
-\field{cfg_type}, as documented below. In this case device MUST include
-this extra data (from the beginning of the \field{cap_vndr} field
-through end of the extra data fields if any)
-in the capability length as specified by \field{cap_len}.
-The device MAY append extra data
-or padding to any structure beyond that; the driver MUST accept a \field{cap_len} value
-which is larger than specified here.
+\field{cfg_type}, as documented below.
The fields are interpreted as follows:
@@ -960,22 +964,22 @@ The fields are interpreted as follows:
#define VIRTIO_PCI_CAP_PCI_CFG 5
\end{lstlisting}
- Any other value - reserved for future use. Drivers MUST
- ignore any vendor-specific capability structure which has
- a reserved \field{cfg_type} value.
+ Any other value is reserved for future use.
+
+ Each structure is detailed individually below.
The device MAY offer more than one structure of any type - this makes it
possible for the device to expose multiple interfaces to drivers. The order of
the capabilities in the capability list specifies the order of preference
- suggested by the device; drivers SHOULD use the first interface that they can
- support. For example, on some hypervisors, notifications using IO accesses are
+ suggested by the device.
+ \begin{note}
+ For example, on some hypervisors, notifications using IO accesses are
faster than memory accesses. In this case, the device would expose two
capabilities with \field{cfg_type} set to VIRTIO_PCI_CAP_NOTIFY_CFG:
the first one addressing an I/O BAR, the second one addressing a memory BAR.
- In this example, the driver SHOULD use the I/O BAR if I/O resources are available, and fall back on
+ In this example, the driver would use the I/O BAR if I/O resources are available, and fall back on
memory BAR when I/O resources are unavailable.
-
- Each structure is detailed individually below.
+ \end{note}
\item[\field{bar}]
values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
@@ -984,9 +988,7 @@ The fields are interpreted as follows:
The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
or I/O Space.
- Any other value is reserved for future use. Drivers MUST
- ignore any vendor-specific capability structure which has
- a reserved \field{bar} value.
+ Any other value is reserved for future use.
\item[\field{offset}]
indicates where the structure begins relative to the base address associated
@@ -999,11 +1001,7 @@ The fields are interpreted as follows:
\field{length} MAY include padding, or fields unused by the driver, or
future extensions.
- Drivers SHOULD only map part of configuration structure
- large enough for device operation. Drivers MUST handle
- an unexpectedly large \field{length}, but MAY check that \field{length}
- is large enough for device operation.
-
+ \begin{note}
For example, a future device might present a large structure size of several
MBytes.
As current devices never utilize structures larger than 4KBytes in size,
@@ -1011,14 +1009,44 @@ The fields are interpreted as follows:
4KBytes (thus ignoring parts of structure after the first
4KBytes) to allow forward compatibility with such devices without loss of
functionality and without wasting resources.
+ \end{note}
\end{description}
+\drivernormative{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+The driver MUST ignore any vendor-specific capability structure which has
+a reserved \field{cfg_type} value.
+
+The driver SHOULD use the first instance of each virtio structure type they can
+support.
+
+The driver MUST accept a \field{cap_len} value which is larger than specified here.
+
+The driver MUST ignore any vendor-specific capability structure which has
+a reserved \field{bar} value.
+
+ The drivers SHOULD only map part of configuration structure
+ large enough for device operation. The drivers MUST handle
+ an unexpectedly large \field{length}, but MAY check that \field{length}
+ is large enough for device operation.
+
+The driver MUST NOT write into any field of the capability structure,
+with the exception of those with \field{cap_type} VIRTIO_PCI_CAP_PCI_CFG as
+detailed in \ref{drivernormative:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}.
+
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
+
+The device MUST include any extra data (from the beginning of the \field{cap_vndr} field
+through end of the extra data fields if any) in \field{cap_len}.
+The device MAY append extra data
+or padding to any structure beyond that.
+
+If the device presents multiple structures of the same type, it SHOULD order
+them from optimal (first) to least-optimal (last).
+
\subsubsection{Common configuration structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
The common configuration structure is found at the \field{bar} and \field{offset} within the VIRTIO_PCI_CAP_COMMON_CFG capability; its layout is below.
-\field{offset} must be 4-byte aligned.
-
-The device MUST present at least one common configuration capability.
\begin{lstlisting}
struct virtio_pci_common_cfg {
@@ -1047,8 +1075,7 @@ struct virtio_pci_common_cfg {
\begin{description}
\item[\field{device_feature_select}]
The driver uses this to select which feature bits \field{device_feature} shows.
- Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63.
- The device MUST present 0 on \field{device_feature} for any other value.
+ Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc.
\item[\field{device_feature}]
The device uses this to report which feature bits it is
@@ -1057,14 +1084,7 @@ struct virtio_pci_common_cfg {
\item[\field{driver_feature_select}]
The driver uses this to select which feature bits \field{driver_feature} shows.
- Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63.
- When set to any other value:
- \begin{itemize}
- \item the device MUST return 0 on reads from \field{driver_feature}
- \item the device MUST ignore writing of 0 into \field{driver_feature}
- \item the driver MUST NOT write any non 0 value into \field{driver_feature} (a corollary of
- the rule that the driver can only write a subset of device features).
- \end{itemize}
+ Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc.
\item[\field{driver_feature}]
The driver writes this to accept feature bits offered by the device.
@@ -1082,10 +1102,7 @@ struct virtio_pci_common_cfg {
\item[\field{config_generation}]
Configuration atomicity value. The device changes this every time the
- configuration noticeably changes. This means the device may
- only change the value after a configuration read operation,
- but MUST change it if there is any risk of a driver seeing an
- inconsistent configuration state.
+ configuration noticeably changes.
\item[\field{queue_select}]
Queue Select. The driver selects which virtqueue the following
@@ -1094,7 +1111,7 @@ struct virtio_pci_common_cfg {
\item[\field{queue_size}]
Queue Size. On reset, specifies the maximum queue size supported by
the hypervisor. This can be modified by driver to reduce memory requirements.
- The device MUST set this to 0 if this virtqueue is unavailable.
+ A 0 means the queue is unavailable.
\item[\field{queue_msix_vector}]
The driver uses this to specify the queue vector for MSI-X.
@@ -1103,14 +1120,12 @@ struct virtio_pci_common_cfg {
The driver uses this to selectively prevent the device from executing requests from this virtqueue.
1 - enabled; 0 - disabled.
- The driver MUST configure the other virtqueue fields before enabling
- the virtqueue.
-
\item[\field{queue_notify_off}]
The driver reads this to calculate the offset from start of Notification structure at
which this virtqueue is located.
- Note: this is \em{not} an offset in bytes.
+ \begin{note} this is \em{not} an offset in bytes.
See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} below.
+ \end{note}
\item[\field{queue_desc}]
The driver writes the physical address of Descriptor Table here. See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
@@ -1122,12 +1137,58 @@ struct virtio_pci_common_cfg {
The driver writes the physical address of Used Ring here. See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
\end{description}
-\subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
+\field{offset} MUST be 4-byte aligned.
-The device MUST present at least one notification capability.
+The device MUST present at least one common configuration capability.
+
+The device MUST present the feature bits it is offering in \field{device_feature}, starting at bit \field{device_feature_select} $*$ 32 for any \field{device_feature_select} written by the driver.
+\begin{note}
+ This means that it will present 0 for any \field{device_feature_select} other than 0 or 1, since no feature defined here exceeds 63.
+\end{note}
+
+The device MUST present any valid feature bits the driver has written in \field{driver_feature}, starting at bit \field{driver_feature_select} $*$ 32 for any \field{driver_feature_select} written by the driver. Valid feature bits are those which are subset of the corresponding \field{device_feature} bits. The device MAY present invalid bits written by the driver.
+
+\begin{note}
+ This means that a device can ignore writes for feature bits it never
+ offers, and simply present 0 on reads. Or it can just mirror what the driver wrote
+ (but it will still have to check them when the driver sets FEATURES_OK).
+\end{note}
+
+\begin{note}
+ A driver shouldn't write invalid bits anyway, as per \ref{drivernormative:General Initialization And Device Operation / Device Initialization}, but this attempts to handle it.
+\end{note}
+
+The device MUST present a changed \field{config_generation} after the
+driver has read a device-specific configuration value which has
+changed since any part of the device-specific configuration was last
+read.
+\begin{note}
+As \field{config_generation} is an 8-bit value, simply incrementing it
+on every configuration change may violate this requirement due to wrap.
+Better would be to set an internal flag when it has changed,
+and if that flag is set when the driver reads from the device-specific
+configuration, increment \field{config_generation} and clear the flag.
+\end{note}
+
+The device MUST reset when 0 is written to \field{device_status}.
+
+The device MUST present a 0 in \field{queue_size} if the virtqueue
+corresponding to the current \field{queue_select} is unavailable.
+
+\drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
+
+The driver MUST NOT write to \field{device_feature}, \field{num_queues}, \field{config_generation} or \field{queue_notify_off}.
+
+The driver MUST NOT write a value which is not a power of 2 to \field{queue_size}.
+
+The driver MUST configure the other virtqueue fields before enabling the virtqueue
+with \field{queue_enable}.
+
+\subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
-capability. The \field{offset} must be 2-byte aligned. This capability is immediately followed by an additional
+capability. This capability is immediately followed by an additional
field, like so:
\begin{lstlisting}
@@ -1137,9 +1198,6 @@ struct virtio_pci_notify_cap {
};
\end{lstlisting}
-The device MUST either present \field{notify_off_multiplier} as an even power of 2,
-or present \field{notify_off_multiplier} as 0.
-
\field{notify_off_multiplier} is combined with the \field{queue_notify_off} to
derive the Queue Notify address within a BAR for a specific queue:
@@ -1151,12 +1209,23 @@ The \field{cap.offset} and \field{notify_off_multiplier} are taken from the
notification capability structure above, and the \field{queue_notify_off} is
taken from the common configuration structure.
+\begin{note}
For example, if \field{notifier_off_multiplier} is 0, the device uses
the same Queue Notify address for all queues.
+\end{note}
+
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
+The device MUST present at least one notification capability.
+
+The \field{cap.offset} must be 2-byte aligned.
+
+The device MUST either present \field{notify_off_multiplier} as an even power of 2,
+or present \field{notify_off_multiplier} as 0.
The value \field{cap.length} presented by the device MUST be at least 2
and MUST be large enough to support queue notification offsets
for all supported queues in all possible configurations.
+
For all queues, the value \field{cap.length} presented by the device MUST satisfy:
\begin{lstlisting}
cap.length >= queue_notify_off * notify_off_multiplier + 2
@@ -1164,16 +1233,14 @@ cap.length >= queue_notify_off * notify_off_multiplier + 2
\subsubsection{ISR status capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
-The device MUST present at least one VIRTIO_PCI_CAP_ISR_CFG capability. This
-refers to at least a single byte, which contains the 8-bit ISR status field.
+The VIRTIO_PCI_CAP_ISR_CFG capability
+refers to at least a single byte, which contains the 8-bit ISR status field
+to be used for INT\#x interrupt handling.
The \field{offset} for the \field{ISR status} has no specific alignment requirements.
-\subsection{ISR status field}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status field}
-
-\field{ISR status} is used for INT\#x interrupt handling.
-Driver MUST NOT access \field{ISR status} when MSI-X capability
-is enabled.
+The ISR bits allow the device to distinguish between device-specific configuration
+change interrupts and normal virtqueue interrupts:
\begin{tabular}{ |l||l|l|l| }
\hline
@@ -1183,29 +1250,43 @@ Purpose & Device Configuration Interrupt & Queue Interrupt & Reserved \\
\hline
\end{tabular}
-If MSI-X capability is disabled, device MUST set Interrupt Status
+To avoid an extra access, simply reading this register resets it to 0 and
+causes the device to de-assert the interrupt.
+
+In this way, driver read of ISR status causes the device to de-assert
+an interrupt.
+
+See sections \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device} and \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes} for how this is used.
+
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
+
+The device MUST present at least one VIRTIO_PCI_CAP_ISR_CFG capability.
+
+If MSI-X capability is disabled, the device MUST set the Interrupt Status
bit in the PCI Status register in the PCI Configuration Header of
the device to the logical OR of all bits in \field{ISR status} of
-the device. Device then asserts/deasserts INT\#x interrupts unless masked
+the device. The device then asserts/deasserts INT\#x interrupts unless masked
according to standard PCI rules \hyperref[intro:PCI]{[PCI]}.
-Device MUST reset \field{ISR status} to 0 on read.
+The device MUST reset \field{ISR status} to 0 on driver read.
-In this way, driver read of \field{ISR status} causes the device to de-assert
-an interrupt.
+\drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
-See sections \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device} and \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes} for how this is used.
+The driver MUST NOT access the ISR field when MSI-X capability
+is enabled.
\subsubsection{Device specific structure}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device specific structure}
The device MAY present at least one VIRTIO_PCI_CAP_DEVICE_CFG capability (some
devices may not have any device specific structure).
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device specific structure}
+
The \field{offset} for the device specific structure must be 4-byte aligned.
\subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
-The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG. This
+The VIRTIO_PCI_CAP_PCI_CFG capability
creates an alternative (and likely suboptimal) access method to the
common configuration, notification, ISR and device-specific regions.
@@ -1218,8 +1299,8 @@ struct virtio_pci_cfg_cap {
};
\end{lstlisting}
-The fields \field{cap.bar}, \field{cap.legth}, \field{cap.offset} and
-\field{pci_cfg_data} are read-write (RW).
+The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset} and
+\field{pci_cfg_data} are read-write (RW) for the driver.
To access a device region, the driver writes into the capability
structure (ie. within the PCI configuration space) as follows:
@@ -1231,13 +1312,16 @@ structure (ie. within the PCI configuration space) as follows:
\field{cap.length}.
\item The driver sets the offset within the BAR by writing to
- \field{cap.offset}. The driver MUST NOT write an offset which is not
- a multiple of \field{cap.length} (ie. all accesses must be aligned).
+ \field{cap.offset}.
\end{itemize}
At that point, \field{pci_cfg_data} will provide a window of size
\field{cap.length} into the given \field{cap.bar} at offset \field{cap.offset}.
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+
+The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG capability.
+
Upon detecting driver write access
to \field{pci_cfg_data}, the device MUST execute a write access
at offset \field{cap.offset} at BAR selected by \field{cap.bar} using the first \field{cap.length}
@@ -1249,6 +1333,11 @@ execute a read access of length cap.length at offset \field{cap.offset}
at BAR selected by \field{cap.bar} and store the first \field{cap.length} bytes in
\field{pci_cfg_data}.
+\drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
+
+The driver MUST NOT write a \field{cap.offset} which is not
+a multiple of \field{cap.length} (ie. all accesses must be aligned).
+
\subsubsection{Legacy Interfaces: A Note on PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Legacy Interfaces: A Note on PCI Device Layout}
Transitional devices should present part of configuration
@@ -1327,18 +1416,17 @@ Space}~\nameref{sec:Basic Facilities of a Virtio Device / Device Configuration S
\subsubsection{Device Initialization}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization}
This documents PCI-specific steps executed during Device Initialization.
-As the first step, driver must detect device configuration layout
-to locate configuration fields in memory, I/O or PCI configuration space of the
-device.
\paragraph{Virtio Device Configuration Layout Detection}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtio Device Configuration Layout Detection}
As a prerequisite to device initialization, the driver scans the
PCI capability list, detecting virtio configuration layout using Virtio
-Structure PCI capabilities.
+Structure PCI capabilities as detailed in \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
\paragraph{Non-transitional Device With Legacy Driver}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Non-transitional Device With Legacy Driver}
+\drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Non-transitional Device With Legacy Driver}
+
Non-transitional devices, on a platform where a legacy driver for
a legacy device with the same ID might have previously existed,
MUST take the following steps to fail gracefully when a legacy
@@ -1379,27 +1467,11 @@ When MSI-X capability is present and enabled in the device
(through standard PCI configuration space) \field{config_msix_vector} and \field{queue_msix_vector} are used to map configuration change and queue
interrupts to MSI-X vectors. In this case, the ISR Status is unused.
-A device that has an MSI-X capability SHOULD support at least 2
-and at most 0x800 MSI-X vectors.
-Device MUST report the number of vectors supported in
-\field{Table Size} in the MSI-X Capability as specified in
-\hyperref[intro:PCI]{[PCI]}.
-Driver MUST support device with any MSI-X Table Size 0 to 0x7FF.
-Driver MAY fall back on using INT\#x interrupts for a device
-which only supports one MSI-X vector (MSI-X Table Size = 0).
-
-Driver MAY intepret the Table Size as a hint from the device
-for the suggested number of MSI-X vectors to use.
-Therefore, devices SHOULD restrict the reported MSI-X Table Size field
-to a value that might benefit system performance.
-For example, a device which does not expect to send
-interrupts at a high rate might only specify 2 MSI-X vectors.
-
Writing a valid MSI-X Table entry number, 0 to 0x7FF, to
\field{config_msix_vector}/\field{queue_msix_vector} maps interrupts triggered
by the configuration change/selected queue events respectively to
the corresponding MSI-X vector. To disable interrupts for a
-specific event type, unmap this event by writing a special NO_VECTOR
+specific event type, the driver unmaps this event by writing a special NO_VECTOR
value:
\begin{lstlisting}
@@ -1407,31 +1479,57 @@ value:
#define VIRTIO_MSI_NO_VECTOR 0xffff
\end{lstlisting}
-Driver MUST NOT attempt to map an event to a vector
-outside the MSI-X Table supported by the device,
-as reported by \field{Table Size} in the MSI-X Capability.
+Note that mapping an event to vector might require device to
+allocate internal device resources, and thus could fail.
+
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
+
+A device that has an MSI-X capability SHOULD support at least 2
+and at most 0x800 MSI-X vectors.
+Device MUST report the number of vectors supported in
+\field{Table Size} in the MSI-X Capability as specified in
+\hyperref[intro:PCI]{[PCI]}.
+The device SHOULD restrict the reported MSI-X Table Size field
+to a value that might benefit system performance.
+\begin{note}
+For example, a device which does not expect to send
+interrupts at a high rate might only specify 2 MSI-X vectors.
+\end{note}
Device MUST support mapping any event type to any valid
vector 0 to MSI-X \field{Table Size}.
Device MUST support unmapping any event type.
-Reading these registers returns vector mapped to a given event,
-or NO_VECTOR if unmapped. All queue and configuration change
-events are unmapped by default.
+The device MUST return vector mapped to a given event,
+(NO_VECTOR if unmapped) on read of \field{config_msix_vector}/\field{queue_msix_vector}.
+The device MUST have all queue and configuration change
+events are unmapped upon reset.
-Note that mapping an event to vector might require device to
-allocate internal device resources, and MAY fail. Devices MUST report such
+Devices SHOULD NOT cause mapping an event to vector to fail
+unless it is impossible for the device to satisfy the mapping
+request. Devices MUST report mapping
failures by returning the NO_VECTOR value when the relevant
-Vector field is read. After mapping an event to vector, the
+\field{config_msix_vector}/\field{queue_msix_vector} field is read.
+
+\drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / MSI-X Vector Configuration}
+
+Driver MUST support device with any MSI-X Table Size 0 to 0x7FF.
+Driver MAY fall back on using INT\#x interrupts for a device
+which only supports one MSI-X vector (MSI-X Table Size = 0).
+
+Driver MAY intepret the Table Size as a hint from the device
+for the suggested number of MSI-X vectors to use.
+
+Driver MUST NOT attempt to map an event to a vector
+outside the MSI-X Table supported by the device,
+as reported by \field{Table Size} in the MSI-X Capability.
+
+After mapping an event to vector, the
driver MUST verify success by reading the Vector field value: on
success, the previously written value is returned, and on
failure, NO_VECTOR is returned. If a mapping failure is detected,
the driver MAY retry mapping with fewer vectors, disable MSI-X
or report device failure.
-Devices SHOULD NOT cause mapping an event to vector to fail
-unless it is impossible for the device to satisfy the mapping
-request.
-
\paragraph{Virtqueue Configuration}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Device Initialization / Virtqueue Configuration}
As a device can have zero or more virtqueues for bulk data
@@ -1439,13 +1537,12 @@ transport (for example, the simplest network device has two), the driver
needs to configure them as part of the device-specific
configuration.
-The driver does this as follows, for each virtqueue a device has:
+The driver typically does this as follows, for each virtqueue a device has:
\begin{enumerate}
\item Write the virtqueue index (first queue is 0) to \field{queue_select}.
-\item Read the virtqueue size from \field{queue_size}, which MUST
- be a power of 2. This controls how big the virtqueue is
+\item Read the virtqueue size from \field{queue_size}. This controls how big the virtqueue is
(see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues}). If this field is 0, the virtqueue does not exist.
\item Optionally, select a smaller virtqueue size and write it to \field{queue_size}.
@@ -1476,7 +1573,7 @@ of this virtqueue to the Queue Notify address. See \ref{sec:Virtio Transport Op
\subsubsection{Virtqueue Interrupts From The Device}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device}
-If an interrupt is necessary for a virtqueue, the device SHOULD:
+If an interrupt is necessary for a virtqueue, the device would typically act as follows:
\begin{itemize}
\item If MSI-X capability is disabled:
@@ -1488,31 +1585,18 @@ If an interrupt is necessary for a virtqueue, the device SHOULD:
\item If MSI-X capability is enabled:
\begin{enumerate}
- \item Request the appropriate MSI-X interrupt message for the
+ \item If \field{queue_msix_vector} is not NO_VECTOR,
+ request the appropriate MSI-X interrupt message for the
device, \field{queue_msix_vector} sets the MSI-X Table entry
number.
-
- \item If the vector value is NO_VECTOR, no interrupt
- message is requested for this event, so the device MUST NOT
- deliver an interrupt.
\end{enumerate}
\end{itemize}
-The driver interrupt handler SHOULD:
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Virtqueue Interrupts From The Device}
-\begin{itemize}
- \item If MSI-X capability is disabled: read the ISR Status field,
- which will reset it to zero. If the lower bit is zero, the
- interrupt was not for this device. Otherwise, the driver
- SHOULD look through the used rings of all virtqueues for the
- device, to see if any progress has been made by the device
- which requires servicing.
-
- \item If MSI-X capability is enabled: look through the used rings of
- all virtqueues mapped to the specific MSI-X vector for the
- device, to see if any progress has been made by the device
- which requires servicing.
-\end{itemize}
+If MSI-X capability is enabled and \field{queue_msix_vector} is
+NO_VECTOR for a virtqueue, the device MUST NOT deliver an interrupt
+for that virtqueue.
\subsubsection{Notification of Device Configuration Changes}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
@@ -1520,19 +1604,61 @@ Some virtio PCI devices can change the device configuration
state, as reflected in the device-specific region of the device. In this case:
\begin{itemize}
- \item If MSI-X capability is disabled: an interrupt is delivered and
- the second lowest bit is set in the ISR Status field to
- indicate that the driver should re-examine the configuration
- space. Note that a single interrupt can indicate both that one
- or more virtqueue has been used and that the configuration
- space has changed: even if the config bit is set, virtqueues
- MUST be scanned.
-
- \item If MSI-X capability is enabled: an interrupt message is
- requested. \field{config_msix_vector} sets the MSI-X Table
- entry number to use. If \field{config_msix_vector} is
- NO_VECTOR, no interrupt message is requested for this event and
- the device MUST NOT deliver an interrupt.
+ \item If MSI-X capability is disabled:
+ \begin{enumerate}
+ \item Set the second lower bit of the ISR Status field for the device.
+
+ \item Send the appropriate PCI interrupt for the device.
+ \end{enumerate}
+
+ \item If MSI-X capability is enabled:
+ \begin{enumerate}
+ \item If \field{config_msix_vector} is not NO_VECTOR,
+ request the appropriate MSI-X interrupt message for the
+ device, \field{config_msix_vector} sets the MSI-X Table entry
+ number.
+ \end{enumerate}
+\end{itemize}
+
+A single interrupt MAY indicate both that one or more virtqueue has
+been used and that the configuration space has changed.
+
+\devicenormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
+
+If MSI-X capability is enabled and \field{config_msix_vector} is
+NO_VECTOR, the device MUST NOT deliver an interrupt
+for device configuration space changes.
+
+\drivernormative{Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Notification of Device Configuration Changes}
+
+A driver MUST handle the case where the same interrupt is used to indicate
+both device configuration space change and one or more virtqueues being used.
+
+\subsubsection{Driver Handling Interrupts}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Initialization And Device Operation / Driver Handling Interrupts}
+The driver interrupt handler would typically:
+
+\begin{itemize}
+ \item If MSI-X capability is disabled:
+ \begin{itemize}
+ \item Read the ISR Status field, which will reset it to zero.
+ \item If the lower bit is set:
+ look through the used rings of all virtqueues for the
+ device, to see if any progress has been made by the device
+ which requires servicing.
+ \item If the second lower bit is set:
+ re-examine the configuration space to see what changed.
+ \end{itemize}
+ \item If MSI-X capability is enabled:
+ \begin{itemize}
+ \item
+ Look through the used rings of
+ all virtqueues mapped to the specific MSI-X vector for the
+ device, to see if any progress has been made by the device
+ which requires servicing.
+ \item
+ If the MSI-X vector is equal to \field{config_msix_vector},
+ re-examine the configuration space to see what changed.
+ \end{itemize}
\end{itemize}
\section{Virtio Over MMIO}\label{sec:Virtio Transport Options / Virtio Over MMIO}