summaryrefslogtreecommitdiff
path: root/virtio-v1.0-wd01-part1-specification.txt
diff options
context:
space:
mode:
Diffstat (limited to 'virtio-v1.0-wd01-part1-specification.txt')
-rw-r--r--virtio-v1.0-wd01-part1-specification.txt1766
1 files changed, 883 insertions, 883 deletions
diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
index 5c887f2..a887618 100644
--- a/virtio-v1.0-wd01-part1-specification.txt
+++ b/virtio-v1.0-wd01-part1-specification.txt
@@ -7,9 +7,9 @@ design they are not all that different from physical devices, and this
document treats them as such. This allows the guest to use standard
drivers and discovery mechanisms.
-The purpose of virtio and this specification is that virtual
-environments and guests should have a straightforward, efficient,
-standard and extensible mechanism for virtual devices, rather
+The purpose of virtio and this specification is that virtual
+environments and guests should have a straightforward, efficient,
+standard and extensible mechanism for virtual devices, rather
than boutique per-environment or per-OS mechanisms.
Straightforward: Virtio devices use normal bus mechanisms of
@@ -17,9 +17,9 @@ than boutique per-environment or per-OS mechanisms.
author. There is no exotic page-flipping or COW mechanism: it's just
a normal device.[1]
- Efficient: Virtio devices consist of rings of descriptors
- for input and output, which are neatly separated to avoid cache
- effects from both guest and device writing to the same cache
+ Efficient: Virtio devices consist of rings of descriptors
+ for input and output, which are neatly separated to avoid cache
+ effects from both guest and device writing to the same cache
lines.
Standard: Virtio makes no assumptions about the environment in which
@@ -27,10 +27,10 @@ than boutique per-environment or per-OS mechanisms.
devices are implemented over PCI and other buses, and earlier drafts
been implemented on other buses not included in this spec.[2]
- Extensible: Virtio PCI devices contain feature bits which are
- acknowledged by the guest operating system during device setup.
- This allows forwards and backwards compatibility: the device
- offers all the features it knows about, and the driver
+ Extensible: Virtio PCI devices contain feature bits which are
+ acknowledged by the guest operating system during device setup.
+ This allows forwards and backwards compatibility: the device
+ offers all the features it knows about, and the driver
acknowledges those it understands and wishes to use.
1.1.1. Key words
@@ -90,28 +90,28 @@ o One or more virtqueues
2.1.1. Device Status Field
-------------------------
-The Device Status field is updated by the guest to indicate its
-progress. This provides a simple low-level diagnostic: it's most
-useful to imagine them hooked up to traffic lights on the console
+The Device Status field is updated by the guest to indicate its
+progress. This provides a simple low-level diagnostic: it's most
+useful to imagine them hooked up to traffic lights on the console
indicating the status of each device.
This field is 0 upon reset, otherwise at least one bit should be set:
- ACKNOWLEDGE (1) Indicates that the guest OS has found the
+ ACKNOWLEDGE (1) Indicates that the guest OS has found the
device and recognized it as a valid virtio device.
- DRIVER (2) Indicates that the guest OS knows how to drive the
- device. Under Linux, drivers can be loadable modules so there
- may be a significant (or infinite) delay before setting this
+ DRIVER (2) Indicates that the guest OS knows how to drive the
+ device. Under Linux, drivers can be loadable modules so there
+ may be a significant (or infinite) delay before setting this
bit.
- DRIVER_OK (4) Indicates that the driver is set up and ready to
+ DRIVER_OK (4) Indicates that the driver is set up and ready to
drive the device.
- FAILED (128) Indicates that something went wrong in the guest,
- and it has given up on the device. This could be an internal
- error, or the driver didn't like the device for some reason, or
- even a fatal error during device operation. The device must be
+ FAILED (128) Indicates that something went wrong in the guest,
+ and it has given up on the device. This could be an internal
+ error, or the driver didn't like the device for some reason, or
+ even a fatal error during device operation. The device must be
reset before attempting to re-initialize.
2.1.2. Feature Bits
@@ -134,15 +134,15 @@ Feature bits are allocated as follows:
0 to 23: Feature bits for the specific device type
- 24 to 31: Feature bits reserved for extensions to the queue and
+ 24 to 31: Feature bits reserved for extensions to the queue and
feature negotiation mechanisms
-For example, feature bit 0 for a network device (i.e. Subsystem
-Device ID 1) indicates that the device supports checksumming of
+For example, feature bit 0 for a network device (i.e. Subsystem
+Device ID 1) indicates that the device supports checksumming of
packets.
-In particular, new fields in the device configuration space are
-indicated by offering a feature bit, so the guest can check
+In particular, new fields in the device configuration space are
+indicated by offering a feature bit, so the guest can check
before accessing that part of the configuration space.
2.1.3. Configuration Space
@@ -151,7 +151,7 @@ before accessing that part of the configuration space.
Configuration space is generally used for rarely-changing or
initialization-time parameters.
-Note that this space is generally the guest's native endian,
+Note that this space is generally the guest's native endian,
rather than PCI's little-endian.
2.1.4. Virtqueues
@@ -164,7 +164,7 @@ transmit and one for receive. Each queue has a 16-bit queue size
parameter, which sets the number of entries and implies the total size
of the queue.
-Each virtqueue occupies two or more physically-contiguous pages
+Each virtqueue occupies two or more physically-contiguous pages
(usually defined as 4096 bytes, but depending on the transport)
and consists of three parts:
@@ -189,10 +189,10 @@ virtqueue layout structure looks like this:
struct vring {
// The actual descriptors (16 bytes each)
struct vring_desc desc[ Queue Size ];
-
+
// A ring of available descriptor heads with free-running index.
struct vring_avail avail;
-
+
// Padding to the next PAGE_SIZE boundary.
char pad[ Padding ];
@@ -200,10 +200,10 @@ virtqueue layout structure looks like this:
struct vring_used used;
};
-When the driver wants to send a buffer to the device, it fills in
-a slot in the descriptor table (or chains several together), and
-writes the descriptor index into the available ring. It then
-notifies the device. When the device has finished a buffer, it
+When the driver wants to send a buffer to the device, it fills in
+a slot in the descriptor table (or chains several together), and
+writes the descriptor index into the available ring. It then
+notifies the device. When the device has finished a buffer, it
writes the descriptor into the used ring, and sends an interrupt.
2.1.4.1. A Note on Virtqueue Endianness
@@ -240,10 +240,10 @@ dividing a network packet into 1500 single-byte descriptors!
2.1.4.3. The Virtqueue Descriptor Table
--------------------------------------
-The descriptor table refers to the buffers the guest is using for
-the device. The addresses are physical addresses, and the buffers
-can be chained via the next field. Each descriptor describes a
-buffer which is read-only or write-only, but a chain of
+The descriptor table refers to the buffers the guest is using for
+the device. The addresses are physical addresses, and the buffers
+can be chained via the next field. Each descriptor describes a
+buffer which is read-only or write-only, but a chain of
descriptors can contain both read-only and write-only buffers.
No descriptor chain may be more than 2^32 bytes long in total.
@@ -253,13 +253,13 @@ No descriptor chain may be more than 2^32 bytes long in total.
u64 addr;
/* Length. */
u32 len;
-
+
/* This marks a buffer as continuing via the next field. */
#define VRING_DESC_F_NEXT 1
/* This marks a buffer as write-only (otherwise read-only). */
#define VRING_DESC_F_WRITE 2
/* This means the buffer contains a list of buffer descriptors. */
- #define VRING_DESC_F_INDIRECT 4
+ #define VRING_DESC_F_INDIRECT 4
/* The flags as indicated above. */
u16 flags;
/* Next field if flags & NEXT */
@@ -272,16 +272,16 @@ for this virtqueue.
2.1.4.3.1. Indirect Descriptors
------------------------------
-Some devices benefit by concurrently dispatching a large number
-of large requests. The VIRTIO_RING_F_INDIRECT_DESC feature can be
-used to allow this (see "2.6. Reserved Feature Bits"). To increase
-ring capacity it is possible to store a table of indirect
-descriptors anywhere in memory, and insert a descriptor in main
-virtqueue (with flags&VRING_DESC_F_INDIRECT on) that refers to memory buffer
-containing this indirect descriptor table; fields addr and len
-refer to the indirect table address and length in bytes,
-respectively. The indirect table layout structure looks like this
-(len is the length of the descriptor that refers to this table,
+Some devices benefit by concurrently dispatching a large number
+of large requests. The VIRTIO_RING_F_INDIRECT_DESC feature can be
+used to allow this (see "2.6. Reserved Feature Bits"). To increase
+ring capacity it is possible to store a table of indirect
+descriptors anywhere in memory, and insert a descriptor in main
+virtqueue (with flags&VRING_DESC_F_INDIRECT on) that refers to memory buffer
+containing this indirect descriptor table; fields addr and len
+refer to the indirect table address and length in bytes,
+respectively. The indirect table layout structure looks like this
+(len is the length of the descriptor that refers to this table,
which is a variable, so this code won't compile):
struct indirect_descriptor_table {
@@ -289,15 +289,15 @@ which is a variable, so this code won't compile):
struct vring_desc desc[len / 16];
};
-The first indirect descriptor is located at start of the indirect
-descriptor table (index 0), additional indirect descriptors are
-chained by next field. An indirect descriptor without next field
-(with flags&VRING_DESC_F_NEXT off) signals the end of the indirect descriptor
-table, and transfers control back to the main virtqueue. An
-indirect descriptor can not refer to another indirect descriptor
-table (flags&VRING_DESC_F_INDIRECT must be off). A single indirect descriptor
-table can include both read-only and write-only descriptors;
-write-only flag (flags&VRING_DESC_F_WRITE) in the descriptor that refers to it
+The first indirect descriptor is located at start of the indirect
+descriptor table (index 0), additional indirect descriptors are
+chained by next field. An indirect descriptor without next field
+(with flags&VRING_DESC_F_NEXT off) signals the end of the indirect descriptor
+table, and transfers control back to the main virtqueue. An
+indirect descriptor can not refer to another indirect descriptor
+table (flags&VRING_DESC_F_INDIRECT must be off). A single indirect descriptor
+table can include both read-only and write-only descriptors;
+write-only flag (flags&VRING_DESC_F_WRITE) in the descriptor that refers to it
is ignored.
2.1.4.4. The Virtqueue Available Ring
@@ -315,7 +315,7 @@ the device is controlled by the VIRTIO_RING_F_EVENT_IDX feature bit
(see "2.6. Reserved Feature Bits"). This interrupt suppression is
merely an optimization; it may not suppress interrupts entirely.
-The “idx” field indicates where we would put the next descriptor
+The “idx” field indicates where we would put the next descriptor
entry (modulo the queue size). This starts at 0, and increases.
struct vring_avail {
@@ -324,29 +324,29 @@ entry (modulo the queue size). This starts at 0, and increases.
u16 idx;
u16 ring[ /* Queue Size */ ];
u16 used_event; /* Only if VIRTIO_RING_F_EVENT_IDX */
- };
+ };
2.1.4.5. The Virtqueue Used Ring
-------------------------------
-The used ring is where the device returns buffers once it is done
-with them. The flags field can be used by the device to hint that
-no notification is necessary when the guest adds to the available
-ring. Alternatively, the “avail_event” field can be used by the
-device to hint that no notification is necessary until an entry
-with an index specified by the “avail_event” is written in the
-available ring (equivalently, until the idx field in the
-available ring will reach the value avail_event + 1). The method
-employed by the device is controlled by the guest through the
+The used ring is where the device returns buffers once it is done
+with them. The flags field can be used by the device to hint that
+no notification is necessary when the guest adds to the available
+ring. Alternatively, the “avail_event” field can be used by the
+device to hint that no notification is necessary until an entry
+with an index specified by the “avail_event” is written in the
+available ring (equivalently, until the idx field in the
+available ring will reach the value avail_event + 1). The method
+employed by the device is controlled by the guest through the
VIRTIO_RING_F_EVENT_IDX feature bit (see "2.6. Reserved
Feature Bits").[7]
-Each entry in the ring is a pair: the head entry of the
-descriptor chain describing the buffer (this matches an entry
-placed in the available ring by the guest earlier), and the total
-of bytes written into the buffer. The latter is extremely useful
-for guests using untrusted buffers: if you do not know exactly
-how much has been written by the device, you usually have to zero
+Each entry in the ring is a pair: the head entry of the
+descriptor chain describing the buffer (this matches an entry
+placed in the available ring by the guest earlier), and the total
+of bytes written into the buffer. The latter is extremely useful
+for guests using untrusted buffers: if you do not know exactly
+how much has been written by the device, you usually have to zero
the buffer to ensure no data leakage occurs.
/* u32 is used here for ids for padding reasons. */
@@ -358,7 +358,7 @@ the buffer to ensure no data leakage occurs.
};
struct vring_used {
- #define VRING_USED_F_NO_NOTIFY 1
+ #define VRING_USED_F_NO_NOTIFY 1
u16 flags;
u16 idx;
struct vring_used_elem ring[ /* Queue Size */];
@@ -368,11 +368,11 @@ the buffer to ensure no data leakage occurs.
2.1.4.6. Helpers for Operating Virtqueues
----------------------------------------
-The Linux Kernel Source code contains the definitions above and
-helper routines in a more usable form, in
-include/linux/virtio_ring.h. This was explicitly licensed by IBM
-and Red Hat under the (3-clause) BSD license so that it can be
-freely used by all other projects, and is reproduced (with slight
+The Linux Kernel Source code contains the definitions above and
+helper routines in a more usable form, in
+include/linux/virtio_ring.h. This was explicitly licensed by IBM
+and Red Hat under the (3-clause) BSD license so that it can be
+freely used by all other projects, and is reproduced (with slight
variation to remove Linux assumptions) in *XREF*.
2.2. General Initialization And Device Operation
@@ -392,73 +392,73 @@ how to communicate with the specific device.
3. The DRIVER status bit is set: we know how to drive the device.
-4. Device-specific setup, including reading the device feature
+4. Device-specific setup, including reading the device feature
bits, discovery of virtqueues for the device, optional per-bus
- setup, and reading and possibly writing the device's virtio
+ setup, and reading and possibly writing the device's virtio
configuration space.
-5. The subset of device feature bits understood by the driver is
+5. The subset of device feature bits understood by the driver is
written to the device.
6. The DRIVER_OK status bit is set.
-7. The device can now be used (ie. buffers added to the
+7. The device can now be used (ie. buffers added to the
virtqueues)[4]
-If any of these steps go irrecoverably wrong, the guest should
-set the FAILED status bit to indicate that it has given up on the
+If any of these steps go irrecoverably wrong, the guest should
+set the FAILED status bit to indicate that it has given up on the
device (it can reset the device later to restart if desired).
2.2.2. Device Operation
----------------------
-There are two parts to device operation: supplying new buffers to
-the device, and processing used buffers from the device. As an
-example, the simplest virtio network device has two virtqueues: the
-transmit virtqueue and the receive virtqueue. The driver adds
-outgoing (read-only) packets to the transmit virtqueue, and then
-frees them after they are used. Similarly, incoming (write-only)
-buffers are added to the receive virtqueue, and processed after
+There are two parts to device operation: supplying new buffers to
+the device, and processing used buffers from the device. As an
+example, the simplest virtio network device has two virtqueues: the
+transmit virtqueue and the receive virtqueue. The driver adds
+outgoing (read-only) packets to the transmit virtqueue, and then
+frees them after they are used. Similarly, incoming (write-only)
+buffers are added to the receive virtqueue, and processed after
they are used.
2.2.2.1. Supplying Buffers to The Device
---------------------------------------
-Actual transfer of buffers from the guest OS to the device
+Actual transfer of buffers from the guest OS to the device
operates as follows:
1. Place the buffer(s) into free descriptor(s).
- (a) If there are no free descriptors, the guest may choose to
- notify the device even if notifications are suppressed (to
+ (a) If there are no free descriptors, the guest may choose to
+ notify the device even if notifications are suppressed (to
reduce latency).[8]
-2. Place the id of the buffer in the next ring entry of the
+2. Place the id of the buffer in the next ring entry of the
available ring.
-3. The steps (1) and (2) may be performed repeatedly if batching
+3. The steps (1) and (2) may be performed repeatedly if batching
is possible.
-4. A memory barrier should be executed to ensure the device sees
- the updated descriptor table and available ring before the next
+4. A memory barrier should be executed to ensure the device sees
+ the updated descriptor table and available ring before the next
step.
-5. The available “idx” field should be increased by the number of
+5. The available “idx” field should be increased by the number of
entries added to the available ring.
-6. A memory barrier should be executed to ensure that we update
+6. A memory barrier should be executed to ensure that we update
the idx field before checking for notification suppression.
-7. If notifications are not suppressed, the device should be
+7. If notifications are not suppressed, the device should be
notified of the new buffers.
-Note that the above code does not take precautions against the
-available ring buffer wrapping around: this is not possible since
-the ring buffer is the same size as the descriptor table, so step
+Note that the above code does not take precautions against the
+available ring buffer wrapping around: this is not possible since
+the ring buffer is the same size as the descriptor table, so step
(1) will prevent such a condition.
-In addition, the maximum queue size is 32768 (it must be a power
-of 2 which fits in 16 bits), so the 16-bit “idx” value can always
+In addition, the maximum queue size is 32768 (it must be a power
+of 2 which fits in 16 bits), so the 16-bit “idx” value can always
distinguish between a full and empty buffer.
Here is a description of each stage in more detail.
@@ -466,9 +466,9 @@ Here is a description of each stage in more detail.
2.2.2.1.1. Placing Buffers Into The Descriptor Table
---------------------------------------------------
-A buffer consists of zero or more read-only physically-contiguous
-elements followed by zero or more physically-contiguous
-write-only elements (it must have at least one element). This
+A buffer consists of zero or more read-only physically-contiguous
+elements followed by zero or more physically-contiguous
+write-only elements (it must have at least one element). This
algorithm maps it into the descriptor table:
for each buffer element, b:
@@ -479,30 +479,30 @@ for each buffer element, b:
(c) Set d.len to the length of b.
- (d) If b is write-only, set d.flags to VRING_DESC_F_WRITE,
+ (d) If b is write-only, set d.flags to VRING_DESC_F_WRITE,
otherwise 0.
(e) If there is a buffer element after this:
- i. Set d.next to the index of the next free descriptor
+ i. Set d.next to the index of the next free descriptor
element.
ii. Set the VRING_DESC_F_NEXT bit in d.flags.
-In practice, the d.next fields are usually used to chain free
-descriptors, and a separate count kept to check there are enough
+In practice, the d.next fields are usually used to chain free
+descriptors, and a separate count kept to check there are enough
free descriptors before beginning the mappings.
2.2.2.1.2. Updating The Available Ring
-------------------------------------
-The head of the buffer we mapped is the first d in the algorithm
+The head of the buffer we mapped is the first d in the algorithm
above. A naive implementation would do the following:
avail->ring[avail->idx % qsz] = head;
-However, in general we can add many descriptor chains before we update
-the “idx” field (at which point they become visible to the
+However, in general we can add many descriptor chains before we update
+the “idx” field (at which point they become visible to the
device), so we keep a counter of how many we've added:
avail->ring[(avail->idx + added++) % qsz] = head;
@@ -510,13 +510,13 @@ device), so we keep a counter of how many we've added:
2.2.2.1.3. Updating The Index Field
----------------------------------
-Once the index field of the virtqueue is updated, the device will
-be able to access the descriptor chains we've created and the
-memory they refer to. This is why a memory barrier is generally
-used before the index update, to ensure it sees the most up-to-date
+Once the index field of the virtqueue is updated, the device will
+be able to access the descriptor chains we've created and the
+memory they refer to. This is why a memory barrier is generally
+used before the index update, to ensure it sees the most up-to-date
copy.
-The index field always increments, and we let it wrap naturally at
+The index field always increments, and we let it wrap naturally at
65536:
avail->idx += added;
@@ -525,21 +525,21 @@ The index field always increments, and we let it wrap naturally at
------------------------------
The actual method of device notification is bus-specific, but generally
-it can be expensive. So the device can suppress such notifications if it
+it can be expensive. So the device can suppress such notifications if it
doesn't need them. We have to be careful to expose the new index
-value before checking if notifications are suppressed: it's OK to notify
-gratuitously, but not to omit a required notification. So again,
-we use a memory barrier here before reading the flags or the
+value before checking if notifications are suppressed: it's OK to notify
+gratuitously, but not to omit a required notification. So again,
+we use a memory barrier here before reading the flags or the
avail_event field.
If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated, and if the
VRING_USED_F_NOTIFY flag is not set, we go ahead and notify the
device.
-If the VIRTIO_F_RING_EVENT_IDX feature is negotiated, we read the
-avail_event field in the available ring structure. If the
-available index crossed_the avail_event field value since the
-last notification, we go ahead and write to the PCI configuration
+If the VIRTIO_F_RING_EVENT_IDX feature is negotiated, we read the
+avail_event field in the available ring structure. If the
+available index crossed_the avail_event field value since the
+last notification, we go ahead and write to the PCI configuration
space. The avail_event field wraps naturally at 65536 as well,
iving the following algorithm for calculating whether a device needs
notification:
@@ -549,46 +549,46 @@ notification:
2.2.2.2. Receiving Used Buffers From The Device
----------------------------------------------
-Once the device has used a buffer (read from or written to it, or
-parts of both, depending on the nature of the virtqueue and the
-device), it sends an interrupt, following an algorithm very
-similar to the algorithm used for the driver to send the device a
+Once the device has used a buffer (read from or written to it, or
+parts of both, depending on the nature of the virtqueue and the
+device), it sends an interrupt, following an algorithm very
+similar to the algorithm used for the driver to send the device a
buffer:
-1. Write the head descriptor number to the next field in the used
+1. Write the head descriptor number to the next field in the used
ring.
2. Update the used ring index.
3. Deliver an interrupt if necessary:
- (a) If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated:
- check if the VRING_AVAIL_F_NO_INTERRUPT flag is not set in
+ (a) If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated:
+ check if the VRING_AVAIL_F_NO_INTERRUPT flag is not set in
avail->flags.
- (b) If the VIRTIO_F_RING_EVENT_IDX feature is negotiated: check
- whether the used index crossed the used_event field value
- since the last update. The used_event field wraps naturally
+ (b) If the VIRTIO_F_RING_EVENT_IDX feature is negotiated: check
+ whether the used index crossed the used_event field value
+ since the last update. The used_event field wraps naturally
at 65536 as well:
(u16)(new_idx - used_event - 1) < (u16)(new_idx - old_idx)
-For each ring, guest should then disable interrupts by writing
-VRING_AVAIL_F_NO_INTERRUPT flag in avail structure, if required.
-It can then process used ring entries finally enabling interrupts
-by clearing the VRING_AVAIL_F_NO_INTERRUPT flag or updating the
-EVENT_IDX field in the available structure. The guest should then
-execute a memory barrier, and then recheck the ring empty
-condition. This is necessary to handle the case where after the
-last check and before enabling interrupts, an interrupt has been
+For each ring, guest should then disable interrupts by writing
+VRING_AVAIL_F_NO_INTERRUPT flag in avail structure, if required.
+It can then process used ring entries finally enabling interrupts
+by clearing the VRING_AVAIL_F_NO_INTERRUPT flag or updating the
+EVENT_IDX field in the available structure. The guest should then
+execute a memory barrier, and then recheck the ring empty
+condition. This is necessary to handle the case where after the
+last check and before enabling interrupts, an interrupt has been
suppressed by the device:
vring_disable_interrupts(vq);
-
+
for (;;) {
if (vq->last_seen_used != vring->used.idx) {
vring_enable_interrupts(vq);
mb();
-
+
if (vq->last_seen_used != vring->used.idx)
break;
}
@@ -624,16 +624,16 @@ Any PCI device with Vendor ID 0x1AF4, and Device ID 0x1000 through
0x103F inclusive is a virtio device[3]. The device must also have a
Revision ID of 0 to match this specification.
-The Subsystem Device ID indicates which virtio device is
-supported by the device. The Subsystem Vendor ID should reflect
-the PCI Vendor ID of the environment (it's currently only used
+The Subsystem Device ID indicates which virtio device is
+supported by the device. The Subsystem Vendor ID should reflect
+the PCI Vendor ID of the environment (it's currently only used
for informational purposes by the guest).
2.4.1.2. PCI Device Layout
-------------------------
-To configure the device, we use the first I/O region of the PCI
-device. This contains a virtio header followed by a
+To configure the device, we use the first I/O region of the PCI
+device. This contains a virtio header followed by a
device-specific region.
There may be different widths of accesses to the I/O region; the
@@ -642,9 +642,9 @@ used (i.e. 32-bit accesses for 32-bit fields, etc), but the
device-specific region can be accessed using any width accesses, and
should obtain the same results.
-Note that this is possible because while the virtio header is PCI
-(i.e. little) endian, the device-specific region is encoded in
-the native endian of the guest (where such distinction is
+Note that this is possible because while the virtio header is PCI
+(i.e. little) endian, the device-specific region is encoded in
+the native endian of the guest (where such distinction is
applicable).
2.4.1.2.1. PCI Device Virtio Header
@@ -662,7 +662,7 @@ The virtio header looks as follows:
+------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
-If MSI-X is enabled for the device, two additional fields
+If MSI-X is enabled for the device, two additional fields
immediately follow this header:[5]
@@ -676,7 +676,7 @@ immediately follow this header:[5]
| (MSI-X) || Vector | Vector |
+------------++----------------+--------+
-Immediately following these general headers, there may be
+Immediately following these general headers, there may be
device-specific headers:
+------------++--------------------+
@@ -701,70 +701,70 @@ The page size for a virtqueue on a PCI virtio device is defined as
2.4.1.3.1.1. Queue Vector Configuration
--------------------------------------
-When MSI-X capability is present and enabled in the device
-(through standard PCI configuration space) 4 bytes at byte offset
-20 are used to map configuration change and queue interrupts to
-MSI-X vectors. In this case, the ISR Status field is unused, and
-device specific configuration starts at byte offset 24 in virtio
-header structure. When MSI-X capability is not enabled, device
+When MSI-X capability is present and enabled in the device
+(through standard PCI configuration space) 4 bytes at byte offset
+20 are used to map configuration change and queue interrupts to
+MSI-X vectors. In this case, the ISR Status field is unused, and
+device specific configuration starts at byte offset 24 in virtio
+header structure. When MSI-X capability is not enabled, device
specific configuration starts at byte offset 20 in virtio header.
-Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of
-Configuration/Queue Vector registers, maps interrupts triggered
-by the configuration change/selected queue events respectively to
-the corresponding MSI-X vector. To disable interrupts for a
-specific event type, unmap it by writing a special NO_VECTOR
+Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of
+Configuration/Queue Vector registers, maps interrupts triggered
+by the configuration change/selected queue events respectively to
+the corresponding MSI-X vector. To disable interrupts for a
+specific event type, unmap it by writing a special NO_VECTOR
value:
/* Vector value used to disable MSI for queue */
- #define VIRTIO_MSI_NO_VECTOR 0xffff
+ #define VIRTIO_MSI_NO_VECTOR 0xffff
-Reading these registers returns vector mapped to a given event,
-or NO_VECTOR if unmapped. All queue and configuration change
+Reading these registers returns vector mapped to a given event,
+or NO_VECTOR if unmapped. All queue and configuration change
events are unmapped by default.
-Note that mapping an event to vector might require allocating
-internal device resources, and might fail. Devices report such
-failures by returning the NO_VECTOR value when the relevant
-Vector field is read. After mapping an event to vector, the
-driver must verify success by reading the Vector field value: on
-success, the previously written value is returned, and on
-failure, NO_VECTOR is returned. If a mapping failure is detected,
+Note that mapping an event to vector might require allocating
+internal device resources, and might fail. Devices report such
+failures by returning the NO_VECTOR value when the relevant
+Vector field is read. After mapping an event to vector, the
+driver must verify success by reading the Vector field value: on
+success, the previously written value is returned, and on
+failure, NO_VECTOR is returned. If a mapping failure is detected,
the driver can retry mapping with fewervectors, or disable MSI-X.
2.4.1.3.1.2. Virtqueue Configuration
-----------------------------------
-As a device can have zero or more virtqueues for bulk data
-transport (for example, the simplest network device has two), the driver
-needs to configure them as part of the device-specific
+As a device can have zero or more virtqueues for bulk data
+transport (for example, the simplest network device has two), the driver
+needs to configure them as part of the device-specific
configuration.
This is done as follows, for each virtqueue a device has:
-1. Write the virtqueue index (first queue is 0) to the Queue
+1. Write the virtqueue index (first queue is 0) to the Queue
Select field.
-2. Read the virtqueue size from the Queue Size field, which is
- always a power of 2. This controls how big the virtqueue is
- (see "2.1.4. Virtqueues"). If this field is 0, the virtqueue does not exist.
+2. Read the virtqueue size from the Queue Size field, which is
+ always a power of 2. This controls how big the virtqueue is
+ (see "2.1.4. Virtqueues"). If this field is 0, the virtqueue does not exist.
-3. Allocate and zero virtqueue in contiguous physical memory, on
- a 4096 byte alignment. Write the physical address, divided by
+3. Allocate and zero virtqueue in contiguous physical memory, on
+ a 4096 byte alignment. Write the physical address, divided by
4096 to the Queue Address field.[6]
-4. Optionally, if MSI-X capability is present and enabled on the
- device, select a vector to use to request interrupts triggered
- by virtqueue events. Write the MSI-X Table entry number
- corresponding to this vector in Queue Vector field. Read the
- Queue Vector field: on success, previously written value is
+4. Optionally, if MSI-X capability is present and enabled on the
+ device, select a vector to use to request interrupts triggered
+ by virtqueue events. Write the MSI-X Table entry number
+ corresponding to this vector in Queue Vector field. Read the
+ Queue Vector field: on success, previously written value is
returned; on failure, NO_VECTOR value is returned.
2.4.1.3.2. Notifying The Device
------------------------------
-Device notification occurs by writing the 16-bit virtqueue index
-of this virtqueue to the Queue Notify field of the virtio header
+Device notification occurs by writing the 16-bit virtqueue index
+of this virtqueue to the Queue Notify field of the virtio header
in the first I/O region of the PCI device.
2.4.1.3.3. Virtqueue Interrupts From The Device
@@ -780,58 +780,58 @@ If an interrupt is necessary:
(b) If MSI-X capability is enabled:
- i. Request the appropriate MSI-X interrupt message for the
- device, Queue Vector field sets the MSI-X Table entry
+ i. Request the appropriate MSI-X interrupt message for the
+ device, Queue Vector field sets the MSI-X Table entry
number.
- ii. If Queue Vector field value is NO_VECTOR, no interrupt
+ ii. If Queue Vector field value is NO_VECTOR, no interrupt
message is requested for this event.
The guest interrupt handler should:
-1. If MSI-X capability is disabled: read the ISR Status field,
- which will reset it to zero. If the lower bit is zero, the
- interrupt was not for this device. Otherwise, the guest driver
- should look through the used rings of each virtqueue for the
- device, to see if any progress has been made by the device
+1. If MSI-X capability is disabled: read the ISR Status field,
+ which will reset it to zero. If the lower bit is zero, the
+ interrupt was not for this device. Otherwise, the guest driver
+ should look through the used rings of each virtqueue for the
+ device, to see if any progress has been made by the device
which requires servicing.
-2. If MSI-X capability is enabled: look through the used rings of
- each virtqueue mapped to the specific MSI-X vector for the
- device, to see if any progress has been made by the device
+2. If MSI-X capability is enabled: look through the used rings of
+ each virtqueue mapped to the specific MSI-X vector for the
+ device, to see if any progress has been made by the device
which requires servicing.
2.4.1.3.4. Notification of Device Configuration Changes
------------------------------------------------------
-Some virtio PCI devices can change the device configuration
-state, as reflected in the virtio header in the PCI configuration
+Some virtio PCI devices can change the device configuration
+state, as reflected in the virtio header in the PCI configuration
space. In this case:
-1. If MSI-X capability is disabled: an interrupt is delivered and
- the second highest bit is set in the ISR Status field to
- indicate that the driver should re-examine the configuration
- space. Note that a single interrupt can indicate both that one
- or more virtqueue has been used and that the configuration
- space has changed: even if the config bit is set, virtqueues
+1. If MSI-X capability is disabled: an interrupt is delivered and
+ the second highest bit is set in the ISR Status field to
+ indicate that the driver should re-examine the configuration
+ space. Note that a single interrupt can indicate both that one
+ or more virtqueue has been used and that the configuration
+ space has changed: even if the config bit is set, virtqueues
must be scanned.
-2. If MSI-X capability is enabled: an interrupt message is
- requested. The Configuration Vector field sets the MSI-X Table
- entry number to use. If Configuration Vector field value is
+2. If MSI-X capability is enabled: an interrupt message is
+ requested. The Configuration Vector field sets the MSI-X Table
+ entry number to use. If Configuration Vector field value is
NO_VECTOR, no interrupt message is requested for this event.
2.4.2. Virtio Over MMIO
----------------------
-Virtual environments without PCI support (a common situation in
+Virtual environments without PCI support (a common situation in
embedded devices models) might use simple memory mapped device (“
virtio-mmio”) instead of the PCI device.
-The memory mapped virtio device behaviour is based on the PCI
-device specification. Therefore most of operations like device
-initialization, queues configuration and buffer transfers are
-nearly identical. Existing differences are described in the
+The memory mapped virtio device behaviour is based on the PCI
+device specification. Therefore most of operations like device
+initialization, queues configuration and buffer transfers are
+nearly identical. Existing differences are described in the
following sections.
2.4.2.1. MMIO Device Discovery
@@ -849,154 +849,154 @@ a device-tree such as Linux's dtc or Open Firmware, the suggested format is:
2.4.2.2. MMIO Device Layout
--------------------------
-MMIO virtio devices provides a set of memory mapped control
-registers, all 32 bits wide, followed by device-specific
+MMIO virtio devices provides a set of memory mapped control
+registers, all 32 bits wide, followed by device-specific
configuration space. The following list presents their layout:
-• Offset from the device base address | Direction | Name
- Description
+• Offset from the device base address | Direction | Name
+ Description
-• 0x000 | R | MagicValue
- “virt” string.
+• 0x000 | R | MagicValue
+ “virt” string.
-• 0x004 | R | Version
- Device version number. Currently must be 1.
+• 0x004 | R | Version
+ Device version number. Currently must be 1.
-• 0x008 | R | DeviceID
- Virtio Subsystem Device ID (ie. 1 for network card).
+• 0x008 | R | DeviceID
+ Virtio Subsystem Device ID (ie. 1 for network card).
-• 0x00c | R | VendorID
- Virtio Subsystem Vendor ID.
+• 0x00c | R | VendorID
+ Virtio Subsystem Vendor ID.
-• 0x010 | R | HostFeatures
+• 0x010 | R | HostFeatures
Flags representing features the device supports.
- Reading from this register returns 32 consecutive flag bits,
- first bit depending on the last value written to
+ Reading from this register returns 32 consecutive flag bits,
+ first bit depending on the last value written to
HostFeaturesSel register. Access to this register returns bits HostFeaturesSel*32
- to (HostFeaturesSel*32)+31, eg. feature bits 0 to 31 if
- HostFeaturesSel is set to 0 and features bits 32 to 63 if
+ to (HostFeaturesSel*32)+31, eg. feature bits 0 to 31 if
+ HostFeaturesSel is set to 0 and features bits 32 to 63 if
HostFeaturesSel is set to 1. Also see [sub:Feature-Bits]
-• 0x014 | W | HostFeaturesSel
+• 0x014 | W | HostFeaturesSel
Device (Host) features word selection.
- Writing to this register selects a set of 32 device feature bits
- accessible by reading from HostFeatures register. Device driver
- must write a value to the HostFeaturesSel register before
- reading from the HostFeatures register.
+ Writing to this register selects a set of 32 device feature bits
+ accessible by reading from HostFeatures register. Device driver
+ must write a value to the HostFeaturesSel register before
+ reading from the HostFeatures register.
-• 0x020 | W | GuestFeatures
- Flags representing device features understood and activated by
+• 0x020 | W | GuestFeatures
+ Flags representing device features understood and activated by
the driver.
- Writing to this register sets 32 consecutive flag bits, first
- bit depending on the last value written to GuestFeaturesSel
+ Writing to this register sets 32 consecutive flag bits, first
+ bit depending on the last value written to GuestFeaturesSel
register. Access to this register sets bits GuestFeaturesSel*32
- to (GuestFeaturesSel*32)+31, eg. feature bits 0 to 31 if
- GuestFeaturesSel is set to 0 and features bits 32 to 63 if
+ to (GuestFeaturesSel*32)+31, eg. feature bits 0 to 31 if
+ GuestFeaturesSel is set to 0 and features bits 32 to 63 if
GuestFeaturesSel is set to 1. Also see [sub:Feature-Bits]
-• 0x024 | W | GuestFeaturesSel
+• 0x024 | W | GuestFeaturesSel
Activated (Guest) features word selection.
- Writing to this register selects a set of 32 activated feature
- bits accessible by writing to the GuestFeatures register.
- Device driver must write a value to the GuestFeaturesSel
- register before writing to the GuestFeatures register.
+ Writing to this register selects a set of 32 activated feature
+ bits accessible by writing to the GuestFeatures register.
+ Device driver must write a value to the GuestFeaturesSel
+ register before writing to the GuestFeatures register.
-• 0x028 | W | GuestPageSize
+• 0x028 | W | GuestPageSize
Guest page size.
- Device driver must write the guest page size in bytes to the
- register during initialization, before any queues are used.
- This value must be a power of 2 and is used by the Host to
- calculate Guest address of the first queue page (see QueuePFN).
+ Device driver must write the guest page size in bytes to the
+ register during initialization, before any queues are used.
+ This value must be a power of 2 and is used by the Host to
+ calculate Guest address of the first queue page (see QueuePFN).
-• 0x030 | W | QueueSel
+• 0x030 | W | QueueSel
Virtual queue index (first queue is 0).
- Writing to this register selects the virtual queue that the
- following operations on QueueNum, QueueAlign and QueuePFN apply
- to.
-
-• 0x034 | R | QueueNumMax
- Maximum virtual queue size.
- Reading from the register returns the maximum size of the queue
- the Host is ready to process or zero (0x0) if the queue is not
- available. This applies to the queue selected by writing to
- QueueSel and is allowed only when QueuePFN is set to zero
- (0x0), so when the queue is not actively used.
-
-• 0x038 | W | QueueNum
+ Writing to this register selects the virtual queue that the
+ following operations on QueueNum, QueueAlign and QueuePFN apply
+ to.
+
+• 0x034 | R | QueueNumMax
+ Maximum virtual queue size.
+ Reading from the register returns the maximum size of the queue
+ the Host is ready to process or zero (0x0) if the queue is not
+ available. This applies to the queue selected by writing to
+ QueueSel and is allowed only when QueuePFN is set to zero
+ (0x0), so when the queue is not actively used.
+
+• 0x038 | W | QueueNum
Virtual queue size.
- Queue size is the number of elements in the queue, therefore size
+ Queue size is the number of elements in the queue, therefore size
of the descriptor table and both available and used rings.
- Writing to this register notifies the Host what size of the
- queue the Guest will use. This applies to the queue selected by
- writing to QueueSel.
+ Writing to this register notifies the Host what size of the
+ queue the Guest will use. This applies to the queue selected by
+ writing to QueueSel.
-• 0x03c | W | QueueAlign
+• 0x03c | W | QueueAlign
Used Ring alignment in the virtual queue.
- Writing to this register notifies the Host about alignment
- boundary of the Used Ring in bytes. This value must be a power
- of 2 and applies to the queue selected by writing to QueueSel.
+ Writing to this register notifies the Host about alignment
+ boundary of the Used Ring in bytes. This value must be a power
+ of 2 and applies to the queue selected by writing to QueueSel.
-• 0x040 | RW | QueuePFN
+• 0x040 | RW | QueuePFN
Guest physical page number of the virtual queue.
- Writing to this register notifies the host about location of the
- virtual queue in the Guest's physical address space. This value
- is the index number of a page starting with the queue
- Descriptor Table. Value zero (0x0) means physical address zero
- (0x00000000) and is illegal. When the Guest stops using the
+ Writing to this register notifies the host about location of the
+ virtual queue in the Guest's physical address space. This value
+ is the index number of a page starting with the queue
+ Descriptor Table. Value zero (0x0) means physical address zero
+ (0x00000000) and is illegal. When the Guest stops using the
queue it must write zero (0x0) to this register.
- Reading from this register returns the currently used page
- number of the queue, therefore a value other than zero (0x0)
+ Reading from this register returns the currently used page
+ number of the queue, therefore a value other than zero (0x0)
means that the queue is in use.
- Both read and write accesses apply to the queue selected by
- writing to QueueSel.
+ Both read and write accesses apply to the queue selected by
+ writing to QueueSel.
-• 0x050 | W | QueueNotify
+• 0x050 | W | QueueNotify
Queue notifier.
- Writing a queue index to this register notifies the Host that
- there are new buffers to process in the queue.
+ Writing a queue index to this register notifies the Host that
+ there are new buffers to process in the queue.
• 0x60 | R | InterruptStatus
Interrupt status.
-Reading from this register returns a bit mask of interrupts
- asserted by the device. An interrupt is asserted if the
+Reading from this register returns a bit mask of interrupts
+ asserted by the device. An interrupt is asserted if the
corresponding bit is set, ie. equals one (1).
– Bit 0 | Used Ring Update
- This interrupt is asserted when the Host has updated the Used
+ This interrupt is asserted when the Host has updated the Used
Ring in at least one of the active virtual queues.
– Bit 1 | Configuration change
- This interrupt is asserted when configuration of the device has
+ This interrupt is asserted when configuration of the device has
changed.
-• 0x064 | W | InterruptACK
- Interrupt acknowledge.
- Writing to this register notifies the Host that the Guest
- finished handling interrupts. Set bits in the value clear the
- corresponding bits of the InterruptStatus register.
-
-• 0x070 | RW | Status
- Device status.
- Reading from this register returns the current device status
- flags.
- Writing non-zero values to this register sets the status flags,
- indicating the Guest progress. Writing zero (0x0) to this
- register triggers a device reset.
+• 0x064 | W | InterruptACK
+ Interrupt acknowledge.
+ Writing to this register notifies the Host that the Guest
+ finished handling interrupts. Set bits in the value clear the
+ corresponding bits of the InterruptStatus register.
+
+• 0x070 | RW | Status
+ Device status.
+ Reading from this register returns the current device status
+ flags.
+ Writing non-zero values to this register sets the status flags,
+ indicating the Guest progress. Writing zero (0x0) to this
+ register triggers a device reset.
Also see [sub:Device-Initialization-Sequence]
-• 0x100+ | RW | Config
- Device-specific configuration space starts at an offset 0x100
- and is accessed with byte alignment. Its meaning and size
- depends on the device and the driver.
+• 0x100+ | RW | Config
+ Device-specific configuration space starts at an offset 0x100
+ and is accessed with byte alignment. Its meaning and size
+ depends on the device and the driver.
-Virtual queue size is the number of elements in the queue,
-therefore size of the descriptor table and both available and
+Virtual queue size is the number of elements in the queue,
+therefore size of the descriptor table and both available and
used rings.
-The endianness of the registers follows the native endianness of
-the Guest. Writing to registers described as “R” and reading from
-registers described as “W” is not permitted and can cause
+The endianness of the registers follows the native endianness of
+the Guest. Writing to registers described as “R” and reading from
+registers described as “W” is not permitted and can cause
undefined behavior.
2.4.2.3. MMIO-specific Initialization And Device Operation
@@ -1012,29 +1012,29 @@ done before the virtqueues are configured.
2.4.2.3.1.1. Virtqueue Configuration
-----------------------------------
-1. Select the queue writing its index (first queue is 0) to the
- QueueSel register.
+1. Select the queue writing its index (first queue is 0) to the
+ QueueSel register.
-2. Check if the queue is not already in use: read QueuePFN
- register, returned value should be zero (0x0).
+2. Check if the queue is not already in use: read QueuePFN
+ register, returned value should be zero (0x0).
-3. Read maximum queue size (number of elements) from the
- QueueNumMax register. If the returned value is zero (0x0) the
- queue is not available.
+3. Read maximum queue size (number of elements) from the
+ QueueNumMax register. If the returned value is zero (0x0) the
+ queue is not available.
-4. Allocate and zero the queue pages in contiguous virtual
- memory, aligning the Used Ring to an optimal boundary (usually
- page size). Size of the allocated queue may be smaller than or
- equal to the maximum size returned by the Host.
+4. Allocate and zero the queue pages in contiguous virtual
+ memory, aligning the Used Ring to an optimal boundary (usually
+ page size). Size of the allocated queue may be smaller than or
+ equal to the maximum size returned by the Host.
-5. Notify the Host about the queue size by writing the size to
- QueueNum register.
+5. Notify the Host about the queue size by writing the size to
+ QueueNum register.
-6. Notify the Host about the used alignment by writing its value
- in bytes to QueueAlign register.
+6. Notify the Host about the used alignment by writing its value
+ in bytes to QueueAlign register.
-7. Write the physical number of the first page of the queue to
- the QueuePFN register.
+7. Write the physical number of the first page of the queue to
+ the QueuePFN register.
2.4.2.3.2. Notifying The Device
------------------------------
@@ -1045,14 +1045,14 @@ writing the queue index to register QueueNum.
2.4.2.3.3. Receiving Used Buffers From The Device
------------------------------------------------
-The memory mapped virtio device is using single, dedicated
-interrupt signal, which is raised when at least one of the
-interrupts described in the InterruptStatus register
-description is asserted. After receiving an interrupt, the
-driver must read the InterruptStatus register to check what
-caused the interrupt (see the register description). After the
-interrupt is handled, the driver must acknowledge it by writing
-a bit mask corresponding to the serviced interrupt to the
+The memory mapped virtio device is using single, dedicated
+interrupt signal, which is raised when at least one of the
+interrupts described in the InterruptStatus register
+description is asserted. After receiving an interrupt, the
+driver must read the InterruptStatus register to check what
+caused the interrupt (see the register description). After the
+interrupt is handled, the driver must acknowledge it by writing
+a bit mask corresponding to the serviced interrupt to the
InterruptACK register.
2.4.2.4.4. Notification of Device Configuration Changes
@@ -1105,13 +1105,13 @@ Discovering what devices are available and their type is bus-dependent.
2.5.1. Network Device
====================
-The virtio network device is a virtual ethernet card, and is the
-most complex of the devices supported so far by virtio. It has
-enhanced rapidly and demonstrates clearly how support for new
-features should be added to an existing device. Empty buffers are
-placed in one virtqueue for receiving packets, and outgoing
-packets are enqueued into another for transmission in that order.
-A third command queue is used to control advanced filtering
+The virtio network device is a virtual ethernet card, and is the
+most complex of the devices supported so far by virtio. It has
+enhanced rapidly and demonstrates clearly how support for new
+features should be added to an existing device. Empty buffers are
+placed in one virtqueue for receiving packets, and outgoing
+packets are enqueued into another for transmission in that order.
+A third command queue is used to control advanced filtering
features.
2.5.1.1. Device ID
@@ -1126,7 +1126,7 @@ features.
Virtqueue 2 only exists if VIRTIO_NET_F_CTRL_VQ set.
-2.5.1.3. Feature bits
+2.5.1.3. Feature bits
--------------------
VIRTIO_NET_F_CSUM (0) Device handles packets with partial checksum
@@ -1138,8 +1138,8 @@ features.
VIRTIO_NET_F_MAC (5) Device has given MAC address.
- VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with
- any GSO type.[13]
+ VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with
+ any GSO type.[13]
VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4.
@@ -1159,7 +1159,7 @@ features.
VIRTIO_NET_F_MRG_RXBUF (15) Guest can merge receive buffers.
- VIRTIO_NET_F_STATUS (16) Configuration status field is
+ VIRTIO_NET_F_STATUS (16) Configuration status field is
available.
VIRTIO_NET_F_CTRL_VQ (17) Control channel is available.
@@ -1168,14 +1168,14 @@ features.
VIRTIO_NET_F_CTRL_VLAN (19) Control channel VLAN filtering.
- VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous
+ VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous
packets.
- Device configuration layout Two configuration fields are
- currently defined. The mac address field always exists (though
- is only valid if VIRTIO_NET_F_MAC is set), and the status field
- only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits
- are currently defined for the status field:
+ Device configuration layout Two configuration fields are
+ currently defined. The mac address field always exists (though
+ is only valid if VIRTIO_NET_F_MAC is set), and the status field
+ only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits
+ are currently defined for the status field:
VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
#define VIRTIO_NET_S_LINK_UP 1
@@ -1189,27 +1189,27 @@ features.
2.5.1.4. Device Initialization
-----------------------------
-1. The initialization routine should identify the receive and
+1. The initialization routine should identify the receive and
transmission virtqueues.
-2. If the VIRTIO_NET_F_MAC feature bit is set, the configuration
- space “mac” entry indicates the “physical” address of the the
- network card, otherwise a private MAC address should be
- assigned. All guests are expected to negotiate this feature if
+2. If the VIRTIO_NET_F_MAC feature bit is set, the configuration
+ space “mac” entry indicates the “physical” address of the the
+ network card, otherwise a private MAC address should be
+ assigned. All guests are expected to negotiate this feature if
it is set.
-3. If the VIRTIO_NET_F_CTRL_VQ feature bit is negotiated,
+3. If the VIRTIO_NET_F_CTRL_VQ feature bit is negotiated,
identify the control virtqueue.
-4. If the VIRTIO_NET_F_STATUS feature bit is negotiated, the link
- status can be read from the bottom bit of the “status” config
+4. If the VIRTIO_NET_F_STATUS feature bit is negotiated, the link
+ status can be read from the bottom bit of the “status” config
field. Otherwise, the link should be assumed active.
-5. The receive virtqueue should be filled with receive buffers.
- This is described in detail below in “Setting Up Receive
+5. The receive virtqueue should be filled with receive buffers.
+ This is described in detail below in “Setting Up Receive
Buffers”.
-6. A driver can indicate that it will generate checksumless
+6. A driver can indicate that it will generate checksumless
packets by negotating the VIRTIO_NET_F_CSUM feature. This “
checksum offload” is a common feature on modern network cards.
@@ -1221,20 +1221,20 @@ features.
Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is
negotiated.[15]
-8. The converse features are also available: a driver can save
+8. The converse features are also available: a driver can save
the virtual device some work by negotiating these features.[16]
- The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
- checksummed packets can be received, and if it can do that then
- the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
- VIRTIO_NET_F_GUEST_UFO and VIRTIO_NET_F_GUEST_ECN are the input
- equivalents of the features described above. See “Receiving
+ The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
+ checksummed packets can be received, and if it can do that then
+ the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
+ VIRTIO_NET_F_GUEST_UFO and VIRTIO_NET_F_GUEST_ECN are the input
+ equivalents of the features described above. See “Receiving
Packets” below.
2.5.1.5. Device Operation
------------------------
-Packets are transmitted by placing them in the transmitq, and
-buffers for incoming packets are placed in the receiveq. In each
+Packets are transmitted by placing them in the transmitq, and
+buffers for incoming packets are placed in the receiveq. In each
case, the packet itself is preceeded by a header:
struct virtio_net_hdr {
@@ -1254,18 +1254,18 @@ case, the packet itself is preceeded by a header:
u16 num_buffers;
};
-The controlq is used to control device features such as
+The controlq is used to control device features such as
filtering.
2.5.1.5.1. Packet Transmission
-----------------------------
-Transmitting a single packet is simple, but varies depending on
+Transmitting a single packet is simple, but varies depending on
the different features the driver negotiated.
-1. If the driver negotiated VIRTIO_NET_F_CSUM, and the packet has
- not been fully checksummed, then the virtio_net_hdr's fields
- are set as follows. Otherwise, the packet must be fully
+1. If the driver negotiated VIRTIO_NET_F_CSUM, and the packet has
+ not been fully checksummed, then the virtio_net_hdr's fields
+ are set as follows. Otherwise, the packet must be fully
checksummed, and flags is zero.
• flags has the VIRTIO_NET_HDR_F_NEEDS_CSUM set,
@@ -1273,30 +1273,30 @@ the different features the driver negotiated.
• csum_start is set to the offset within the packet to begin checksumming,
and
- • csum_offset indicates how many bytes after the csum_start the
+ • csum_offset indicates how many bytes after the csum_start the
new (16 bit ones' complement) checksum should be placed.[17]
-2. If the driver negotiated
- VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires
- TCP segmentation or UDP fragmentation, then the “gso_type”
- field is set to VIRTIO_NET_HDR_GSO_TCPV4, TCPV6 or UDP.
- (Otherwise, it is set to VIRTIO_NET_HDR_GSO_NONE). In this
- case, packets larger than 1514 bytes can be transmitted: the
- metadata indicates how to replicate the packet header to cut it
+2. If the driver negotiated
+ VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires
+ TCP segmentation or UDP fragmentation, then the “gso_type”
+ field is set to VIRTIO_NET_HDR_GSO_TCPV4, TCPV6 or UDP.
+ (Otherwise, it is set to VIRTIO_NET_HDR_GSO_NONE). In this
+ case, packets larger than 1514 bytes can be transmitted: the
+ metadata indicates how to replicate the packet header to cut it
into smaller packets. The other gso fields are set:
- • hdr_len is a hint to the device as to how much of the header
- needs to be kept to copy into each packet, usually set to the
+ • hdr_len is a hint to the device as to how much of the header
+ needs to be kept to copy into each packet, usually set to the
length of the headers, including the transport header.[18]
- • gso_size is the maximum size of each packet beyond that
+ • gso_size is the maximum size of each packet beyond that
header (ie. MSS).
- • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature,
- the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as
+ • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature,
+ the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as
well, indicating that the TCP packet has the ECN bit set.[19]
-3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
+3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
the num_buffers field is set to zero.
4. The header and packet are added as one output buffer to the
@@ -1309,71 +1309,71 @@ the different features the driver negotiated.
Often a driver will suppress transmission interrupts using the
VRING_AVAIL_F_NO_INTERRUPT flag (see "2.4.2. Receiving Used Buffers From
The Device") and check for used packets in the transmit path of following
-packets. However, it will still receive interrupts if the
-VIRTIO_F_NOTIFY_ON_EMPTY feature is negotiated, indicating that
+packets. However, it will still receive interrupts if the
+VIRTIO_F_NOTIFY_ON_EMPTY feature is negotiated, indicating that
the transmission queue is completely emptied.
-The normal behavior in this interrupt handler is to retrieve and
-new descriptors from the used ring and free the corresponding
+The normal behavior in this interrupt handler is to retrieve and
+new descriptors from the used ring and free the corresponding
headers and packets.
2.5.1.5.2. Setting Up Receive Buffers
-It is generally a good idea to keep the receive virtqueue as
-fully populated as possible: if it runs out, network performance
+It is generally a good idea to keep the receive virtqueue as
+fully populated as possible: if it runs out, network performance
will suffer.
-If the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or
-VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to
-accept packets of up to 65550 bytes long (the maximum size of a
-TCP or UDP packet, plus the 14 byte ethernet header), otherwise
-1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every
+If the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or
+VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to
+accept packets of up to 65550 bytes long (the maximum size of a
+TCP or UDP packet, plus the 14 byte ethernet header), otherwise
+1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every
buffer in the receive queue needs to be at least this length [20a]
-If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at
+If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at
least the size of the struct virtio_net_hdr.
2.5.1.5.2.1. Packet Receive Interrupt
------------------------------------
-When a packet is copied into a buffer in the receiveq, the
-optimal path is to disable further interrupts for the receiveq
-(see [sub:Receiving-Used-Buffers]) and process packets until no
+When a packet is copied into a buffer in the receiveq, the
+optimal path is to disable further interrupts for the receiveq
+(see [sub:Receiving-Used-Buffers]) and process packets until no
more are found, then re-enable them.
Processing packet involves:
-1. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
- then the “num_buffers” field indicates how many descriptors
- this packet is spread over (including this one). This allows
- receipt of large packets without having to allocate large
- buffers. In this case, there will be at least “num_buffers” in
- the used ring, and they should be chained together to form a
- single packet. The other buffers will not begin with a struct
+1. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
+ then the “num_buffers” field indicates how many descriptors
+ this packet is spread over (including this one). This allows
+ receipt of large packets without having to allocate large
+ buffers. In this case, there will be at least “num_buffers” in
+ the used ring, and they should be chained together to form a
+ single packet. The other buffers will not begin with a struct
virtio_net_hdr.
-2. If the VIRTIO_NET_F_MRG_RXBUF feature was not negotiated, or
- the “num_buffers” field is one, then the entire packet will be
- contained within this buffer, immediately following the struct
+2. If the VIRTIO_NET_F_MRG_RXBUF feature was not negotiated, or
+ the “num_buffers” field is one, then the entire packet will be
+ contained within this buffer, immediately following the struct
virtio_net_hdr.
-3. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
- VIRTIO_NET_HDR_F_NEEDS_CSUM bit in the “flags” field may be
+3. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
+ VIRTIO_NET_HDR_F_NEEDS_CSUM bit in the “flags” field may be
set: if so, the checksum on the packet is incomplete and the “
- csum_start” and “csum_offset” fields indicate how to calculate
+ csum_start” and “csum_offset” fields indicate how to calculate
it (see Packet Transmission point 1).
-4. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
- negotiated, then the “gso_type” may be something other than
- VIRTIO_NET_HDR_GSO_NONE, and the “gso_size” field indicates the
+4. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
+ negotiated, then the “gso_type” may be something other than
+ VIRTIO_NET_HDR_GSO_NONE, and the “gso_size” field indicates the
desired MSS (see Packet Transmission point 2).
2.5.1.5.3. Control Virtqueue
---------------------------
-The driver uses the control virtqueue (if VIRTIO_NET_F_VTRL_VQ is
-negotiated) to send commands to manipulate various features of
-the device which would not easily map into the configuration
+The driver uses the control virtqueue (if VIRTIO_NET_F_VTRL_VQ is
+negotiated) to send commands to manipulate various features of
+the device which would not easily map into the configuration
space.
All commands are of the following form:
@@ -1387,33 +1387,33 @@ All commands are of the following form:
/* ack values */
#define VIRTIO_NET_OK 0
- #define VIRTIO_NET_ERR 1
+ #define VIRTIO_NET_ERR 1
-The class, command and command-specific-data are set by the
-driver, and the device sets the ack byte. There is little it can
-do except issue a diagnostic if the ack byte is not
+The class, command and command-specific-data are set by the
+driver, and the device sets the ack byte. There is little it can
+do except issue a diagnostic if the ack byte is not
VIRTIO_NET_OK.
2.5.1.5.3.1. Packet Receive Filtering
------------------------------------
-If the VIRTIO_NET_F_CTRL_RX feature is negotiated, the driver can
-send control commands for promiscuous mode, multicast receiving,
+If the VIRTIO_NET_F_CTRL_RX feature is negotiated, the driver can
+send control commands for promiscuous mode, multicast receiving,
and filtering of MAC addresses.
-Note that in general, these commands are best-effort: unwanted
-packets may still arrive.
+Note that in general, these commands are best-effort: unwanted
+packets may still arrive.
Setting Promiscuous Mode
#define VIRTIO_NET_CTRL_RX 0
#define VIRTIO_NET_CTRL_RX_PROMISC 0
- #define VIRTIO_NET_CTRL_RX_ALLMULTI 1
+ #define VIRTIO_NET_CTRL_RX_ALLMULTI 1
-The class VIRTIO_NET_CTRL_RX has two commands:
-VIRTIO_NET_CTRL_RX_PROMISC turns promiscuous mode on and off, and
-VIRTIO_NET_CTRL_RX_ALLMULTI turns all-multicast receive on and
-off. The command-specific-data is one byte containing 0 (off) or
+The class VIRTIO_NET_CTRL_RX has two commands:
+VIRTIO_NET_CTRL_RX_PROMISC turns promiscuous mode on and off, and
+VIRTIO_NET_CTRL_RX_ALLMULTI turns all-multicast receive on and
+off. The command-specific-data is one byte containing 0 (off) or
1 (on).
2.5.1.5.3.2. Setting MAC Address Filtering
@@ -1425,7 +1425,7 @@ off. The command-specific-data is one byte containing 0 (off) or
};
#define VIRTIO_NET_CTRL_MAC 1
- #define VIRTIO_NET_CTRL_MAC_TABLE_SET 0
+ #define VIRTIO_NET_CTRL_MAC_TABLE_SET 0
The device can filter incoming packets by any number of destination
MAC addresses.[21] This table is set using the class
@@ -1437,45 +1437,45 @@ contains multicast addresses.
2.5.1.5.3.3. VLAN Filtering
--------------------------
-If the driver negotiates the VIRTION_NET_F_CTRL_VLAN feature, it
+If the driver negotiates the VIRTION_NET_F_CTRL_VLAN feature, it
can control a VLAN filter table in the device.
#define VIRTIO_NET_CTRL_VLAN 2
#define VIRTIO_NET_CTRL_VLAN_ADD 0
- #define VIRTIO_NET_CTRL_VLAN_DEL 1
+ #define VIRTIO_NET_CTRL_VLAN_DEL 1
-Both the VIRTIO_NET_CTRL_VLAN_ADD and VIRTIO_NET_CTRL_VLAN_DEL
+Both the VIRTIO_NET_CTRL_VLAN_ADD and VIRTIO_NET_CTRL_VLAN_DEL
command take a 16-bit VLAN id as the command-specific-data.
2.5.1.5.3.4. Gratuitous Packet Sending
-------------------------------------
-If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE (depends
-on VIRTIO_NET_F_CTRL_VQ), it can ask the guest to send gratuitous
-packets; this is usually done after the guest has been physically
-migrated, and needs to announce its presence on the new network
-links. (As hypervisor does not have the knowledge of guest
-network configuration (eg. tagged vlan) it is simplest to prod
+If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE (depends
+on VIRTIO_NET_F_CTRL_VQ), it can ask the guest to send gratuitous
+packets; this is usually done after the guest has been physically
+migrated, and needs to announce its presence on the new network
+links. (As hypervisor does not have the knowledge of guest
+network configuration (eg. tagged vlan) it is simplest to prod
the guest in this way).
#define VIRTIO_NET_CTRL_ANNOUNCE 3
#define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
-The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status
-field when it notices the changes of device configuration. The
-command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
-driver has recevied the notification and device would clear the
-VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
+The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status
+field when it notices the changes of device configuration. The
+command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
+driver has recevied the notification and device would clear the
+VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
this command.
Processing this notification involves:
-1. Sending the gratuitous packets or marking there are pending
- gratuitous packets to be sent and letting deferred routine to
+1. Sending the gratuitous packets or marking there are pending
+ gratuitous packets to be sent and letting deferred routine to
send them.
-2. Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control
- vq.
+2. Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control
+ vq.
2.5.1.5.3.4. Offloads State Configuration
-------------------------------------
@@ -1514,9 +1514,9 @@ change of specific offload state.
2.5.2. Block Device
==================
-The virtio block device is a simple virtual block device (ie.
-disk). Read and write requests (and other exotic requests) are
-placed in the queue, and serviced (probably out of order) by the
+The virtio block device is a simple virtual block device (ie.
+disk). Read and write requests (and other exotic requests) are
+placed in the queue, and serviced (probably out of order) by the
device except where noted.
2.5.2.1. Device ID
@@ -1532,10 +1532,10 @@ device except where noted.
VIRTIO_BLK_F_BARRIER (0) Host supports request barriers.
- VIRTIO_BLK_F_SIZE_MAX (1) Maximum size of any single segment is
+ VIRTIO_BLK_F_SIZE_MAX (1) Maximum size of any single segment is
in “size_max”.
- VIRTIO_BLK_F_SEG_MAX (2) Maximum number of segments in a
+ VIRTIO_BLK_F_SEG_MAX (2) Maximum number of segments in a
request is in “seg_max”.
VIRTIO_BLK_F_GEOMETRY (4) Disk-style geometry specified in “
@@ -1549,9 +1549,9 @@ device except where noted.
VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
- Device configuration layout The capacity of the device
- (expressed in 512-byte sectors) is always present. The
- availability of the others all depend on various feature bits
+ Device configuration layout The capacity of the device
+ (expressed in 512-byte sectors) is always present. The
+ availability of the others all depend on various feature bits
as indicated above.
struct virtio_blk_config {
@@ -1569,23 +1569,23 @@ device except where noted.
2.5.2.4. Device Initialization
-----------------------------
-1. The device size should be read from the “capacity”
- configuration field. No requests should be submitted which goes
+1. The device size should be read from the “capacity”
+ configuration field. No requests should be submitted which goes
beyond this limit.
-2. If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, the
- blk_size field can be read to determine the optimal sector size
- for the driver to use. This does not effect the units used in
- the protocol (always 512 bytes), but awareness of the correct
+2. If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, the
+ blk_size field can be read to determine the optimal sector size
+ for the driver to use. This does not effect the units used in
+ the protocol (always 512 bytes), but awareness of the correct
value can effect performance.
-3. If the VIRTIO_BLK_F_RO feature is set by the device, any write
+3. If the VIRTIO_BLK_F_RO feature is set by the device, any write
requests will fail.
2.5.2.5. Device Operation
------------------------
-The driver queues requests to the virtqueue, and they are used by
+The driver queues requests to the virtqueue, and they are used by
the device (not necessarily in order). Each request is of form:
struct virtio_blk_req {
@@ -1596,7 +1596,7 @@ the device (not necessarily in order). Each request is of form:
u8 status;
};
-If the device has VIRTIO_BLK_F_SCSI feature, it can also support
+If the device has VIRTIO_BLK_F_SCSI feature, it can also support
scsi packet command requests, each of these requests is of form:
struct virtio_scsi_pc_req {
@@ -1634,71 +1634,71 @@ flush the host cache.
#define VIRTIO_BLK_T_FLUSH_OUT 5
#define VIRTIO_BLK_T_BARRIER 0x80000000
-The ioprio field is a hint about the relative priorities of
-requests to the device: higher numbers indicate more important
+The ioprio field is a hint about the relative priorities of
+requests to the device: higher numbers indicate more important
requests.
-The sector number indicates the offset (multiplied by 512) where
-the read or write is to occur. This field is unused and set to 0
+The sector number indicates the offset (multiplied by 512) where
+the read or write is to occur. This field is unused and set to 0
for scsi packet commands and for flush commands.
-The cmd field is only present for scsi packet command requests,
-and indicates the command to perform. This field must reside in a
-single, separate read-only buffer; command length can be derived
-from the length of this buffer.
+The cmd field is only present for scsi packet command requests,
+and indicates the command to perform. This field must reside in a
+single, separate read-only buffer; command length can be derived
+from the length of this buffer.
-Note that these first three (four for scsi packet commands)
-fields are always read-only: the data field is either read-only
-or write-only, depending on the request. The size of the read or
+Note that these first three (four for scsi packet commands)
+fields are always read-only: the data field is either read-only
+or write-only, depending on the request. The size of the read or
write can be derived from the total size of the request buffers.
-The sense field is only present for scsi packet command requests,
+The sense field is only present for scsi packet command requests,
and indicates the buffer for scsi sense data.
-The data_len field is only present for scsi packet command
-requests, this field is deprecated, and should be ignored by the
+The data_len field is only present for scsi packet command
+requests, this field is deprecated, and should be ignored by the
driver. Historically, devices copied data length there.
-The sense_len field is only present for scsi packet command
-requests and indicates the number of bytes actually written to
+The sense_len field is only present for scsi packet command
+requests and indicates the number of bytes actually written to
the sense buffer.
-The residual field is only present for scsi packet command
-requests and indicates the residual size, calculated as data
+The residual field is only present for scsi packet command
+requests and indicates the residual size, calculated as data
length - number of bytes actually transferred.
-The final status byte is written by the device: either
-VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for host or guest
+The final status byte is written by the device: either
+VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for host or guest
error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host:
#define VIRTIO_BLK_S_OK 0
#define VIRTIO_BLK_S_IOERR 1
#define VIRTIO_BLK_S_UNSUPP 2
-Historically, devices assumed that the fields type, ioprio and
-sector reside in a single, separate read-only buffer; the fields
-errors, data_len, sense_len and residual reside in a single,
-separate write-only buffer; the sense field in a separate
-write-only buffer of size 96 bytes, by itself; the fields errors,
-data_len, sense_len and residual in a single write-only buffer;
-and the status field is a separate read-only buffer of size 1
+Historically, devices assumed that the fields type, ioprio and
+sector reside in a single, separate read-only buffer; the fields
+errors, data_len, sense_len and residual reside in a single,
+separate write-only buffer; the sense field in a separate
+write-only buffer of size 96 bytes, by itself; the fields errors,
+data_len, sense_len and residual in a single write-only buffer;
+and the status field is a separate read-only buffer of size 1
byte, by itself.
2.5.3. Console Device
====================
-The virtio console device is a simple device for data input and
-output. A device may have one or more ports. Each port has a pair
-of input and output virtqueues. Moreover, a device has a pair of
-control IO virtqueues. The control virtqueues are used to
-communicate information between the device and the driver about
-ports being opened and closed on either side of the connection,
-indication from the host about whether a particular port is a
-console port, adding new ports, port hot-plug/unplug, etc., and
-indication from the guest about whether a port or a device was
-successfully added, port open/close, etc.. For data IO, one or
-more empty buffers are placed in the receive queue for incoming
+The virtio console device is a simple device for data input and
+output. A device may have one or more ports. Each port has a pair
+of input and output virtqueues. Moreover, a device has a pair of
+control IO virtqueues. The control virtqueues are used to
+communicate information between the device and the driver about
+ports being opened and closed on either side of the connection,
+indication from the host about whether a particular port is a
+console port, adding new ports, port hot-plug/unplug, etc., and
+indication from the guest about whether a port or a device was
+successfully added, port open/close, etc.. For data IO, one or
+more empty buffers are placed in the receive queue for incoming
data and outgoing characters are placed in the transmit queue.
2.5.3.1. Device ID
@@ -1709,7 +1709,7 @@ data and outgoing characters are placed in the transmit queue.
2.5.3.2. Virtqueues
------------------
- 0:receiveq(port0). 1:transmitq(port0), 2:control receiveq, 3:control transmitq, 4:receiveq(port1), 5:transmitq(port1),
+ 0:receiveq(port0). 1:transmitq(port0), 2:control receiveq, 3:control transmitq, 4:receiveq(port1), 5:transmitq(port1),
...
Ports 2 onwards only exist if VIRTIO_CONSOLE_F_MULTIPORT is set.
@@ -1717,20 +1717,20 @@ data and outgoing characters are placed in the transmit queue.
2.5.3.3. Feature bits
--------------------
- VIRTIO_CONSOLE_F_SIZE (0) Configuration cols and rows fields
+ VIRTIO_CONSOLE_F_SIZE (0) Configuration cols and rows fields
are valid.
- VIRTIO_CONSOLE_F_MULTIPORT(1) Device has support for multiple
- ports; configuration fields nr_ports and max_nr_ports are
+ VIRTIO_CONSOLE_F_MULTIPORT(1) Device has support for multiple
+ ports; configuration fields nr_ports and max_nr_ports are
valid and control virtqueues will be used.
2.5.3.4. Device configuration layout
-----------------------------------
- The size of the console is supplied
- in the configuration space if the VIRTIO_CONSOLE_F_SIZE feature
- is set. Furthermore, if the VIRTIO_CONSOLE_F_MULTIPORT feature
- is set, the maximum number of ports supported by the device can
+ The size of the console is supplied
+ in the configuration space if the VIRTIO_CONSOLE_F_SIZE feature
+ is set. Furthermore, if the VIRTIO_CONSOLE_F_MULTIPORT feature
+ is set, the maximum number of ports supported by the device can
be fetched.
struct virtio_console_config {
@@ -1742,52 +1742,52 @@ data and outgoing characters are placed in the transmit queue.
2.5.3.5. Device Initialization
-----------------------------
-1. If the VIRTIO_CONSOLE_F_SIZE feature is negotiated, the driver
+1. If the VIRTIO_CONSOLE_F_SIZE feature is negotiated, the driver
can read the console dimensions from the configuration fields.
-2. If the VIRTIO_CONSOLE_F_MULTIPORT feature is negotiated, the
- driver can spawn multiple ports, not all of which may be
- attached to a console. Some could be generic ports. In this
- case, the control virtqueues are enabled and according to the
- max_nr_ports configuration-space value, the appropriate number
- of virtqueues are created. A control message indicating the
- driver is ready is sent to the host. The host can then send
- control messages for adding new ports to the device. After
- creating and initializing each port, a
- VIRTIO_CONSOLE_PORT_READY control message is sent to the host
- for that port so the host can let us know of any additional
+2. If the VIRTIO_CONSOLE_F_MULTIPORT feature is negotiated, the
+ driver can spawn multiple ports, not all of which may be
+ attached to a console. Some could be generic ports. In this
+ case, the control virtqueues are enabled and according to the
+ max_nr_ports configuration-space value, the appropriate number
+ of virtqueues are created. A control message indicating the
+ driver is ready is sent to the host. The host can then send
+ control messages for adding new ports to the device. After
+ creating and initializing each port, a
+ VIRTIO_CONSOLE_PORT_READY control message is sent to the host
+ for that port so the host can let us know of any additional
configuration options set for that port.
-3. The receiveq for each port is populated with one or more
+3. The receiveq for each port is populated with one or more
receive buffers.
2.5.3.6. Device Operation
------------------------
-1. For output, a buffer containing the characters is placed in
+1. For output, a buffer containing the characters is placed in
the port's transmitq.[25]
-2. When a buffer is used in the receiveq (signalled by an
- interrupt), the contents is the input to the port associated
+2. When a buffer is used in the receiveq (signalled by an
+ interrupt), the contents is the input to the port associated
with the virtqueue for which the notification was received.
-3. If the driver negotiated the VIRTIO_CONSOLE_F_SIZE feature, a
- configuration change interrupt may occur. The updated size can
+3. If the driver negotiated the VIRTIO_CONSOLE_F_SIZE feature, a
+ configuration change interrupt may occur. The updated size can
be read from the configuration fields.
-4. If the driver negotiated the VIRTIO_CONSOLE_F_MULTIPORT
- feature, active ports are announced by the host using the
- VIRTIO_CONSOLE_PORT_ADD control message. The same message is
+4. If the driver negotiated the VIRTIO_CONSOLE_F_MULTIPORT
+ feature, active ports are announced by the host using the
+ VIRTIO_CONSOLE_PORT_ADD control message. The same message is
used for port hot-plug as well.
-5. If the host specified a port `name', a sysfs attribute is
- created with the name filled in, so that udev rules can be
- written that can create a symlink from the port's name to the
+5. If the host specified a port `name', a sysfs attribute is
+ created with the name filled in, so that udev rules can be
+ written that can create a symlink from the port's name to the
char device for port discovery by applications in the guest.
-6. Changes to ports' state are effected by control messages.
- Appropriate action is taken on the port indicated in the
- control message. The layout of the structure of the control
+6. Changes to ports' state are effected by control messages.
+ Appropriate action is taken on the port indicated in the
+ control message. The layout of the structure of the control
buffer and the events associated are:
struct virtio_console_control {
@@ -1809,7 +1809,7 @@ data and outgoing characters are placed in the transmit queue.
2.5.4. Entropy Device
====================
-The virtio entropy device supplies high-quality randomness for
+The virtio entropy device supplies high-quality randomness for
guest use.
2.5.4.1. Device ID
@@ -1836,19 +1836,19 @@ guest use.
2.5.4.6. Device Operation
------------------------
-When the driver requires random bytes, it places the descriptor
-of one or more buffers in the queue. It will be completely filled
+When the driver requires random bytes, it places the descriptor
+of one or more buffers in the queue. It will be completely filled
by random data by the device.
2.5.5. Memory Balloon Device
===========================
-The virtio memory balloon device is a primitive device for
-managing guest memory: the device asks for a certain amount of
-memory, and the guest supplies it (or withdraws it, if the device
-has more than it asks for). This allows the guest to adapt to
-changes in allowance of underlying physical memory. If the
-feature is negotiated, the device can also be used to communicate
+The virtio memory balloon device is a primitive device for
+managing guest memory: the device asks for a certain amount of
+memory, and the guest supplies it (or withdraws it, if the device
+has more than it asks for). This allows the guest to adapt to
+changes in allowance of underlying physical memory. If the
+feature is negotiated, the device can also be used to communicate
guest memory statistics to the host.
2.5.5.1. Device ID
@@ -1863,16 +1863,16 @@ guest memory statistics to the host.
2.5.5.3. Feature bits
--------------------
- VIRTIO_BALLOON_F_MUST_TELL_HOST (0) Host must be told before
+ VIRTIO_BALLOON_F_MUST_TELL_HOST (0) Host must be told before
pages from the balloon are used.
- VIRTIO_BALLOON_F_STATS_VQ (1) A virtqueue for reporting guest
+ VIRTIO_BALLOON_F_STATS_VQ (1) A virtqueue for reporting guest
memory statistics is present.
2.5.5.4. Device configuration layout
-----------------------------------
- Both fields of this configuration
- are always available. Note that they are little endian, despite
+ Both fields of this configuration
+ are always available. Note that they are little endian, despite
convention that device fields are guest endian:
struct virtio_balloon_config {
@@ -1889,7 +1889,7 @@ guest memory statistics to the host.
(a) Identify the stats virtqueue.
- (b) Add one empty buffer to the stats virtqueue and notify the
+ (b) Add one empty buffer to the stats virtqueue and notify the
host.
Device operation begins immediately.
@@ -1897,13 +1897,13 @@ Device operation begins immediately.
2.5.5.6. Device Operation
------------------------
-Memory Ballooning The device is driven by the receipt of a
+Memory Ballooning The device is driven by the receipt of a
configuration change interrupt.
-1. The “num_pages” configuration field is examined. If this is
- greater than the “actual” number of pages, memory must be given
- to the balloon. If it is less than the “actual” number of
- pages, memory may be taken back from the balloon for general
+1. The “num_pages” configuration field is examined. If this is
+ greater than the “actual” number of pages, memory must be given
+ to the balloon. If it is less than the “actual” number of
+ pages, memory may be taken back from the balloon for general
use.
2. To supply memory to the balloon (aka. inflate):
@@ -1914,49 +1914,49 @@ configuration change interrupt.
3. To remove memory from the balloon (aka. deflate):
- (a) The driver constructs an array of addresses of memory pages
- it has previously given to the balloon, as described above.
+ (a) The driver constructs an array of addresses of memory pages
+ it has previously given to the balloon, as described above.
This descriptor is added to the deflateq.
- (b) If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiated, the
- guest may not use these requested pages until that descriptor
+ (b) If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiated, the
+ guest may not use these requested pages until that descriptor
in the deflateq has been used by the device.
- (c) Otherwise, the guest may begin to re-use pages previously
- given to the balloon before the device has acknowledged their
- withdrawl. [28]
+ (c) Otherwise, the guest may begin to re-use pages previously
+ given to the balloon before the device has acknowledged their
+ withdrawl. [28]
-4. In either case, once the device has completed the inflation or
- deflation, the “actual” field of the configuration should be
+4. In either case, once the device has completed the inflation or
+ deflation, the “actual” field of the configuration should be
updated to reflect the new number of pages in the balloon.[29]
2.5.5.6.1. Memory Statistics
---------------------------
-The stats virtqueue is atypical because communication is driven
-by the device (not the driver). The channel becomes active at
-driver initialization time when the driver adds an empty buffer
-and notifies the device. A request for memory statistics proceeds
+The stats virtqueue is atypical because communication is driven
+by the device (not the driver). The channel becomes active at
+driver initialization time when the driver adds an empty buffer
+and notifies the device. A request for memory statistics proceeds
as follows:
-1. The device pushes the buffer onto the used ring and sends an
+1. The device pushes the buffer onto the used ring and sends an
interrupt.
2. The driver pops the used buffer and discards it.
-3. The driver collects memory statistics and writes them into a
+3. The driver collects memory statistics and writes them into a
new buffer.
-4. The driver adds the buffer to the virtqueue and notifies the
+4. The driver adds the buffer to the virtqueue and notifies the
device.
-5. The device pops the buffer (retaining it to initiate a
+5. The device pops the buffer (retaining it to initiate a
subsequent request) and consumes the statistics.
- Memory Statistics Format Each statistic consists of a 16 bit
- tag and a 64 bit value. Both quantities are represented in the
- native endian of the guest. All statistics are optional and the
- driver may choose which ones to supply. To guarantee backwards
+ Memory Statistics Format Each statistic consists of a 16 bit
+ tag and a 64 bit value. Both quantities are represented in the
+ native endian of the guest. All statistics are optional and the
+ driver may choose which ones to supply. To guarantee backwards
compatibility, unsupported statistics should be omitted.
struct virtio_balloon_stat {
@@ -1973,46 +1973,46 @@ as follows:
2.5.5.6.2. Memory Statistics Tags
--------------------------------
- VIRTIO_BALLOON_S_SWAP_IN The amount of memory that has been
+ VIRTIO_BALLOON_S_SWAP_IN The amount of memory that has been
swapped in (in bytes).
- VIRTIO_BALLOON_S_SWAP_OUT The amount of memory that has been
+ VIRTIO_BALLOON_S_SWAP_OUT The amount of memory that has been
swapped out to disk (in bytes).
- VIRTIO_BALLOON_S_MAJFLT The number of major page faults that
+ VIRTIO_BALLOON_S_MAJFLT The number of major page faults that
have occurred.
- VIRTIO_BALLOON_S_MINFLT The number of minor page faults that
+ VIRTIO_BALLOON_S_MINFLT The number of minor page faults that
have occurred.
- VIRTIO_BALLOON_S_MEMFREE The amount of memory not being used
+ VIRTIO_BALLOON_S_MEMFREE The amount of memory not being used
for any purpose (in bytes).
- VIRTIO_BALLOON_S_MEMTOT The total amount of memory available
+ VIRTIO_BALLOON_S_MEMTOT The total amount of memory available
(in bytes).
2.5.6. SCSI Host Device
======================
-The virtio SCSI host device groups together one or more virtual
-logical units (such as disks), and allows communicating to them
-using the SCSI protocol. An instance of the device represents a
+The virtio SCSI host device groups together one or more virtual
+logical units (such as disks), and allows communicating to them
+using the SCSI protocol. An instance of the device represents a
SCSI host to which many targets and LUNs are attached.
The virtio SCSI device services two kinds of requests:
• command requests for a logical unit;
-• task management functions related to a logical unit, target or
+• task management functions related to a logical unit, target or
command.
-The device is also able to send out notifications about added and
-removed logical units. Together, these capabilities provide a
-SCSI transport protocol that uses virtqueues as the transfer
-medium. In the transport protocol, the virtio driver acts as the
-initiator, while the virtio SCSI host provides one or more
-targets that receive and process the requests.
+The device is also able to send out notifications about added and
+removed logical units. Together, these capabilities provide a
+SCSI transport protocol that uses virtqueues as the transfer
+medium. In the transport protocol, the virtio driver acts as the
+initiator, while the virtio SCSI host provides one or more
+targets that receive and process the requests.
2.5.6.1. Device ID
-----------------
@@ -2025,10 +2025,10 @@ targets that receive and process the requests.
2.5.6.3. Feature bits
--------------------
- VIRTIO_SCSI_F_INOUT (0) A single request can include both
+ VIRTIO_SCSI_F_INOUT (0) A single request can include both
read-only and write-only data buffers.
- VIRTIO_SCSI_F_HOTPLUG (1) The host should enable
+ VIRTIO_SCSI_F_HOTPLUG (1) The host should enable
hot-plug/hot-unplug of new LUNs and targets on the SCSI bus.
2.5.6.4. Device configuration layout
@@ -2050,54 +2050,54 @@ targets that receive and process the requests.
u32 max_lun;
};
- num_queues is the total number of request virtqueues exposed by
- the device. The driver is free to use only one request queue,
+ num_queues is the total number of request virtqueues exposed by
+ the device. The driver is free to use only one request queue,
or it can use more to achieve better performance.
- seg_max is the maximum number of segments that can be in a
- command. A bidirectional command can include seg_max input
+ seg_max is the maximum number of segments that can be in a
+ command. A bidirectional command can include seg_max input
segments and seg_max output segments.
- max_sectors is a hint to the guest about the maximum transfer
+ max_sectors is a hint to the guest about the maximum transfer
size it should use.
- cmd_per_lun is a hint to the guest about the maximum number of
- linked commands it should send to one LUN. The actual value
- to be used is the minimum of cmd_per_lun and the virtqueue
+ cmd_per_lun is a hint to the guest about the maximum number of
+ linked commands it should send to one LUN. The actual value
+ to be used is the minimum of cmd_per_lun and the virtqueue
size.
- event_info_size is the maximum size that the device will fill
- for buffers that the driver places in the eventq. The driver
- should always put buffers at least of this size. It is
- written by the device depending on the set of negotated
+ event_info_size is the maximum size that the device will fill
+ for buffers that the driver places in the eventq. The driver
+ should always put buffers at least of this size. It is
+ written by the device depending on the set of negotated
features.
- sense_size is the maximum size of the sense data that the
- device will write. The default value is written by the device
- and will always be 96, but the driver can modify it. It is
+ sense_size is the maximum size of the sense data that the
+ device will write. The default value is written by the device
+ and will always be 96, but the driver can modify it. It is
restored to the default when the device is reset.
- cdb_size is the maximum size of the CDB that the driver will
- write. The default value is written by the device and will
- always be 32, but the driver can likewise modify it. It is
+ cdb_size is the maximum size of the CDB that the driver will
+ write. The default value is written by the device and will
+ always be 32, but the driver can likewise modify it. It is
restored to the default when the device is reset.
- max_channel, max_target and max_lun can be used by the driver
- as hints to constrain scanning the logical units on the
+ max_channel, max_target and max_lun can be used by the driver
+ as hints to constrain scanning the logical units on the
host.h
2.5.6.5. Device Initialization
-----------------------------
-The initialization routine should first of all discover the
+The initialization routine should first of all discover the
device's virtqueues.
-If the driver uses the eventq, it should then place at least a
+If the driver uses the eventq, it should then place at least a
buffer in the eventq.
-The driver can immediately issue requests (for example, INQUIRY
-or REPORT LUNS) or task management functions (for example, I_T
-RESET).
+The driver can immediately issue requests (for example, INQUIRY
+or REPORT LUNS) or task management functions (for example, I_T
+RESET).
2.5.6.6. Device Operation
------------------------
@@ -2108,13 +2108,13 @@ queue and the event queue.
2.5.6.6.1. Device Operation: Request Queues
------------------------------------------
-The driver queues requests to an arbitrary request queue, and
-they are used by the device on that same queue. It is the
-responsibility of the driver to ensure strict request ordering
-for commands placed on different queues, because they will be
+The driver queues requests to an arbitrary request queue, and
+they are used by the device on that same queue. It is the
+responsibility of the driver to ensure strict request ordering
+for commands placed on different queues, because they will be
consumed with no order constraints.
-Requests have the following format:
+Requests have the following format:
struct virtio_scsi_req_cmd {
// Read-only
@@ -2154,84 +2154,84 @@ Requests have the following format:
#define VIRTIO_SCSI_S_HEAD 2
#define VIRTIO_SCSI_S_ACA 3
-The lun field addresses a target and logical unit in the
-virtio-scsi device's SCSI domain. The only supported format for
-the LUN field is: first byte set to 1, second byte set to target,
-third and fourth byte representing a single level LUN structure,
-followed by four zero bytes. With this representation, a
-virtio-scsi device can serve up to 256 targets and 16384 LUNs per
+The lun field addresses a target and logical unit in the
+virtio-scsi device's SCSI domain. The only supported format for
+the LUN field is: first byte set to 1, second byte set to target,
+third and fourth byte representing a single level LUN structure,
+followed by four zero bytes. With this representation, a
+virtio-scsi device can serve up to 256 targets and 16384 LUNs per
target.
The id field is the command identifier (“tag”).
-task_attr, prio and crn should be left to zero. task_attr defines
-the task attribute as in the table above, but all task attributes
-may be mapped to SIMPLE by the device; crn may also be provided
-by clients, but is generally expected to be 0. The maximum CRN
-value defined by the protocol is 255, since CRN is stored in an
+task_attr, prio and crn should be left to zero. task_attr defines
+the task attribute as in the table above, but all task attributes
+may be mapped to SIMPLE by the device; crn may also be provided
+by clients, but is generally expected to be 0. The maximum CRN
+value defined by the protocol is 255, since CRN is stored in an
8-bit integer.
-All of these fields are defined in SAM. They are always
-read-only, as are the cdb and dataout field. The cdb_size is
+All of these fields are defined in SAM. They are always
+read-only, as are the cdb and dataout field. The cdb_size is
taken from the configuration space.
-sense and subsequent fields are always write-only. The sense_len
-field indicates the number of bytes actually written to the sense
-buffer. The residual field indicates the residual size,
-calculated as “data_length - number_of_transferred_bytes”, for
-read or write operations. For bidirectional commands, the
-number_of_transferred_bytes includes both read and written bytes.
-A residual field that is less than the size of datain means that
-the dataout field was processed entirely. A residual field that
-exceeds the size of datain means that the dataout field was
-processed partially and the datain field was not processed at
+sense and subsequent fields are always write-only. The sense_len
+field indicates the number of bytes actually written to the sense
+buffer. The residual field indicates the residual size,
+calculated as “data_length - number_of_transferred_bytes”, for
+read or write operations. For bidirectional commands, the
+number_of_transferred_bytes includes both read and written bytes.
+A residual field that is less than the size of datain means that
+the dataout field was processed entirely. A residual field that
+exceeds the size of datain means that the dataout field was
+processed partially and the datain field was not processed at
all.
-The status byte is written by the device to be the status code as
+The status byte is written by the device to be the status code as
defined in SAM.
-The response byte is written by the device to be one of the
+The response byte is written by the device to be one of the
following:
- VIRTIO_SCSI_S_OK when the request was completed and the status
- byte is filled with a SCSI status code (not necessarily
+ VIRTIO_SCSI_S_OK when the request was completed and the status
+ byte is filled with a SCSI status code (not necessarily
"GOOD").
- VIRTIO_SCSI_S_OVERRUN if the content of the CDB requires
+ VIRTIO_SCSI_S_OVERRUN if the content of the CDB requires
transferring more data than is available in the data buffers.
- VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an
+ VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an
ABORT TASK or ABORT TASK SET task management function.
- VIRTIO_SCSI_S_BAD_TARGET if the request was never processed
+ VIRTIO_SCSI_S_BAD_TARGET if the request was never processed
because the target indicated by the lun field does not exist.
- VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus
+ VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus
or device reset (including a task management function).
- VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a
- problem in the connection between the host and the target
+ VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a
+ problem in the connection between the host and the target
(severed link).
- VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a
+ VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a
failure and the guest should not retry on other paths.
- VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure
+ VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure
but retrying on other paths might yield a different result.
- VIRTIO_SCSI_S_BUSY if the request failed but retrying on the
+ VIRTIO_SCSI_S_BUSY if the request failed but retrying on the
same path should work.
- VIRTIO_SCSI_S_FAILURE for other host or guest error. In
- particular, if neither dataout nor datain is empty, and the
- VIRTIO_SCSI_F_INOUT feature has not been negotiated, the
- request will be immediately returned with a response equal to
- VIRTIO_SCSI_S_FAILURE.
+ VIRTIO_SCSI_S_FAILURE for other host or guest error. In
+ particular, if neither dataout nor datain is empty, and the
+ VIRTIO_SCSI_F_INOUT feature has not been negotiated, the
+ request will be immediately returned with a response equal to
+ VIRTIO_SCSI_S_FAILURE.
2.5.6.6.2. Device Operation: controlq
------------------------------------
-The controlq is used for other SCSI transport operations.
+The controlq is used for other SCSI transport operations.
Requests have the following format:
struct virtio_scsi_ctrl {
@@ -2254,7 +2254,7 @@ The type identifies the remaining fields.
The following commands are defined:
- Task management function
+ Task management function
#define VIRTIO_SCSI_T_TMF 0
#define VIRTIO_SCSI_T_TMF_ABORT_TASK 0
@@ -2282,23 +2282,23 @@ The following commands are defined:
#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 10
#define VIRTIO_SCSI_S_FUNCTION_REJECTED 11
- The type is VIRTIO_SCSI_T_TMF; the subtype field defines. All
- fields except response are filled by the driver. The subtype
- field must always be specified and identifies the requested
+ The type is VIRTIO_SCSI_T_TMF; the subtype field defines. All
+ fields except response are filled by the driver. The subtype
+ field must always be specified and identifies the requested
task management function.
- Other fields may be irrelevant for the requested TMF; if so,
- they are ignored but they should still be present. The lun
- field is in the same format specified for request queues; the
- single level LUN is ignored when the task management function
- addresses a whole I_T nexus. When relevant, the value of the id
+ Other fields may be irrelevant for the requested TMF; if so,
+ they are ignored but they should still be present. The lun
+ field is in the same format specified for request queues; the
+ single level LUN is ignored when the task management function
+ addresses a whole I_T nexus. When relevant, the value of the id
field is matched against the id values passed on the requestq.
- The outcome of the task management function is written by the
- device in the response field. The command-specific response
+ The outcome of the task management function is written by the
+ device in the response field. The command-specific response
values map 1-to-1 with those defined in SAM.
- Asynchronous notification query
+ Asynchronous notification query
#define VIRTIO_SCSI_T_AN_QUERY 1
@@ -2319,20 +2319,20 @@ The following commands are defined:
#define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST 32
#define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY 64
- By sending this command, the driver asks the device which
- events the given LUN can report, as described in paragraphs 6.6
- and A.6 of the SCSI MMC specification. The driver writes the
- events it is interested in into the event_requested; the device
- responds by writing the events that it supports into
+ By sending this command, the driver asks the device which
+ events the given LUN can report, as described in paragraphs 6.6
+ and A.6 of the SCSI MMC specification. The driver writes the
+ events it is interested in into the event_requested; the device
+ responds by writing the events that it supports into
event_actual.
- The type is VIRTIO_SCSI_T_AN_QUERY. The lun and event_requested
- fields are written by the driver. The event_actual and response
+ The type is VIRTIO_SCSI_T_AN_QUERY. The lun and event_requested
+ fields are written by the driver. The event_actual and response
fields are written by the device.
No command-specific values are defined for the response byte.
- Asynchronous notification subscription
+ Asynchronous notification subscription
#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
struct virtio_scsi_ctrl_an {
@@ -2345,17 +2345,17 @@ The following commands are defined:
u8 response;
}
- By sending this command, the driver asks the specified LUN to
- report events for its physical interface, again as described in
- the SCSI MMC specification. The driver writes the events it is
- interested in into the event_requested; the device responds by
+ By sending this command, the driver asks the specified LUN to
+ report events for its physical interface, again as described in
+ the SCSI MMC specification. The driver writes the events it is
+ interested in into the event_requested; the device responds by
writing the events that it supports into event_actual.
- Event types are the same as for the asynchronous notification
+ Event types are the same as for the asynchronous notification
query message.
- The type is VIRTIO_SCSI_T_AN_SUBSCRIBE. The lun and
- event_requested fields are written by the driver. The
+ The type is VIRTIO_SCSI_T_AN_SUBSCRIBE. The lun and
+ event_requested fields are written by the driver. The
event_actual and response fields are written by the device.
No command-specific values are defined for the response byte.
@@ -2363,25 +2363,25 @@ The following commands are defined:
2.5.6.6.3. Device Operation: eventq
----------------------------------
-The eventq is used by the device to report information on logical
-units that are attached to it. The driver should always leave a
-few buffers ready in the eventq. In general, the device will not
-queue events to cope with an empty eventq, and will end up
-dropping events if it finds no buffer ready. However, when
-reporting events for many LUNs (e.g. when a whole target
-disappears), the device can throttle events to avoid dropping
-them. For this reason, placing 10-15 buffers on the event queue
+The eventq is used by the device to report information on logical
+units that are attached to it. The driver should always leave a
+few buffers ready in the eventq. In general, the device will not
+queue events to cope with an empty eventq, and will end up
+dropping events if it finds no buffer ready. However, when
+reporting events for many LUNs (e.g. when a whole target
+disappears), the device can throttle events to avoid dropping
+them. For this reason, placing 10-15 buffers on the event queue
should be enough.
-Buffers are placed in the eventq and filled by the device when
-interesting events occur. The buffers should be strictly
-write-only (device-filled) and the size of the buffers should be
-at least the value given in the device's configuration
+Buffers are placed in the eventq and filled by the device when
+interesting events occur. The buffers should be strictly
+write-only (device-filled) and the size of the buffers should be
+at least the value given in the device's configuration
information.
-Buffers returned by the device on the eventq will be referred to
-as "events" in the rest of this section. Events have the
-following format:
+Buffers returned by the device on the eventq will be referred to
+as "events" in the rest of this section. Events have the
+following format:
#define VIRTIO_SCSI_T_EVENTS_MISSED 0x80000000
@@ -2391,33 +2391,33 @@ following format:
...
}
-If bit 31 is set in the event field, the device failed to report
-an event due to missing buffers. In this case, the driver should
-poll the logical units for unit attention conditions, and/or do
-whatever form of bus scan is appropriate for the guest operating
+If bit 31 is set in the event field, the device failed to report
+an event due to missing buffers. In this case, the driver should
+poll the logical units for unit attention conditions, and/or do
+whatever form of bus scan is appropriate for the guest operating
system.
-Other data that the device writes to the buffer depends on the
+Other data that the device writes to the buffer depends on the
contents of the event field. The following events are defined:
- No event
+ No event
#define VIRTIO_SCSI_T_NO_EVENT 0
- This event is fired in the following cases:
+ This event is fired in the following cases:
- • When the device detects in the eventq a buffer that is
- shorter than what is indicated in the configuration field, it
- might use it immediately and put this dummy value in the
- event field. A well-written driver will never observe this
+ • When the device detects in the eventq a buffer that is
+ shorter than what is indicated in the configuration field, it
+ might use it immediately and put this dummy value in the
+ event field. A well-written driver will never observe this
situation.
- • When events are dropped, the device may signal this event as
- soon as the drivers makes a buffer available, in order to
- request action from the driver. In this case, of course, this
- event will be reported with the VIRTIO_SCSI_T_EVENTS_MISSED
- flag.
+ • When events are dropped, the device may signal this event as
+ soon as the drivers makes a buffer available, in order to
+ request action from the driver. In this case, of course, this
+ event will be reported with the VIRTIO_SCSI_T_EVENTS_MISSED
+ flag.
- Transport reset
+ Transport reset
#define VIRTIO_SCSI_T_TRANSPORT_RESET 1
struct virtio_scsi_event_reset {
@@ -2431,58 +2431,58 @@ contents of the event field. The following events are defined:
#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
- By sending this event, the device signals that a logical unit
- on a target has been reset, including the case of a new device
- appearing or disappearing on the bus.The device fills in all
- fields. The event field is set to
- VIRTIO_SCSI_T_TRANSPORT_RESET. The lun field addresses a
+ By sending this event, the device signals that a logical unit
+ on a target has been reset, including the case of a new device
+ appearing or disappearing on the bus.The device fills in all
+ fields. The event field is set to
+ VIRTIO_SCSI_T_TRANSPORT_RESET. The lun field addresses a
logical unit in the SCSI host.
- The reason value is one of the three #define values appearing
+ The reason value is one of the three #define values appearing
above:
- • VIRTIO_SCSI_EVT_RESET_REMOVED (“LUN/target removed”) is used
- if the target or logical unit is no longer able to receive
+ • VIRTIO_SCSI_EVT_RESET_REMOVED (“LUN/target removed”) is used
+ if the target or logical unit is no longer able to receive
commands.
- • VIRTIO_SCSI_EVT_RESET_HARD (“LUN hard reset”) is used if the
+ • VIRTIO_SCSI_EVT_RESET_HARD (“LUN hard reset”) is used if the
logical unit has been reset, but is still present.
- • VIRTIO_SCSI_EVT_RESET_RESCAN (“rescan LUN/target”) is used if
+ • VIRTIO_SCSI_EVT_RESET_RESCAN (“rescan LUN/target”) is used if
a target or logical unit has just appeared on the device.
- The “removed” and “rescan” events, when sent for LUN 0, may
- apply to the entire target. After receiving them the driver
- should ask the initiator to rescan the target, in order to
- detect the case when an entire target has appeared or
- disappeared. These two events will never be reported unless the
- VIRTIO_SCSI_F_HOTPLUG feature was negotiated between the host
+ The “removed” and “rescan” events, when sent for LUN 0, may
+ apply to the entire target. After receiving them the driver
+ should ask the initiator to rescan the target, in order to
+ detect the case when an entire target has appeared or
+ disappeared. These two events will never be reported unless the
+ VIRTIO_SCSI_F_HOTPLUG feature was negotiated between the host
and the guest.
- Events will also be reported via sense codes (this obviously
- does not apply to newly appeared buses or targets, since the
+ Events will also be reported via sense codes (this obviously
+ does not apply to newly appeared buses or targets, since the
application has never discovered them):
- • “LUN/target removed” maps to sense key ILLEGAL REQUEST, asc
+ • “LUN/target removed” maps to sense key ILLEGAL REQUEST, asc
0x25, ascq 0x00 (LOGICAL UNIT NOT SUPPORTED)
- • “LUN hard reset” maps to sense key UNIT ATTENTION, asc 0x29
+ • “LUN hard reset” maps to sense key UNIT ATTENTION, asc 0x29
(POWER ON, RESET OR BUS DEVICE RESET OCCURRED)
- • “rescan LUN/target” maps to sense key UNIT ATTENTION, asc
+ • “rescan LUN/target” maps to sense key UNIT ATTENTION, asc
0x3f, ascq 0x0e (REPORTED LUNS DATA HAS CHANGED)
- The preferred way to detect transport reset is always to use
- events, because sense codes are only seen by the driver when it
- sends a SCSI command to the logical unit or target. However, in
- case events are dropped, the initiator will still be able to
- synchronize with the actual state of the controller if the
- driver asks the initiator to rescan of the SCSI bus. During the
- rescan, the initiator will be able to observe the above sense
- codes, and it will process them as if it the driver had
- received the equivalent event.
-
- Asynchronous notification
+ The preferred way to detect transport reset is always to use
+ events, because sense codes are only seen by the driver when it
+ sends a SCSI command to the logical unit or target. However, in
+ case events are dropped, the initiator will still be able to
+ synchronize with the actual state of the controller if the
+ driver asks the initiator to rescan of the SCSI bus. During the
+ rescan, the initiator will be able to observe the above sense
+ codes, and it will process them as if it the driver had
+ received the equivalent event.
+
+ Asynchronous notification
#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
struct virtio_scsi_event_an {
@@ -2492,16 +2492,16 @@ contents of the event field. The following events are defined:
u32 reason;
}
- By sending this event, the device signals that an asynchronous
+ By sending this event, the device signals that an asynchronous
event was fired from a physical interface.
- All fields are written by the device. The event field is set to
- VIRTIO_SCSI_T_ASYNC_NOTIFY. The lun field addresses a logical
- unit in the SCSI host. The reason field is a subset of the
- events that the driver has subscribed to via the "Asynchronous
+ All fields are written by the device. The event field is set to
+ VIRTIO_SCSI_T_ASYNC_NOTIFY. The lun field addresses a logical
+ unit in the SCSI host. The reason field is a subset of the
+ events that the driver has subscribed to via the "Asynchronous
notification subscription" command.
- When dropped events are reported, the driver should poll for
+ When dropped events are reported, the driver should poll for
asynchronous events manually using SCSI commands.
@@ -2510,15 +2510,15 @@ contents of the event field. The following events are defined:
Currently there are four device-independent feature bits defined:
- VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature
- indicates that the driver wants an interrupt if the device runs
- out of available descriptors on a virtqueue, even though
- interrupts are suppressed using the VRING_AVAIL_F_NO_INTERRUPT
- flag or the used_event field. An example of this is the
- networking driver: it doesn't need to know every time a packet
- is transmitted, but it does need to free the transmitted
- packets a finite time after they are transmitted. It can avoid
- using a timer if the device interrupts it when all the packets
+ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature
+ indicates that the driver wants an interrupt if the device runs
+ out of available descriptors on a virtqueue, even though
+ interrupts are suppressed using the VRING_AVAIL_F_NO_INTERRUPT
+ flag or the used_event field. An example of this is the
+ networking driver: it doesn't need to know every time a packet
+ is transmitted, but it does need to free the transmitted
+ packets a finite time after they are transmitted. It can avoid
+ using a timer if the device interrupts it when all the packets
are transmitted.
VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
@@ -2528,15 +2528,15 @@ Currently there are four device-independent feature bits defined:
that the driver can use descriptors with the VRING_DESC_F_INDIRECT
flag set, as described in "2.3.3. Indirect Descriptors".
- VIRTIO_F_RING_EVENT_IDX(29) This feature enables the used_event
- and the avail_event fields. If set, it indicates that the
- device should ignore the flags field in the available ring
- structure. Instead, the used_event field in this structure is
- used by guest to suppress device interrupts. Further, the
- driver should ignore the flags field in the used ring
- structure. Instead, the avail_event field in this structure is
- used by the device to suppress notifications. If unset, the
- driver should ignore the used_event field; the device should
+ VIRTIO_F_RING_EVENT_IDX(29) This feature enables the used_event
+ and the avail_event fields. If set, it indicates that the
+ device should ignore the flags field in the available ring
+ structure. Instead, the used_event field in this structure is
+ used by guest to suppress device interrupts. Further, the
+ driver should ignore the flags field in the used ring
+ structure. Instead, the avail_event field in this structure is
+ used by the device to suppress notifications. If unset, the
+ driver should ignore the used_event field; the device should
ignore the avail_event field; the flags field is used
@@ -2695,7 +2695,7 @@ static inline unsigned vring_size(unsigned int num, unsigned long align)
static inline int vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old_idx)
{
- return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old_idx);
+ return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old_idx);
}
/* Get location of event indices (only with VIRTIO_RING_F_EVENT_IDX) */
@@ -2717,18 +2717,18 @@ static inline uint16_t *vring_avail_event(struct vring *vr)
2.10. Creating New Device Types
==============================
-Various considerations are necessary when creating a new device
+Various considerations are necessary when creating a new device
type.
-
+
2.10.1. How Many Virtqueues?
---------------------------
-It is possible that a very simple device will operate entirely
-through its configuration space, but most will need at least one
-virtqueue in which it will place requests. A device with both
-input and output (eg. console and network devices described here)
-need two queues: one which the driver fills with buffers to
-receive input, and one which the driver places buffers to
+It is possible that a very simple device will operate entirely
+through its configuration space, but most will need at least one
+virtqueue in which it will place requests. A device with both
+input and output (eg. console and network devices described here)
+need two queues: one which the driver fills with buffers to
+receive input, and one which the driver places buffers to
transmit output.
2.10.2. What Configuration Space Layout?
@@ -2736,16 +2736,16 @@ transmit output.
Configuration space should only be used for initialization-time
parameters. It is a limited resource with no synchronization, so for
-most uses it is better to use a virtqueue to update configuration
-information (the network device does this for filtering,
-otherwise the table in the config space could potentially be very
+most uses it is better to use a virtqueue to update configuration
+information (the network device does this for filtering,
+otherwise the table in the config space could potentially be very
large).
2.10.3. What Device Number?
--------------------------
-Currently device numbers are assigned quite freely: a simple
-request mail to the author of this document or the Linux
+Currently device numbers are assigned quite freely: a simple
+request mail to the author of this document or the Linux
virtualization mailing list[9] will be sufficient to secure a unique one.
Meanwhile for experimental drivers, use 65535 and work backwards.
@@ -2753,67 +2753,67 @@ Meanwhile for experimental drivers, use 65535 and work backwards.
2.10.4. How many MSI-X vectors? (for PCI)
-----------------------------------------
-Using the optional MSI-X capability devices can speed up
-interrupt processing by removing the need to read ISR Status
-register by guest driver (which might be an expensive operation),
-reducing interrupt sharing between devices and queues within the
-device, and handling interrupts from multiple CPUs. However, some
-systems impose a limit (which might be as low as 256) on the
-total number of MSI-X vectors that can be allocated to all
-devices. Devices and/or device drivers should take this into
-account, limiting the number of vectors used unless the device is
-expected to cause a high volume of interrupts. Devices can
-control the number of vectors used by limiting the MSI-X Table
-Size or not presenting MSI-X capability in PCI configuration
-space. Drivers can control this by mapping events to as small
-number of vectors as possible, or disabling MSI-X capability
+Using the optional MSI-X capability devices can speed up
+interrupt processing by removing the need to read ISR Status
+register by guest driver (which might be an expensive operation),
+reducing interrupt sharing between devices and queues within the
+device, and handling interrupts from multiple CPUs. However, some
+systems impose a limit (which might be as low as 256) on the
+total number of MSI-X vectors that can be allocated to all
+devices. Devices and/or device drivers should take this into
+account, limiting the number of vectors used unless the device is
+expected to cause a high volume of interrupts. Devices can
+control the number of vectors used by limiting the MSI-X Table
+Size or not presenting MSI-X capability in PCI configuration
+space. Drivers can control this by mapping events to as small
+number of vectors as possible, or disabling MSI-X capability
altogether.
2.10.5. Device Improvements
--------------------------
-Any change to configuration space, or new virtqueues, or
-behavioural changes, should be indicated by negotiation of a new
+Any change to configuration space, or new virtqueues, or
+behavioural changes, should be indicated by negotiation of a new
feature bit. This establishes clarity[11] and avoids future expansion problems.
-Clusters of functionality which are always implemented together
-can use a single bit, but if one feature makes sense without the
-others they should not be gratuitously grouped together to
-conserve feature bits. We can always extend the spec when the
+Clusters of functionality which are always implemented together
+can use a single bit, but if one feature makes sense without the
+others they should not be gratuitously grouped together to
+conserve feature bits. We can always extend the spec when the
first person needs more than 24 feature bits for their device.
FOOTNOTES:
==========
-[1] This lack of page-sharing implies that the implementation of the
-device (e.g. the hypervisor or host) needs full access to the
-guest memory. Communication with untrusted parties (i.e.
+[1] This lack of page-sharing implies that the implementation of the
+device (e.g. the hypervisor or host) needs full access to the
+guest memory. Communication with untrusted parties (i.e.
inter-guest communication) requires copying.
-[2] The Linux implementation further separates the PCI virtio code
-from the specific virtio drivers: these drivers are shared with
+[2] The Linux implementation further separates the PCI virtio code
+from the specific virtio drivers: these drivers are shared with
the non-PCI implementations (currently lguest and S/390).
[3] The actual value within this range is ignored
-[4] Historically, drivers have used the device before steps 5 and 6.
-This is only allowed if the driver does not use any features
+[4] Historically, drivers have used the device before steps 5 and 6.
+This is only allowed if the driver does not use any features
which would alter this early use of the device.
-[5] ie. once you enable MSI-X on the device, the other fields move.
+[5] ie. once you enable MSI-X on the device, the other fields move.
If you turn it off again, they move back!
-[6] The 4096 is based on the x86 page size, but it's also large
-enough to ensure that the separate parts of the virtqueue are on
+[6] The 4096 is based on the x86 page size, but it's also large
+enough to ensure that the separate parts of the virtqueue are on
separate cache lines.
-[7] These fields are kept here because this is the only part of the
+[7] These fields are kept here because this is the only part of the
virtqueue written by the device
-[8] The Linux drivers do this only for read-only buffers: for
-write-only buffers, it is assumed that the driver is merely
-trying to keep the receive buffer ring full, and no notification
+[8] The Linux drivers do this only for read-only buffers: for
+write-only buffers, it is assumed that the driver is merely
+trying to keep the receive buffer ring full, and no notification
of this expected condition is necessary.
[9] https://lists.linux-foundation.org/mailman/listinfo/virtualization
@@ -2824,19 +2824,19 @@ devices assumed it.
In addition, the specifications for virtio_blk and virtio_scsi require
intuiting field lengths from frame boundaries.
-[11] Even if it does mean documenting design or implementation
+[11] Even if it does mean documenting design or implementation
mistakes!
-[13] It was supposed to indicate segmentation offload support, but
-upon further investigation it became clear that multiple bits
+[13] It was supposed to indicate segmentation offload support, but
+upon further investigation it became clear that multiple bits
were required.
-[14] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
-dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
-features must offer the checksum feature, and a driver which
-accepts the offload features must accept the checksum feature.
-Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
+[14] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
+dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
+features must offer the checksum feature, and a driver which
+accepts the offload features must accept the checksum feature.
+Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
depending on VIRTIO_NET_F_GUEST_CSUM.
[15] This is a common restriction in real, older network cards.
@@ -2845,44 +2845,44 @@ depending on VIRTIO_NET_F_GUEST_CSUM.
the same system may not require checksumming at all, nor segmentation,
if both guests are amenable.
-[17] For example, consider a partially checksummed TCP (IPv4) packet.
-It will have a 14 byte ethernet header and 20 byte IP header
-followed by the TCP header (with the TCP checksum field 16 bytes
-into that header). csum_start will be 14+20 = 34 (the TCP
-checksum includes the header), and csum_offset will be 16. The
-value in the TCP checksum field should be initialized to the sum
-of the TCP pseudo header, so that replacing it by the ones'
-complement checksum of the TCP header and body will give the
+[17] For example, consider a partially checksummed TCP (IPv4) packet.
+It will have a 14 byte ethernet header and 20 byte IP header
+followed by the TCP header (with the TCP checksum field 16 bytes
+into that header). csum_start will be 14+20 = 34 (the TCP
+checksum includes the header), and csum_offset will be 16. The
+value in the TCP checksum field should be initialized to the sum
+of the TCP pseudo header, so that replacing it by the ones'
+complement checksum of the TCP header and body will give the
correct result.
-[18] Due to various bugs in implementations, this field is not useful
+[18] Due to various bugs in implementations, this field is not useful
as a guarantee of the transport header size.
-[19] This case is not handled by some older hardware, so is called out
+[19] This case is not handled by some older hardware, so is called out
specifically in the protocol.
-[20] Note that the header will be two bytes longer for the
+[20] Note that the header will be two bytes longer for the
VIRTIO_NET_F_MRG_RXBUF case.
-[20a] Obviously each one can be split across multiple descriptor
+[20a] Obviously each one can be split across multiple descriptor
elements.
[21] Since there are no guarentees, it can use a hash filter or
silently switch to allmulti or promiscuous mode if it is given too
many addresses.
-[22] The SCSI_CMD and SCSI_CMD_OUT types are equivalent, the device
+[22] The SCSI_CMD and SCSI_CMD_OUT types are equivalent, the device
does not distinguish between them.
[23] The FLUSH and FLUSH_OUT types are equivalent, the device does not
distinguish between them
-[25] Because this is high importance and low bandwidth, the current
-Linux implementation polls for the buffer to be used, rather than
-waiting for an interrupt, simplifying the implementation
-significantly. However, for generic serial ports with the
-O_NONBLOCK flag set, the polling limitation is relaxed and the
-consumed buffers are freed upon the next write or poll call or
+[25] Because this is high importance and low bandwidth, the current
+Linux implementation polls for the buffer to be used, rather than
+waiting for an interrupt, simplifying the implementation
+significantly. However, for generic serial ports with the
+O_NONBLOCK flag set, the polling limitation is relaxed and the
+consumed buffers are freed upon the next write or poll call or
when a port is closed or hot-unplugged.
[27] This is historical, and independent of the guest page size