summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--virtio-v1.0-wd01-part1-specification.txt4219
1 files changed, 0 insertions, 4219 deletions
diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
deleted file mode 100644
index ba3ecae..0000000
--- a/virtio-v1.0-wd01-part1-specification.txt
+++ /dev/null
@@ -1,4219 +0,0 @@
-1. INTRODUCTION
-===============
-
-This document describes the specifications of the “virtio” family of
-devices. These are devices are found in virtual environments, yet by
-design they are not all that different from physical devices, and this
-document treats them as such. This allows the guest to use standard
-drivers and discovery mechanisms.
-
-The purpose of virtio and this specification is that virtual
-environments and guests should have a straightforward, efficient,
-standard and extensible mechanism for virtual devices, rather
-than boutique per-environment or per-OS mechanisms.
-
- Straightforward: Virtio devices use normal bus mechanisms of
- interrupts and DMA which should be familiar to any device driver
- author. There is no exotic page-flipping or COW mechanism: it's just
- a normal device.[1]
-
- Efficient: Virtio devices consist of rings of descriptors
- for input and output, which are neatly separated to avoid cache
- effects from both guest and device writing to the same cache
- lines.
-
- Standard: Virtio makes no assumptions about the environment in which
- it operates, beyond supporting the bus attaching the device. Virtio
- devices are implemented over PCI and other buses, and earlier drafts
- been implemented on other buses not included in this spec.[2]
-
- Extensible: Virtio PCI devices contain feature bits which are
- acknowledged by the guest operating system during device setup.
- This allows forwards and backwards compatibility: the device
- offers all the features it knows about, and the driver
- acknowledges those it understands and wishes to use.
-
-1.1.1. Key words
------------------
-
-The key words must, must not, required, shall, shall not, should,
-should not, recommended, may, and optional are to be interpreted as
-described in [RFC 2119]. Note that for reasons of style, these words
-are not capitalized in this document.
-
-1.1.2. Definitions
--------------------
-
-term
- Definition
-
-1.1.3. Key concepts
---------------------
-
-Guest
- Definition...
-
-Host
- Definition
-
-Device
- Definition
-
-Driver
- Definition
-
-1.2. Normative References
-=========================
-
-[RFC 2119] S. Bradner, Key words for use in RFCs to Indicate Requirement Levels, http://www.ietf.org/rfc/rfc2119.txt IETF (Internet Engineering Task Force) RFC 2119, March 1997.
-
-[S390 PoP] z/Architecture Principles of Operation, IBM Publication SA22-7832
-
-[S390 Common I/O] ESA/390 Common I/O-Device and Self-Description, IBM Publication SA22-7204
-
-1.3. Non-Normative References
-=========================
-
-
-
-2. The Virtio Standard
-=========================
-
-2.1. Basic Facilities of a Virtio Device
-=======================================
-
-A virtio device is discovered and identified by a bus-specific method
-(see the bus specific sections: "2.3.1. Virtio Over PCI Bus",
-"2.3.2. Virtio Over MMIO" and "2.3.3. Virtio over channel I/O"). Each
-device consists of the following parts:
-
-o Device Status field
-o Feature bits
-o Configuration space
-o One or more virtqueues
-
-Unless explicitly specified otherwise, all multi-byte fields are little-endian.
-To reinforce this the examples use typenames like "le16" instead of "uint16_t".
-
-2.1.1. Device Status Field
--------------------------
-
-The Device Status field is updated by the guest to indicate its
-progress. This provides a simple low-level diagnostic: it's most
-useful to imagine them hooked up to traffic lights on the console
-indicating the status of each device.
-
-This field is 0 upon reset, otherwise at least one bit should be set:
-
- ACKNOWLEDGE (1) Indicates that the guest OS has found the
- device and recognized it as a valid virtio device.
-
- DRIVER (2) Indicates that the guest OS knows how to drive the
- device. Under Linux, drivers can be loadable modules so there
- may be a significant (or infinite) delay before setting this
- bit.
-
- FEATURES_OK (8) Indicates that the driver has acknowledged all the
- features it understands, and feature negotiation is complete.
-
- DRIVER_OK (4) Indicates that the driver is set up and ready to
- drive the device.
-
- FAILED (128) Indicates that something went wrong in the guest,
- and it has given up on the device. This could be an internal
- error, or the driver didn't like the device for some reason, or
- even a fatal error during device operation. The device must be
- reset before attempting to re-initialize.
-
-2.1.2. Feature Bits
-------------------
-
-Each virtio device lists all the features it understands. During
-device initialization, the guest reads this and tells the device the
-subset that it understands. The only way to renegotiate is to reset
-the device.
-
-This allows for forwards and backwards compatibility: if the device is
-enhanced with a new feature bit, older guests will not write that
-feature bit back to the device and it can go into backwards
-compatibility mode. Similarly, if a guest is enhanced with a feature
-that the device doesn't support, it see the new feature is not offered
-and can go into backwards compatibility mode (or, for poor
-implementations, set the FAILED Device Status bit).
-
-Feature bits are allocated as follows:
-
- 0 to 23: Feature bits for the specific device type
-
- 24 to 32: Feature bits reserved for extensions to the queue and
- feature negotiation mechanisms
-
- 33 and above: Feature bits reserved for future extensions.
-
-For example, feature bit 0 for a network device (i.e. Subsystem
-Device ID 1) indicates that the device supports checksumming of
-packets.
-
-In particular, new fields in the device configuration space are
-indicated by offering a feature bit, so the guest can check
-before accessing that part of the configuration space.
-
-2.1.2.1. Legacy Interface: A Note on transitions from earlier drafts
---------------------------------------
-
-Earlier drafts of this specification (up to 0.9.X) defined a similar, but
-different interface between the hypervisor and the guest.
-Since these are widely deployed, this specification
-accommodates optional features to simplify transition
-from these earlier draft interfaces. Specifically:
-
-Legacy Interface
- is an interface specified by an earlier draft of this specification
- (up to 0.9.X)
-Legacy Device
- is a device implemented before this specification was released,
- and implementing a legacy interface on the host side
-Legacy Driver
- is a driver implemented before this specification was released,
- and implementing a legacy interface on the guest side
-
-Legacy devices and legacy drivers are not compliant with this
-specification.
-
-To simplify transition from these earlier draft interfaces,
-it is possible to implement:
-
-Transitional Device
- a device supporting both drivers conforming to this
- specification, and allowing legacy drivers.
-
-Transitional Driver
- a driver supporting both devices conforming to this
- specification, and legacy devices.
-
-Transitional devices and transitional drivers can be compliant with
-this specification (ie. when not operating in legacy mode).
-
-Devices or drivers with no legacy compatibility are referred to as
-non-transitional devices and drivers, respectively.
-
-Transitional Drivers can detect Legacy Devices by detecting that
-the feature bit VIRTIO_F_VERSION_1 is not offered.
-Transitional devices can detect Legacy drivers by detecting that
-VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
-In this case device is used through the legacy interface.
-
-To make them easier to locate, specification sections documenting
-these transitional features are explicitly marked with 'Legacy
-Interface' in the section title.
-
-2.1.3. Configuration Space
--------------------------
-
-Configuration space is generally used for rarely-changing or
-initialization-time parameters. Drivers must not assume reads from
-fields greater than 32 bits wide are atomic, nor or reads from
-multiple fields.
-
-Each transport provides a generation count for the configuration
-space, which must change whenever there is a possibility that two
-accesses to the configuration space can see different versions of that
-space.
-
-Thus drivers should read configuration space fields like so:
-
- u32 before, after;
- do {
- before = get_config_generation(device);
- // read config entry/entries.
- after = get_config_generation(device);
- } while (after != before);
-
-Note that configuration space generally uses the little-endian format
-for multi-byte fields.
-
-Note that future versions of this specification will likely
-extend the configuration space for devices by adding extra fields
-at the tail end of some structures in configuration space.
-
-To allow forward compatibility with such extensions, drivers must
-not limit structure size and configuration space size. Instead,
-drivers should only check that configuration space is *large enough* to
-contain the fields required for device operation.
-
-For example, if the specification states that configuration
-space 'includes a single 8-bit field' drivers should understand this to mean that
-the configuration space can also include an arbitrary amount of
-tail padding, and accept any configuration space size equal to or
-greater than the specified 8-bit size.
-
-100.100.4.1. Legacy Interface: A Note on Configuration Space endian-ness
---------------------------------------
-
-Note that for legacy interfaces, configuration space is generally the
-guest's native endian, rather than PCI's little-endian.
-
-2.1.3.1. Legacy Interface: Configuration Space
--------------------------
-
-Legacy devices did not have a configuration generation field, thus are
-susceptible to race conditions if configuration is updated. This
-effects the block capacity and network mac fields; best practice is to
-read these fields multiple times until two reads generate a consistent
-result.
-
-2.1.4. Virtqueues
-----------------
-
-The mechanism for bulk data transport on virtio devices is
-pretentiously called a virtqueue. Each device can have zero or more
-virtqueues: for example, the simplest network device has one for
-transmit and one for receive. Each queue has a 16-bit queue size
-parameter, which sets the number of entries and implies the total size
-of the queue.
-
-Each virtqueue consists of three parts:
-
- Descriptor Table
-
- Available Ring
-
- Used Ring
-
-where each part is physically-contiguous in guest memory,
-and has different alignment requirements.
-
-The memory aligment and size requirements, in bytes, of each part of the
-virtqueue are summarized in the following table:
-
-+------------+-----------------------------------------+
-| Virtqueue Part | Alignment | Size |
-+------------+-----------------------------------------+
-+------------+-----------------------------------------+
-| Descriptor Table | 16 | 16 * (Queue Size) |
-+------------+-----------------------------------------+
-| Available Ring | 2 | 6 + 2 * (Queue Size) |
-+------------+-----------------------------------------+
-| Used Ring | 4 | 6 + 4 * (Queue Size) |
-+------------+-----------------------------------------+
-
-The Alignment column gives the miminum alignment: for each part
-of the virtqueue, the physical address of the first byte of it
-must be a multiple of the specified alignment value.
-
-The Size column gives the total number of bytes required for each
-part of the virtqueue.
-
-Queue Size corresponds to the maximum number of buffers in the
-virtqueue. For example, if Queue Size is 4 then at most 4 buffers
-can be queued at any given time. Queue Size value is always a
-power of 2. The maximum Queue Size value is 32768. This value
-is specified in a bus-specific way.
-
-When the driver wants to send a buffer to the device, it fills in
-a slot in the descriptor table (or chains several together), and
-writes the descriptor index into the available ring. It then
-notifies the device. When the device has finished a buffer, it
-writes the descriptor into the used ring, and sends an interrupt.
-
-
-2.1.4.1. Legacy Interfaces: A Note on Virtqueue Layout
---------------------------------------
-
-For Legacy Interfaces, several additional
-restrictions are placed on the virtqueue layout:
-
-Each virtqueue occupies two or more physically-contiguous pages
-(usually defined as 4096 bytes, but depending on the transport)
-and consists of three parts:
-
-+-------------------+-----------------------------------+-----------+
-| Descriptor Table | Available Ring (padding) | Used Ring |
-+-------------------+-----------------------------------+-----------+
-
-The bus-specific Queue Size field controls the total number of bytes
-required for the virtqueue according to the following formula:
-
- #define ALIGN(x) (((x) + PAGE_SIZE) & ~PAGE_SIZE)
- static inline unsigned vring_size(unsigned int qsz)
- {
- return ALIGN(sizeof(struct vring_desc)*qsz + sizeof(u16)*(3 + qsz))
- + ALIGN(sizeof(u16)*3 + sizeof(struct vring_used_elem)*qsz);
- }
-
-This wastes some space with padding.
-The legacy virtqueue layout structure therefore looks like this:
-
- struct vring {
- // The actual descriptors (16 bytes each)
- struct vring_desc desc[ Queue Size ];
-
- // A ring of available descriptor heads with free-running index.
- struct vring_avail avail;
-
- // Padding to the next PAGE_SIZE boundary.
- char pad[ Padding ];
-
- // A ring of used descriptor heads with free-running index.
- struct vring_used used;
- };
-
-2.1.4.100. Legacy Interfaces: A Note on Virtqueue Endianness
---------------------------------------
-
-Note that the endian of fields and in the virtqueue is the native
-endian of the guest, not little-endian as specified by this standard.
-It is assumed that the host is already aware of the guest endian.
-
-2.1.4.2. Message Framing
------------------------
-The message framing (the particular layout of descriptors) is
-independent of the contents of the buffers. For example, a network
-transmit buffer consists of a 12 byte header followed by the network
-packet. This could be most simply placed in the descriptor table as a
-12 byte output descriptor followed by a 1514 byte output descriptor,
-but it could also consist of a single 1526 byte output descriptor in
-the case where the header and packet are adjacent, or even three or
-more descriptors (possibly with loss of efficiency in that case).
-
-Note that, some implementations may have large-but-reasonable
-restrictions on total descriptor size (such as based on IOV_MAX in the
-host OS). This has not been a problem in practice: little sympathy
-will be given to drivers which create unreasonably-sized descriptors
-such as by dividing a network packet into 1500 single-byte
-descriptors!
-
-2.1.4.2.1. Legacy Interface: Message Framing
------------------------
-
-Regrettably, initial driver implementations used simple layouts, and
-devices came to rely on it, despite this specification wording. In
-addition, the specification for virtio_blk SCSI commands required
-intuiting field lengths from frame boundaries (see
- "2.4.2.5.1. Legacy Interface: Device Operation")
-
-It is thus recommended that when using legacy interfaces, transitional
-drivers be conservative in their assumptions, unless the
-VIRTIO_F_ANY_LAYOUT feature is accepted.
-
-2.1.4.3. The Virtqueue Descriptor Table
---------------------------------------
-
-The descriptor table refers to the buffers the guest is using for
-the device. The addresses are physical addresses, and the buffers
-can be chained via the next field. Each descriptor describes a
-buffer which is read-only or write-only, but a chain of
-descriptors can contain both read-only and write-only buffers.
-
-The actual contents of the memory offered to the device depends on the
-device type. Most common is to begin the data with a header
-(containing little-endian fields) for the device to read, and postfix
-it with a status tailer for the device to write.
-
-No descriptor chain may be more than 2^32 bytes long in total.
-
- struct vring_desc {
- /* Address (guest-physical). */
- le64 addr;
- /* Length. */
- le32 len;
-
- /* This marks a buffer as continuing via the next field. */
- #define VRING_DESC_F_NEXT 1
- /* This marks a buffer as write-only (otherwise read-only). */
- #define VRING_DESC_F_WRITE 2
- /* This means the buffer contains a list of buffer descriptors. */
- #define VRING_DESC_F_INDIRECT 4
- /* The flags as indicated above. */
- le16 flags;
- /* Next field if flags & NEXT */
- le16 next;
- };
-
-The number of descriptors in the table is defined by the queue size
-for this virtqueue.
-
-2.1.4.3.1. Indirect Descriptors
-------------------------------
-
-Some devices benefit by concurrently dispatching a large number
-of large requests. The VIRTIO_RING_F_INDIRECT_DESC feature can be
-used to allow this (see "2.6. Reserved Feature Bits"). To increase
-ring capacity it is possible to store a table of indirect
-descriptors anywhere in memory, and insert a descriptor in main
-virtqueue (with flags&VRING_DESC_F_INDIRECT on) that refers to memory buffer
-containing this indirect descriptor table; fields addr and len
-refer to the indirect table address and length in bytes,
-respectively. The indirect table layout structure looks like this
-(len is the length of the descriptor that refers to this table,
-which is a variable, so this code won't compile):
-
- struct indirect_descriptor_table {
- /* The actual descriptors (16 bytes each) */
- struct vring_desc desc[len / 16];
- };
-
-The first indirect descriptor is located at start of the indirect
-descriptor table (index 0), additional indirect descriptors are
-chained by next field. An indirect descriptor without next field
-(with flags&VRING_DESC_F_NEXT off) signals the end of the descriptor.
-An
-indirect descriptor can not refer to another indirect descriptor
-table (flags&VRING_DESC_F_INDIRECT must be off). A single indirect descriptor
-table can include both read-only and write-only descriptors;
-write-only flag (flags&VRING_DESC_F_WRITE) in the descriptor that refers to it
-is ignored.
-
-2.1.4.4. The Virtqueue Available Ring
-------------------------------------
-
-The available ring refers to what descriptor chains we are offering the
-device: each entry refers to the head of a descriptor chain. The “flags” field
-is currently 0 or 1: 1 indicating that we do not need an interrupt
-when the device consumes a descriptor chain from the available
-ring. Alternatively, the guest can ask the device to delay interrupts
-until an entry with an index specified by the “used_event” field is
-written in the used ring (equivalently, until the idx field in the
-used ring will reach the value used_event + 1). The method employed by
-the device is controlled by the VIRTIO_RING_F_EVENT_IDX feature bit
-(see "2.6. Reserved Feature Bits"). This interrupt suppression is
-merely an optimization; it may not suppress interrupts entirely.
-
-The “idx” field indicates where we would put the next descriptor
-entry (modulo the queue size). This starts at 0, and increases.
-
- struct vring_avail {
- #define VRING_AVAIL_F_NO_INTERRUPT 1
- le16 flags;
- le16 idx;
- le16 ring[ /* Queue Size */ ];
- le16 used_event; /* Only if VIRTIO_RING_F_EVENT_IDX */
- };
-
-2.1.4.5. The Virtqueue Used Ring
--------------------------------
-
-The used ring is where the device returns buffers once it is done
-with them. The flags field can be used by the device to hint that
-no notification is necessary when the guest adds to the available
-ring. Alternatively, the “avail_event” field can be used by the
-device to hint that no notification is necessary until an entry
-with an index specified by the “avail_event” is written in the
-available ring (equivalently, until the idx field in the
-available ring will reach the value avail_event + 1). The method
-employed by the device is controlled by the guest through the
-VIRTIO_RING_F_EVENT_IDX feature bit (
-see "2.6. Reserved Feature Bits").[7]
-
-Each entry in the ring is a pair: the head entry of the
-descriptor chain describing the buffer (this matches an entry
-placed in the available ring by the guest earlier), and the total
-of bytes written into the buffer. The latter is extremely useful
-for guests using untrusted buffers: if you do not know exactly
-how much has been written by the device, you usually have to zero
-the buffer to ensure no data leakage occurs.
-
- /* le32 is used here for ids for padding reasons. */
- struct vring_used_elem {
- /* Index of start of used descriptor chain. */
- le32 id;
- /* Total length of the descriptor chain which was used (written to) */
- le32 len;
- };
-
- struct vring_used {
- #define VRING_USED_F_NO_NOTIFY 1
- le16 flags;
- le16 idx;
- struct vring_used_elem ring[ /* Queue Size */];
- le16 avail_event; /* Only if VIRTIO_RING_F_EVENT_IDX */
- };
-
-2.1.4.6. Helpers for Operating Virtqueues
-----------------------------------------
-
-The Linux Kernel Source code contains the definitions above and
-helper routines in a more usable form, in
-include/linux/virtio_ring.h. This was explicitly licensed by IBM
-and Red Hat under the (3-clause) BSD license so that it can be
-freely used by all other projects, and is reproduced (with slight
-variation to remove Linux assumptions) in "2.6. virtio_ring.h".
-
-2.2. General Initialization And Device Operation
-===============================================
-
-We start with an overview of device initialization, then expand on the
-details of the device and how each step is preformed. This section
-should be read along with the bus-specific section which describes
-how to communicate with the specific device.
-
-2.2.1. Device Initialization
----------------------------
-
-1. Reset the device.
-
-2. The ACKNOWLEDGE status bit is set: we have noticed the device.
-
-3. The DRIVER status bit is set: we know how to drive the device.
-
-4. Device feature bits are read, and the the subset of feature bits
- understood by the OS and driver is written to the device.
-
-5. The FEATURES_OK status bit is set.
-
-6. The status byte is re-read to ensure the FEATURES_OK bit is still
- set: otherwise, the device does not support our subset of features
- and the device is unusable.
-
-7. Device-specific setup, including discovery of virtqueues for the
- device, optional per-bus setup, reading and possibly writing the
- device's virtio configuration space, and population of virtqueues.
-
-8. The DRIVER_OK status bit is set. At this point the device is
- "live".
-
-If any of these steps go irrecoverably wrong, the guest should
-set the FAILED status bit to indicate that it has given up on the
-device (it can reset the device later to restart if desired).
-
-The device must not consume buffers before DRIVER_OK, and the driver
-must not notify the device before it sets DRIVER_OK.
-
-Devices should support all valid combinations of features, but we know
-that implementations may well make assuptions that they will only be
-used by fully-optimized drivers. The resetting of the FEATURES_OK flag
-provides a semi-graceful failure mode for this case.
-
-2.2.1.1. Legacy Interface: Device Initialization
----------------------------
-Legacy devices do not support the FEATURES_OK status bit, and thus did
-not have a graceful way for the device to indicate unsupported feature
-combinations. It also did not provide a clear mechanism to end
-feature negotiation, which meant that devices finalized features on
-first-use, and no features could be introduced which radically changed
-the initial operation of the device.
-
-Legacy device implementations often used the device before setting the
-DRIVER_OK bit.
-
-The result was the steps 5 and 6 were omitted, and steps 7 and 8
-were conflated.
-
-2.2.2. Device Operation
-----------------------
-
-There are two parts to device operation: supplying new buffers to
-the device, and processing used buffers from the device. As an
-example, the simplest virtio network device has two virtqueues: the
-transmit virtqueue and the receive virtqueue. The driver adds
-outgoing (read-only) packets to the transmit virtqueue, and then
-frees them after they are used. Similarly, incoming (write-only)
-buffers are added to the receive virtqueue, and processed after
-they are used.
-
-2.2.2.1. Supplying Buffers to The Device
----------------------------------------
-
-Actual transfer of buffers from the guest OS to the device
-operates as follows:
-
-1. Place the buffer(s) into free descriptor(s).
-
-2. Place the id of the buffer in the next ring entry of the
- available ring.
-
-3. The steps (1) and (2) may be performed repeatedly if batching
- is possible.
-
-4. A memory barrier should be executed to ensure the device sees
- the updated descriptor table and available ring before the next
- step.
-
-5. The available “idx” field should be increased by the number of
- entries added to the available ring.
-
-6. A memory barrier should be executed to ensure that we update
- the idx field before checking for notification suppression.
-
-7. If notifications are not suppressed, the device should be
- notified of the new buffers.
-
-Note that the above code does not take precautions against the
-available ring buffer wrapping around: this is not possible since
-the ring buffer is the same size as the descriptor table, so step
-(1) will prevent such a condition.
-
-In addition, the maximum queue size is 32768 (it must be a power
-of 2 which fits in 16 bits), so the 16-bit “idx” value can always
-distinguish between a full and empty buffer.
-
-Here is a description of each stage in more detail.
-
-2.2.2.1.1. Placing Buffers Into The Descriptor Table
----------------------------------------------------
-
-A buffer consists of zero or more read-only physically-contiguous
-elements followed by zero or more physically-contiguous
-write-only elements (it must have at least one element). This
-algorithm maps it into the descriptor table:
-
-for each buffer element, b:
-
- (a) Get the next free descriptor table entry, d
-
- (b) Set d.addr to the physical address of the start of b
-
- (c) Set d.len to the length of b.
-
- (d) If b is write-only, set d.flags to VRING_DESC_F_WRITE,
- otherwise 0.
-
- (e) If there is a buffer element after this:
-
- i. Set d.next to the index of the next free descriptor
- element.
-
- ii. Set the VRING_DESC_F_NEXT bit in d.flags.
-
-In practice, the d.next fields are usually used to chain free
-descriptors, and a separate count kept to check there are enough
-free descriptors before beginning the mappings.
-
-2.2.2.1.2. Updating The Available Ring
--------------------------------------
-
-The head of the buffer we mapped is the first d in the algorithm
-above. A naive implementation would do the following (with the
-appropriate conversion to-and-from little-endian assumed):
-
- avail->ring[avail->idx % qsz] = head;
-
-However, in general we can add many descriptor chains before we update
-the “idx” field (at which point they become visible to the
-device), so we keep a counter of how many we've added:
-
- avail->ring[(avail->idx + added++) % qsz] = head;
-
-2.2.2.1.3. Updating The Index Field
-----------------------------------
-
-Once the index field of the virtqueue is updated, the device will
-be able to access the descriptor chains we've created and the
-memory they refer to. This is why a memory barrier is generally
-used before the index update, to ensure it sees the most up-to-date
-copy.
-
-The index field always increments, and we let it wrap naturally at
-65536:
-
- avail->idx += added;
-
-2.2.2.1.4. Notifying The Device
-------------------------------
-
-The actual method of device notification is bus-specific, but generally
-it can be expensive. So the device can suppress such notifications if it
-doesn't need them. We have to be careful to expose the new index
-value before checking if notifications are suppressed: it's OK to notify
-gratuitously, but not to omit a required notification. So again,
-we use a memory barrier here before reading the flags or the
-avail_event field.
-
-If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated, and if the
-VRING_USED_F_NOTIFY flag is not set, we go ahead and notify the
-device.
-
-If the VIRTIO_F_RING_EVENT_IDX feature is negotiated, we read the
-avail_event field in the available ring structure. If the
-available index crossed_the avail_event field value since the
-last notification, we go ahead and write to the PCI configuration
-space. The avail_event field wraps naturally at 65536 as well,
-iving the following algorithm for calculating whether a device needs
-notification:
-
- (u16)(new_idx - avail_event - 1) < (u16)(new_idx - old_idx)
-
-2.2.2.2. Receiving Used Buffers From The Device
-----------------------------------------------
-
-Once the device has used a buffer (read from or written to it, or
-parts of both, depending on the nature of the virtqueue and the
-device), it sends an interrupt, following an algorithm very
-similar to the algorithm used for the driver to send the device a
-buffer:
-
-1. Write the head descriptor number to the next field in the used
- ring.
-
-2. Update the used ring index.
-
-3. Deliver an interrupt if necessary:
-
- (a) If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated:
- check if the VRING_AVAIL_F_NO_INTERRUPT flag is not set in
- avail->flags.
-
- (b) If the VIRTIO_F_RING_EVENT_IDX feature is negotiated: check
- whether the used index crossed the used_event field value
- since the last update. The used_event field wraps naturally
- at 65536 as well:
- (u16)(new_idx - used_event - 1) < (u16)(new_idx - old_idx)
-
-For each ring, guest should then disable interrupts by writing
-VRING_AVAIL_F_NO_INTERRUPT flag in avail structure, if required.
-It can then process used ring entries finally enabling interrupts
-by clearing the VRING_AVAIL_F_NO_INTERRUPT flag or updating the
-EVENT_IDX field in the available structure. The guest should then
-execute a memory barrier, and then recheck the ring empty
-condition. This is necessary to handle the case where after the
-last check and before enabling interrupts, an interrupt has been
-suppressed by the device:
-
- vring_disable_interrupts(vq);
-
- for (;;) {
- if (vq->last_seen_used != le16_to_cpu(vring->used.idx)) {
- vring_enable_interrupts(vq);
- mb();
-
- if (vq->last_seen_used != le16_to_cpu(vring->used.idx))
- break;
- }
-
- struct vring_used_elem *e = vring.used->ring[vq->last_seen_used%vsz];
- process_buffer(e);
- vq->last_seen_used++;
- }
-
-2.2.2.3. Notification of Device Configuration Changes
-----------------------------------------------------
-
-For devices where the configuration information can be changed, an
-interrupt is delivered when a configuration change occurs.
-
-
-
-2.3. Virtio Transport Options
-============================
-
-Virtio can use various different busses, thus the standard is split
-into virtio general and bus-specific sections.
-
-2.3.1. Virtio Over PCI Bus
--------------------------
-
-Virtio devices are commonly implemented as PCI devices.
-
-2.3.1.1. PCI Device Discovery
-----------------------------
-
-Any PCI device with Vendor ID 0x1AF4, and Device ID 0x1000 through
-0x103F inclusive is a virtio device[3].
-
-The Subsystem Device ID indicates which virtio device is
-supported by the device. The Subsystem Vendor ID should reflect
-the PCI Vendor ID of the environment (it's currently only used
-for informational purposes by the guest).
-
-All Drivers must match devices with any Revision ID, this
-is to allow devices to be versioned without breaking drivers.
-
-2.3.1.1.1. Legacy Interfaces: A Note on PCI Device Discovery
--------------------------
-Transitional devices must have a Revision ID of 0 to match
-legacy drivers.
-
-Non-transitional devices must have a Revision ID of 1 or higher.
-
-Both transitional and non-transitional drivers must match
-any Revision ID value.
-
-2.3.1.2. PCI Device Layout
--------------------------
-
-To configure the device,
-use I/O and/or memory regions and/or PCI configuration space of the PCI device.
-These contain the virtio header registers, the notification register, the
-ISR status register and device specific registers, as specified by Virtio
-+ Structure PCI Capabilities
-
-There may be different widths of accesses to the I/O region; the
-“natural” access method for each field must be
-used (i.e. 32-bit accesses for 32-bit fields, etc).
-
-PCI Device Configuration Layout includes the common configuration,
-ISR, notification and device specific configuration
-structures.
-
-Unless explicitly specified otherwise, all multi-byte fields are little-endian.
-
-2.3.1.2.1. Common configuration structure layout
--------------------------
-Common configuration structure layout is documented below:
-
- struct virtio_pci_common_cfg {
- /* About the whole device. */
- le32 device_feature_select; /* read-write */
- le32 device_feature; /* read-only */
- le32 guest_feature_select; /* read-write */
- le32 guest_feature; /* read-write */
- le16 msix_config; /* read-write */
- le16 num_queues; /* read-only */
- u8 device_status; /* read-write */
- u8 config_generation; /* read-only */
-
- /* About a specific virtqueue. */
- le16 queue_select; /* read-write */
- le16 queue_size; /* read-write, power of 2, or 0. */
- le16 queue_msix_vector; /* read-write */
- le16 queue_enable; /* read-write */
- le16 queue_notify_off; /* read-only */
- le64 queue_desc; /* read-write */
- le64 queue_avail; /* read-write */
- le64 queue_used; /* read-write */
- };
-
-device_feature_select
-
- Selects which Feature Bits does device_feature field refer to.
- Value 0x0 selects Feature Bits 0 to 31
- Value 0x1 selects Feature Bits 32 to 63
- All other values cause reads from device_feature to return 0.
-
-device_feature
-
- Used by Device to report Feature Bits to Driver.
- Device Feature Bits selected by device_feature_select.
-
-guest_feature_select
-
- Selects which Feature Bits does guest_feature field refer to.
- Value 0x0 selects Feature Bits 0 to 31
- Value 0x1 selects Feature Bits 32 to 63
- When set to any other value, reads from guest_feature
- return 0, writing 0 into guest_feature has no effect, and
- writing any other value into guest_feature is an error.
-
-guest_feature
-
- Used by Driver to acknowledge Feature Bits to Device.
- Guest Feature Bits selected by guest_feature_select.
-
-msix_config
-
- Configuration Vector for MSI-X.
-
-num_queues
-
- Specifies the maximum number of virtqueues supported by device.
-
-device_status
-
- Device Status field. Writing 0 into this field resets the
- device.
-
-config_generation
-
- Configuration atomicity value. Changes every time the
- configuration noticeably changes. This means the device may
- only change the value after a configuration read operation,
- but it must change if there is any risk of a device seeing an
- inconsistent configuration state.
-
-queue_select
-
- Queue Select. Selects which virtqueue do other fields refer to.
-
-queue_size
-
- Queue Size. On reset, specifies the maximum queue size supported by
- the hypervisor. This can be modified by driver to reduce memory requirements.
- Set to 0 if this virtqueue is unused.
-
-queue_msix_vector
-
- Queue Vector for MSI-X.
-
-queue_enable
-
- Used to selectively prevent host from executing requests from this virtqueue.
- 1 - enabled; 0 - disabled
-
-queue_notify_off
-
- Used to calculate the offset from start of Notification structure at
- which this virtqueue is located.
- Note: this is *not* an offset in bytes. See notify_off_multiplier below.
-
-queue_desc
-
- Physical address of Descriptor Table.
-
-queue_avail
-
- Physical address of Available Ring.
-
-queue_used
-
- Physical address of Used Ring.
-
-2.3.1.2.2. ISR status structure layout
--------------------------
-ISR status structure includes a single 8-bit ISR status field.
-
-2.3.1.2.3. Notification structure layout
--------------------------
-Notification structure is always a multiple of 2 bytes in size.
-It includes 2-byte Queue Notify fields for each virtqueue of
-the device. Note that multiple virtqueues can use the same
-Queue Notify field, if necessary.
-
-2.3.1.2.4. Device specific structure
--------------------------
-
-Device specific structure is optional.
-
-2.3.1.2.5. Legacy Interfaces: A Note on PCI Device Layout
--------------------------
-
-Transitional devices should present part of configuration
-registers in a legacy configuration structure in BAR0 in the first I/O
-region of the PCI device, as documented below.
-
-There may be different widths of accesses to the I/O region; the
-“natural” access method for each field in the virtio header must be
-used (i.e. 32-bit accesses for 32-bit fields, etc), but
-when accessed through the legacy interface the
-device-specific region can be accessed using any width accesses, and
-should obtain the same results.
-
-Note that this is possible because while the virtio header is PCI
-(i.e. little) endian, when using the legacy interface the device-specific
-region is encoded in the native endian of the guest (where such distinction is
-applicable).
-
-When used through the legacy interface, the virtio header looks as follows:
-
-+------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
-| Bits || 32 | 32 | 32 | 16 | 16 | 16 | 8 | 8 |
-+------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
-| Read/Write || R | R+W | R+W | R | R+W | R+W | R+W | R |
-+------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
-| Purpose || Device | Guest | Queue | Queue | Queue | Queue | Device | ISR |
-| || Features bits 0:31 | Features bits 0:31 | Address | Size | Select | Notify | Status | Status |
-+------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+
-
-
-If MSI-X is enabled for the device, two additional fields
-immediately follow this header:
-
-
-+------------++----------------+--------+
-| Bits || 16 | 16 |
-+------------++----------------+--------+
-| Read/Write || R+W | R+W |
-+------------++----------------+--------+
-| Purpose || Configuration | Queue |
-| (MSI-X) || Vector | Vector |
-+------------++----------------+--------+
-
-Note: When MSI-X capability is enabled, device specific configuration starts at
-byte offset 24 in virtio header structure. When MSI-X capability is not
-enabled, device specific configuration starts at byte offset 20 in virtio
-header. ie. once you enable MSI-X on the device, the other fields move.
-If you turn it off again, they move back!
-
-Immediately following these general headers, there may be
-device-specific headers:
-
-+------------++--------------------+
-| Bits || Device Specific |
-+------------++--------------------+
-| Read/Write || Device Specific |
-+------------++--------------------+
-| Purpose || Device Specific... |
-| || |
-+------------++--------------------+
-
-Note that only Feature Bits 0 to 31 are accessible through the
-Legacy Interface. When used through the Legacy Interface,
-Transitional Devices must assume that Feature Bits 32 to 63
-are not acknowledged by Driver.
-
-As legacy devices had no configuration generation field,
-see "2.1.3.1. Legacy Interface: Configuration Space" for workarounds.
-
-2.3.1.3. PCI-specific Initialization And Device Operation
---------------------------------------------------------
-
-2.3.1.3.1. Device Initialization
--------------------------------
-
-This documents PCI-specific steps executed during Device Initialization.
-As the first step, driver must detect device configuration layout
-to locate configuration fields in memory,I/O or configuration space of the
-device.
-
-100.100.1.3.1.1. Virtio Device Configuration Layout Detection
--------------------------------
-
-As a prerequisite to device initialization, driver executes a
-PCI capability list scan, detecting virtio configuration layout using Virtio
-Structure PCI capabilities.
-
-Virtio Device Configuration Layout includes virtio configuration header, Notification
-and ISR Status and device configuration structures.
-Each structure can be mapped by a Base Address register (BAR) belonging to
-the function, located beginning at 10h in Configuration Space,
-or accessed though PCI configuration space.
-
-Actual location of each structure is specified using vendor-specific PCI capability located
-on capability list in PCI configuration space of the device.
-This virtio structure capability uses little-endian format; all bits are
-read-only:
-
- struct virtio_pci_cap {
- u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
- u8 cap_next; /* Generic PCI field: next ptr. */
- u8 cap_len; /* Generic PCI field: capability length */
- u8 cfg_type; /* Identifies the structure. */
- u8 bar; /* Where to find it. */
- u8 padding[3]; /* Pad to full dword. */
- le32 offset; /* Offset within bar. */
- le32 length; /* Length of the structure, in bytes. */
- };
-
-This structure can optionally followed by extra data, depending on
-other fields, as documented below.
-
-Note that future versions of this specification will likely
-extend devices by adding extra fields at the tail end of some structures.
-
-To allow forward compatibility with such extensions, drivers must
-not limit structure size. Instead, drivers should only
-check that structures are *large enough* to contain the fields
-required for device operation.
-
-For example, if the specification states 'structure includes a
-single 8-bit field' drivers should understand this to mean that
-the structure can also include an arbitrary amount of tail padding,
-and accept any structure size equal to or greater than the
-specified 8-bit size.
-
-The fields are interpreted as follows:
-
-cap_vndr
- 0x09; Identifies a vendor-specific capability.
-
-cap_next
- Link to next capability in the capability list in the configuration space.
-
-cap_len
- Length of the capability structure, including the whole of
- struct virtio_pci_cap, and extra data if any.
- This length might include padding, or fields unused by the driver.
-
-cfg_type
- identifies the structure, according to the following table.
-
- /* Common configuration */
- #define VIRTIO_PCI_CAP_COMMON_CFG 1
- /* Notifications */
- #define VIRTIO_PCI_CAP_NOTIFY_CFG 2
- /* ISR Status */
- #define VIRTIO_PCI_CAP_ISR_CFG 3
- /* Device specific configuration */
- #define VIRTIO_PCI_CAP_DEVICE_CFG 4
- /* PCI configuration access */
- #define VIRTIO_PCI_CAP_PCI_CFG 5
-
- Any other value - reserved for future use. Drivers must
- ignore any vendor-specific capability structure which has
- a reserved cfg_type value.
-
- More than one capability can identify the same structure - this makes it
- possible for the device to expose multiple interfaces to drivers. The order of
- the capabilities in the capability list specifies the order of preference
- suggested by the device; drivers should use the first interface that they can
- support. For example, on some hypervisors, notifications using IO accesses are
- faster than memory accesses. In this case, hypervisor can expose two
- capabilities with cfg_type set to VIRTIO_PCI_CAP_NOTIFY_CFG:
- the first one addressing an I/O BAR, the second one addressing a memory BAR.
- Driver will use the I/O BAR if I/O resources are available, and fall back on
- memory BAR when I/O resources are unavailable.
-
-bar
- values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
- the function located beginning at 10h in Configuration Space
- and used to map the structure into Memory or I/O Space.
- The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
- or I/O Space.
-
- Any other value - reserved for future use. Drivers must
- ignore any vendor-specific capability structure which has
- a reserved bar value.
-
-offset
- indicates where the structure begins relative to the base address associated
- with the BAR.
-
-length
- indicates the length of the structure.
- This size might include padding, or fields unused by the driver.
- Drivers are also recommended to only map part of configuration structure
- large enough for device operation.
- For example, a future device might present a large structure size of several
- MBytes.
- As current devices never utilize structures larger than 4KBytes in size,
- driver can limit the mapped structure size to e.g.
- 4KBytes to allow forward compatibility with such devices without loss of
- functionality and without wasting resources.
-
-
-If cfg_type is VIRTIO_PCI_CAP_NOTIFY_CFG this structure is immediately followed
-by additional fields:
-
- struct virtio_pci_notify_cap {
- struct virtio_pci_cap cap;
- le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
- };
-
-notify_off_multiplier
-
- Virtqueue offset multiplier, in bytes. Must be even and either a power of two, or 0.
- Value 0x1 is reserved.
- For a given virtqueue, the address to use for notifications is calculated as follows:
-
- queue_notify_off * notify_off_multiplier + offset
-
- If notify_off_multiplier is 0, all virtqueues use the same address in
- the Notifications structure!
-
-If cfg_type is VIRTIO_PCI_CAP_PCI_CFG the fields bar, offset and length are RW
-and this structure is immediately followed by an additional field:
-
- struct virtio_pci_cfg_cap {
- __u8 pci_cfg_data[4]; /* Data for BAR access. */
- };
-
-pci_cfg_data
-
- This RW field allows an indirect access to any BAR on the
- device using PCI configuration accesses.
-
- The BAR to access is selected using the bar field.
- The length of the access is specified by the length
- field, which can be set to 1, 2 and 4.
- The offset within the BAR is specified by the offset
- field, which must be aligned to length bytes.
-
- After this field is written by driver, the first length
- bytes in pci_cfg_data are written at the selected
- offset in the selected BAR.
-
- When this field is read by driver, length bytes at the
- selected offset in the selected BAR are read into pci_cfg_data.
-
-100.100.1.3.1.1.1. Legacy Interface: A Note on Device Layout Detection
--------------------------------
-
-Legacy drivers skipped Device Layout Detection step, assuming legacy
-configuration space in BAR0 in I/O space unconditionally.
-
-Legacy devices did not have the Virtio PCI Capability in their
-capability list.
-
-Therefore:
-
-Transitional devices should expose the Legacy Interface in I/O
-space in BAR0.
-
-Transitional drivers should look for the Virtio PCI
-Capabilities on the capability list.
-If these are not present, driver should assume a legacy device.
-
-Non-transitional drivers should look for the Virtio PCI
-Capabilities on the capability list.
-If these are not present, driver should assume a legacy device,
-and fail gracefully.
-
-Non-transitional devices, on a platform where a legacy driver for
-a legacy device with the same ID might have previously existed,
-must take the following steps to fail gracefully when a legacy
-driver attempts to drive them:
-
-1) Present an I/O BAR in BAR0, and
-2) Respond to a single-byte zero write to offset 18
- (corresponding to Device Status register in the legacy layout)
- of BAR0 by presenting zeroes on every BAR and ignoring writes.
-
-2.3.1.3.1.1. Queue Vector Configuration
---------------------------------------
-
-When MSI-X capability is present and enabled in the device
-(through standard PCI configuration space) Configuration/Queue
-MSI-X Vector registers are used to map configuration change and queue
-interrupts to MSI-X vectors. In this case, the ISR Status is unused.
-
-Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of
-Configuration/Queue Vector registers, maps interrupts triggered
-by the configuration change/selected queue events respectively to
-the corresponding MSI-X vector. To disable interrupts for a
-specific event type, unmap it by writing a special NO_VECTOR
-value:
-
- /* Vector value used to disable MSI for queue */
- #define VIRTIO_MSI_NO_VECTOR 0xffff
-
-Reading these registers returns vector mapped to a given event,
-or NO_VECTOR if unmapped. All queue and configuration change
-events are unmapped by default.
-
-Note that mapping an event to vector might require allocating
-internal device resources, and might fail. Devices report such
-failures by returning the NO_VECTOR value when the relevant
-Vector field is read. After mapping an event to vector, the
-driver must verify success by reading the Vector field value: on
-success, the previously written value is returned, and on
-failure, NO_VECTOR is returned. If a mapping failure is detected,
-the driver can retry mapping with fewervectors, or disable MSI-X.
-
-2.3.1.3.1.2. Virtqueue Configuration
------------------------------------
-
-As a device can have zero or more virtqueues for bulk data
-transport (for example, the simplest network device has two), the driver
-needs to configure them as part of the device-specific
-configuration.
-
-This is done as follows, for each virtqueue a device has:
-
-1. Write the virtqueue index (first queue is 0) to the Queue
- Select field.
-
-2. Read the virtqueue size from the Queue Size field, which is
- always a power of 2. This controls how big the virtqueue is
- (see "2.1.4. Virtqueues"). If this field is 0, the virtqueue does not exist.
-
-3. Optionally, select a smaller virtqueue size and write it in the Queue Size
- field.
-
-4. Allocate and zero Descriptor Table, Available and Used rings for the
- virtqueue in contiguous physical memory.
-
-5. Optionally, if MSI-X capability is present and enabled on the
- device, select a vector to use to request interrupts triggered
- by virtqueue events. Write the MSI-X Table entry number
- corresponding to this vector in Queue Vector field. Read the
- Queue Vector field: on success, previously written value is
- returned; on failure, NO_VECTOR value is returned.
-
-100.100.1.3.1.4.1. Legacy Interface: A Note on Virtqueue Configuration
------------------------------------
-When using the legacy interface, the page size for a virtqueue on a PCI virtio
-device is defined as 4096 bytes. Driver writes the physical address, divided
-by 4096 to the Queue Address field [6].
-
-2.3.1.3.2. Notifying The Device
-------------------------------
-
-Device notification occurs by writing the 16-bit virtqueue index
-of this virtqueue to the Queue Notify field.
-
-2.3.1.3.3. Virtqueue Interrupts From The Device
-----------------------------------------------
-
-If an interrupt is necessary:
-
- (a) If MSI-X capability is disabled:
-
- i. Set the lower bit of the ISR Status field for the device.
-
- ii. Send the appropriate PCI interrupt for the device.
-
- (b) If MSI-X capability is enabled:
-
- i. Request the appropriate MSI-X interrupt message for the
- device, Queue Vector field sets the MSI-X Table entry
- number.
-
- ii. If Queue Vector field value is NO_VECTOR, no interrupt
- message is requested for this event.
-
-The guest interrupt handler should:
-
-1. If MSI-X capability is disabled: read the ISR Status field,
- which will reset it to zero. If the lower bit is zero, the
- interrupt was not for this device. Otherwise, the guest driver
- should look through the used rings of each virtqueue for the
- device, to see if any progress has been made by the device
- which requires servicing.
-
-2. If MSI-X capability is enabled: look through the used rings of
- each virtqueue mapped to the specific MSI-X vector for the
- device, to see if any progress has been made by the device
- which requires servicing.
-
-2.3.1.3.4. Notification of Device Configuration Changes
-------------------------------------------------------
-
-Some virtio PCI devices can change the device configuration
-state, as reflected in the virtio header in the PCI configuration
-space. In this case:
-
-1. If MSI-X capability is disabled: an interrupt is delivered and
- the second highest bit is set in the ISR Status field to
- indicate that the driver should re-examine the configuration
- space. Note that a single interrupt can indicate both that one
- or more virtqueue has been used and that the configuration
- space has changed: even if the config bit is set, virtqueues
- must be scanned.
-
-2. If MSI-X capability is enabled: an interrupt message is
- requested. The Configuration Vector field sets the MSI-X Table
- entry number to use. If Configuration Vector field value is
- NO_VECTOR, no interrupt message is requested for this event.
-
-2.3.2. Virtio Over MMIO
------------------------
-
-Virtual environments without PCI support (a common situation in
-embedded devices models) might use simple memory mapped device
-("virtio-mmio") instead of the PCI device.
-
-The memory mapped virtio device behaviour is based on the PCI
-device specification. Therefore most of operations like device
-initialization, queues configuration and buffer transfers are
-nearly identical. Existing differences are described in the
-following sections.
-
-2.3.2.1. MMIO Device Discovery
-------------------------------
-
-Unlike PCI, MMIO provides no generic device discovery. For
-systems using Flattened Device Trees the suggested format is:
-
- virtio_block@1e000 {
- compatible = "virtio,mmio";
- reg = <0x1e000 0x100>;
- interrupts = <42>;
- }
-
-2.3.2.2. MMIO Device Layout
----------------------------
-
-MMIO virtio devices provides a set of memory mapped control
-registers, all 32 bits wide, followed by device-specific
-configuration space. The following list presents their layout:
-
-* Offset from the device base address | Direction | Name
- Description
-
-* 0x000 | R | MagicValue
- Magic value. Must be 0x74726976 (a Little Endian equivalent
- of a "virt" string).
-
-* 0x004 | R | Version
- Device version number. Devices compliant with this specification
- must return value 0x2.
-
-* 0x008 | R | DeviceID
- Virtio Subsystem Device ID.
- See "2.4. Device Types" for possible values. Value zero (0x0)
- is invalid and devices returning this ID must be ignored
- by the guest.
-
-* 0x00c | R | VendorID
- Virtio Subsystem Vendor ID.
-
-* 0x010 | R | HostFeatures
- Flags representing features the device supports.
- Reading from this register returns 32 consecutive flag bits,
- first bit depending on the last value written to the
- HostFeaturesSel register. Access to this register returns
- bits HostFeaturesSel*32 to (HostFeaturesSel*32)+31, eg.
- feature bits 0 to 31 if HostFeaturesSel is set to 0 and
- features bits 32 to 63 if HostFeaturesSel is set to 1.
- Also see "2.1.2. Feature Bits".
-
-* 0x014 | W | HostFeaturesSel
- Device (Host) features word selection.
- Writing to this register selects a set of 32 device feature bits
- accessible by reading from the HostFeatures register. Device driver
- must write a value to the HostFeaturesSel register before
- reading from the HostFeatures register.
-
-* 0x020 | W | GuestFeatures
- Flags representing device features understood and activated by
- the driver.
- Writing to this register sets 32 consecutive flag bits, first
- bit depending on the last value written to the GuestFeaturesSel
- register. Access to this register sets bits GuestFeaturesSel*32
- to (GuestFeaturesSel*32)+31, eg. feature bits 0 to 31 if
- GuestFeaturesSel is set to 0 and features bits 32 to 63 if
- GuestFeaturesSel is set to 1. Also see "2.1.2. Feature Bits".
-
-* 0x024 | W | GuestFeaturesSel
- Activated (Guest) features word selection.
- Writing to this register selects a set of 32 activated feature
- bits accessible by writing to the GuestFeatures register.
- Device driver must write a value to the GuestFeaturesSel
- register before writing to the GuestFeatures register.
-
-* 0x030 | W | QueueSel
- Virtual queue index (first queue is 0).
- Writing to this register selects the virtual queue that the
- following operations on the QueueNumMax, QueueNum, QueueReady,
- QueueDescLow, QueueDescHigh, QueueAvailLow, QueueAvailHigh,
- QueueUsedLow and QueueUsedHigh registers apply to.
-
-* 0x034 | R | QueueNumMax
- Maximum virtual queue size.
- Reading from the register returns the maximum size of the queue
- the Host is ready to process or zero (0x0) if the queue is not
- available. This applies to the queue selected by writing to
- QueueSel and is allowed only when QueueReady is set to zero
- (0x0), so when the queue is not in use.
-
-* 0x038 | W | QueueNum
- Virtual queue size.
- Queue size is the number of elements in the queue, therefore size
- of the Descriptor Table and both Available and Used rings.
- Writing to this register notifies the Host what size of the
- queue the Guest will use. This applies to the queue selected by
- writing to QueueSel and is allowed only when QueueReady is set
- to zero (0x0), so when the queue is not in use.
-
-* 0x03c | RW | QueueReady
- Virtual queue ready bit.
- Writing one (0x1) to this register notifies the Host that the
- virtual queue is ready to be used. Reading from this register
- returns the last value written to it. Both read and write
- accesses apply to the queue selected by writing to QueueSel.
- When the Guest wants to stop using the queue it must write
- zero (0x0) to this register and read the value back to
- ensure synchronisation.
-
-* 0x050 | W | QueueNotify
- Queue notifier.
- Writing a queue index to this register notifies the Host that
- there are new buffers to process in the queue.
-
-* 0x60 | R | InterruptStatus
- Interrupt status.
- Reading from this register returns a bit mask of interrupts
- asserted by the device. An interrupt is asserted if the
- corresponding bit is set, ie. equals one (1).
-
- – Bit 0 | Used Ring Update
- This interrupt is asserted when the Host has updated the Used
- Ring in at least one of the active virtual queues.
-
- – Bit 1 | Configuration change
- This interrupt is asserted when configuration of the device has
- changed.
-
-* 0x064 | W | InterruptACK
- Interrupt acknowledge.
- Writing to this register notifies the Host that the Guest
- finished handling interrupts. Set bits in the value clear
- the corresponding bits of the InterruptStatus register.
-
-* 0x070 | RW | Status
- Device status.
- Reading from this register returns the current device status
- flags.
- Writing non-zero values to this register sets the status flags,
- indicating the Guest progress. Writing zero (0x0) to this
- register triggers a device reset, including clearing all
- bits in the InterruptStatus register and ready bits in the
- QueueReady register for all queues in the device.
- See also p. "2.3.2.3.1. Device Initialization".
-
-* 0x080 | W | QueueDescLow
- 0x084 | W | QueueDescHigh
- Virtual queue's Descriptor Table 64 bit long physical address.
- Writing to these two registers (lower 32 bits of the address
- to QueueDescLow, higher 32 bits to QueueDescHigh) notifies
- the host about location of the Descriptor Table of the queue
- selected by writing to the QueueSel register. It is allowed
- only when QueueReady is set to zero (0x0), so when the queue
- is not in use.
-
-* 0x090 | W | QueueAvailLow
- 0x094 | W | QueueAvailHigh
- Virtual queue's Available Ring 64 bit long physical address.
- Writing to these two registers (lower 32 bits of the address
- to QueueAvailLow, higher 32 bits to QueueAvailHigh) notifies
- the host about location of the Available Ring of the queue
- selected by writing to the QueueSel register. It is allowed
- only when QueueReady is set to zero (0x0), so when the queue
- is not in use.
-
-* 0x0a0 | W | QueueUsedLow
- 0x0a4 | W | QueueUsedHigh
- Virtual queue's Used Ring 64 bit long physical address.
- Writing to these two registers (lower 32 bits of the address
- to QueueUsedLow, higher 32 bits to QueueUsedHigh) notifies
- the host about location of the Used Ring of the queue
- selected by writing to the QueueSel register. It is allowed
- only when QueueReady is set to zero (0x0), so when the queue
- is not in use.
-
-* 0x0fc | R | ConfigGeneration
- Configuration atomicity value.
- Changes every time the configuration noticeably changes. This
- means the device may only change the value after a configuration
- read operation, but it must change if there is any risk of a
- device seeing an inconsistent configuration state.
-
-* 0x100+ | RW | Config
- Device-specific configuration space starts at an offset 0x100
- and is accessed with byte alignment. Its meaning and size
- depends on the device and the driver.
-
-All register values are organized as Little Endian.
-
-Accessing memory locations not explicitly described above (or
-- in case of the configuration space - described in the device
-specification), writing to the registers described as "R" and
-reading from registers described as "W" is not permitted and
-can cause undefined behavior.
-
-2.3.2.3. MMIO-specific Initialization And Device Operation
-----------------------------------------------------------
-
-2.3.2.3.1. Device Initialization
---------------------------------
-
-The guest must start the device initialization by reading and
-checking values from the MagicValue and the Version registers.
-If both values are valid, it must read the DeviceID register
-and if its value is zero (0x0) must abort initialization and
-must not access any other register.
-
-Further initialization must follow the procedure described in
-p. "2.2.1. Device Initialization".
-
-2.3.2.3.2. Virtqueue Configuration
-----------------------------------
-
-1. Select the queue writing its index (first queue is 0) to the
- QueueSel register.
-
-2. Check if the queue is not already in use: read the QueueReady
- register, returned value should be zero (0x0).
-
-3. Read maximum queue size (number of elements) from the
- QueueNumMax register. If the returned value is zero (0x0) the
- queue is not available.
-
-4. Allocate and zero the queue pages, making sure the memory
- is physically contiguous. It is recommended to align the
- Used Ring to an optimal boundary (usually page size).
- Size of the allocated queue may be smaller than or equal to
- the maximum size returned by the Host.
-
-5. Notify the Host about the queue size by writing the size to
- the QueueNum register.
-
-6. Write physical addresses of the queue's Descriptor Table,
- Available Ring and Used Ring to (respectively) the QueueDescLow/
- QueueDescHigh, QueueAvailLow/QueueAvailHigh and QueueUsedLow/
- QueueUsedHigh register pairs.
-
-7. Write 0x1 to the QueueReady register.
-
-2.3.2.3.3. Notifying The Device
--------------------------------
-
-The device is notified about new buffers available in a queue by
-writing the queue index to the QueueNum register.
-
-2.3.2.3.4. Notifications From The Device
-----------------------------------------
-
-The memory mapped virtio device is using single, dedicated
-interrupt signal, which is raised when at least one of the
-interrupts described in the InterruptStatus register
-description is asserted. After receiving an interrupt, the
-driver must read the InterruptStatus register to check what
-caused the interrupt (see the register description). After the
-interrupt is handled, the driver must acknowledge it by writing
-a bit mask corresponding to the serviced interrupt to the
-InterruptACK register.
-
-As documented in the InterruptStatus register description,
-the device may notify the driver about a new used buffer being
-available in the queue or about a change in the device
-configuration.
-
-2.3.2.4. Legacy interface
--------------------------
-
-The legacy MMIO transport used page-based addressing, resulting
-in a slightly different control register layout, the device
-initialization and the virtual queue configuration procedure.
-
-The following list presents control registers layout, omitting
-descriptions of registers which did not change their function
-nor behaviour:
-
-* Offset from the device base address | Direction | Name
- Description
-
-* 0x000 | R | MagicValue
-
-* 0x004 | R | Version
- Device version number. Legacy devices must return value 0x1.
-
-* 0x008 | R | DeviceID
-
-* 0x00c | R | VendorID
-
-* 0x010 | R | HostFeatures
-
-* 0x014 | W | HostFeaturesSel
-
-* 0x020 | W | GuestFeatures
-
-* 0x024 | W | GuestFeaturesSel
-
-* 0x028 | W | GuestPageSize
- Guest page size.
- Device driver must write the guest page size in bytes to the
- register during initialization, before any queues are used.
- This value must be a power of 2 and is used by the Host to
- calculate the Guest address of the first queue page
- (see QueuePFN).
-
-* 0x030 | W | QueueSel
- Virtual queue index (first queue is 0).
- Writing to this register selects the virtual queue that the
- following operations on the QueueNumMAx, QueueNum, QueueAlign
- and QueuePFN registers apply to.
-
-* 0x034 | R | QueueNumMax
- Maximum virtual queue size.
- Reading from the register returns the maximum size of the queue
- the Host is ready to process or zero (0x0) if the queue is not
- available. This applies to the queue selected by writing to the
- QueueSel and is allowed only when the QueuePFN is set to zero
- (0x0), so when the queue is not actively used.
-
-* 0x038 | W | QueueNum
- Virtual queue size.
- Queue size is the number of elements in the queue, therefore size
- of the descriptor table and both available and used rings.
- Writing to this register notifies the Host what size of the
- queue the Guest will use. This applies to the queue selected by
- writing to the QueueSel register.
-
-* 0x03c | W | QueueAlign
- Used Ring alignment in the virtual queue.
- Writing to this register notifies the Host about alignment
- boundary of the Used Ring in bytes. This value must be a power
- of 2 and applies to the queue selected by writing to the QueueSel
- register.
-
-* 0x040 | RW | QueuePFN
- Guest physical page number of the virtual queue.
- Writing to this register notifies the host about location of the
- virtual queue in the Guest's physical address space. This value
- is the index number of a page starting with the queue
- Descriptor Table. Value zero (0x0) means physical address zero
- (0x00000000) and is illegal. When the Guest stops using the
- queue it must write zero (0x0) to this register.
- Reading from this register returns the currently used page
- number of the queue, therefore a value other than zero (0x0)
- means that the queue is in use.
- Both read and write accesses apply to the queue selected by
- writing to the QueueSel register.
-
-* 0x050 | W | QueueNotify
-
-* 0x060 | R | InterruptStatus
-
-* 0x064 | W | InterruptACK
-
-* 0x070 | RW | Status
- Device status.
- Reading from this register returns the current device status
- flags.
- Writing non-zero values to this register sets the status flags,
- indicating the Guest progress. Writing zero (0x0) to this
- register triggers a device reset. This should include
- setting QueuePFN to zero (0x0) for all queues in the device.
- Also see "2.2.1. Device Initialization".
-
-* 0x100+ | RW | Config
-
-The virtual queue page size is defined by writing to the GuestPageSize
-register, as written by the guest. This must be done before the
-virtual queues are configured.
-
-The virtual queue layout follows
-p. "2.1.4.1. Legacy Interfaces: A Note on Virtqueue Layout",
-with the alignment defined in the QueueAlign register.
-
-The virtual queue is configured as follows:
-
-1. Select the queue writing its index (first queue is 0) to the
- QueueSel register.
-
-2. Check if the queue is not already in use: read the QueuePFN
- register, returned value should be zero (0x0).
-
-3. Read maximum queue size (number of elements) from the
- QueueNumMax register. If the returned value is zero (0x0) the
- queue is not available.
-
-4. Allocate and zero the queue pages in contiguous virtual
- memory, aligning the Used Ring to an optimal boundary (usually
- page size). Size of the allocated queue may be smaller than or
- equal to the maximum size returned by the Host.
-
-5. Notify the Host about the queue size by writing the size to
- the QueueNum register.
-
-6. Notify the Host about the used alignment by writing its value
- in bytes to the QueueAlign register.
-
-7. Write the physical number of the first page of the queue to
- the QueuePFN register.
-
-Notification mechanisms did not change.
-
-2.3.3. Virtio over channel I/O
-------------------------------
-
-S/390 based virtual machines support neither PCI nor MMIO, so a
-different transport is needed there.
-
-virtio-ccw uses the standard channel I/O based mechanism used for
-the majority of devices on S/390. A virtual channel device with a
-special control unit type acts as proxy to the virtio device
-(similar to the way virtio-pci uses a PCI device) and
-configuration and operation of the virtio device is accomplished
-(mostly) via channel commands. This means virtio devices are
-discoverable via standard operating system algorithms, and adding
-virtio support is mainly a question of supporting a new control
-unit type.
-
-As the S/390 is a big endian machine, the data structures transmitted
-via channel commands are big-endian: this is made clear by use of
-the types be16, be32 and be64.
-
-2.3.3.1. Basic Concepts
------------------------
-
-As a proxy device, virtio-ccw uses a channel-attached I/O control
-unit with a special control unit type (0x3832) and a control unit
-model corresponding to the attached virtio device's subsystem
-device ID, accessed via a virtual I/O subchannel and a virtual
-channel path of type 0x32. This proxy device is discoverable via
-normal channel subsystem device discovery (usually a STORE
-SUBCHANNEL loop) and answers to the basic channel commands, most
-importantly SENSE ID.
-
-For a virtio-ccw proxy device, SENSE ID will return the following
-information:
-
-+-------+--------------------------------------------+
-| Bytes | Contents |
-|-------|--------------------------------------------|
-| 0 | reserved | 0xff |
-|-------|-----------------------|--------------------|
-| 1-2 | control unit type | 0x3832 |
-|-------|-----------------------|--------------------|
-| 3 | control unit model | <virtio device id> |
-|-------|-----------------------|--------------------|
-| 4-5 | device type | zeroes (unset) |
-|-------|-----------------------|--------------------|
-| 6 | device model | zeroes (unset) |
-|-------|-----------------------|--------------------|
-| 7-255 | extended SenseId data | zeroes (unset) |
-+-------+--------------------------------------------+
-
-A driver for virtio-ccw devices MUST check for a control unit
-type of 0x3832 and MUST ignore the device type and model.
-
-In addition to the basic channel commands, virtio-ccw defines a
-set of channel commands related to configuration and operation of
-virtio:
-
- #define CCW_CMD_SET_VQ 0x13
- #define CCW_CMD_VDEV_RESET 0x33
- #define CCW_CMD_SET_IND 0x43
- #define CCW_CMD_SET_CONF_IND 0x53
- #define CCW_CMD_SET_IND_ADAPTER 0x73
- #define CCW_CMD_READ_FEAT 0x12
- #define CCW_CMD_WRITE_FEAT 0x11
- #define CCW_CMD_READ_CONF 0x22
- #define CCW_CMD_WRITE_CONF 0x21
- #define CCW_CMD_WRITE_STATUS 0x31
- #define CCW_CMD_READ_VQ_CONF 0x32
- #define CCW_CMD_SET_VIRTIO_REV 0x83
-
-The virtio-ccw device acts like a normal channel device, as specified
-in [S390 PoP] and [S390 Common I/O]. In particular:
-
-- A device must post a unit check with command reject for any command
- it does not support.
-
-- If a driver did not suppress length checks for a channel command,
- the device must present a subchannel status as detailed in the
- architecture when the actual length did not match the expected length.
-
-- If a driver did suppress length checks for a channel command, the
- device must present a check condition if the transmitted data does
- not contain enough data to process the command. If the driver submitted
- a buffer that was too long, the device should accept the command.
- The driver should attempt to provide the correct length even if it
- suppresses length checks.
-
-2.3.3.2. Device Initialization
-------------------------------
-
-virtio-ccw uses several channel commands to set up a device.
-
-2.3.3.2.1. Setting the Virtio Revision
---------------------------------------
-
-CCW_CMD_SET_VIRTIO_REV is issued by the driver to set the revision of
-the virtio-ccw transport it intends to drive the device with. It uses the
-following communication structure:
-
- struct virtio_rev_info {
- __u16 revision;
- __u16 length;
- __u8 data[];
- };
-
-revision contains the desired revision id, length the length of the
-data portion and data revision-dependent additional desired options.
-
-The following values are supported:
-
-+----------+--------+-----------+--------------------------------+
-| revision | length | data | remarks |
-|----------|--------|-----------|--------------------------------|
-| 0 | 0 | <empty> | legacy interface; transitional |
-| | | | devices only |
-|----------|--------|-----------|--------------------------------|
-| 1 | 0 | <empty> | Virtio 1.0 |
-|----------|--------|-----------|--------------------------------|
-| 2-n | | | reserved for later revisions |
-+----------+--------+-----------+--------------------------------+
-
-Note that a change in the virtio standard does not neccessarily
-correspond to a change in the virtio-ccw revision.
-
-A device must post a unit check with command reject for any revision
-it does not support. For any invalid combination of revision, length
-and data, it must post a unit check with command reject as well. A
-non-transitional device must reject revision id 0.
-
-A driver should start with trying to set the highest revision it
-supports and continue with lower revisions if it gets a command reject.
-
-A driver must not issue any other virtio-ccw specific channel commands
-prior to setting the revision.
-
-A device must answer with command reject to any virtio-ccw specific
-channel command that is not contained in the revision selected by the
-driver.
-
-After a revision has been successfully selected by the driver, it
-must not attempt to select a different revision. A device must answer
-to any such attempt with a command reject.
-
-A device must treat the revision as unset from the time the associated
-subchannel has been enabled until a revision has been successfully set
-by the driver. This implies that revisions are not persistent across
-disabling and enabling of the associated subchannel.
-
-2.3.3.2.1.1. Legacy Interfaces: A Note on Setting the Virtio Revision
----------------------------------------------------------------------
-
-A legacy device will not support the CCW_CMD_SET_VIRTIO_REV and answer
-with a command reject. A non-transitional driver must stop trying to
-operate this device in that case. A transitional driver must operate
-the device as if it had been able to set revision 0.
-
-A legacy driver will not issue the CCW_CMD_SET_VIRTIO_REV prior to
-issueing other virtio-ccw specific channel commands. A non-transitional
-device therefore must answer any such attempts with a command reject.
-A transitional device must assume in this case that the driver is a
-legacy driver and continue as if the driver selected revision 0. This
-implies that the device must reject any command not valid for revision
-0, including a subsequent CCW_CMD_SET_VIRTIO_REV.
-
-2.3.3.2.2. Configuring a Virtqueue
-----------------------------------
-
-CCW_CMD_READ_VQ_CONF is issued by the guest to obtain information
-about a queue. It uses the following structure for communicating:
-
- struct vq_config_block {
- be16 index;
- be16 max_num;
- } __attribute__ ((packed));
-
-The requested number of buffers for queue index is returned in
-max_num.
-
-Afterwards, CCW_CMD_SET_VQ is issued by the guest to inform the
-host about the location used for its queue. The transmitted
-structure is
-
- struct vq_info_block {
- be64 desc;
- be32 res0;
- be16 index;
- be16 num;
- be64 avail;
- be64 used;
- } __attribute__ ((packed));
-
-desc, avail and used contain the guest addresses for the descriptor table,
-available ring and used ring for queue index, respectively. The actual
-virtqueue size (number of allocated buffers) is transmitted in num.
-res0 is reserved and must be ignored by the device.
-
-2.3.3.2.2.1. Legacy Interface: A Note on Configuring a Virtqueue
-----------------------------------------------------------------
-
-For a legacy driver or for a driver that selected revision 0,
-CCW_CMD_SET_VQ uses the following communication block:
-
- struct vq_info_block_legacy {
- be64 queue;
- be32 align;
- be16 index;
- be16 num;
- } __attribute__ ((packed));
-
-queue contains the guest address for queue index, num the number of buffers
-and align the alignment.
-
-100.3.3.2.2. Virtqueue Layout
-------------------------------
-
-The virtqueue is physically contiguous, with padded added to make the
-used ring meet the align value:
-
-+-------------------+-----------------------------------+-----------+
-| Descriptor Table | Available Ring (padding) | Used Ring |
-+-------------------+-----------------------------------+-----------+
-
-The calculation for total size is as follows:
-
- #define ALIGN(x) (((x) + align) & ~align)
- static inline unsigned vring_size(unsigned int num)
- {
- return ALIGN(sizeof(struct vring_desc)*num
- + sizeof(u16)*(3 + num))
- + ALIGN(sizeof(u16)*3 + sizeof(struct vring_used_elem)*num);
- }
-
-2.3.3.2.3. Communicating Status Information
--------------------------------------------
-
-The guest can change the status of a device via the
-CCW_CMD_WRITE_STATUS command, which transmits an 8 bit status
-value.
-
-2.3.3.2.4. Handling Device Features
------------------------------------
-
-Feature bits are arranged in an array of 32 bit values, making
-for a total of 8192 feature bits. Feature bits are in
-little-endian byte order.
-
-The CCW commands dealing with features use the following
-communication block:
-
- struct virtio_feature_desc {
- be32 features;
- u8 index;
- } __attribute__ ((packed));
-
-features are the 32 bits of features currently accessed, while
-index describes which of the feature bit values is to be
-accessed.
-
-The guest may obtain the host's device feature set via the
-CCW_CMD_READ_FEAT command. The host stores the features at index
-to features.
-
-For communicating its device features to the host, the guest may
-use the CCW_CMD_WRITE_FEAT command, denoting a features/index
-combination.
-
-2.3.3.2.5. Device Configuration
--------------------------------
-
-The device's configuration space is located in host memory. It is
-the same size as the standard PCI configuration space.
-
-To obtain information from the configuration space, the guest may
-use CCW_CMD_READ_CONF, specifying the guest memory for the host
-to write to.
-
-For changing configuration information, the guest may use
-CCW_CMD_WRITE_CONF, specifying the guest memory for the host to
-read from.
-
-In both cases, the complete configuration space is transmitted. This
-allows the guest to compare the new configuration space with the old
-version, and keep a generation count internally whenever it changes.
-
-2.3.3.2.6. Setting Up Indicators
---------------------------------
-
-In order to set up the indicator bits for host->guest notification,
-the driver uses different channel commands depending on whether it
-wishes to use traditional I/O interrupts tied to a subchannel or
-adapter I/O interrupts for virtqueue notifications. For any given
-device, the two mechanisms are mutually exclusive.
-
-For the configuration change indicators, only a mechanism using
-traditional I/O interrupts is provided, regardless of whether
-traditional or adapter I/O interrupts are used for virtqueue
-notifications.
-
-2.3.3.2.6.1. Setting Up Classic Queue Indicators
-------------------------------------------------
-
-Indicators for notification via classic I/O interrupts are contained
-in a 64 bit value per virtio-ccw proxy device.
-
-To communicate the location of the indicator bits for host->guest
-notification, the guest uses the CCW_CMD_SET_IND command,
-pointing to a location containing the guest address of the
-indicators in a 64 bit value.
-
-If the driver has already set up two-staged queue indicators via the
-CCW_CMD_SET_IND_ADAPTER command, the device MUST post a unit check
-with command reject to any subsequent CCW_CMD_SET_IND command.
-
-2.3.3.2.6.2. Setting Up Configuration Change Indicators
--------------------------------------------------------
-
-Indicators for configuration change host->guest notification are
-contained in a 64 bit value per virtio-ccw proxy device.
-
-To communicate the location of the indicator bits used in the
-configuration change host->guest notification, the driver issues the
-CCW_CMD_SET_CONF_IND command, pointing to a location containing the
-guest address of the indicators in a 64 bit value.
-
-2.3.3.2.6.3. Setting Up Two-Stage Queue Indicators
---------------------------------------------------
-
-Indicators for notification via adapter I/O interrupts consist of
-two stages:
-- a summary indicator byte covering the virtqueues for one or more
- virtio-ccw proxy devices
-- a set of contigous indicator bits for the virtqueues for a
- virtio-ccw proxy device
-
-To communicate the location of the summary and queue indicator bits,
-the driver uses the CCW_CMD_SET_IND_ADAPTER command with the following
-payload:
-
- struct virtio_thinint_area {
- be64 summary_indicator;
- be64 indicator;
- be64 bit_nr;
- u8 isc;
- } __attribute__ ((packed));
-
-summary_indicator contains the guest address of the 8 bit summary
-indicator.
-indicator contains the guest address of an area wherin the indicators
-for the devices are contained, starting at bit_nr, one bit per
-virtqueue of the device. Bit numbers start at the left.
-isc contains the I/O interruption subclass to be used for the adapter
-I/O interrupt. It may be different from the isc used by the proxy
-virtio-ccw device's subchannel.
-
-If the driver has already set up classic queue indicators via the
-CCW_CMD_SET_IND command, the device MUST post a unit check with
-command reject to any subsequent CCW_CMD_SET_IND_ADAPTER command.
-
-2.3.3.2.6.4. Legacy Interfaces: A Note on Setting Up Indicators
----------------------------------------------------------------
-
-Legacy devices will only support classic queue indicators; they will
-reject CCW_CMD_SET_IND_ADAPTER as they don't know that command.
-
-2.3.3.3. Device Operation
--------------------------
-
-2.3.3.3.1. Host->Guest Notification
------------------------------------
-
-There are two modes of operation regarding host->guest notifcation,
-classic I/O interrupts and adapter I/O interrupts. The mode to be
-used is determined by the driver by using CCW_CMD_SET_IND respectively
-CCW_CMD_SET_IND_ADAPTER to set up queue indicators.
-
-For configuration changes, the driver will always use classic I/O
-interrupts.
-
-2.3.3.3.1.1. Notification via Classic I/O Interrupts
-----------------------------------------------------
-
-If the driver used the CCW_CMD_SET_IND command to set up queue
-indicators, the device will use classic I/O interrupts for
-host->guest notification about virtqueue activity.
-
-For notifying the guest of virtqueue buffers, the host sets the
-corresponding bit in the guest-provided indicators. If an
-interrupt is not already pending for the subchannel, the host
-generates an unsolicited I/O interrupt.
-
-If the host wants to notify the guest about configuration
-changes, it sets bit 0 in the configuration indicators and
-generates an unsolicited I/O interrupt, if needed. This also
-applies if adapter I/O interrupts are used for queue notifications.
-
-2.3.3.3.1.2. Notification via Adapter I/O Interrupts
-----------------------------------------------------
-
-If the driver used the CCW_CMD_SET_IND_ADAPTER command to set up
-queue indicators, the device will use adapter I/O interrupts for
-host->guest notification about virtqueue activity.
-
-For notifying the guest of virtqueue buffers, the host sets the
-bit in the guest-provided indicator area at the corresponding offset.
-The guest-provided summary indicator is also set. An adapter I/O
-interrupt for the corresponding interruption subclass is generated.
-The device SHOULD only generate an adapter I/O interrupt if the
-summary indicator had not been set prior to notification. The driver
-MUST clear the summary indicator after receiving an adapter I/O
-interrupt before it processes the queue indicators.
-
-2.3.3.3.1.3. Legacy Interfaces: A Note on Host->Guest Notification
-------------------------------------------------------------------
-
-As legacy devices and drivers support only classic queue indicators,
-host->guest notification will always be done via classic I/O interrupts.
-
-2.3.3.3.2. Guest->Host Notification
------------------------------------
-
-For notifying the host of virtqueue buffers, the guest
-unfortunately can't use a channel command (the asynchronous
-characteristics of channel I/O interact badly with the host block
-I/O backend). Instead, it uses a diagnose 0x500 call with subcode
-3 specifying the queue, as follows:
-
-+------+-------------------+--------------+
-| GPR | Input Value | Output Value |
-+------+-------------------+--------------+
-+------+-------------------+--------------+
-| 1 | 0x3 | |
-+------+-------------------+--------------+
-| 2 | Subchannel ID | Host Cookie |
-+------+-------------------+--------------+
-| 3 | Virtqueue number | |
-+------+-------------------+--------------+
-| 4 | Host Cookie | |
-+------+-------------------+--------------+
-
-Host cookie is an optional per-virtqueue 64 bit value that can be
-used by the hypervisor to speed up the notification execution.
-For each notification, the output value is returned in GPR2 and
-should be passed in GPR4 for the next notification:
-
- info->cookie = do_notify(schid,
- virtqueue_get_queue_index(vq),
- info->cookie);
-
-2.3.3.3.3. Early printk for Virtio Consoles
--------------------------------------------
-
-For the early printk mechanism, diagnose 0x500 with subcode 0 is
-used.
-
-2.3.3.3.4. Resetting Devices
-----------------------------
-
-In order to reset a device, a guest may send the
-CCW_CMD_VDEV_RESET command.
-
-
-2.4. Device Types
-================
-
-On top of the queues, config space and feature negotiation facilities
-built into virtio, several specific devices are defined.
-
-The following device IDs are used to identify different types of virtio
-devices. Some device IDs are reserved for devices which are not currently
-defined in this standard.
-
-Discovering what devices are available and their type is bus-dependent.
-
-+------------+--------------------+
-| Device ID | Virtio Device |
-+------------+--------------------+
-+------------+--------------------+
-| 0 | reserved (invalid) |
-+------------+--------------------+
-| 1 | network card |
-+------------+--------------------+
-| 2 | block device |
-+------------+--------------------+
-| 3 | console |
-+------------+--------------------+
-| 4 | entropy source |
-+------------+--------------------+
-| 5 | memory ballooning |
-+------------+--------------------+
-| 6 | ioMemory |
-+------------+--------------------+
-| 7 | rpmsg |
-+------------+--------------------+
-| 8 | SCSI host |
-+------------+--------------------+
-| 9 | 9P transport |
-+------------+--------------------+
-| 10 | mac80211 wlan |
-+------------+--------------------+
-| 11 | rproc serial |
-+------------+--------------------+
-| 12 | virtio CAIF |
-+------------+--------------------+
-
-2.4.1. Network Device
-====================
-
-The virtio network device is a virtual ethernet card, and is the
-most complex of the devices supported so far by virtio. It has
-enhanced rapidly and demonstrates clearly how support for new
-features should be added to an existing device. Empty buffers are
-placed in one virtqueue for receiving packets, and outgoing
-packets are enqueued into another for transmission in that order.
-A third command queue is used to control advanced filtering
-features.
-
-2.4.1.1. Device ID
------------------
-
- 1
-
-2.4.1.2. Virtqueues
-------------------
-
- 0:receiveq. 1:transmitq. 2:controlq
-
- Virtqueue 2 only exists if VIRTIO_NET_F_CTRL_VQ set.
-
-2.4.1.3. Feature bits
---------------------
-
- VIRTIO_NET_F_CSUM (0) Device handles packets with partial checksum
-
- VIRTIO_NET_F_GUEST_CSUM (1) Guest handles packets with partial checksum
-
- VIRTIO_NET_F_CTRL_GUEST_OFFLOADS (2) Control channel offloads
- reconfiguration support.
-
- VIRTIO_NET_F_MAC (5) Device has given MAC address.
-
- VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4.
-
- VIRTIO_NET_F_GUEST_TSO6 (8) Guest can receive TSOv6.
-
- VIRTIO_NET_F_GUEST_ECN (9) Guest can receive TSO with ECN.
-
- VIRTIO_NET_F_GUEST_UFO (10) Guest can receive UFO.
-
- VIRTIO_NET_F_HOST_TSO4 (11) Device can receive TSOv4.
-
- VIRTIO_NET_F_HOST_TSO6 (12) Device can receive TSOv6.
-
- VIRTIO_NET_F_HOST_ECN (13) Device can receive TSO with ECN.
-
- VIRTIO_NET_F_HOST_UFO (14) Device can receive UFO.
-
- VIRTIO_NET_F_MRG_RXBUF (15) Guest can merge receive buffers.
-
- VIRTIO_NET_F_STATUS (16) Configuration status field is
- available.
-
- VIRTIO_NET_F_CTRL_VQ (17) Control channel is available.
-
- VIRTIO_NET_F_CTRL_RX (18) Control channel RX mode support.
-
- VIRTIO_NET_F_CTRL_VLAN (19) Control channel VLAN filtering.
-
- VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous
- packets.
-
-2.4.1.3.1. Legacy Interface: Feature bits
---------------------
-VIRTIO_NET_F_GSO (6) Device handles packets with any GSO type.
-
-This was supposed to indicate segmentation offload support, but
-upon further investigation it became clear that multiple bits
-were required.
-
-100.4.1.4. Device configuration layout
----------------------
-
-Two configuration fields are currently defined. The mac address field
-always exists (though is only valid if VIRTIO_NET_F_MAC is set), and
-the status field only exists if VIRTIO_NET_F_STATUS is set. Two
-read-only bits are currently defined for the status field:
-VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
-
- #define VIRTIO_NET_S_LINK_UP 1
- #define VIRTIO_NET_S_ANNOUNCE 2
-
- struct virtio_net_config {
- u8 mac[6];
- le16 status;
- };
-
-100.4.1.4.1. Legacy Interface: Device configuration layout
---------------------
-For legacy devices, the status field in struct virtio_net_config is the
-native endian of the guest rather than (necessarily) little-endian.
-
-
-2.4.1.4. Device Initialization
------------------------------
-
-1. The initialization routine should identify the receive and
- transmission virtqueues.
-
-2. If the VIRTIO_NET_F_MAC feature bit is set, the configuration
- space “mac” entry indicates the “physical” address of the the
- network card, otherwise a private MAC address should be
- assigned. All guests are expected to negotiate this feature if
- it is set.
-
-3. If the VIRTIO_NET_F_CTRL_VQ feature bit is negotiated,
- identify the control virtqueue.
-
-4. If the VIRTIO_NET_F_STATUS feature bit is negotiated, the link
- status can be read from the bottom bit of the “status” config
- field. Otherwise, the link should be assumed active.
-
-5. The receive virtqueue should be filled with receive buffers.
- This is described in detail below in “Setting Up Receive
- Buffers”.
-
-6. A driver can indicate that it will generate checksumless
- packets by negotating the VIRTIO_NET_F_CSUM feature. This “
- checksum offload” is a common feature on modern network cards.
-
-7. If that feature is negotiated[13], a driver can use TCP or UDP
- segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4
- TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
- (UDP fragmentation) features. It should not send TCP packets
- requiring segmentation offload which have the Explicit Congestion
- Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is
- negotiated.[14]
-
-8. The converse features are also available: a driver can save
- the virtual device some work by negotiating these features.[15]
- The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
- checksummed packets can be received, and if it can do that then
- the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
- VIRTIO_NET_F_GUEST_UFO and VIRTIO_NET_F_GUEST_ECN are the input
- equivalents of the features described above.
- See "2.4.1.5.2. Setting Up Receive Buffers" and "2.4.1.5.2.1. Packet Receive Interrupt" below.
-
-2.4.1.5. Device Operation
-------------------------
-
-Packets are transmitted by placing them in the transmitq, and
-buffers for incoming packets are placed in the receiveq. In each
-case, the packet itself is preceeded by a header:
-
- struct virtio_net_hdr {
- #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1
- u8 flags;
- #define VIRTIO_NET_HDR_GSO_NONE 0
- #define VIRTIO_NET_HDR_GSO_TCPV4 1
- #define VIRTIO_NET_HDR_GSO_UDP 3
- #define VIRTIO_NET_HDR_GSO_TCPV6 4
- #define VIRTIO_NET_HDR_GSO_ECN 0x80
- u8 gso_type;
- le16 hdr_len;
- le16 gso_size;
- le16 csum_start;
- le16 csum_offset;
- /* Only if VIRTIO_NET_F_MRG_RXBUF: */
- le16 num_buffers;
- };
-
-The controlq is used to control device features such as
-filtering.
-
-100.4.1.5.1. Legacy Interface: Device Operation
-------------------------
-For legacy devices, the fields in struct virtio_net_hdr are the
-native endian of the guest rather than (necessarily) little-endian.
-
-2.4.1.5.1. Packet Transmission
------------------------------
-
-Transmitting a single packet is simple, but varies depending on
-the different features the driver negotiated.
-
-1. If the driver negotiated VIRTIO_NET_F_CSUM, and the packet has
- not been fully checksummed, then the virtio_net_hdr's fields
- are set as follows. Otherwise, the packet must be fully
- checksummed, and flags is zero.
-
- • flags has the VIRTIO_NET_HDR_F_NEEDS_CSUM set,
-
- • csum_start is set to the offset within the packet to begin checksumming,
- and
-
- • csum_offset indicates how many bytes after the csum_start the
- new (16 bit ones' complement) checksum should be placed.[16]
-
-2. If the driver negotiated
- VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires
- TCP segmentation or UDP fragmentation, then the “gso_type”
- field is set to VIRTIO_NET_HDR_GSO_TCPV4, TCPV6 or UDP.
- (Otherwise, it is set to VIRTIO_NET_HDR_GSO_NONE). In this
- case, packets larger than 1514 bytes can be transmitted: the
- metadata indicates how to replicate the packet header to cut it
- into smaller packets. The other gso fields are set:
-
- • hdr_len is a hint to the device as to how much of the header
- needs to be kept to copy into each packet, usually set to the
- length of the headers, including the transport header.[17]
-
- • gso_size is the maximum size of each packet beyond that
- header (ie. MSS).
-
- • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature,
- the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as
- well, indicating that the TCP packet has the ECN bit set.[18]
-
-3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
- the num_buffers field is set to zero.
-
-4. The header and packet are added as one output buffer to the
- transmitq, and the device is notified of the new entry
- (see "2.4.1.4. Notifying The Device").[19]
-
-2.4.1.5.1.1. Packet Transmission Interrupt
------------------------------------------
-
-Often a driver will suppress transmission interrupts using the
-VRING_AVAIL_F_NO_INTERRUPT flag
- (see "2.4.2. Receiving Used Buffers From The Device")
-and check for used packets in the transmit path of following
-packets.
-
-The normal behavior in this interrupt handler is to retrieve and
-new descriptors from the used ring and free the corresponding
-headers and packets.
-
-2.4.1.5.2. Setting Up Receive Buffers
------------------------------------------
-
-It is generally a good idea to keep the receive virtqueue as
-fully populated as possible: if it runs out, network performance
-will suffer.
-
-If the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or
-VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to
-accept packets of up to 65550 bytes long (the maximum size of a
-TCP or UDP packet, plus the 14 byte ethernet header), otherwise
-1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every
-buffer in the receive queue needs to be at least this length [20]
-
-If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at
-least the size of the struct virtio_net_hdr.
-
-2.4.1.5.2.1. Packet Receive Interrupt
-------------------------------------
-
-When a packet is copied into a buffer in the receiveq, the
-optimal path is to disable further interrupts for the receiveq
-(see 2.2.2.2. Receiving Used Buffers From The Device) and process
-packets until no more are found, then re-enable them.
-
-Processing packet involves:
-
-1. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
- then the “num_buffers” field indicates how many descriptors
- this packet is spread over (including this one). This allows
- receipt of large packets without having to allocate large
- buffers. In this case, there will be at least “num_buffers” in
- the used ring, and they should be chained together to form a
- single packet. The other buffers will not begin with a struct
- virtio_net_hdr.
-
-2. If the VIRTIO_NET_F_MRG_RXBUF feature was not negotiated, or
- the “num_buffers” field is one, then the entire packet will be
- contained within this buffer, immediately following the struct
- virtio_net_hdr.
-
-3. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
- VIRTIO_NET_HDR_F_NEEDS_CSUM bit in the “flags” field may be
- set: if so, the checksum on the packet is incomplete and the “
- csum_start” and “csum_offset” fields indicate how to calculate
- it (see Packet Transmission point 1).
-
-4. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
- negotiated, then the “gso_type” may be something other than
- VIRTIO_NET_HDR_GSO_NONE, and the “gso_size” field indicates the
- desired MSS (see Packet Transmission point 2).
-
-2.4.1.5.3. Control Virtqueue
----------------------------
-
-The driver uses the control virtqueue (if VIRTIO_NET_F_VTRL_VQ is
-negotiated) to send commands to manipulate various features of
-the device which would not easily map into the configuration
-space.
-
-All commands are of the following form:
-
- struct virtio_net_ctrl {
- u8 class;
- u8 command;
- u8 command-specific-data[];
- u8 ack;
- };
-
- /* ack values */
- #define VIRTIO_NET_OK 0
- #define VIRTIO_NET_ERR 1
-
-The class, command and command-specific-data are set by the
-driver, and the device sets the ack byte. There is little it can
-do except issue a diagnostic if the ack byte is not
-VIRTIO_NET_OK.
-
-2.4.1.5.3.1. Packet Receive Filtering
-------------------------------------
-
-If the VIRTIO_NET_F_CTRL_RX feature is negotiated, the driver can
-send control commands for promiscuous mode, multicast receiving,
-and filtering of MAC addresses.
-
-Note that in general, these commands are best-effort: unwanted
-packets may still arrive.
-
-Setting Promiscuous Mode
-
- #define VIRTIO_NET_CTRL_RX 0
- #define VIRTIO_NET_CTRL_RX_PROMISC 0
- #define VIRTIO_NET_CTRL_RX_ALLMULTI 1
-
-The class VIRTIO_NET_CTRL_RX has two commands:
-VIRTIO_NET_CTRL_RX_PROMISC turns promiscuous mode on and off, and
-VIRTIO_NET_CTRL_RX_ALLMULTI turns all-multicast receive on and
-off. The command-specific-data is one byte containing 0 (off) or
-1 (on).
-
-2.4.1.5.3.2. Setting MAC Address Filtering
------------------------------------------
-
- struct virtio_net_ctrl_mac {
- le32 entries;
- u8 macs[entries][ETH_ALEN];
- };
-
- #define VIRTIO_NET_CTRL_MAC 1
- #define VIRTIO_NET_CTRL_MAC_TABLE_SET 0
-
-The device can filter incoming packets by any number of destination
-MAC addresses.[21] This table is set using the class
-VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The
-command-specific-data is two variable length tables of 6-byte MAC
-addresses. The first table contains unicast addresses, and the second
-contains multicast addresses.
-
-2.4.1.5.3.2.1. Legacy Interface: Setting MAC Address Filtering
------------------------------------------
-For legacy devices, the entries field in struct virtio_net_ctrl_mac is the
-native endian of the guest rather than (necessarily) little-endian.
-
-2.4.1.5.3.3. VLAN Filtering
---------------------------
-
-If the driver negotiates the VIRTION_NET_F_CTRL_VLAN feature, it
-can control a VLAN filter table in the device.
-
- #define VIRTIO_NET_CTRL_VLAN 2
- #define VIRTIO_NET_CTRL_VLAN_ADD 0
- #define VIRTIO_NET_CTRL_VLAN_DEL 1
-
-Both the VIRTIO_NET_CTRL_VLAN_ADD and VIRTIO_NET_CTRL_VLAN_DEL
-command take a little-endian 16-bit VLAN id as the command-specific-data.
-
-2.4.1.5.3.3.1. Legacy Interface: VLAN Filtering
------------------------------------------
-For legacy devices, the VLAN id is in the
-native endian of the guest rather than (necessarily) little-endian.
-
-2.4.1.5.3.4. Gratuitous Packet Sending
--------------------------------------
-
-If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE (depends
-on VIRTIO_NET_F_CTRL_VQ), it can ask the guest to send gratuitous
-packets; this is usually done after the guest has been physically
-migrated, and needs to announce its presence on the new network
-links. (As hypervisor does not have the knowledge of guest
-network configuration (eg. tagged vlan) it is simplest to prod
-the guest in this way).
-
- #define VIRTIO_NET_CTRL_ANNOUNCE 3
- #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
-
-The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status
-field when it notices the changes of device configuration. The
-command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
-driver has recevied the notification and device would clear the
-VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
-this command.
-
-Processing this notification involves:
-
-1. Sending the gratuitous packets or marking there are pending
- gratuitous packets to be sent and letting deferred routine to
- send them.
-
-2. Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control
- vq.
-
-2.4.1.5.3.5. Offloads State Configuration
--------------------------------------
-
-If the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS feature is negotiated, the driver can
-send control commands for dynamic offloads state configuration.
-
-2.4.1.5.4.3.1. Setting Offloads State
--------------------------------------
-
- le64 offloads;
-
- #define VIRTIO_NET_F_GUEST_CSUM 1
- #define VIRTIO_NET_F_GUEST_TSO4 7
- #define VIRTIO_NET_F_GUEST_TSO6 8
- #define VIRTIO_NET_F_GUEST_ECN 9
- #define VIRTIO_NET_F_GUEST_UFO 10
-
- #define VIRTIO_NET_CTRL_GUEST_OFFLOADS 5
- #define VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET 0
-
-The class VIRTIO_NET_CTRL_GUEST_OFFLOADS has one command:
-VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET applies the new offloads configuration.
-
-le64 value passed as command data is a bitmask, bits set define
-offloads to be enabled, bits cleared - offloads to be disabled.
-
-There is a corresponding device feature for each offload. Upon feature
-negotiation corresponding offload gets enabled to preserve backward
-compartibility.
-
-Corresponding feature must be negotiated at startup in order to allow dynamic
-change of specific offload state.
-
-
-2.4.1.5.4.3.1.1. Legacy Interface: Setting Offloads State
--------------------------------------
-For legacy devices, the offloads field is the
-native endian of the guest rather than (necessarily) little-endian.
-
-
-2.4.2. Block Device
-==================
-
-The virtio block device is a simple virtual block device (ie.
-disk). Read and write requests (and other exotic requests) are
-placed in the queue, and serviced (probably out of order) by the
-device except where noted.
-
-2.4.2.1. Device ID
------------------
- 2
-
-2.4.2.2. Virtqueues
-------------------
- 0:requestq
-
-2.4.2.3. Feature bits
---------------------
-
- VIRTIO_BLK_F_SIZE_MAX (1) Maximum size of any single segment is
- in “size_max”.
-
- VIRTIO_BLK_F_SEG_MAX (2) Maximum number of segments in a
- request is in “seg_max”.
-
- VIRTIO_BLK_F_GEOMETRY (4) Disk-style geometry specified in “
- geometry”.
-
- VIRTIO_BLK_F_RO (5) Device is read-only.
-
- VIRTIO_BLK_F_BLK_SIZE (6) Block size of disk is in “blk_size”.
-
- VIRTIO_BLK_F_TOPOLOGY (10) Device exports information on optimal I/O
- alignment.
-
-2.4.2.3.1. Legacy Interface: Feature bits
---------------------
- VIRTIO_BLK_F_BARRIER (0) Host supports request barriers.
-
- VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands.
-
- VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
-
- VIRTIO_BLK_F_CONFIG_WCE (11) Device can toggle its cache between writeback
- and writethrough modes.
-
-VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE: Legacy drivers
-should only negotiate this feature if they are capable of sending
-VIRTIO_BLK_T_FLUSH commands.
-
-100.2.4.2.5. Device configuration layout
---------------------
-
-The capacity of the device (expressed in 512-byte sectors) is always
-present. The availability of the others all depend on various feature
-bits as indicated above.
-
- struct virtio_blk_config {
- le64 capacity;
- le32 size_max;
- le32 seg_max;
- struct virtio_blk_geometry {
- le16 cylinders;
- u8 heads;
- u8 sectors;
- } geometry;
- le32 blk_size;
- struct virtio_blk_topology {
- // # of logical blocks per physical block (log2)
- u8 physical_block_exp;
- // offset of first aligned logical block
- u8 alignment_offset;
- // suggested minimum I/O size in blocks
- le16 min_io_size;
- // optimal (suggested maximum) I/O size in blocks
- le32 opt_io_size;
- } topology;
- u8 reserved;
- };
-
-
-100.2.4.2.5.1. Legacy Interface: Device configuration layout
---------------------
-For legacy devices, the fields in struct virtio_blk_config are the
-native endian of the guest rather than (necessarily) little-endian.
-
-
-2.4.2.4. Device Initialization
------------------------------
-
-1. The device size should be read from the “capacity”
- configuration field. No requests should be submitted which goes
- beyond this limit.
-
-2. If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, the
- blk_size field can be read to determine the optimal sector size
- for the driver to use. This does not affect the units used in
- the protocol (always 512 bytes), but awareness of the correct
- value can affect performance.
-
-3. If the VIRTIO_BLK_F_RO feature is set by the device, any write
- requests will fail.
-
-4. If the VIRTIO_BLK_F_TOPOLOGY feature is negotiated, the fields in the
- topology struct can be read to determine the physical block size and optimal
- I/O lengths for the driver to use. This also does not affect the units
- in the protocol, only performance.
-
-2.4.2.4.1. Legacy Interface: Device Initialization
------------------------------
-
-The reserved field used to be called writeback. If the
-VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode should be
-read from the writeback field of the configuration if available; the
-driver can also write to the field in order to toggle the cache
-between writethrough (0) and writeback (1) mode. If the feature is
-not available, the driver can instead look at the result of
-negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode
-after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated.
-
-Some older legacy devices did not operate in writethrough mode even
-after a guest announced lack of support for VIRTIO_BLK_F_FLUSH.
-
-2.4.2.5. Device Operation
-------------------------
-
-The driver queues requests to the virtqueue, and they are used by
-the device (not necessarily in order). Each request is of form:
-
- struct virtio_blk_req {
- le32 type;
- le32 reserved;
- le64 sector;
- char data[][512];
- u8 status;
- };
-
-The type of the request is either a read (VIRTIO_BLK_T_IN), a write
-(VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH or
-VIRTIO_BLK_T_FLUSH_OUT[23]).
-
- #define VIRTIO_BLK_T_IN 0
- #define VIRTIO_BLK_T_OUT 1
- #define VIRTIO_BLK_T_FLUSH 4
- #define VIRTIO_BLK_T_FLUSH_OUT 5
-
-The sector number indicates the offset (multiplied by 512) where
-the read or write is to occur. This field is unused and set to 0
-for scsi packet commands and for flush commands.
-
-The final status byte is written by the device: either
-VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for host or guest
-error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host:
-
- #define VIRTIO_BLK_S_OK 0
- #define VIRTIO_BLK_S_IOERR 1
- #define VIRTIO_BLK_S_UNSUPP 2
-
-Any writes completed before the submission of the flush command should
-be committed to non-volatile storage by the device.
-
-2.4.2.5.1. Legacy Interface: Device Operation
-------------------------
-For legacy devices, the fields in struct virtio_blk_req are the
-native endian of the guest rather than (necessarily) little-endian.
-
-The 'reserved' field was previously called ioprio. The ioprio field
-is a hint about the relative priorities of requests to the device:
-higher numbers indicate more important requests.
-
- #define VIRTIO_BLK_T_BARRIER 0x80000000
-
-If the device has VIRTIO_BLK_F_BARRIER
-feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this
-request acts as a barrier and that all preceeding requests must be
-complete before this one, and all following requests must not be
-started until this is complete. Note that a barrier does not flush
-caches in the underlying backend device in host, and thus does not
-serve as data consistency guarantee. Driver must use FLUSH request to
-flush the host cache.
-
-If the device has VIRTIO_BLK_F_SCSI feature, it can also support
-scsi packet command requests, each of these requests is of form:
-
- /* All fields are in guest's native endian. */
- struct virtio_scsi_pc_req {
- u32 type;
- u32 ioprio;
- u64 sector;
- char cmd[];
- char data[][512];
- #define SCSI_SENSE_BUFFERSIZE 96
- u8 sense[SCSI_SENSE_BUFFERSIZE];
- u32 errors;
- u32 data_len;
- u32 sense_len;
- u32 residual;
- u8 status;
- };
-
-A request type can also be a scsi packet command (VIRTIO_BLK_T_SCSI_CMD or
-VIRTIO_BLK_T_SCSI_CMD_OUT). The two types are equivalent, the device
-does not distinguish between them:
-
- #define VIRTIO_BLK_T_SCSI_CMD 2
- #define VIRTIO_BLK_T_SCSI_CMD_OUT 3
-
-The cmd field is only present for scsi packet command requests,
-and indicates the command to perform. This field must reside in a
-single, separate read-only buffer; command length can be derived
-from the length of this buffer.
-
-Note that these first three (four for scsi packet commands)
-fields are always read-only: the data field is either read-only
-or write-only, depending on the request. The size of the read or
-write can be derived from the total size of the request buffers.
-
-The sense field is only present for scsi packet command requests,
-and indicates the buffer for scsi sense data.
-
-The data_len field is only present for scsi packet command
-requests, this field is deprecated, and should be ignored by the
-driver. Historically, devices copied data length there.
-
-The sense_len field is only present for scsi packet command
-requests and indicates the number of bytes actually written to
-the sense buffer.
-
-The residual field is only present for scsi packet command
-requests and indicates the residual size, calculated as data
-length - number of bytes actually transferred.
-
-Historically, devices assumed that the fields type, ioprio and
-sector reside in a single, separate read-only buffer; the fields
-errors, data_len, sense_len and residual reside in a single,
-separate write-only buffer; the sense field in a separate
-write-only buffer of size 96 bytes, by itself; the fields errors,
-data_len, sense_len and residual in a single write-only buffer;
-and the status field is a separate read-only buffer of size 1
-byte, by itself.
-
-
-2.4.3. Console Device
-====================
-
-The virtio console device is a simple device for data input and
-output. A device may have one or more ports. Each port has a pair
-of input and output virtqueues. Moreover, a device has a pair of
-control IO virtqueues. The control virtqueues are used to
-communicate information between the device and the driver about
-ports being opened and closed on either side of the connection,
-indication from the host about whether a particular port is a
-console port, adding new ports, port hot-plug/unplug, etc., and
-indication from the guest about whether a port or a device was
-successfully added, port open/close, etc.. For data IO, one or
-more empty buffers are placed in the receive queue for incoming
-data and outgoing characters are placed in the transmit queue.
-
-2.4.3.1. Device ID
------------------
-
- 3
-
-2.4.3.2. Virtqueues
-------------------
-
- 0:receiveq(port0). 1:transmitq(port0), 2:control receiveq, 3:control transmitq, 4:receiveq(port1), 5:transmitq(port1),
- ...
-
- Ports 2 onwards only exist if VIRTIO_CONSOLE_F_MULTIPORT is set.
-
-2.4.3.3. Feature bits
---------------------
-
- VIRTIO_CONSOLE_F_SIZE (0) Configuration cols and rows fields
- are valid.
-
- VIRTIO_CONSOLE_F_MULTIPORT(1) Device has support for multiple
- ports; configuration fields nr_ports and max_nr_ports are
- valid and control virtqueues will be used.
-
-2.4.3.4. Device configuration layout
------------------------------------
-
- The size of the console is supplied
- in the configuration space if the VIRTIO_CONSOLE_F_SIZE feature
- is set. Furthermore, if the VIRTIO_CONSOLE_F_MULTIPORT feature
- is set, the maximum number of ports supported by the device can
- be fetched.
-
- struct virtio_console_config {
- le16 cols;
- le16 rows;
- le32 max_nr_ports;
- };
-
-2.4.3.4.1. Legacy Interface: Device configuration layout
------------------------------------
-For legacy devices, the fields in struct virtio_console_config are the
-native endian of the guest rather than (necessarily) little-endian.
-
-2.4.3.5. Device Initialization
------------------------------
-
-1. If the VIRTIO_CONSOLE_F_SIZE feature is negotiated, the driver
- can read the console dimensions from the configuration fields.
-
-2. If the VIRTIO_CONSOLE_F_MULTIPORT feature is negotiated, the
- driver can spawn multiple ports, not all of which may be
- attached to a console. Some could be generic ports. In this
- case, the control virtqueues are enabled and according to the
- max_nr_ports configuration-space value, the appropriate number
- of virtqueues are created. A control message indicating the
- driver is ready is sent to the host. The host can then send
- control messages for adding new ports to the device. After
- creating and initializing each port, a
- VIRTIO_CONSOLE_PORT_READY control message is sent to the host
- for that port so the host can let us know of any additional
- configuration options set for that port.
-
-3. The receiveq for each port is populated with one or more
- receive buffers.
-
-2.4.3.6. Device Operation
-------------------------
-
-1. For output, a buffer containing the characters is placed in
- the port's transmitq.[24]
-
-2. When a buffer is used in the receiveq (signalled by an
- interrupt), the contents is the input to the port associated
- with the virtqueue for which the notification was received.
-
-3. If the driver negotiated the VIRTIO_CONSOLE_F_SIZE feature, a
- configuration change interrupt may occur. The updated size can
- be read from the configuration fields.
-
-4. If the driver negotiated the VIRTIO_CONSOLE_F_MULTIPORT
- feature, active ports are announced by the host using the
- VIRTIO_CONSOLE_PORT_ADD control message. The same message is
- used for port hot-plug as well.
-
-5. If the host specified a port `name', a sysfs attribute is
- created with the name filled in, so that udev rules can be
- written that can create a symlink from the port's name to the
- char device for port discovery by applications in the guest.
-
-6. Changes to ports' state are effected by control messages.
- Appropriate action is taken on the port indicated in the
- control message. The layout of the structure of the control
- buffer and the events associated are:
-
- struct virtio_console_control {
- le32 id; /* Port number */
- le16 event; /* The kind of control event */
- le16 value; /* Extra information for the event */
- };
-
- /* Some events for the internal messages (control packets) */
- #define VIRTIO_CONSOLE_DEVICE_READY 0
- #define VIRTIO_CONSOLE_PORT_ADD 1
- #define VIRTIO_CONSOLE_PORT_REMOVE 2
- #define VIRTIO_CONSOLE_PORT_READY 3
- #define VIRTIO_CONSOLE_CONSOLE_PORT 4
- #define VIRTIO_CONSOLE_RESIZE 5
- #define VIRTIO_CONSOLE_PORT_OPEN 6
- #define VIRTIO_CONSOLE_PORT_NAME 7
-
-2.4.3.6.1. Legacy Interface: Device Operation
-------------------------
-For legacy devices, the fields in struct virtio_console_control are the
-native endian of the guest rather than (necessarily) little-endian.
-
-
-2.4.4. Entropy Device
-====================
-
-The virtio entropy device supplies high-quality randomness for
-guest use.
-
-2.4.4.1. Device ID
------------------
- 4
-
-2.4.4.2. Virtqueues
-------------------
- 0:requestq.
-
-2.4.4.3. Feature bits
---------------------
- None currently defined
-
-2.4.4.4. Device configuration layout
------------------------------------
- None currently defined.
-
-2.4.4.5. Device Initialization
------------------------------
-
-1. The virtqueue is initialized
-
-2.4.4.6. Device Operation
-------------------------
-
-When the driver requires random bytes, it places the descriptor
-of one or more buffers in the queue. It will be completely filled
-by random data by the device.
-
-2.4.5. Memory Balloon Device
-===========================
-
-The virtio memory balloon device is a primitive device for
-managing guest memory: the device asks for a certain amount of
-memory, and the guest supplies it (or withdraws it, if the device
-has more than it asks for). This allows the guest to adapt to
-changes in allowance of underlying physical memory. If the
-feature is negotiated, the device can also be used to communicate
-guest memory statistics to the host.
-
-2.4.5.1. Device ID
------------------
- 5
-
-2.4.5.2. Virtqueues
-------------------
- 0:inflateq. 1:deflateq. 2:statsq.
-
- Virtqueue 2 only exists if VIRTIO_BALLON_F_STATS_VQ set.
-
-2.4.5.3. Feature bits
---------------------
- VIRTIO_BALLOON_F_MUST_TELL_HOST (0) Host must be told before
- pages from the balloon are used.
-
- VIRTIO_BALLOON_F_STATS_VQ (1) A virtqueue for reporting guest
- memory statistics is present.
-
-2.4.5.4. Device configuration layout
------------------------------------
- Both fields of this configuration
- are always available.
-
- struct virtio_balloon_config {
- le32 num_pages;
- le32 actual;
- };
-
-2.4.5.4.1. Legacy Interface: Device configuration layout
------------------------------------
-Note that these fields are always little endian, despite convention
-that legacy device fields are guest endian.
-
-2.4.5.5. Device Initialization
------------------------------
-
-1. The inflate and deflate virtqueues are identified.
-
-2. If the VIRTIO_BALLOON_F_STATS_VQ feature bit is negotiated:
-
- (a) Identify the stats virtqueue.
-
- (b) Add one empty buffer to the stats virtqueue and notify the
- host.
-
-Device operation begins immediately.
-
-2.4.5.6. Device Operation
-------------------------
-
-Memory Ballooning The device is driven by the receipt of a
-configuration change interrupt.
-
-1. The “num_pages” configuration field is examined. If this is
- greater than the “actual” number of pages, memory must be given
- to the balloon. If it is less than the “actual” number of
- pages, memory may be taken back from the balloon for general
- use.
-
-2. To supply memory to the balloon (aka. inflate):
-
- (a) The driver constructs an array of addresses of unused memory
- pages. These addresses are divided by 4096[25] and the descriptor
- describing the resulting 32-bit array is added to the inflateq.
-
-3. To remove memory from the balloon (aka. deflate):
-
- (a) The driver constructs an array of addresses of memory pages
- it has previously given to the balloon, as described above.
- This descriptor is added to the deflateq.
-
- (b) If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiated, the
- guest may not use these requested pages until that descriptor
- in the deflateq has been used by the device.
-
- (c) Otherwise, the guest may begin to re-use pages previously
- given to the balloon before the device has acknowledged their
- withdrawl. [26]
-
-4. In either case, once the device has completed the inflation or
- deflation, the “actual” field of the configuration should be
- updated to reflect the new number of pages in the balloon.[27]
-
-2.4.5.6.1. Memory Statistics
----------------------------
-
-The stats virtqueue is atypical because communication is driven
-by the device (not the driver). The channel becomes active at
-driver initialization time when the driver adds an empty buffer
-and notifies the device. A request for memory statistics proceeds
-as follows:
-
-1. The device pushes the buffer onto the used ring and sends an
- interrupt.
-
-2. The driver pops the used buffer and discards it.
-
-3. The driver collects memory statistics and writes them into a
- new buffer.
-
-4. The driver adds the buffer to the virtqueue and notifies the
- device.
-
-5. The device pops the buffer (retaining it to initiate a
- subsequent request) and consumes the statistics.
-
- Memory Statistics Format Each statistic consists of a 16 bit
- tag and a 64 bit value. All statistics are optional and the
- driver may choose which ones to supply. To guarantee backwards
- compatibility, unsupported statistics should be omitted.
-
- struct virtio_balloon_stat {
- #define VIRTIO_BALLOON_S_SWAP_IN 0
- #define VIRTIO_BALLOON_S_SWAP_OUT 1
- #define VIRTIO_BALLOON_S_MAJFLT 2
- #define VIRTIO_BALLOON_S_MINFLT 3
- #define VIRTIO_BALLOON_S_MEMFREE 4
- #define VIRTIO_BALLOON_S_MEMTOT 5
- le16 tag;
- le64 val;
- } __attribute__((packed));
-
-2.4.5.6.1.1. Legacy Interface: Memory Statistics
----------------------------
-For legacy devices, the fields in struct virtio_balloon_stat are the
-native endian of the guest rather than (necessarily) little-endian.
-
-
-2.4.5.6.2. Memory Statistics Tags
---------------------------------
-
- VIRTIO_BALLOON_S_SWAP_IN The amount of memory that has been
- swapped in (in bytes).
-
- VIRTIO_BALLOON_S_SWAP_OUT The amount of memory that has been
- swapped out to disk (in bytes).
-
- VIRTIO_BALLOON_S_MAJFLT The number of major page faults that
- have occurred.
-
- VIRTIO_BALLOON_S_MINFLT The number of minor page faults that
- have occurred.
-
- VIRTIO_BALLOON_S_MEMFREE The amount of memory not being used
- for any purpose (in bytes).
-
- VIRTIO_BALLOON_S_MEMTOT The total amount of memory available
- (in bytes).
-
-
-2.4.6. SCSI Host Device
-======================
-
-The virtio SCSI host device groups together one or more virtual
-logical units (such as disks), and allows communicating to them
-using the SCSI protocol. An instance of the device represents a
-SCSI host to which many targets and LUNs are attached.
-
-The virtio SCSI device services two kinds of requests:
-
-• command requests for a logical unit;
-
-• task management functions related to a logical unit, target or
- command.
-
-The device is also able to send out notifications about added and
-removed logical units. Together, these capabilities provide a
-SCSI transport protocol that uses virtqueues as the transfer
-medium. In the transport protocol, the virtio driver acts as the
-initiator, while the virtio SCSI host provides one or more
-targets that receive and process the requests.
-
-2.4.6.1. Device ID
------------------
- 8
-
-2.4.6.2. Virtqueues
-------------------
- 0:controlq; 1:eventq; 2..n:request queues.
-
-2.4.6.3. Feature bits
---------------------
-
- VIRTIO_SCSI_F_INOUT (0) A single request can include both
- read-only and write-only data buffers.
-
- VIRTIO_SCSI_F_HOTPLUG (1) The host should enable
- hot-plug/hot-unplug of new LUNs and targets on the SCSI bus.
-
- VIRTIO_SCSI_F_CHANGE (2) The host will report changes to LUN
- parameters via a VIRTIO_SCSI_T_PARAM_CHANGE event.
-
-2.4.6.4. Device configuration layout
------------------------------------
-
- All fields of this configuration are always available. sense_size
- and cdb_size are writable by the guest.
-
- struct virtio_scsi_config {
- le32 num_queues;
- le32 seg_max;
- le32 max_sectors;
- le32 cmd_per_lun;
- le32 event_info_size;
- le32 sense_size;
- le32 cdb_size;
- le16 max_channel;
- le16 max_target;
- le32 max_lun;
- };
-
- num_queues is the total number of request virtqueues exposed by
- the device. The driver is free to use only one request queue,
- or it can use more to achieve better performance.
-
- seg_max is the maximum number of segments that can be in a
- command. A bidirectional command can include seg_max input
- segments and seg_max output segments.
-
- max_sectors is a hint to the guest about the maximum transfer
- size it should use.
-
- cmd_per_lun is a hint to the guest about the maximum number of
- linked commands it should send to one LUN. The actual value
- to be used is the minimum of cmd_per_lun and the virtqueue
- size.
-
- event_info_size is the maximum size that the device will fill
- for buffers that the driver places in the eventq. The driver
- should always put buffers at least of this size. It is
- written by the device depending on the set of negotated
- features.
-
- sense_size is the maximum size of the sense data that the
- device will write. The default value is written by the device
- and will always be 96, but the driver can modify it. It is
- restored to the default when the device is reset.
-
- cdb_size is the maximum size of the CDB that the driver will
- write. The default value is written by the device and will
- always be 32, but the driver can likewise modify it. It is
- restored to the default when the device is reset.
-
- max_channel, max_target and max_lun can be used by the driver
- as hints to constrain scanning the logical units on the
- host.h
-
-2.4.6.4.1. Legacy Interface: Device configuration layout
------------------------------------
-For legacy devices, the fields in struct virtio_scsi_config are the
-native endian of the guest rather than (necessarily) little-endian.
-
-2.4.6.5. Device Initialization
------------------------------
-
-The initialization routine should first of all discover the
-device's virtqueues.
-
-If the driver uses the eventq, it should then place at least a
-buffer in the eventq.
-
-The driver can immediately issue requests (for example, INQUIRY
-or REPORT LUNS) or task management functions (for example, I_T
-RESET).
-
-2.4.6.6. Device Operation
-------------------------
-
-Device operation consists of operating request queues, the control
-queue and the event queue.
-
-2.4.6.6.1. Device Operation: Request Queues
-------------------------------------------
-
-The driver queues requests to an arbitrary request queue, and
-they are used by the device on that same queue. It is the
-responsibility of the driver to ensure strict request ordering
-for commands placed on different queues, because they will be
-consumed with no order constraints.
-
-Requests have the following format:
-
- struct virtio_scsi_req_cmd {
- // Read-only
- u8 lun[8];
- le64 id;
- u8 task_attr;
- u8 prio;
- u8 crn;
- char cdb[cdb_size];
- char dataout[];
- // Write-only part
- le32 sense_len;
- le32 residual;
- le16 status_qualifier;
- u8 status;
- u8 response;
- u8 sense[sense_size];
- char datain[];
- };
-
-
- /* command-specific response values */
- #define VIRTIO_SCSI_S_OK 0
- #define VIRTIO_SCSI_S_OVERRUN 1
- #define VIRTIO_SCSI_S_ABORTED 2
- #define VIRTIO_SCSI_S_BAD_TARGET 3
- #define VIRTIO_SCSI_S_RESET 4
- #define VIRTIO_SCSI_S_BUSY 5
- #define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
- #define VIRTIO_SCSI_S_TARGET_FAILURE 7
- #define VIRTIO_SCSI_S_NEXUS_FAILURE 8
- #define VIRTIO_SCSI_S_FAILURE 9
-
- /* task_attr */
- #define VIRTIO_SCSI_S_SIMPLE 0
- #define VIRTIO_SCSI_S_ORDERED 1
- #define VIRTIO_SCSI_S_HEAD 2
- #define VIRTIO_SCSI_S_ACA 3
-
-The lun field addresses a target and logical unit in the
-virtio-scsi device's SCSI domain. The only supported format for
-the LUN field is: first byte set to 1, second byte set to target,
-third and fourth byte representing a single level LUN structure,
-followed by four zero bytes. With this representation, a
-virtio-scsi device can serve up to 256 targets and 16384 LUNs per
-target.
-
-The id field is the command identifier (“tag”).
-
-task_attr, prio and crn should be left to zero. task_attr defines
-the task attribute as in the table above, but all task attributes
-may be mapped to SIMPLE by the device; crn may also be provided
-by clients, but is generally expected to be 0. The maximum CRN
-value defined by the protocol is 255, since CRN is stored in an
-8-bit integer.
-
-All of these fields are defined in SAM. They are always
-read-only, as are the cdb and dataout field. The cdb_size is
-taken from the configuration space.
-
-sense and subsequent fields are always write-only. The sense_len
-field indicates the number of bytes actually written to the sense
-buffer. The residual field indicates the residual size,
-calculated as “data_length - number_of_transferred_bytes”, for
-read or write operations. For bidirectional commands, the
-number_of_transferred_bytes includes both read and written bytes.
-A residual field that is less than the size of datain means that
-the dataout field was processed entirely. A residual field that
-exceeds the size of datain means that the dataout field was
-processed partially and the datain field was not processed at
-all.
-
-The status byte is written by the device to be the status code as
-defined in SAM.
-
-The response byte is written by the device to be one of the
-following:
-
- VIRTIO_SCSI_S_OK when the request was completed and the status
- byte is filled with a SCSI status code (not necessarily
- "GOOD").
-
- VIRTIO_SCSI_S_OVERRUN if the content of the CDB requires
- transferring more data than is available in the data buffers.
-
- VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an
- ABORT TASK or ABORT TASK SET task management function.
-
- VIRTIO_SCSI_S_BAD_TARGET if the request was never processed
- because the target indicated by the lun field does not exist.
-
- VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus
- or device reset (including a task management function).
-
- VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a
- problem in the connection between the host and the target
- (severed link).
-
- VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a
- failure and the guest should not retry on other paths.
-
- VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure
- but retrying on other paths might yield a different result.
-
- VIRTIO_SCSI_S_BUSY if the request failed but retrying on the
- same path should work.
-
- VIRTIO_SCSI_S_FAILURE for other host or guest error. In
- particular, if neither dataout nor datain is empty, and the
- VIRTIO_SCSI_F_INOUT feature has not been negotiated, the
- request will be immediately returned with a response equal to
- VIRTIO_SCSI_S_FAILURE.
-
-2.4.6.6.1.1. Legacy Interface: Device Operation: Request Queues
-------------------------------------------
-For legacy devices, the fields in struct virtio_scsi_req_cmd are the
-native endian of the guest rather than (necessarily) little-endian.
-
-2.4.6.6.2. Device Operation: controlq
-------------------------------------
-
-The controlq is used for other SCSI transport operations.
-Requests have the following format:
-
- struct virtio_scsi_ctrl {
- le32 type;
- ...
- u8 response;
- };
-
- /* response values valid for all commands */
- #define VIRTIO_SCSI_S_OK 0
- #define VIRTIO_SCSI_S_BAD_TARGET 3
- #define VIRTIO_SCSI_S_BUSY 5
- #define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
- #define VIRTIO_SCSI_S_TARGET_FAILURE 7
- #define VIRTIO_SCSI_S_NEXUS_FAILURE 8
- #define VIRTIO_SCSI_S_FAILURE 9
- #define VIRTIO_SCSI_S_INCORRECT_LUN 12
-
-The type identifies the remaining fields.
-
-The following commands are defined:
-
- Task management function
- #define VIRTIO_SCSI_T_TMF 0
-
- #define VIRTIO_SCSI_T_TMF_ABORT_TASK 0
- #define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1
- #define VIRTIO_SCSI_T_TMF_CLEAR_ACA 2
- #define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET 3
- #define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4
- #define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5
- #define VIRTIO_SCSI_T_TMF_QUERY_TASK 6
- #define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7
-
- struct virtio_scsi_ctrl_tmf
- {
- // Read-only part
- le32 type;
- le32 subtype;
- u8 lun[8];
- le64 id;
- // Write-only part
- u8 response;
- }
-
- /* command-specific response values */
- #define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0
- #define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 10
- #define VIRTIO_SCSI_S_FUNCTION_REJECTED 11
-
- The type is VIRTIO_SCSI_T_TMF; the subtype field defines. All
- fields except response are filled by the driver. The subtype
- field must always be specified and identifies the requested
- task management function.
-
- Other fields may be irrelevant for the requested TMF; if so,
- they are ignored but they should still be present. The lun
- field is in the same format specified for request queues; the
- single level LUN is ignored when the task management function
- addresses a whole I_T nexus. When relevant, the value of the id
- field is matched against the id values passed on the requestq.
-
- The outcome of the task management function is written by the
- device in the response field. The command-specific response
- values map 1-to-1 with those defined in SAM.
-
- Asynchronous notification query
-
- #define VIRTIO_SCSI_T_AN_QUERY 1
-
- struct virtio_scsi_ctrl_an {
- // Read-only part
- le32 type;
- u8 lun[8];
- le32 event_requested;
- // Write-only part
- le32 event_actual;
- u8 response;
- }
-
- #define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE 2
- #define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT 4
- #define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST 8
- #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 16
- #define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST 32
- #define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY 64
-
- By sending this command, the driver asks the device which
- events the given LUN can report, as described in paragraphs 6.6
- and A.6 of the SCSI MMC specification. The driver writes the
- events it is interested in into the event_requested; the device
- responds by writing the events that it supports into
- event_actual.
-
- The type is VIRTIO_SCSI_T_AN_QUERY. The lun and event_requested
- fields are written by the driver. The event_actual and response
- fields are written by the device.
-
- No command-specific values are defined for the response byte.
-
- Asynchronous notification subscription
- #define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
-
- struct virtio_scsi_ctrl_an {
- // Read-only part
- le32 type;
- u8 lun[8];
- le32 event_requested;
- // Write-only part
- le32 event_actual;
- u8 response;
- }
-
- By sending this command, the driver asks the specified LUN to
- report events for its physical interface, again as described in
- the SCSI MMC specification. The driver writes the events it is
- interested in into the event_requested; the device responds by
- writing the events that it supports into event_actual.
-
- Event types are the same as for the asynchronous notification
- query message.
-
- The type is VIRTIO_SCSI_T_AN_SUBSCRIBE. The lun and
- event_requested fields are written by the driver. The
- event_actual and response fields are written by the device.
-
- No command-specific values are defined for the response byte.
-
-2.4.6.6.2.1. Legacy Interface: Device Operation: controlq
-------------------------------------------
-
-For legacy devices, the fields in struct virtio_scsi_ctrl, struct
-virtio_scsi_ctrl_tmf, struct virtio_scsi_ctrl_an and struct
-virtio_scsi_ctrl_an are the native endian of the guest rather than
-(necessarily) little-endian.
-
-
-2.4.6.6.3. Device Operation: eventq
-----------------------------------
-
-The eventq is used by the device to report information on logical
-units that are attached to it. The driver should always leave a
-few buffers ready in the eventq. In general, the device will not
-queue events to cope with an empty eventq, and will end up
-dropping events if it finds no buffer ready. However, when
-reporting events for many LUNs (e.g. when a whole target
-disappears), the device can throttle events to avoid dropping
-them. For this reason, placing 10-15 buffers on the event queue
-should be enough.
-
-Buffers are placed in the eventq and filled by the device when
-interesting events occur. The buffers should be strictly
-write-only (device-filled) and the size of the buffers should be
-at least the value given in the device's configuration
-information.
-
-Buffers returned by the device on the eventq will be referred to
-as "events" in the rest of this section. Events have the
-following format:
-
- #define VIRTIO_SCSI_T_EVENTS_MISSED 0x80000000
-
- struct virtio_scsi_event {
- // Write-only part
- le32 event;
- u8 lun[8];
- le32 reason;
- }
-
-If bit 31 is set in the event field, the device failed to report
-an event due to missing buffers. In this case, the driver should
-poll the logical units for unit attention conditions, and/or do
-whatever form of bus scan is appropriate for the guest operating
-system.
-
-The meaning of the reason field depends on the
-contents of the event field. The following events are defined:
-
- No event
- #define VIRTIO_SCSI_T_NO_EVENT 0
-
- This event is fired in the following cases:
-
- • When the device detects in the eventq a buffer that is
- shorter than what is indicated in the configuration field, it
- might use it immediately and put this dummy value in the
- event field. A well-written driver will never observe this
- situation.
-
- • When events are dropped, the device may signal this event as
- soon as the drivers makes a buffer available, in order to
- request action from the driver. In this case, of course, this
- event will be reported with the VIRTIO_SCSI_T_EVENTS_MISSED
- flag.
-
- Transport reset
- #define VIRTIO_SCSI_T_TRANSPORT_RESET 1
-
- #define VIRTIO_SCSI_EVT_RESET_HARD 0
- #define VIRTIO_SCSI_EVT_RESET_RESCAN 1
- #define VIRTIO_SCSI_EVT_RESET_REMOVED 2
-
- By sending this event, the device signals that a logical unit
- on a target has been reset, including the case of a new device
- appearing or disappearing on the bus.The device fills in all
- fields. The event field is set to
- VIRTIO_SCSI_T_TRANSPORT_RESET. The lun field addresses a
- logical unit in the SCSI host.
-
- The reason value is one of the three #define values appearing
- above:
-
- • VIRTIO_SCSI_EVT_RESET_REMOVED (“LUN/target removed”) is used
- if the target or logical unit is no longer able to receive
- commands.
-
- • VIRTIO_SCSI_EVT_RESET_HARD (“LUN hard reset”) is used if the
- logical unit has been reset, but is still present.
-
- • VIRTIO_SCSI_EVT_RESET_RESCAN (“rescan LUN/target”) is used if
- a target or logical unit has just appeared on the device.
-
- The “removed” and “rescan” events, when sent for LUN 0, may
- apply to the entire target. After receiving them the driver
- should ask the initiator to rescan the target, in order to
- detect the case when an entire target has appeared or
- disappeared. These two events will never be reported unless the
- VIRTIO_SCSI_F_HOTPLUG feature was negotiated between the host
- and the guest.
-
- Events will also be reported via sense codes (this obviously
- does not apply to newly appeared buses or targets, since the
- application has never discovered them):
-
- • “LUN/target removed” maps to sense key ILLEGAL REQUEST, asc
- 0x25, ascq 0x00 (LOGICAL UNIT NOT SUPPORTED)
-
- • “LUN hard reset” maps to sense key UNIT ATTENTION, asc 0x29
- (POWER ON, RESET OR BUS DEVICE RESET OCCURRED)
-
- • “rescan LUN/target” maps to sense key UNIT ATTENTION, asc
- 0x3f, ascq 0x0e (REPORTED LUNS DATA HAS CHANGED)
-
- The preferred way to detect transport reset is always to use
- events, because sense codes are only seen by the driver when it
- sends a SCSI command to the logical unit or target. However, in
- case events are dropped, the initiator will still be able to
- synchronize with the actual state of the controller if the
- driver asks the initiator to rescan of the SCSI bus. During the
- rescan, the initiator will be able to observe the above sense
- codes, and it will process them as if it the driver had
- received the equivalent event.
-
- Asynchronous notification
- #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
-
- By sending this event, the device signals that an asynchronous
- event was fired from a physical interface.
-
- All fields are written by the device. The event field is set to
- VIRTIO_SCSI_T_ASYNC_NOTIFY. The lun field addresses a logical
- unit in the SCSI host. The reason field is a subset of the
- events that the driver has subscribed to via the "Asynchronous
- notification subscription" command.
-
- When dropped events are reported, the driver should poll for
- asynchronous events manually using SCSI commands.
-
- LUN parameter change
- #define VIRTIO_SCSI_T_PARAM_CHANGE 3
-
- By sending this event, the device signals that the configuration parameters
- (for example the capacity) of a logical unit have changed.
- The event field is set to VIRTIO_SCSI_T_PARAM_CHANGE.
- The lun field addresses a logical unit in the SCSI host.
-
- The same event is also reported as a unit attention condition.
- The reason field contains the additional sense code and additional sense code qualifier,
- respectively in bits 0..7 and 8..15.
- For example, a change in capacity will be reported as asc 0x2a, ascq 0x09
- (CAPACITY DATA HAS CHANGED).
-
- For MMC devices (inquiry type 5) there would be some overlap between this
- event and the asynchronous notification event.
- For simplicity, as of this version of the specification the host must
- never report this event for MMC devices.
-
-2.4.6.6.3.1. Legacy Interface: Device Operation: eventq
-----------------------------------
-For legacy devices, the fields in struct virtio_scsi_event are the
-native endian of the guest rather than (necessarily) little-endian.
-
-2.5. Reserved Feature Bits
-=========================
-
-Currently there are four device-independent feature bits defined:
-
- VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
- that the driver can use descriptors with the VRING_DESC_F_INDIRECT
- flag set, as described in "2.1.4.3.1. Indirect Descriptors".
-
- VIRTIO_F_RING_EVENT_IDX(29) This feature enables the used_event
- and the avail_event fields. If set, it indicates that the
- device should ignore the flags field in the available ring
- structure. Instead, the used_event field in this structure is
- used by guest to suppress device interrupts. Further, the
- driver should ignore the flags field in the used ring
- structure. Instead, the avail_event field in this structure is
- used by the device to suppress notifications. If unset, the
- driver should ignore the used_event field; the device should
- ignore the avail_event field; the flags field is used
-
- VIRTIO_F_VERSION_1(32) This feature must be offered by any device
- compliant with this specification, and acknowledged by all device
- drivers.
-
-In addition, bit 30 is used by qemu's implementation to check for experimental
-early versions of virtio which did not perform correct feature negotiation,
-and should not be used.
-
-2.5.1. Legacy Interface: Reserved Feature Bits
---------------------------------------------
-
-Legacy or transitional devices may offer the following:
-
-VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature
- indicates that the driver wants an interrupt if the device runs
- out of available descriptors on a virtqueue, even though
- interrupts are suppressed using the VRING_AVAIL_F_NO_INTERRUPT
- flag or the used_event field. An example of this is the
- networking driver: it doesn't need to know every time a packet
- is transmitted, but it does need to free the transmitted
- packets a finite time after they are transmitted. It can avoid
- using a timer if the device interrupts it when all the packets
- are transmitted.
-
-VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device
- accepts arbitrary descriptor layouts, as described in Section
- "2.1.4.2.1. Legacy Interface: Message Framing".
-
-2.6. virtio_ring.h
-=================
-
-#ifndef VIRTIO_RING_H
-#define VIRTIO_RING_H
-/* An interface for efficient virtio implementation.
- *
- * This header is BSD licensed so anyone can use the definitions
- * to implement compatible drivers/servers.
- *
- * Copyright 2007, 2009, IBM Corporation
- * Copyright 2011, Red Hat, Inc
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- * notice, this list of conditions and the following disclaimer in the
- * documentation and/or other materials provided with the distribution.
- * 3. Neither the name of IBM nor the names of its contributors
- * may be used to endorse or promote products derived from this software
- * without specific prior written permission.
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED. IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
- * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- */
-#include <stdint.h>
-
-/* This marks a buffer as continuing via the next field. */
-#define VRING_DESC_F_NEXT 1
-/* This marks a buffer as write-only (otherwise read-only). */
-#define VRING_DESC_F_WRITE 2
-/* This means the buffer contains a list of buffer descriptors. */
-#define VRING_DESC_F_INDIRECT 4
-
-/* The Host uses this in used->flags to advise the Guest: don't kick me
- * when you add a buffer. It's unreliable, so it's simply an
- * optimization. Guest will still kick if it's out of buffers. */
-#define VRING_USED_F_NO_NOTIFY 1
-/* The Guest uses this in avail->flags to advise the Host: don't
- * interrupt me when you consume a buffer. It's unreliable, so it's
- * simply an optimization. */
-#define VRING_AVAIL_F_NO_INTERRUPT 1
-
-/* Support for indirect descriptors */
-#define VIRTIO_RING_F_INDIRECT_DESC 28
-
-/* Support for avail_idx and used_idx fields */
-#define VIRTIO_RING_F_EVENT_IDX 29
-
-/* Arbitrary descriptor layouts. */
-#define VIRTIO_F_ANY_LAYOUT 27
-
-/* Virtio ring descriptors: 16 bytes.
- * These can chain together via "next". */
-struct vring_desc {
- /* Address (guest-physical). */
- le64 addr;
- /* Length. */
- le32 len;
- /* The flags as indicated above. */
- le16 flags;
- /* We chain unused descriptors via this, too */
- le16 next;
-};
-
-struct vring_avail {
- le16 flags;
- le16 idx;
- le16 ring[];
- /* Only if VIRTIO_RING_F_EVENT_IDX: le16 used_event; */
-};
-
-/* le32 is used here for ids for padding reasons. */
-struct vring_used_elem {
- /* Index of start of used descriptor chain. */
- le32 id;
- /* Total length of the descriptor chain which was written to. */
- le32 len;
-};
-
-struct vring_used {
- le16 flags;
- le16 idx;
- struct vring_used_elem ring[];
- /* Only if VIRTIO_RING_F_EVENT_IDX: le16 avail_event; */
-};
-
-struct vring {
- unsigned int num;
-
- struct vring_desc *desc;
- struct vring_avail *avail;
- struct vring_used *used;
-};
-
-/* The standard layout for the ring is a continuous chunk of memory which
- * looks like this. We assume num is a power of 2.
- *
- * struct vring {
- * // The actual descriptors (16 bytes each)
- * struct vring_desc desc[num];
- *
- * // A ring of available descriptor heads with free-running index.
- * le16 avail_flags;
- * le16 avail_idx;
- * le16 available[num];
- * le16 used_event_idx; // Only if VIRTIO_RING_F_EVENT_IDX
- *
- * // Padding to the next align boundary.
- * char pad[];
- *
- * // A ring of used descriptor heads with free-running index.
- * le16 used_flags;
- * le16 used_idx;
- * struct vring_used_elem used[num];
- * le16 avail_event_idx; // Only if VIRTIO_RING_F_EVENT_IDX
- * };
- * Note: for virtio PCI, align is 4096.
- */
-static inline void vring_init(struct vring *vr, unsigned int num, void *p,
- unsigned long align)
-{
- vr->num = num;
- vr->desc = p;
- vr->avail = p + num*sizeof(struct vring_desc);
- vr->used = (void *)(((unsigned long)&vr->avail->ring[num] + sizeof(le16)
- + align-1)
- & ~(align - 1));
-}
-
-static inline unsigned vring_size(unsigned int num, unsigned long align)
-{
- return ((sizeof(struct vring_desc)*num + sizeof(le16)*(3+num)
- + align - 1) & ~(align - 1))
- + sizeof(le16)*3 + sizeof(struct vring_used_elem)*num;
-}
-
-static inline int vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old_idx)
-{
- return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old_idx);
-}
-
-/* Get location of event indices (only with VIRTIO_RING_F_EVENT_IDX) */
-static inline le16 *vring_used_event(struct vring *vr)
-{
- /* For backwards compat, used event index is at *end* of avail ring. */
- return &vr->avail->ring[vr->num];
-}
-
-static inline le16 *vring_avail_event(struct vring *vr)
-{
- /* For backwards compat, avail event index is at *end* of used ring. */
- return (le16 *)&vr->used->ring[vr->num];
-}
-#endif /* VIRTIO_RING_H */
-
-
-
-2.7. Creating New Device Types
-==============================
-
-Various considerations are necessary when creating a new device
-type.
-
-2.7.1. How Many Virtqueues?
----------------------------
-
-It is possible that a very simple device will operate entirely
-through its configuration space, but most will need at least one
-virtqueue in which it will place requests. A device with both
-input and output (eg. console and network devices described here)
-need two queues: one which the driver fills with buffers to
-receive input, and one which the driver places buffers to
-transmit output.
-
-2.7.2. What Configuration Space Layout?
----------------------------------------
-
-Configuration space should only be used for initialization-time
-parameters. It is a limited resource with no synchronization between
-writable fields, so for most uses it is better to use a virtqueue to update
-configuration information (the network device does this for filtering,
-otherwise the table in the config space could potentially be very
-large).
-
-Devices must not assume that configuration fields over 32 bits wide
-are atomically writable.
-
-2.7.3. What Device Number?
---------------------------
-
-Currently device numbers are assigned quite freely: a simple
-request mail to the author of this document or the Linux
-virtualization mailing list[9] will be sufficient to secure a unique one.
-
-Meanwhile for experimental drivers, use 65535 and work backwards.
-
-2.7.4. How many MSI-X vectors? (for PCI)
------------------------------------------
-
-Using the optional MSI-X capability devices can speed up
-interrupt processing by removing the need to read ISR Status
-register by guest driver (which might be an expensive operation),
-reducing interrupt sharing between devices and queues within the
-device, and handling interrupts from multiple CPUs. However, some
-systems impose a limit (which might be as low as 256) on the
-total number of MSI-X vectors that can be allocated to all
-devices. Devices and/or device drivers should take this into
-account, limiting the number of vectors used unless the device is
-expected to cause a high volume of interrupts. Devices can
-control the number of vectors used by limiting the MSI-X Table
-Size or not presenting MSI-X capability in PCI configuration
-space. Drivers can control this by mapping events to as small
-number of vectors as possible, or disabling MSI-X capability
-altogether.
-
-2.7.5. Device Improvements
---------------------------
-
-Any change to configuration space, or new virtqueues, or
-behavioural changes, should be indicated by negotiation of a new
-feature bit. This establishes clarity[11] and avoids future expansion problems.
-
-Clusters of functionality which are always implemented together
-can use a single bit, but if one feature makes sense without the
-others they should not be gratuitously grouped together to
-conserve feature bits.
-
-
-FOOTNOTES:
-==========
-
-[1] This lack of page-sharing implies that the implementation of the
-device (e.g. the hypervisor or host) needs full access to the
-guest memory. Communication with untrusted parties (i.e.
-inter-guest communication) requires copying.
-
-[2] The Linux implementation further separates the PCI virtio code
-from the specific virtio drivers: these drivers are shared with
-the non-PCI implementations (currently lguest and S/390).
-
-[3] The actual value within this range is ignored
-
-[6] The 4096 is based on the x86 page size, but it's also large
-enough to ensure that the separate parts of the virtqueue are on
-separate cache lines.
-
-[7] These fields are kept here because this is the only part of the
-virtqueue written by the device
-
-[9] https://lists.linux-foundation.org/mailman/listinfo/virtualization
-
-[11] Even if it does mean documenting design or implementation
-mistakes!
-
-[13] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
-dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
-features must offer the checksum feature, and a driver which
-accepts the offload features must accept the checksum feature.
-Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
-depending on VIRTIO_NET_F_GUEST_CSUM.
-
-[14] This is a common restriction in real, older network cards.
-
-[15] For example, a network packet transported between two guests on
-the same system may not require checksumming at all, nor segmentation,
-if both guests are amenable.
-
-[16] For example, consider a partially checksummed TCP (IPv4) packet.
-It will have a 14 byte ethernet header and 20 byte IP header
-followed by the TCP header (with the TCP checksum field 16 bytes
-into that header). csum_start will be 14+20 = 34 (the TCP
-checksum includes the header), and csum_offset will be 16. The
-value in the TCP checksum field should be initialized to the sum
-of the TCP pseudo header, so that replacing it by the ones'
-complement checksum of the TCP header and body will give the
-correct result.
-
-[17] Due to various bugs in implementations, this field is not useful
-as a guarantee of the transport header size.
-
-[18] This case is not handled by some older hardware, so is called out
-specifically in the protocol.
-
-[19] Note that the header will be two bytes longer for the
-VIRTIO_NET_F_MRG_RXBUF case.
-
-[20] Obviously each one can be split across multiple descriptor
-elements.
-
-[21] Since there are no guarentees, it can use a hash filter or
-silently switch to allmulti or promiscuous mode if it is given too
-many addresses.
-
-[23] The FLUSH and FLUSH_OUT types are equivalent, the device does not
-distinguish between them
-
-[24] Because this is high importance and low bandwidth, the current
-Linux implementation polls for the buffer to be used, rather than
-waiting for an interrupt, simplifying the implementation
-significantly. However, for generic serial ports with the
-O_NONBLOCK flag set, the polling limitation is relaxed and the
-consumed buffers are freed upon the next write or poll call or
-when a port is closed or hot-unplugged.
-
-[25] This is historical, and independent of the guest page size
-
-[26] In this case, deflation advice is merely a courtesy
-
-[27] As updates to configuration space are not atomic, this field
-isn't particularly reliable, but can be used to diagnose buggy guests.