diff options
-rw-r--r-- | conformance.tex | 2 | ||||
-rw-r--r-- | content.tex | 145 |
2 files changed, 125 insertions, 22 deletions
diff --git a/conformance.tex b/conformance.tex index 7b7df32..f59e360 100644 --- a/conformance.tex +++ b/conformance.tex @@ -105,6 +105,7 @@ A network driver MUST conform to the following normative statements: A block driver MUST conform to the following normative statements: \begin{itemize} +\item \ref{drivernormative:Device Types / Block Device / Device Initialization} \item \ref{drivernormative:Device Types / Block Device / Device Operation} \end{itemize} @@ -224,6 +225,7 @@ A network device MUST conform to the following normative statements: A block device MUST conform to the following normative statements: \begin{itemize} +\item \ref{devicenormative:Device Types / Block Device / Device Initialization} \item \ref{devicenormative:Device Types / Block Device / Device Operation} \end{itemize} diff --git a/content.tex b/content.tex index 4be0f7d..e054919 100644 --- a/content.tex +++ b/content.tex @@ -4041,8 +4041,13 @@ device except where noted. \item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in \field{blk_size}. +\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. + \item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O alignment. + +\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback + and writethrough modes. \end{description} \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits} @@ -4051,16 +4056,12 @@ device except where noted. \item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. \item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. - -\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. - -\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback - and writethrough modes. \end{description} -VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE: Legacy drivers -MUST only negotiate this feature if they are capable of sending -VIRTIO_BLK_T_FLUSH commands. +\begin{note} + In the legacy interface, VIRTIO_BLK_F_FLUSH was also + called VIRTIO_BLK_F_WCE. +\end{note} \subsection{Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout} @@ -4089,7 +4090,7 @@ struct virtio_blk_config { // optimal (suggested maximum) I/O size in blocks le32 opt_io_size; } topology; - u8 reserved; + u8 writeback; }; \end{lstlisting} @@ -4119,21 +4120,63 @@ according to the native endian of the guest rather than \field{topology} struct can be read to determine the physical block size and optimal I/O lengths for the driver to use. This also does not affect the units in the protocol, only performance. + +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache + mode can be read or set through the \field{writeback} field. 0 corresponds + to a writethrough cache, 1 to a writeback cache\footnote{Consistent with + \ref{devicenormative:Device Types / Block Device / Device Operation}, + a writethrough cache can be defined broadly as a cache that commits + writes to persistent device backend storage before reporting their + completion. For example, a battery-backed writeback cache actually + counts as writethrough according to this definition.}. The cache mode + after reset can be either writeback or writethrough. The actual + mode can be determined by reading \field{writeback} after feature + negotiation. \end{enumerate} +\drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} + +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of +sending VIRTIO_BLK_T_FLUSH commands. + +If neither VIRTIO_BLK_F_CONFIG_WCE nor VIRTIO_BLK_F_FLUSH are +negotiated, the driver MAY deduce the presence of a writethrough cache. +If VIRTIO_BLK_F_CONFIG_WCE was not negotiated but VIRTIO_BLK_F_FLUSH was, +the driver SHOULD assume presence of a writeback cache. + +The driver MUST NOT read \field{writeback} before setting +the FEATURES_OK \field{status} bit. + +\devicenormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} + +Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it +if they offer VIRTIO_BLK_F_CONFIG_WCE. + +If VIRTIO_BLK_F_CONFIG_WCE is negotiated but VIRTIO_BLK_F_FLUSH +is not, the device MUST initialize \field{writeback} to 0. + \subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization} -The \field{reserved} field used to be called \field{writeback}. If the -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be -read from \field{writeback}; the -driver can also write to the field in order to toggle the cache -between writethrough (0) and writeback (1) mode. If the feature is -not available, the driver can instead look at the result of -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated. +Because legacy devices do not have FEATURES_OK, transitional devices +MUST implement slightly different behavior around feature negotiation +when used through the legacy interface. In particular, when using the +legacy interface: + +\begin{itemize} +\item the driver MAY read or write \field{writeback} before setting + the DRIVER or DRIVER_OK \field{status} bit + +\item the device MUST NOT modify the cache mode (and \field{writeback}) + as a result of a driver setting a status bit, unless + the DRIVER_OK bit is being set and the driver has not set the + VIRTIO_BLK_F_CONFIG_WCE driver feature bit. + +\item the device MUST NOT modify the cache mode (and \field{writeback}) + as a result of a driver modifying the driver feature bits, for example + if the driver sets the VIRTIO_BLK_F_CONFIG_WCE driver feature bit but + does not set the VIRTIO_BLK_F_FLUSH bit. +\end{itemize} -Some older legacy devices did not operate in writethrough mode even -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Operation} @@ -4183,14 +4226,66 @@ A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered. A driver MUST set \field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request. A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request. +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY +switch to writethrough or writeback mode by writing respectively 0 and +1 to the \field{writeback} field. After writing a 0 to \field{writeback}, +the driver MUST NOT assume that any volatile writes have been committed +to persistent device backend storage. + \devicenormative{\subsubsection}{Device Operation}{Device Types / Block Device / Device Operation} A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT write any data. -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure -that any writes which were completed are committed to non-volatile storage. +A write is considered volatile when it is submitted; the contents of +sectors covered by a volatile write are undefined in persistent device +backend storage until the write becomes stable. A write becomes stable +once it is completed and one or more of the following conditions is true: + +\begin{enumerate} +\item\label{item:flush1} neither VIRTIO_BLK_F_CONFIG_WCE nor + VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was + offered by the device; + +\item\label{item:flush2} the VIRTIO_BLK_F_CONFIG_WCE feature was negotiated and the + \field{writeback} field in configuration space was 0 \textbf{all the time between + the submission of the write and its completion}; + +\item\label{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent \textbf{after the write is + completed} and is completed itself. +\end{enumerate} + +If the device is backed by persistent storage, the device MUST ensure that +stable writes are committed to it, before reporting completion of the write +(cases~\ref{item:flush1} and~\ref{item:flush2}) or the flush +(case~\ref{item:flush3}). Failure to do so can cause data loss +in case of a crash. + +If the driver changes \field{writeback} between the submission of the write +and its completion, the write could be either volatile or stable when +its completion is reported; in other words, the exact behavior is undefined. + +% According to the device requirements for device initialization: +% Offer(CONFIG_WCE) => Offer(FLUSH). +% +% After reversing the implication: +% not Offer(FLUSH) => not Offer(CONFIG_WCE). + +If VIRTIO_BLK_F_FLUSH was not offered by the + device\footnote{Note that in this case, according to + \ref{devicenormative:Device Types / Block Device / Device Initialization}, + the device will not have offered VIRTIO_BLK_F_CONFIG_WCE either.}, the +device MAY also commit writes to persistent device backend storage before +reporting their completion. Unlike case~\ref{item:flush1}, however, this +is not an absolute requirement of the specification. + +\begin{note} + An implementation that does not offer VIRTIO_BLK_F_FLUSH and does not commit + completed writes will not be resilient to data loss in case of crashes. + Not offering VIRTIO_BLK_F_FLUSH is an absolute requirement + for implementations that do not wish to be safe against such data losses. +\end{note} \subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation} When using the legacy interface, transitional devices and drivers @@ -4233,6 +4328,12 @@ serve as data consistency guarantee. Only a VIRTIO_BLK_T_FLUSH request does that. \end{note} +Some older legacy devices did not commit completed writes to persistent +device backend storage when VIRTIO_BLK_F_FLUSH was offered but not +negotiated. In order to work around this, the driver MAY set the +\field{writeback} to 0 (if available) or it MAY send an explicit flush +request after every completed write. + If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet command requests, each of these requests is of form: @@ -5577,7 +5678,7 @@ contents of \field{event}. The following events are defined: \end{lstlisting} By sending this event, the device signals a change in the configuration parameters - of a logical unit, for example the capacity or caching mode. + of a logical unit, for example the capacity or cache mode. \field{event} is set to VIRTIO_SCSI_T_PARAM_CHANGE. \field{lun} addresses a logical unit in the SCSI host. |