Budget Fair Queueing (BFQ) Storage-I/O Scheduler

In this page we report a selection of our blk-mq benchmarks with NONE, MQ-DEADLINE, KYBER, and the development version BFQ-MQ-V9 (which coincides with the mainline BFQ available from Linux 4.20.0), with Linux 4.18.0, and on the following two devices:

HITACHI HTS72755 HDD
PLEXTOR PX-256M5S SSD

Results with many more devices, but with previous versions of BFQ and Linux--in particular, with legacy blk--can be found here. In short, our results are essentially the same with any kernel version, and with either blk-mq or legacy blk. In addition, the relative performance of BFQ, with respect to the other I/O schedulers, is the same with any storage medium.

For each device, we report the results of our throughput, application-responsiveness (start-up time) and video-playing (frame-drop-rate) benchmarks. The last two benchmarks measure also total throughput during the test, but we do not report throughput measurements too for these benchmarks, as these values are little meaningful. In fact:

Starting applications and playing videos entail relatively short I/O, and we benchmark these tasks in hostile conditions, i.e., while a lot of extra I/O is being generated too;
blk-mq I/O schedulers are work-conserving, apart from BFQ, which, to privilege critical I/O, may occasionally plug I/O dispatching. However, plugging lasts at most a few milliseconds.
With a mostly work-conserving I/O scheduler, short I/O influences total throughput very little or not at all, if there is a lot of extra I/O in progress.

More precisely, these benchmarks are part of the S benchmark suite, and can be repeated with the following commands:

  git clone https://github.com/Algodev-github/S.git
  cd S/run_multiple_benchmarks
  sudo ./run\_main\_benchmarks.sh "throughput replayed-startup video-playing" "none mq-deadline kyber bfq"

In what follows we call reader/writer a program (fio in the S suite) that just reads/writes a large file. In addition, we say that a reader/writer is sequential or random depending on whether it reads/writes the file sequentially or at random positions. For brevity, we report only our results with synthetic, heavy workloads. The goal is to show application start-up times in rather extreme conditions, i.e, with very heavy background workloads.

HITACHI HDD

Next figure shows the throughput reached by each I/O scheduler while one of the following four heavy workloads is being executed: 10 parallel sequential or random sync readers (10r-seq, 10r-rand), 5 parallel, sequential or random sync readers plus 5 parallel sequential or random writers (5r5w-seq, 5r5w-rand). The symbol X means that, for that workload and with that scheduler, the benchmark script failed to terminate within 10 seconds from due termination time (which implies that the system, and thus the results, were not reliable).

For all workloads but 5r5w-rand, the benchmark simply fails with all schedulers but BFQ. For 5r5w-rand, BFQ outperforms the other schedulers.

Next figure shows the cold-cache start-up time of gnome-terminal, a medium-size application, while one of the above two heavy sequential workloads is being executed in the background. We consider only sequential workloads, because these are the nastiest background workloads for responsiveness. In fact, this is the I/O that both the kernel I/O stack and the storage-device firmware prefer, and thus privilege. The reason is that sequential I/O is the one that boosts throughput most, while sync reads are the most time-critical operations. The symbol X in the figure means that, for that workload and with that scheduler, the application failed to start in 60 seconds.

HITACHI HDD gnome-terminale start-up time — **Figure 2**. *gnome-terminal* start-up time on the HITACHI HDD (lower is better).

As can be seen, with any workload BFQ guarantees about the same start-up time as if the device was idle. With the other schedulers, the application in practice does not start at all. We ran tests with lighter background workloads too, and, also in those cases, the responsiveness guaranteed by these schedulers was noticeably worse than that guaranteed by BFQ (results available on demand). Results with both smaller and larger applications can be found in this extra result page.

Finally, video-playing results are shown in next figure. In this benchmark, the same background workloads as for the responsiveness tests are generated, and, to make the background workload even more demanding for the time-sensitive application under test, a bash shell is also started and terminated repeatedly. This time the symbol X means that the playback of the video did not terminate within a 60-second timeout after its actual duration, and thus the test was aborted. In most of the failed cases, the playback of the video actually did not start at all.

Video-playing frame-drop rate on the HitachiHDD — **Figure 3**. Video-playing frame-drop rate on the Hitachi HDD (lower is better).

As can be seen, the performance of BFQ is not even comparable with that of the other schedulers.

PLEXTOR SSD

For each benchmark, we report our results for the same workloads as with the HDD.

SSD throughput — **Figure 4**. Throughput on the Plextor SSD (higher is better).

With sequential workloads, BFQ reaches the same throughput as the other schedulers. BFQ loses about 18% with only random readers, because the number of IOPS becomes so high that the execution time and parallel efficiency of the schedulers becomes relevant. And BFQ is still longer to execute and less parallel than the other schedulers. In contrast, BFQ gets a much higher throughput with 5r5w-rand, because BFQ privileges reads over writes (as system-level latency mostly depends on reads), and random reads reach a higher throughput than random writes.

As for responsiveness, for gnome-terminal BFQ guarantees the lowest-possible start-up time with only reads in the background, and about twice the lowest-possible start-up time with reads and writes. The reason for the increase of the start-up time in the latter case is reported in the comments on lowriter start-up times in the extra result page.

SSD gnome-terminal start-up time — **Figure 5**. *gnome-terminal* start-up time on the Plextor SSD (lower is better).

The other schedulers cause a much higher start-up time, in spite of the high speed of the device, also because of I/O-request prefetching. The device prefetches I/O requests, and, among internally-queued requests, privileges sequential ones. BFQ prevents the device from prefetching requests when that would hurt latency. Results with both smaller and larger applications can be found in this extra result page.

Finally, the next figure shows our video-playing results.

**Figure 6**. Video-playing frame-drop rate on the Plextor SSD (lower is better).

Results are good with all schedulers. However, the figure does not show the fact that the player takes a lot of time to start up with all schedulers but BFQ.