Budget Fair Queueing (BFQ) Storage-I/O Scheduler

A short history of BFQ

Well, not so short any more ...

I defined BFQ in 2008 with Fabio Checconi. Fabio wrote the first version of the code. Then we checked, fixed and improved it together, until we thought it was mature enough for submission to lkml:

Despite a general warm acceptance, BFQ did not make it, because of the doubts expressed by Jens Axboe, block-layer maintainer for mainline. One one side he was worried about the conceptual complexity of the engine that allows BFQ to provide its accurate service. In particular, he feared that this complexity would have been a problem for maintaining the code, once pushed to mainline. On the other side, he was convinced that there had to be only one main I/O scheduler in Linux.

Although BFQ was not merged into mainline, some power users started to download it from the BFQ webpage prepared by Fabio. BFQ was also merged into a few kernel variants, such as the Zen Kernel.

Meanwhile, I went on following lkml threads about I/O. Until I was struck by a discussion about responsiveness, where (very) high start-up time for popular applications, such as firefox and konsole, were reported:

In particular, the reported start-up times occurred if these applications were launched while one file was being read or written sequentially.

From that thread I got the inspiration for the simple idea behind my first improvement on the initial version of BFQ: detecting and privileging interactive applications. After adding both this and a few other improvements to boost the throughput, BFQ-v1 was born (July 2010):

From that moment on, BFQ started to gain momentum: in the intervening years, new Linux-kernel variants, Android-system variants, and Linux distributions have adopted BFQ as the default, or just an additional, I/O scheduler (pf-kernel, many kernel variants for Android, Cyanogenmod, Manjaro, OpenMandriva, Sabayon, ROSA, PCLinuxOS, Gentoo Linux, Arch Linux). Besides, I have been recording, during the last years, a few tens of downloads per day from individual users with various distributions.

Meanwhile, I have continued to add new features and improvements, such as the extension to soft real-time applications of the low-latency heuristic for interactive applications. I had also to extend BFQ constantly, to deal with the new characteristics of HDDs, such as NCQ, or with the then-emerging SSDs.

Fortunately, in this effort, I have received not only valuable feedback from the users, but also direct help from Francesco Allertsen, Mauro Andreolini, and especially Arianna Avanzini. She has been contributing to the BFQ project for several years, devising new solutions, implementing ideas, testing code and preparing patches for tons of kernel versions.

In addition to peculiar features of BFQ, over the last years Arianna and I have added to BFQ several extensions to handle new devices, as well as useful feature, taken from CFQ. In this respect, for each new non-trivial CFQ extension or feature, we have not just ported it to BFQ, but we have taken the opportunity to look for further improvements, in terms of throughput boosting, latency guarantees or execution time. This is the genesis, for example, of the unified mechanism, developed by Arianna (and designed with Mauro Andreolini), with which BFQ handles the I/O generated by QEMU threads so as to keep a high throughput.

Keeping up with this rate of changes was not easy at all. Without the help of Arianna, I would not have had the necessary time budget, and the project would have probably died. Instead, at last, with v7r3, and the following bug-fix release v7r4, we may have made it: BFQ seems able to properly handle all the spectrum of current popular storage devices. Besides, all the features I would have liked to see in BFQ are now present and seem to work well. Finally, the code is apparently mature and clean enough.

So I have felt up to submit BFQ-v7r4 to lkml:

As can be seen from the thread, some regressions have been found, which has positively led to v7r5. Finally, this last (re)proposal of BFQ to lkml triggered also a new article on BFQ by Jonathan Corbet (in fact, Corbet already wrote an article on BFQ-v0 in the occasion of the first submission of BFQ to lkml).

In the thread, Jens Axboe pushed for a change of direction: BFQ had to replace CFQ instead of being an additional scheduler. I accepted to follow this new path, as it seemed the only way to have hope to see BFQ accepted. The needed structural change entailed however an enormous amount of work. And I was a little discouraged by all these difficulties, and by changes of plans that had very little to do with the merits or demerits of BFQ. So I slowed down.

Yet, luckily, during this unenthusiastic phase I met Ulf Hansonn, maintainer of the MMC subsystem, from Linaro. We met at the first Workshop on Mobile System Technologies (MST) in May 2015. Ulf introduced me to Linaro, and helped me convince Linaro to support the BFQ project. With Linaro's support, the project gained a new spin, and, finally, on the first of February 2016, I submitted a new version of BFQ, implementing the requested replace-CFQ scheme. Have a look at this article on lwn.net for further details and comments on this submission.

The new patch series received a careful revision from Tejun Heo, and I submitted a newer version on July, containing all the recommended changes. But a new, unpleasant surprise arrived: somewhat confusingly, Jens said that the important changes made by this patch series were not acceptable, because it touched blk, which, in the meantime, had been declared legacy code by the principal block-I/O developers.

Jens did not reply directly, but let Tejun report his negative feedback. So, in the absence of a detailed response and of an actual discussion with Jens, many commented that the reason for the red light was that this patch series replaced CFQ, and as such introduced a too radical change. This was rather odd, because it was Jens himself who had proposed a change to a replace-CFQ scheme. Anyway, many people insisted on reverting to the previous add-BFQ scheme and resubmitting. Linaro guys deemed this the way to go too, and supported me in undertaking this new little endeavour (every such radical change of scheme entailed a lot of non-trivial work). Above all, Linaro went on supporting this non-trivial effort financially.

So I did it, and submitted the new patch series after a few months, in October. This time it was Christoph Hellwig to stop the show with a "big NAK for introducing giant new infrastructure like a new I/O scheduler for the legacy request structure".

Fortunately, Christoph himself had already manifested his willingness to see BFQ in the new blk-mq. Yet blk-mq was not ready to have I/O schedulers. In addition, BFQ was still intrinsically a single-queue scheduler, and the handling of process contexts it needed allowed for basically no parallelism. In the end, only tens of KIOPS could be handled. blk-mq had been devised, instead, to handle from hundreds of KIOPS to millions of IOPS. But Jens himself proposed a new way to go in this respect: develop the needed infrastructure to accomodate BFQ in blk-mq, only for single-queue devices, where scalability is not the main concern.

As a further stroke of luck for the BFQ project, I was invited to the Linux Kernel Summit, which took place a few days after this feedback from Jens and Christoph. There I could talk face-to-face with both. In particular, we decided the plan with Jens: he would have made the infrastructure, and I would have adapted BFQ. Well, Jens did make it, as you can read, with full details, in this other article on lwn.net.

Ok, this history has really become rather long ☺

Let's make the final very short: the months following the above summit have seen BFQ landing into mainline (lwn article), from 4.12.0!

After that, BFQ went on gaining momentum, and being evaluated and chosen as default I/O scheduler by first-class systems: Chromium OS, Android, SUSE, Fedora, ...