Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Design Introduction and Rationale

Write ordering constraints, and how these can be used to achieve some of the Durability in ACID without needing fsync()
Background on how filing systems work
Write ordering data and durability: why does it matter?

Boost.AFIO came about out of the need for a scalable, high performance, portable asynchronous file i/o and filesystem implementation library for a forthcoming filing system based graph store ACID compliant transactional persistence layer called TripleGit — call it a SQLite3 but for graphstores[1]. The fact that a portable asynchronous file i/o and filesystem library for C++ was needed at all came as a bit of a surprise: one thinks of these things as done and dusted decades ago, but it turns out that the fully featured libuv, a C library, is good enough for most people needing portable asynchronous file i/o. However as great as libuv is, it isn't very C++-ish, and hooking it in with Boost.ASIO (parts of which are expected to enter the ISO C++ language standard) isn't particularly clean. I therefore resolved to write a native Boost asynchronous file i/o and filesystem implementation, and keep it as simple as possible.

A quick version history

AFIO started life as a C++ 0x library written for an early Visual Studio 2013 Community Preview back in 2012 as a outside-of-work side project when I was working at BlackBerry. It was ported to Boost during Google Summer of Code 2013 with the help of student Paul Kirth, and VS2012 and VS2010 support was added. For v1.0, AFIO used a simple dispatch engine which kept the extant ops in a hash table, and the entire dispatch engine was protected by a single giant and recursive mutex. Performance never exceeded about 150k ops/sec maximum on a four core Intel Ivy Bridge CPU.

That performance was embarrassing, so for v1.1 the entire engine was rewritten using atomic shared pointers to be completely lock free, and very nearly wait free if it weren't for the thin spin locks around the central ops hash table. Now performance can reach 1.5m ops/sec on a four core Intel Ivy Bridge CPU, or more than half of Boost.ASIO's maximum dispatch rate.

For the v1.2 engine, another large refactor was done, this time to substantially simplify the engine by removing the use of std::packaged_task<> completely, replacing it with a new intrusive-capable enqueued_task<> which permits the engine to early out in many cases, plus allowing the consolidation of all spinlocked points down to just two: one in dispatch, and one other in completion, which is now optimal. Performance of the v1.2 engine rose by about 20% over the v1.1 engine, plus AFIO is now fully clean on all race detecting tools.

For the v1.3 engine, yet another large refactor was done, though not for performance but rather to make it much easier to maintain AFIO in the future, especially after acceptance into Boost whereupon one cannot arbitrarily break API anymore, and one must maintain backwards compatibility. To this end the dependencies between AFIO and Boost were completely abstracted into a substitutable symbol aliasing layer such that any combination of Boost and C++ 11 STL threading/chrono, filesystem and networking can be selected externally using macros. Indeed, any of the eight build combinations can coexist in the same translation unit too, I have unit test runs which prove it! With the v1.3 engine AFIO optionally no longer needs Boost at all, not even for its unit testing. However the cost was dropping support for all Visual Studios before 2013 and all GCCs before 4.7 as they don't have the template aliasing support needed to implement the STL abstraction layer. A very large amount of legacy cruft code e.g. support for non-variadic templates was cleaned out for the v1.3 release.

This version 1.4 of AFIO

During ACCU and C++ Now 2015 I spoke with a number of ISO WG21 committee members about the structural design problems in iostreams and the Filesystem TS (lack of filesystem race freedom, lack of context dependant filesystem), and what design the committee would prefer to see to fix those problems. As it happens, we were all close to the same page, so from the v1.4 engine onwards I resolved to refactor the AFIO API thusly:

  1. Since v1.0 AFIO implemented the Concurrency TS continuations atop standard futures using a central hash table to keep the continuations information per future. This had the big advantage that standard STL and Boost futures could be used, but it came with many other problems, mainly performance and code complexity related. An additional problem was that the Concurrency TS had moved on considerably since 2012, and AFIO's emulation was now significantly out of date. For v1.4 a new lightweight future-promise factory toolkit library called Boost.Outcome was written between the end of C++ Now (May) and the beginning of the AFIO peer review (July) which makes easy the writing of arbitrarily customisable future-promises for C++ which:
    • Implements an almost complete Concurrency TS (N4399) future-promise specification with almost all the Boost.Thread future-promise extensions.
    • Are very considerably more performant (2x-3x) and much more reliably low latency (no memory allocation).
    • Permit arbitrary wait composure of any kind of custom future-promise with one another.
    • Are part of a general purpose lightweight C++ monadic programming implementation, so futures are merely asynchronous monads.
    • Natively support C++ 1z coroutines (N4499) which are currently only supported by Visual Studio 2015.

    The use of this future factory toolkit makes the AFIO continuations infrastructure redundant, and it will therefore be removed shortly. The monadic programming library also makes quite a bit of internal AFIO implementation code much more simplified as thanks to the monads, one can use noexcept design throughout and therefore skip dealing directly with exception safety as the monads take away the potential of control flow being reversed by an exception throw.

    [Note] Note

    This version of AFIO being presented for Boost review does not yet make use of lightweight future-promises, and instead mocks up the eventual API using the existing highly mature and very well tested engine. The API presented is expected to be final, except for the very few items specified as deprecated (see below for a list). This has been done in order to test that the engine rewrite based on lightweight future-promise exactly matches the behaviour of the current engine using an identical unit test suite.

    I should emphasise that I expect any programs written to match the presented API to continue to work after the engine rewrite — after all the internal unit test suite will do so.

  2. For race free filesystem programming, you really need to base all path related operations on an open file descriptor or handle, so something like:
    afio::handle_ptr &h;  // Some open file handle
    
    // Create a sibling file to h->path() race free
    afio::handle_ptr newfileh=afio::file(h, "../newfile", afio::file_flags::create);
    
    // Asynchronously create a sibling file to h->path() race free
    // afio::future has type void because afio::future *always* carries a shared pointer to a handle
    // (it only gets a type when the operation returns more than a file handle)
    afio::future<void> newfilefh2=afio::async_file(h, "../newfile", afio::file_flags::create);
    
    // Wait for the asynchronous file creation to complete, rethrowing any error or exception
    afio::handle_ptr newfileh2=newfilefh2.get_handle();
    

    I'll admit this design isn't quite what the members of WG21 had in mind, especially the notion that afio::future<void> with type void always carries a shared pointer to a handle and that implicit type slicing from afio::future<T> to afio::future<void> is not just allowed but absolutely essential. However apart from that, the above API is probably quite close to what members of WG21 were thinking.

  3. One oft-observed design limitation in the Filesystem TS is that it cannot support filesystems in filesystems, with the most classic example being a ZIP archive on a filesystem where it might be nice to allow generic C++ filesystem code to not need to be aware that the filesystem it sees is inside a ZIP archive. The solution could be one or both of these options:
    1. Make the Filesystem TS operations hang as virtual member functions off some filesystem abstract base class.
    2. Have a thread local variable set the current filesystem instance to be used by the global Filesystem TS functions on that thread.

    AFIO v1.4 and probably v1.5 won't implement this as this is really a thing for Boost.Filesystem to do. However, AFIO's make_dispatcher() already takes a URI and there is a RAII facility for setting the current thread local dispatcher, so AFIO is ready for a Filesystem TS implementation matching the above design to be written on top of it in that a dispatcher instance has a suite of virtual member functions which define what some filesystem is or does. AFIO v1.4 only provides POSIX and NT kernel filesystem backends currently, however it is expected that v1.5 will add a new temporary filesystem backend which lets programs portably work inside tmpfs in whichever form that takes across Linux, FreeBSD, Apple OS X and Microsoft Windows. Additional backends implementing say a ZIP archive filesystem are similarly easy to add on.

As mentioned above, note that due to the above refactoring some parts of this v1.4 release of AFIO are deprecated and are expected to be removed shortly. You can find a list of these shortly to be removed APIs and parts here. The list is not long, and the removals are obvious.



[1] The UnQLite embedded NoSQL database engine is exactly one of those of course. Unfortunately I intend TripleGit for implementing portable Component Objects for C++ extending C++ Modules, which means I need a database engine suitable for incorporation into a dynamic linker, which unfortunately is not quite UnQLite.


PrevUpHomeNext