Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Handling races on the filing system

Filing systems are a shared resource common to all processes on the system and sometimes the network, and are therefore as a globally shared resource inherently racy. Yet overwhelmingly programs, even often those written by world expert programmers, singularly assume the filing system to be a static, constant and unchanging place only modifiable by the current program, as indeed did until very recently the POSIX API standard which defines the common API for Linux, FreeBSD, Mac OS X and other Unices. When bug reports come in of data being lost, even very large professional corporations can make a real hash of testing that their fix isn't worse at losing data than the previous more naive implementation. This is because when you program against a mental model of a static, unchanging filesystem you will become inevitably surprised when it happens to change at exactly the wrong moment — which of course is a moment you can never replicate on your developer workstation, thus making finding and fixing these sorts of bug highly non-trivial.

In case you don't realise how much user data and productivity is lost each year to filing system races, just look up corrupted Office file on Google and weep. Even for us programmers, if you try keeping a Git repository on a Samba drive expect some interesting, and moreover quite strongly associated to specific combinations of client accessing the repo concurrently, object database corruption from time to time.

Well, there is some good news: AFIO makes maximum use of host OS filing system race safeguards, so if you write your code against AFIO and take note of the race guarantees section in each individual per-API reference documentation page, you should hopefully avoid any possibility of experiencing filing system races.

What AFIO provides for managing filing system raciness

Firstly, readers will probably be quite surprised to learn that the only operating system capable of providing completely race free filing system behaviour is Microsoft Windows, or rather the very well designed NT kernel API which AFIO uses directly. Linux provides robust file descriptor path discovery and the XXXat() POSIX APIs, and with those AFIO can provide pretty good race condition safety on Linux up to the final directory in the path. Mac OS X provides an unfortunately quite broken file descriptor path discovery, and additionally does not provide the XXXat() POSIX APIs and so AFIO cannot provide race protection, but can throw exceptions sometimes if it detects the filesystem has suddenly changed and you're about to delete the wrong file (you shouldn't rely on this, it's racy). FreeBSD provides the XXXat() POSIX APIs, but its file descriptor path discovery only works correctly for directory not file handles due to a kernel bug (I've opened a feature request ticket for this at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198570) and therefore AFIO can provide only race condition safety for directories only on FreeBSD.

Additionally, working with the filing system in a race safe way on POSIX requires opening a file descriptor to the containing directory for every operation (some proprietary Linux extensions allow this to be avoided for some operations on newer Linux kernels). AFIO will keep a global cache of open file handles for containing directories on request using the file_flags::hold_parent_open flag which can be enabled per dispatcher or per individual file handle open, this very significantly reduces the cost of race condition safety on POSIX for file entries only as directories ignore the file_flags::hold_parent_open flag, though at the cost of increased file descriptor usage, which has low hard limits especially on OS X which is why it is disabled by default. The alternative if you don't want AFIO to bother with race safety is to specify the file_flags::no_race_protection flag per dispatcher or per individual file handle open, this causes AFIO to use the same maximum performance code paths as used before the v1.3 engine.

How to implement filing system race safety on AFIO

The essential rule for achieving maximum filing system race safety is to avoid using absolute paths where possible. If you want your code to also be safe on POSIX, you must additionally only assume race safety up to the final directory in a path — thereafter design your node to never behave racily within a single directory.

The core container type for specifying a location on the filing system to AFIO is path_req which looks like this:

struct path_req
{
    bool is_relative;              // Whether the precondition is also where this path begins.
    afio::path path;               // The filing system path to be used for this operation.
    file_flags flags;              // The flags to be used for this operation (note they can be overriden by flags passed during dispatcher construction).
    future<> precondition;         // An optional precondition for this operation.
    path_req(T &&);                // Converts T to a filesystem::path, makes it absolute, then converts to an afio::path
    path_req(bool, T &&);          // If the bool is true, converts T to an afio::path fragment. If false, same as above overload (i.e. make absolute).
};

For convenience, type markup is provided for the boolean taking constructor, these being path_req::relative and path_req::absolute.

If the path is relative, then the path of the precondition is used as the base from which the relative path fragment operates. On FreeBSD, Linux and Windows this base extension happens inside the kernel and so the current path of the precondition really doesn't matter — it could be changing a thousand times per second and it wouldn't matter. On OS X due to lack of the XXXat() POSIX APIs the path of the precondition is fetched and the extension done by hand.

An AFIO extension allows you to specify a file as precondition. In this situation, if you specify an empty path then you mean the precondition itself which is very useful for deleting or renaming an open file handle. If you want a sibling file, this can be found via a path fragment starting with ../, though note that this necessarily is racy to the containing directory (AFIO opens the containing directory of the file, ensuring the directory contains an inode matching the file, and then uses that directory handle as a base — the race here being if the file relocates after matching its containing directory).

Gotchas specific to Microsoft Windows

Finally, there are some gotchas specific to Microsoft Windows:

1. You cannot rename a directory which has an open file handle in any process to any item anywhere within itself or its children.

2. You cannot rename to a destination which has an open file handle with DELETE permissions (file_flags::write) to itself or any of its parent directories in any process. You CAN do this from a source like this, but the destination cannot be like this (why is this? It is not documented anywhere in Microsoft's documentation, but if I had to guess, I'd suggest that the atomicity of the rename is implemented by taking an op lock on the destination, an op lock not granted if any handles exist which could change the path suddenly. I'm not sure if Microsoft are themselves aware of this limitation).

One might note that much of the utility of race protecting APIs is lost with these restrictions. However, note that one could emulate POSIX semantics by renaming out all the contents of a directory to be renamed to elsewhere, rename the directory, and then renaming the contents back in. Given the pathetic slowness of opening handles on Windows, this might seem impossibly inefficient, however NT provides a little known FILE_DELETE_CHILD permission which gives you delete/rename permission on all the children and subchildren of a directory with just one handle open. I learned about this flag the hard way, by it breaking in many subtle ways AFIO's functioning on Windows when it was requested by default, something which took several days of head scratching to track down. AFIO doesn't currently do this trick of renaming out and back in on Windows, but might in the future after a lot more experimentation as to if it is viable and reliable without surprises.

On Windows opening a directory with write access requests rename/delete privileges, whereas on POSIX the write access request is ignored for directories as POSIX doesn't allow it anyway. This allows you to write identical code which works universally.

As an example of some programming AFIO safely on an extremely unstable filing system, below is the functional test which verifies AFIO for filing system race safety. As you will see, a worker thread is solely dedicated to renaming directories to unique names whilst the main thread creates files inside those constantly changing directories, and relinks them into another directory which is also constantly changing on POSIX, but is stable on Windows. This is iterated for a substantial period of time to verify that nothing goes wrong.

  try
  {
    // HoldParentOpen is actually ineffectual as renames zap the parent container, but it tests more code.
    auto dispatcher = make_dispatcher("file:///", file_flags::hold_parent_open).get();
    auto testdir = dispatcher->dir(path_req("testdir", file_flags::create));
    future<> dirh;

    try
    {
      // We can only reliably track directory renames on all platforms, so let's create 100 directories
      // which will be constantly renamed to something different by a worker thread
      std::vector<path_req> dirreqs;
      for(size_t n = 0; n < ITEMS; n++)
        dirreqs.push_back(path_req::relative(testdir, to_string(n), file_flags::create | file_flags::read_write));
      // Windows needs write access to the directory to enable relinking, but opening a handle
      // with write access causes any renames into that directory to fail. So mark the first
      // directory which is always the destination for renames as non-writable
      dirreqs.front().flags = file_flags::create;
      std::cout << "Creating " << ITEMS << " directories ..." << std::endl;
      auto dirs = dispatcher->dir(dirreqs);
      when_all_p(dirs).get();
      dirh = dirs.front();
      atomic<bool> done(false);
      std::cout << "Creating worker thread to constantly rename those " << ITEMS << " directories ..." << std::endl;
      thread worker([&done, &testdir, &dirs]
                    {
                      for(size_t number = 0; !done; number++)
                      {
                        try
                        {
#ifdef WIN32
                          for(size_t n = 1; n < ITEMS; n++)
#else  /*_ defined(WIN32) _*/
                          for(size_t n = 0; n < ITEMS; n++)
#endif  /*_ defined(WIN32) _*/
                          {
                            path_req::relative req(testdir, to_string(number) + "_" + to_string(n));
                            // std::cout << "Renaming " << dirs[n].get()->path(true) << " ..." << std::endl;
                            try
                            {
                              dirs[n]->atomic_relink(req);
                            }
#ifdef WIN32
                            catch(const system_error & /*e*/)
                            {
                              // Windows does not permit renaming a directory containing open file handles
                              // std::cout << "NOTE: Failed to rename directory " << dirs[n]->path() << " due to " << e.what() << ", this is usual on Windows." << std::endl;
                            }
#else  /*_ defined(WIN32) _*/
                            catch(...)
                            {
                              throw;
                            }
#endif  /*_ defined(WIN32) _*/
                          }
                          std::cout << "Worker relinked all dirs to " << number << std::endl;
                        }
                        catch(const system_error &e)
                        {
                          std::cerr << "ERROR: worker thread exits via system_error code " << e.code().value() << "(" << e.what() << ")" << std::endl;
                          BOOST_CHECK(false);
                        }
                        catch(const std::exception &e)
                        {
                          std::cerr << "ERROR: worker thread exits via exception (" << e.what() << ")" << std::endl;
                          BOOST_CHECK(false);
                        }
                        catch(...)
                        {
                          std::cerr << "ERROR: worker thread exits via unknown exception" << std::endl;
                          BOOST_CHECK(false);
                        }
                      }
                    });
      auto unworker = detail::Undoer([&done, &worker]
                                     {
                                       done = true;
                                       worker.join();
                                     });

      // Create some files inside the changing directories and rename them across changing directories
      std::vector<future<>> newfiles;
      for(size_t n = 0; n < ITEMS; n++)
      {
        dirreqs[n].precondition = dirs[n];
        dirreqs[n].flags = file_flags::create_only_if_not_exist | file_flags::read_write;
      }
      for(size_t i = 0; i < ITERATIONS; i++)
      {
        if(!newfiles.empty())
          std::cout << "Iteration " << i << ": Renaming " << ITEMS << " files and directories inside the " << ITEMS << " constantly changing directories ..." << std::endl;
        for(size_t n = 0; n < ITEMS; n++)
        {
          if(!newfiles.empty())
          {
            // Relink previous new file into first directory
            // std::cout << "Renaming " << newfiles[n].get()->path() << std::endl;
            newfiles[n]->atomic_relink(path_req::relative(dirh, to_string(n) + "_" + to_string(i)));
            // Note that on FreeBSD if this is a file its path would be now be incorrect and moreover lost due to lack of
            // path enumeration support for files. As we throw away the handle, this doesn't show up here.

            // Have the file creation depend on the previous file creation
            dirreqs[n].precondition = dispatcher->depends(newfiles[n], dirs[n]);
          }
          dirreqs[n].path = to_string(i);
        }
        // Split into two
        std::vector<path_req> front(dirreqs.begin(), dirreqs.begin() + ITEMS / 2), back(dirreqs.begin() + ITEMS / 2, dirreqs.end());
        std::cout << "Iteration " << i << ": Creating " << ITEMS << " files and directories inside the " << ITEMS << " constantly changing directories ..." << std::endl;
#ifdef __FreeBSD__  // FreeBSD can only track directories not files when their parent directories change
        newfiles = dispatcher->dir(front);
#else  /*_ defined(__FreeBSD__) _*/
        newfiles = dispatcher->file(front);
#endif  /*_ defined(__FreeBSD__) _*/
        auto newfiles2 = dispatcher->dir(back);
        newfiles.insert(newfiles.end(), std::make_move_iterator(newfiles2.begin()), std::make_move_iterator(newfiles2.end()));

        // Pace the scheduling, else we slow things down a ton. Also retrieve and throw any errors.
        when_all_p(newfiles).get();
      }
      // Wait around for all that to process
      do
      {
        this_thread::sleep_for(chrono::seconds(1));
      } while(dispatcher->wait_queue_depth());
      // Close all handles opened during this context except for dirh
    }
    catch(const system_error &e)
    {
      std::cerr << "ERROR: test exits via system_error code " << e.code().value() << "(" << e.what() << ")" << std::endl;
      BOOST_REQUIRE(false);
    }
    catch(const std::exception &e)
    {
      std::cerr << "ERROR: test exits via exception (" << e.what() << ")" << std::endl;
      BOOST_REQUIRE(false);
    }
    catch(...)
    {
      std::cerr << "ERROR: test exits via unknown exception" << std::endl;
      BOOST_REQUIRE(false);
    }

    // Check that everything is as it ought to be
    auto _contents = dispatcher->enumerate(enumerate_req(dirh, metadata_flags::All, 10 * ITEMS * ITERATIONS)).get().first;
    testdir = future<>();  // Kick out AFIO now so NTFS has itself cleaned up by the end of the checks
    dirh = future<>();
    dispatcher.reset();
    std::cout << "Checking that we successfully renamed " << (ITEMS * (ITERATIONS - 1) + 1) << " items into the same directory ..." << std::endl;
    BOOST_CHECK(_contents.size() == (ITEMS * (ITERATIONS - 1) + 1));
    std::set<BOOST_AFIO_V2_NAMESPACE::filesystem::path> contents;
    for(auto &i : _contents)
      contents.insert(i.name());
    BOOST_CHECK(contents.size() == (ITEMS * (ITERATIONS - 1) + 1));
    for(size_t i = 1; i < ITERATIONS; i++)
    {
      for(size_t n = 0; n < ITEMS; n++)
      {
        if(contents.count(to_string(n) + "_" + to_string(i)) == 0)
          std::cerr << to_string(n) + "_" + to_string(i) << std::endl;
        BOOST_CHECK(contents.count(to_string(n) + "_" + to_string(i)) > 0);
      }
    }
    filesystem::remove_all("testdir");
  }
  catch(...)
  {
    filesystem::remove_all("testdir");
    throw;
  }


PrevUpHomeNext