std::filesystem Implementation Corner Cases

2019-12-22 13:24

On this years Meeting C++ I had the chance to talk with a lot of people and it was very inspirational. With some I talked about ghc::filesystem, my implementation of a C++17 std::filesystem compatible library. Part of this discussions was about some corner cases I found during my tests, and I was trying to remember the details when I was having the chance to talk to @NicoJosuttis. He among others was interested in these issues, as they might be rooted in unclear passages of the standard.

[Update 2020-02-08: A post on stackoverflow.com showed an additional corner case, read more about it in part 2.]

Meeting C++

In summer 2018 I started on changing my private C++ filesystem abstraction into a more std::filesystem conforming structure. Main motivation was the low chance of an official implementation on my primary platform, macOS, anytime soon. The first feature-complete public release supporting C++11, C++14 and C++17 on macOS, Windows and Linux was in September 2018.

Early on, I started writing lots of tests and wrote my test-suite in a way, that I could run my tests against an existing std::filesystem implementation, where one is available. The test code is based on Catch2 macros. Currently, using the valuable services of Travis-CI and Appveyor, ghc::filesystem is tested on GCC 5 to GCC 9, Clang 5 to Clang 9, Xcode 9/10/11 and Visual Studio 2015/2017/2019. The tests are also run against std::filesystem in GCC 8.3 and GCC 9.3 with libstdc++ and Clang 7.1/8.0/9.0 with libc++ and MSVC 19.16 and MSVC 19.23.

Important note:

With this post I want to share some corner cases I found, where my tests showed different behaviour for the tested std::filesystem implementation in C++17 mode.

All tests pass on ghc::filesystem because they are from its test suite.

I have by no means the authority nor knowledge of all mails, forums, chats and other channels outside of the standard that have lead the developers to their implementation, and I might have missed an LWG-issue that is relevant here or simply missed something in the standard, so I’m not saying mine is the way to do things, and I already changed some behaviour to follow other implementations, as I’m interested in a common sense of what should happen. So I don’t want to blame anyone for anything, but some results are still irritating for me, so I keep my implementation the way it is, until I have better arguments or understanding to do otherwise.

Any opinions on any of the following “issues” is very welcome! Also new/missing corner cases are welcome to get a more complete picture!

You can reach me via twitter by @gulrak or drop me a ✉ mail.


List of Implementation Corner Cases


path native format observers [fs.path.native.obs]

GCC 8.1, 8.2, 8.3:

FAILED:
  CHECK( fs::u8path("\xc3\xa4/\xe2\x82\xac\xf0\x9d\x84\x9e").u16string() == std::u16string(u"\u00E4/\u20AC\U0001D11E") )
due to unexpected exception with messages:
  filesystem error: Cannot convert character sequence: Illegal byte sequence

I had observed this with GCC 8.1/8.2 on Ubuntu 18.04 and macOS, and GCC 8.3.0 on Wandbox. The compiler doesn’t recognize the valid Unicode codepoint U+1D11E. This was fixed with GCC issue 90281).

GCC 9.2:

FAILED:
  CHECK( fs::u8path("\xc3\xa4/\xe2\x82\xac").wstring() == std::wstring(L"ä/€") )
due to unexpected exception with message:
  filesystem error: Cannot convert character sequence: Invalid or incomplete
  multibyte or wide character  

This is observed on an Homebrew install of GCC-9 on macOS and on Travis-CI Linux GCC 9.2, and with this Godbolt example. The std::filesystem::wstring() is the problem here, and std::filesystem::u32string() works as expected and the returned std::u32string is the expected 3 char32_t long.


path generic format observers [fs.path.generic.obs]

GCC 9.2:

FAILED:
  CHECK( fs::u8path("\xc3\xa4/\xe2\x82\xac").generic_wstring() == std::wstring(L"ä/€") )
due to unexpected exception with message:
  filesystem error: Cannot convert character sequence: Invalid or incomplete
  multibyte or wide character

This is observed on an Homebrew install of GCC-9 on macOS and on Travis-CI Linux GCC 9.2, and with this Godbolt example. The std::filesystem::generic_wstring() is the problem here, and std::filesystem::generic_u32string() works as expected and the returned std::u32string is the expected 3 char32_t long.


path iterators [fs.path.itr]

MSVC 19.16, 19.23:

FAILED:
  CHECK( "/,foo,bar" == iterateResult(fs::path("///foo/bar")) )
with expansion:
  "/,foo,bar" == "///,foo,bar"

The used helper iterateResult is this:

static std::string iterateResult(const fs::path& path)
{
    std::ostringstream result;
    for (fs::path::const_iterator i = path.begin(); i != path.end(); ++i) {
        if (i != path.begin()) {
            result << ",";
        }
        result << i->generic_string();
    }
    return result.str();
}

GCC, Clang and my implementation all return / for the /// part, only MSVC decides to return ///. As Windows is supporting / as seperator too, even if it’s not the preferred seperator, I’m not sure about this and I didn’t mimic this behaviour on Windows.


path inserter and extractor [fs.path.io]

MSVC 19.16, 19.23:

With an std::ostringstream filled by:

std::ostringstream os;
os << fs::path("/root/foo bar");

we get this failure:

FAILED:
  CHECK( os.str() == "\"\\\\root\\\\foo bar\"" )
with expansion:
  ""/root/foo bar"" == ""\\root\\foo bar""

The standard describes the effect of the path inserter as: “Equivalent to: os << quoted(p.string());” and fs::path::string() is one of the native format observers that MSVC returns a backslashed path for. Still the stream inserter outputs the generic format (that Windows API will accept) and this seems inconsistent.


directory entry [fs.dir.entry]

  fs::directory_entry none;
  none.refresh();

This code currently throws an exception on MSVC 19.16, 19.23 and ghc::filesystem while not throwing on all versions of GCC/Clang.

  fs::directory_entry de;
  de.assign("");

This code currently only throws on MSVC 19.16 and ghc::filesystem.

  fs::directory_entry de{"existing_dir/existing_file"};
  de.replace_filename("bar");

This code currently only throws on MSVC 19.16 and ghc::filesystem.

MSVC 19.23:

  auto none = fs::directory_entry();
  CHECK( none.status().type() == fs::file_type::not_found );
FAILED:
  CHECK( none.status().type() == fs::file_type::not_found )
with expansion:
  2 == 1

This is surprizing. I have no idea why this happens, als 2 would be fs::file_type::regular for a directory_entry with an empty path.


absolute [fs.op.absolute]

GCC 8.3, 9.2:

FAILED:
  CHECK( fs::absolute("") == fs::current_path() / "" )
due to unexpected exception with message:
  filesystem error: cannot make absolute path: Invalid argument

GCC is the only one throwing in this case, or sets the error_core in case of fs::absolute(error_code&), besides the standard note: “Implementations are strongly encouraged to not query secondary storage, and not consider !exists(p) an error.” and the example implementation from the standard: “For POSIX-based operating systems, absolute(p) is simply current_path()/p.”.

MSVC 19.16, 19.23:

FAILED:
  CHECK( fs::absolute("") == fs::current_path() / "" )
with expansion:
  "" == "C:\projects\filesystem\build\"

MSVC is returning an empty result.

Others

Clang and my implementation return the suggested result of current_path()/p.


canonical [fs.op.canonical]

Clang 7, 8, 9, MSVC 19.16, 19.23:

FAILED:
  CHECK_THROWS_AS( fs::canonical(""), fs::filesystem_error )
because no exception was thrown where one was expected:

This seems to be a violation of the standard, as 30.10.15.2 (4) specifies !exists(p) is an error.” and the compilers report false for exists(""). My guess is, Clang first calls absolute(p) and after that, p is not empty and exists(absolute("")) is normally true as it is the current directory path.

The GCC implementations treat this as an error, as well as my own.


create_directory, create_directories
[fs.op.create_directory], [fs.op.create_directories]

Given a p with:

  auto p = fs::path{"some_existing_regular_file"};

We get:

GCC 7, MSVC 19.16:

FAILED:
  CHECK_THROWS_AS( fs::create_directories(p), fs::filesystem_error )
because no exception was thrown where one was expected

Clang 7, 8, 9 and MSVC 19.16:

FAILED:
  CHECK_THROWS_AS( fs::create_directory(p), fs::filesystem_error )
because no exception was thrown where one was expected

This was not well defined with the original standard and the LWG#2935 suggested this behaviour. The more intuitive approach to directory creation of treating a file with that name as an error is also advocated by the newer paper WG21 P1164R0, the revison P1161R1 was agreed upon on Kona 2019 meeting see merge and MSVC and GCC by now switched to following its proposal (GCC #86910).


equivalent [fs.op.equivalent]

GCC 8.x, 9.2:

  CHECK_THROWS_AS( fs::equivalent("foo", "foo3"), fs::filesystem_error )
because no exception was thrown where one was expected:
  This test expects LWG #2937 result conformance.

  CHECK( ec )
with expansion:
  system:0
with message:
  This test expects LWG #2937 result conformance.

  CHECK_THROWS_AS( fs::equivalent("foo3", "foo"), fs::filesystem_error )
because no exception was thrown where one was expected:
  This test expects LWG #2937 result conformance.

  CHECK( ec )
with expansion:
  system:0
with message:
  This test expects LWG #2937 result conformance.

With LWG #2937 was specified, that it is an error if either of the given path objects doesn’t resolve to an existing file and Clang and MSVC follow #2937. My implementation follows this rule too (but there is a define at the top, where this can be disabled).


weakly_canonical [fs.op.weakly_canonical]

Clang 7, 8, 9 and GCC 8.3, 9.2:

FAILED:
  CHECK( fs::weakly_canonical("") == "." )
with expansion:
  "/home/travis/build/gulrak/filesystem/build"
  ==
  "."

Clang and GCC are returning fs::current_path() for fs::canonical("") so it is to be expected that they return the same for fs::weakly_canonical(""). MSVC and my implementation are returning . as the normalized empty path ([fs.def.normal.form], 8).


Raw Test Issue Table

I wrote a script generating an html table with the issues ordered by category and compiler and I try to reupload a new one, if something relevant changes. I’ll link to it only for completeness:

Test-Report


Thanks!

I would like to thank Jonathan Wakely, whom I had contact with through my last GCCs issues, for taking me serious and for pointing me to the right directions a few times. He was always helpful and quick in fixing actual issues I found and reported.

I also want to thank Nico Josuttis for his WG21 P1164R0 to clear up the create_directory/create_directories issue, his efforts to make the standard clearer for normal developers and his C++17 - The Complete Guide.

And thanks to all those I talked to on Meeting C++ but can not list here, I hope to meet many of you next year.