I tried using this a while back and found it was not widely available. You need coreutils version 9.1 or later for this, many distros do not ship this.
You're misinterpreting the title. The author didn't intend "Unix" to literally mean only the official AT&T/TheOpenGroup UNIX® System to the exclusion of Linux.
The first sentence of "UNIX-like" makes that clear : >This is a catalog of things UNIX-like/POSIX-compliant operating systems can do atomically,
Further down, he then mentions some Linux specifics : >fcntl(fd, F_GETLK, &lock), fcntl(fd, F_SETLK, &lock), and fcntl(fd, F_SETLKW, &lock) . [...] There is a “mandatory locking” mode but Linux’s implementation is unreliable as it’s subject to a race condition.
Which, FWIW, doesn't mean Linux. AFAIK there is no Linux distro that's fully compliant, even before you worry about the specifics of whether it's certified as compliant.
>POSIX-compliant Which, FWIW, doesn't mean Linux. AFAIK there is no Linux distro that's fully compliant
I read author's use of "POSIX-compliant" as a loose and fuzzy family category rather than 100% strict compliance. Therefore, the author mentioning non-100%-compliant Linux is ok.
There seems to be 2 different expectations and interpretations of what the article is about.
- (1) article is attempting to be a strict intersection of all Unix-like systems that conform to official UNIX POSIX API. I didn't think this was a reasonable interpretation since we can't be sure the author actually verified/tested other POSIX-like systems such as FreeBSD, HP-UX, IBM AIX, etc.
- (2) article is a looser union of operating systems and can also include idiosyncracies of certain systems like Linux that the author is familiar with that don't apply to all other UNIX systems. I think some readers don't realize that all the author's citations to man pages point to Linux specific urls at : https://linux.die.net/man/
The ggp's additional comment about renameat2() is useful info and is consistent with interpretation (2).
If the author really didn't want Linux to be lumped in with "POSIX-like", it seems he would avoid linux.die.net and instead point to something more of a UNIX standard like: https://unix.org/apis.html
AFAIK you don't even want to be POSIX-compliant unless having a sticker means more to you than being reasonable. Most projects knowingly steer away from compliance (and certifying compliance is probably also expensive)
Even though it can do some things atomically, it only does with one file at a time, and race conditions are still possible because it only does one operation at a time (even if you are only need one file). Some of these are helpful anyways, such as O_EXCL, but it is still only one thing at a time which can cause problems in some cases.
What else it does not do is a transaction with multiple objects. That is why, I would design a operating system, that you can do a transaction with multiple objects.
In some cases, you can start by using the "at" functions (openat...) to work on a directory tree. If you have your logical "locking" done at the top-level of the tree, it might be a fine option.
In some other cases, I've used a pattern where I used a symlink to folders. The symlink is created, resolved or updated atomically, and all I need is eventual consistency.
That last case was to manage several APT repository indices. The indices were constantly updated to publish new testing or unstable releases of software and machines in the fleet were regularly fetching the repository index. The APT protocol and structure being a bit "dumb" (for better or worse) requires you to fetch files (many of them) in the reverse order they are created, which leads to obvious issues like the signature is updated only after the list of files is updated, or the list of files is created only after the list of packages is created.
Long story short, each update would create a new folder that's consistent, and a symlink points to the last created folder (to atomically replace the folder as it was not possible to swap them), and a small HTTP server would initiate a server side session when the first file is fetched and only return files from the same index list, and everything is eventually consistent, and we never get APT complaining about having signature or hash mismatches. The pivotal component was indeed the atomicity of having a symlink to deal with it, as the Java implementation didn't have access to a more modern "openat" syscall, relative to a specific folder.
Windows had APIs for this sort of thing added in Vista, but they're now deprecating it "due to its complexity and various nuances which developers need to consider":
All you need for this to occur is the window where both renames occurs overlap. A system polling to check if a, b, c, and d exist while the renames are happening might find all four of them.
Assuming that the two `mv` commands are run in sequence, there shouldn't be any possibility for a and d to be observed "at once" (i.e. first d and then afterwards still a, by a single process).
I'm almost certain what the OP meant was if the commands were run synchronously (ie: from 2 different shells or as `mv a b &; mv c d`) yes there is a possibility that a and d exist (eg: On a busy system where neither of the 2 commands can be immediately scheduled and eventually the second one ends up being scheduled before the first)
Or to go a level deeper, if you have 2 occurrences of rename(2) from the stdlibc ...
rename('a', 'b');
rename('c', 'd');
...and the compiler decides on out of order execution or optimizing by scheduling on different cpus, you can get a and d existing at the same time.
The reason it won't happen in the example you posted is the shell ensures the atomicity (by not forking the second mv until the wait() on the first returns)
nitpick, it should be `touch a c & mv a b & mv c d` as `&;` returns `bash: syntax error near unexpected token `;'`. I always find this oddly weird, but that would not be the first pattern in BASH that is.
`inotifywait` actually sees them in order, but nothing ensure that it's that way.
$ inotifywait -m /tmp
/tmp/ MOVED_FROM a
/tmp/ MOVED_TO b
/tmp/ MOVED_FROM c
/tmp/ MOVED_TO d
`stat` tells us that the timestamps are equal as well.
$ stat b d | grep '^Change'
Change: 2026-02-06 12:22:55.394932841 +0100
Change: 2026-02-06 12:22:55.394932841 +0100
However, speeding things up changes it a bit.
Given
$ (
set -eo pipefail
for i in {1..10000}
do
printf '%d ' "$i"
touch a c
mv a b &
mv c d &
wait
rm b d
done
)
1 2 3 4 5 6 .....
And with `inotifywait` I saw this when running it for a while.
$ inotifywait -m -e MOVED_FROM,MOVED_TO /tmp > /tmp/output
cat /tmp/output | xargs -l4 | sort | uniq -c
9104 /tmp/ MOVED_FROM a /tmp/ MOVED_TO b /tmp/ MOVED_FROM c /tmp/ MOVED_TO d
896 /tmp/ MOVED_FROM c /tmp/ MOVED_TO d /tmp/ MOVED_FROM a /tmp/ MOVED_TO b
I don't think that's happening in practice, but 1) it may not be specified and 2) What you say could well be the persisted state after a machine crash or power loss. In particular if those files live in different directories.
You can remedy 2) by doing fsync() on the parent directory in between. I just asked ChatGPT which directory you need to fsync. It says it's both, the source and the target directory. Which "makes sense" and simplifies implementations, but it means the rename operation is atomic only at runtime, not if there's a crash in between. It think you might end up with 0 or 2 entries after a crash if you're unlucky.
If that's true, then for safety maybe one should never rename across directories, but instead do a coordinated link(source, target), fsync(target_dir), unlink(source), fsync(source_dir)
Not technically related to atomicity, but I was looking for a way to do arbitrary filesystem operations based on some condition (like adding a file to a directory, and having some operation be performed on it). The usual recommendation for this is to use inotify/watchman, but something about it seems clunky to me. I want to write a virtual filesystem, where you pass it a trigger condition and a function, and it applies the function to all files based on the trigger condition. Does something like this exist?
Sounds half baked. What context does this function run in? Is it an interpreted language or an executable that you provide?
Inotify is the way to shovel these events out of the kernel, then userspace process rules apply. It's maybe not elegant from your pov, but it's simple.
I know this is trivial to do programmatically, but I was looking for a way this will be handled by the filesystem. For instance, if I have some processes generating log files, and I have a script that converts them to html, I wanted the script to be called every time a log file is updated, without having a daemon running in the background to monitor the directory, just some filesystem mount. This would have made some deployments easier.
Thanks, I didn't find this when I was looking for a solution for my problem. This is pretty much the exact solution for my usecase, though for some reason inotify feels more complicated than some kind of filesystem mount solution for me.
This document being from 2010 is, of course, missing the C11/C++11 atomics that replaced the need for compiler intrinsics or non portable inline asm when "operating on virtual memory".
With that said, at least for C and C++, the behavior of (std::)atomic when dealing with interprocess interactions is slightly outside the scope of the standard, but in practice (and at least recommended by the C++ standard) (atomic_)is_lock_free() atomics are generally usable between processes.
Worked pretty well in production systems, serving huge amount of RPS (like ~5-10k/s) running on a LAMP stack monolith in five different geographical regions.
Just git branch (one branch per region because of compliance requirements) -> branch creates "tar.gz" with predefined name -> automated system downloads the new "tar.gz", checks release date, revision, etc. -> new symlink & php (serverles!!!) graceful restart and ka-b00m.
Rollbacks worked by pointing back to the old dir & restart.
not surprised about the chrome part, but pretty shocked at the phone OS part. I know APFS migration was done in this way, but wouldn't storage considerations for this be massive?
Not really, because only the OS core is swapped in this way. Apps and data live in their own partitions/subvolumes, which are mutable and shared between OS versions.
The OS core is deployed as a single unit and is a few GB in size, pretty small when internal storage is into the hundreds of GB.
Unless they can be guaranteed by the POSIX specification, they are implementation specific and should not be relied upon for portable code.
Missing (probably because of the date of the article): `mv --exchange` aka renameat2+RENAME_EXCHANGE. It atomically swaps 2 file paths.
I tried using this a while back and found it was not widely available. You need coreutils version 9.1 or later for this, many distros do not ship this.
I made https://github.com/rubenvannieuwpoort/atomic-exchange for my usecase.
Title says Unix, renameat2 is Linux-only.
>Title says Unix,
You're misinterpreting the title. The author didn't intend "Unix" to literally mean only the official AT&T/TheOpenGroup UNIX® System to the exclusion of Linux.
The first sentence of "UNIX-like" makes that clear : >This is a catalog of things UNIX-like/POSIX-compliant operating systems can do atomically,
Further down, he then mentions some Linux specifics : >fcntl(fd, F_GETLK, &lock), fcntl(fd, F_SETLK, &lock), and fcntl(fd, F_SETLKW, &lock) . [...] There is a “mandatory locking” mode but Linux’s implementation is unreliable as it’s subject to a race condition.
They aren’t misinterpreting the title, the title is incorrect.
Bit rot alert: Linux doesn't even have mandatory file locks these days.
Linux-specific open file description locks could be brought up in a modern version of TFA though.
Except POSIX doesn't specify some of them as happening atomically.
Many people write UNIX/POSIX without ever reading what it says.
But I also don't think the auther meant Things you can do in Linux but not Unix
>But I also don't think the auther meant Things you can do in Linux but not Unix
I wasn't claiming that. I just thought the ggp had a useful comment about renameat2() which led to gp's "correction" which wasn't 100% accurate.
IBM z/OS UNIX also has renameat2(). It doesn't have the Linux specific flag RENAME_EXCHANGE.
https://www.ibm.com/docs/en/zos/3.1.0?topic=functions-rename...
Sounds like the key term then is probably this:
> POSIX-compliant
Which, FWIW, doesn't mean Linux. AFAIK there is no Linux distro that's fully compliant, even before you worry about the specifics of whether it's certified as compliant.
>POSIX-compliant Which, FWIW, doesn't mean Linux. AFAIK there is no Linux distro that's fully compliant
I read author's use of "POSIX-compliant" as a loose and fuzzy family category rather than 100% strict compliance. Therefore, the author mentioning non-100%-compliant Linux is ok.
There seems to be 2 different expectations and interpretations of what the article is about.
- (1) article is attempting to be a strict intersection of all Unix-like systems that conform to official UNIX POSIX API. I didn't think this was a reasonable interpretation since we can't be sure the author actually verified/tested other POSIX-like systems such as FreeBSD, HP-UX, IBM AIX, etc.
- (2) article is a looser union of operating systems and can also include idiosyncracies of certain systems like Linux that the author is familiar with that don't apply to all other UNIX systems. I think some readers don't realize that all the author's citations to man pages point to Linux specific urls at : https://linux.die.net/man/
The ggp's additional comment about renameat2() is useful info and is consistent with interpretation (2).
If the author really didn't want Linux to be lumped in with "POSIX-like", it seems he would avoid linux.die.net and instead point to something more of a UNIX standard like: https://unix.org/apis.html
[0] Intersection vs Union: https://en.wikipedia.org/wiki/Set_(mathematics)#Intersection
AFAIK you don't even want to be POSIX-compliant unless having a sticker means more to you than being reasonable. Most projects knowingly steer away from compliance (and certifying compliance is probably also expensive)
EulerOS was certified UNIX some years ago.
Huh, TIL. Thanks.
You can use `ln` atomicity for a simple, portable(ish) locking system: https://gist.github.com/pwillis-els/b01b22f1b967a228c31db3cf...
Really nice explanation of a useful pattern. I was surprised to discover that even the famously broken NFS honours atomicity of hardlink creation.
>fcntl(fd, F_GETLK, &lock), fcntl(fd, F_SETLK, &lock), and fcntl(fd, F_SETLKW, &lock)
There's also `flock`, the CLI utility in util-linux, that allows using flocks in shell scripts.
In UNIX/POSIX file locks are advisory, not enforced, it only works if all processes play ball.
What are flocks in this context? Surely not a number of sheep...
File locks.
Even though it can do some things atomically, it only does with one file at a time, and race conditions are still possible because it only does one operation at a time (even if you are only need one file). Some of these are helpful anyways, such as O_EXCL, but it is still only one thing at a time which can cause problems in some cases.
What else it does not do is a transaction with multiple objects. That is why, I would design a operating system, that you can do a transaction with multiple objects.
In some cases, you can start by using the "at" functions (openat...) to work on a directory tree. If you have your logical "locking" done at the top-level of the tree, it might be a fine option.
In some other cases, I've used a pattern where I used a symlink to folders. The symlink is created, resolved or updated atomically, and all I need is eventual consistency.
That last case was to manage several APT repository indices. The indices were constantly updated to publish new testing or unstable releases of software and machines in the fleet were regularly fetching the repository index. The APT protocol and structure being a bit "dumb" (for better or worse) requires you to fetch files (many of them) in the reverse order they are created, which leads to obvious issues like the signature is updated only after the list of files is updated, or the list of files is created only after the list of packages is created.
Long story short, each update would create a new folder that's consistent, and a symlink points to the last created folder (to atomically replace the folder as it was not possible to swap them), and a small HTTP server would initiate a server side session when the first file is fetched and only return files from the same index list, and everything is eventually consistent, and we never get APT complaining about having signature or hash mismatches. The pivotal component was indeed the atomicity of having a symlink to deal with it, as the Java implementation didn't have access to a more modern "openat" syscall, relative to a specific folder.
Windows had APIs for this sort of thing added in Vista, but they're now deprecating it "due to its complexity and various nuances which developers need to consider":
https://learn.microsoft.com/en-us/windows/win32/fileio/about...
I don't follow, sorry. Are you saying that if we run:
We could observe a state where a and d exist? I would find such "out of order execution" shocking.If that's not what you're saying, could you give an example of something you want to be able to do but can't?
All you need for this to occur is the window where both renames occurs overlap. A system polling to check if a, b, c, and d exist while the renames are happening might find all four of them.
Assuming that the two `mv` commands are run in sequence, there shouldn't be any possibility for a and d to be observed "at once" (i.e. first d and then afterwards still a, by a single process).
I'm almost certain what the OP meant was if the commands were run synchronously (ie: from 2 different shells or as `mv a b &; mv c d`) yes there is a possibility that a and d exist (eg: On a busy system where neither of the 2 commands can be immediately scheduled and eventually the second one ends up being scheduled before the first)
Or to go a level deeper, if you have 2 occurrences of rename(2) from the stdlibc ...
rename('a', 'b'); rename('c', 'd');
...and the compiler decides on out of order execution or optimizing by scheduling on different cpus, you can get a and d existing at the same time.
The reason it won't happen in the example you posted is the shell ensures the atomicity (by not forking the second mv until the wait() on the first returns)
nitpick, it should be `touch a c & mv a b & mv c d` as `&;` returns `bash: syntax error near unexpected token `;'`. I always find this oddly weird, but that would not be the first pattern in BASH that is.
`inotifywait` actually sees them in order, but nothing ensure that it's that way.
`stat` tells us that the timestamps are equal as well. However, speeding things up changes it a bit.Given
And with `inotifywait` I saw this when running it for a while.I don't think that's happening in practice, but 1) it may not be specified and 2) What you say could well be the persisted state after a machine crash or power loss. In particular if those files live in different directories.
You can remedy 2) by doing fsync() on the parent directory in between. I just asked ChatGPT which directory you need to fsync. It says it's both, the source and the target directory. Which "makes sense" and simplifies implementations, but it means the rename operation is atomic only at runtime, not if there's a crash in between. It think you might end up with 0 or 2 entries after a crash if you're unlucky.
If that's true, then for safety maybe one should never rename across directories, but instead do a coordinated link(source, target), fsync(target_dir), unlink(source), fsync(source_dir)
why is this being downvoted? If there's something wrong, explain?
I use several of these to implement alternative SQLite locking protocols.
POSIX file locking semantics really are broken beyond repair: https://news.ycombinator.com/item?id=46542247
rename() is certainly the easiest to use for any sort of file-system based synchronization.
Not much apparently, although I didn't know about changing symlinks, that could be very useful.
Anywhere there is atomic capability you can build a queuing application.
Not technically related to atomicity, but I was looking for a way to do arbitrary filesystem operations based on some condition (like adding a file to a directory, and having some operation be performed on it). The usual recommendation for this is to use inotify/watchman, but something about it seems clunky to me. I want to write a virtual filesystem, where you pass it a trigger condition and a function, and it applies the function to all files based on the trigger condition. Does something like this exist?
I've used FUSE for something similar.
There are sample "drivers" in easily-modified python that are fast enough for casual use.
Sounds half baked. What context does this function run in? Is it an interpreted language or an executable that you provide?
Inotify is the way to shovel these events out of the kernel, then userspace process rules apply. It's maybe not elegant from your pov, but it's simple.
are you asking for if statements?
if(condition) {do the thing;}
I know this is trivial to do programmatically, but I was looking for a way this will be handled by the filesystem. For instance, if I have some processes generating log files, and I have a script that converts them to html, I wanted the script to be called every time a log file is updated, without having a daemon running in the background to monitor the directory, just some filesystem mount. This would have made some deployments easier.
incron
Thanks, I didn't find this when I was looking for a solution for my problem. This is pretty much the exact solution for my usecase, though for some reason inotify feels more complicated than some kind of filesystem mount solution for me.
This document being from 2010 is, of course, missing the C11/C++11 atomics that replaced the need for compiler intrinsics or non portable inline asm when "operating on virtual memory".
With that said, at least for C and C++, the behavior of (std::)atomic when dealing with interprocess interactions is slightly outside the scope of the standard, but in practice (and at least recommended by the C++ standard) (atomic_)is_lock_free() atomics are generally usable between processes.
Sorry, there is zero chance I will ever deploy new code by changing a symlink to point to the new directory.
Isn't that the standard way to do that? Why wouldn't you?
Why? What do you prefer to do instead?
Anything less than an entire new k8s cluster and switching over is just amateur hour obviously
why? it works and its super clever. Simple command instead some shit written in JS with docker trash
Ah, the memories of capistrano, complete with zero-downtime unicorn handover
https://github.com/capistrano/capistrano/
Still use php deployer each day and works with symlinks as well. https://deployer.org/
Then you are locking yourself out of a pretty much ironclad (and extremely cost-effective) way of managing such things.
Works pretty well for Nix
Worked pretty well in production systems, serving huge amount of RPS (like ~5-10k/s) running on a LAMP stack monolith in five different geographical regions.
Just git branch (one branch per region because of compliance requirements) -> branch creates "tar.gz" with predefined name -> automated system downloads the new "tar.gz", checks release date, revision, etc. -> new symlink & php (serverles!!!) graceful restart and ka-b00m.
Rollbacks worked by pointing back to the old dir & restart.
Worked like a charm :-)
And for Stow[1] before it, and for its inspiration Depot[2] before even that. It’s an old idea.
[1] https://www.gnu.org/software/stow/
[2] http://ftp.gregor.com/download/dgregor/depot.pdf
I really liked stow. My toy distro back in the day was based on it.
that's how some phone OSes update the system (by having 2 read only fs)
that's how Chrome updates itself, but without the symlink part
No snapshotting at all? Thinking about it.. The filesystem does not support it I suppose.
Android does use snapshots: https://source.android.com/docs/core/ota/virtual_ab
Oh cool. I was a bit confused about not using snapshots and relying on symlinks but it couldn't be so simple. I guess it's just a simple userspace cow mount. https://source.android.com/docs/core/ota/virtual_ab#compress...
not surprised about the chrome part, but pretty shocked at the phone OS part. I know APFS migration was done in this way, but wouldn't storage considerations for this be massive?
what would be more massive would be phones not booting up because of a botched update. this way you can just switch back to the old partition
Not really, because only the OS core is swapped in this way. Apps and data live in their own partitions/subvolumes, which are mutable and shared between OS versions.
The OS core is deployed as a single unit and is a few GB in size, pretty small when internal storage is into the hundreds of GB.
Nobody's saying you should deploy code with this, but symlinks are a very common filesystem locking method.