NOCOW and Swap Files?

Discussion:

Robert White

2014-10-22 20:08:48 UTC

So the documentation is clear that you can't mount a swap file through
BTRFS (unless you use a loop device).

Why isn't a NOCOW file that has been fully pre-allocated -- as with
fallocate(1) -- not suitable for swapping?

I found one reference to an unimplemented feature necessary for swap,
but wouldn't it be reasonable for that feature to exist for NOCOW files?
(or does this relate to my previous questions about the COW operation
that happens after a snapshot?)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Hugo Mills

2014-10-22 20:25:19 UTC

Permalink

Post by Robert White
So the documentation is clear that you can't mount a swap file
through BTRFS (unless you use a loop device).
Why isn't a NOCOW file that has been fully pre-allocated -- as with
fallocate(1) -- not suitable for swapping?
I found one reference to an unimplemented feature necessary for
swap, but wouldn't it be reasonable for that feature to exist for
NOCOW files? (or does this relate to my previous questions about the
COW operation that happens after a snapshot?)

The original swap implementation worked by determining a list of
blocks (well, I guess extents) using fiemap, and passing that to the
swap code for it to use. This is fine, as long as (a) nobody else
writes to the file, and (b) the blocks comprising the file don't move
elsewhere.

Part (a) can be done with normal permissions, so that's not a
problem.

Part (b) is more tricky -- not because of CoW (because the writes
from the swap code go directly to the device, ignoring the FS), but
because the FS's idea of where the file lives on the device can move
-- balance will do this, for example. So you can't balance a
filesystem with any active swapfiles on it. This is the main reason
that swapfiles aren't allowed on btrfs, as far as I know.

The new code is the swap-on-NFS infrastructure, which indirects
swapfile accesses through the filesystem code. The reason you have to
do that with NFS is because NFS doesn't expose a block device at all,
so you can't get a list of blocks on an underlying device because
there isn't one. Indirecting the accesses through the filesystem,
however, allows us to side-step btrfs's problems with part (b) above,
and in theory gives us swapfile capability.

Hugo.

--
=== Hugo Mills: ***@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Great oxymorons of the world, no. 7: The Simple Truth ---

Robert White

2014-10-22 20:39:58 UTC

Permalink

Post by Hugo Mills
The new code is the swap-on-NFS infrastructure, which indirects
swapfile accesses through the filesystem code. The reason you have to
do that with NFS is because NFS doesn't expose a block device at all,
so you can't get a list of blocks on an underlying device because
there isn't one. Indirecting the accesses through the filesystem,
however, allows us to side-step btrfs's problems with part (b) above,
and in theory gives us swapfile capability.

I was not even aware there was "new code" on the matter.

Is there a guide or whatever to doing this? I didn't see any mention of
it in the places Google led me.

--Rob

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Hugo Mills

2014-10-22 20:42:31 UTC

Permalink

Post by Robert White

I was not even aware there was "new code" on the matter.
Is there a guide or whatever to doing this? I didn't see any mention
of it in the places Google led me.

swap-on-NFS is still, I think, in a set of out of tree patches, and
it's not gone anywhere near btrfs yet. It's just that once it does
land in mainline, it would form the appropriate infrastructure to
develop swapfile capability for btrfs.

Hugo.

Robert White

2014-10-22 20:48:44 UTC

Permalink

Post by Hugo Mills
swap-on-NFS is still, I think, in a set of out of tree patches, and
it's not gone anywhere near btrfs yet. It's just that once it does
land in mainline, it would form the appropriate infrastructure to
develop swapfile capability for btrfs.

I just looked at my 3.16.6 kernel tree and there is a check-box for swap
over NFS in the network file systems menu. For whatever that's worth.

But as of now only the loopdev method is workable if I understand you
correctly.

Thanks.

--Rob.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Russell Coker

2014-10-23 07:34:32 UTC

Permalink

Also it would be nice to have checksums on the swap data. It's a bit of a waste to pay for ECC RAM and then lose the ECC benefits as soon as data is paged out.

--
Sent from my Samsung Galaxy Note 3 with K-9 Mail.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Austin S Hemmelgarn

2014-10-23 11:22:01 UTC

Permalink

Post by Robert White
So the documentation is clear that you can't mount a swap file through
BTRFS (unless you use a loop device).
Why isn't a NOCOW file that has been fully pre-allocated -- as with
fallocate(1) -- not suitable for swapping?
I found one reference to an unimplemented feature necessary for swap,
but wouldn't it be reasonable for that feature to exist for NOCOW files?
(or does this relate to my previous questions about the COW operation
that happens after a snapshot?)

I actually use a swapfile on BTRFS on a regular basis on my laptop
(trying to keep the number of partitions to a minimum, cause I dual boot
Windows), and here's what the init script I use for it does:
1. Remove any old swap file (the fs is on an SSD, so I do this mostly to
get the discard operation).
2. Use touch to create a new file.
3. Use chattr to mark the file NOCOW.
4. Use fallocate to pre-allocate the space for the file.
5. Bind the file to a loop device.
6. Format as swap and add as swapspace.

This works very reliably for me, and the overhead of the loop device is
relatively insignificant (because my disk is actually faster than my
RAM) for my use case, and I can safely balance/defrag/fstrim the
filesystem without causing issues with the swap file.

If you can avoid using a swapfile though, I would suggest doing so,
regardless of which FS you are using. I actually use a 4-disk RAID-0
LVM volume on my desktop, and it gets noticeably better performance than
using a swap file.