EBS volumes with identical UUIDs + btrfs

Discussion:

Brandon Philips

2014-04-29 21:44:08 UTC

Hello All-

I attached an AWS EBS volume to `xvdh` that was from a terminated EC2
machine to another machine. The filesystem shared a btrfs UUID since
they came from an identical install. When I mounted the new EBS volume
to /mnt something very odd happened:

Before:

$ mount
/dev/xvda9 on / type btrfs (rw,relatime,ssd,space_cache)
/dev/xvda3 on /usr type ext4 (ro,relatime)

After:

# mount /dev/xvdh9 /mnt

# mount
/dev/xvdh9 on / type btrfs (rw,relatime,ssd,space_cache)
/dev/xvdh9 on /mnt type btrfs (rw,relatime,ssd,space_cache)

It seems that btrfs gets very confused when there are matching UUIDs
and /mnt didn't contain the contents that I expected. To work around
the issue I booted a non-identical machine image that had a different
btrfs UUID and attached the backup EBS volume again and everything
worked as expected.

What is the right way of handling this? Attaching EBS volumes from
snapshots or old identical machines is a common use case.

Thanks!

Brandon
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Hugo Mills

2014-04-29 22:21:39 UTC

Permalink

Post by Brandon Philips
Hello All-
I attached an AWS EBS volume to `xvdh` that was from a terminated EC2
machine to another machine. The filesystem shared a btrfs UUID since
they came from an identical install. When I mounted the new EBS volume
$ mount
/dev/xvda9 on / type btrfs (rw,relatime,ssd,space_cache)
/dev/xvda3 on /usr type ext4 (ro,relatime)
# mount /dev/xvdh9 /mnt
# mount
/dev/xvdh9 on / type btrfs (rw,relatime,ssd,space_cache)
/dev/xvdh9 on /mnt type btrfs (rw,relatime,ssd,space_cache)
It seems that btrfs gets very confused when there are matching UUIDs
and /mnt didn't contain the contents that I expected. To work around
the issue I booted a non-identical machine image that had a different
btrfs UUID and attached the backup EBS volume again and everything
worked as expected.
What is the right way of handling this?

The only solution that there is right now is, "don't do that".
btrfs basically assumes that if several block devices have the same
UUID in their btrfs superblocks, they're different parts of the same
filesystem. If they're actually clones of the same filesystem, then it
has problems, and can _really_ screw things up, as you've discovered.

The closest thing to a "good" solution that's been proposed so far
is to have a tool that will scan the metadata on a block device (or a
set of block devices making up a filesystem) and rewrite the FS UUID
embedded in every metadata block. This is likely to be expensive.

To do the conversion, you'll have to either (a) load the chunk tree
and only scan the metadata chunks, or (b) scan the whole FS for things
that look like metadata blocks and convert every block you find. In
either case, you'll have to supply exact names for the block device(s)
to convert -- preferably as a whole (particularly in case (a), where
you need all that info to find the current chunk tree).

Option (a) is useful if you already have the clones -- but given
the behaviour of most udev installations these days, that's already
got you in a dangerous position, because udev has probably already
detected the new devices and run btrfs dev scan on them. Option (b) is
handy if you want to treat the image as a stream (e.g. dd if=/dev/sda
| btrfs fi set-uuid --stream | dd of=/dev/sdb)

Needless to say, neither of these has actually been implemented
yet.

Hugo.

Post by Brandon Philips
Attaching EBS volumes from
snapshots or old identical machines is a common use case.
Thanks!
Brandon

--
=== Hugo Mills: ***@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- If you're not part of the solution, you're part ---
of the precipiate.

Brandon Philips

2014-04-29 23:51:29 UTC

Permalink

Hello Hugo-

Thanks for the helpful explanation. This is what I assumed was
happening and it is great to have a clarification. More comments
inline.

Post by Hugo Mills
The only solution that there is right now is, "don't do that".

This use case, attaching an old block device for recovery or
debugging, is common on a majority of the "cloud" platforms. And
starting from an identical btrfs UUID is common with the prevalence of
the "install, snapshot and replicate the VM" model. So, this leaves us
with two options as I can see:

1. mkfs on first boot of an instance and copy any files over
2. rewrite the UUID on first boot of an instance

Post by Hugo Mills
btrfs basically assumes that if several block devices have the same
UUID in their btrfs superblocks, they're different parts of the same
filesystem. If they're actually clones of the same filesystem, then it
has problems, and can _really_ screw things up, as you've discovered.

Best guess: would a mkfs and copy or this tree walk and write be more
expensive? Say I have 40 megabytes of initial data on the btrfs that
would need to be copied in case 1.

Thanks!

Brandon
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html