Discussion:
btrfs data dup on single device?
Daniel Landstedt
2014-06-25 07:25:57 UTC
Permalink
Will it be possible to use DUP for data as well as for metadata on a
single device?
And if so, am I going to be able to specify more than 1 copy of the data?

Storage is pretty cheap now, and to have multiple copies in btrfs is
something that I think could be used a lot. I know I will use multiple
copies of my data if made possible.

Is it something that might be available when RAID1 gets N mirrors
instead of just 1 mirror?



Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hugo Mills
2014-06-25 07:47:13 UTC
Permalink
Post by Daniel Landstedt
Will it be possible to use DUP for data as well as for metadata on a
single device?
This has variously been possible and not over the last few years. I
think it's finally come down on the side of "not", but by all means try
it (mkfs.btrfs -d dup).
Post by Daniel Landstedt
And if so, am I going to be able to specify more than 1 copy of the data?
It'll be exactly 2 copies at the moment. Note that performance on
an SSD will at least halve, and performance on a rotational device
will probably suck quite badly. Neither will help you in the case of a
full-device failure. You still need backups, kept on a separate machine.
Post by Daniel Landstedt
Storage is pretty cheap now, and to have multiple copies in btrfs is
something that I think could be used a lot. I know I will use multiple
copies of my data if made possible.
The question is, why? If you have enough disk media errors to make
it worth using multiple copies, then your storage device is basically
broken and needs replacing, and it can't really be relied on for very
much longer.
Post by Daniel Landstedt
Is it something that might be available when RAID1 gets N mirrors
instead of just 1 mirror?
The n-copies code will probably support n-copies DUP as well.
There's no reason particularly to restrict it that way.

Hugo.
--
=== Hugo Mills: ***@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Do not meddle in the affairs of wizards, for they are subtle, ---
and quick to anger.
Daniel Landstedt
2014-06-25 09:00:58 UTC
Permalink
Post by Hugo Mills
It'll be exactly 2 copies at the moment. Note that performance on
an SSD will at least halve, and performance on a rotational device
will probably suck quite badly. Neither will help you in the case of a
full-device failure. You still need backups, kept on a separate machine.
Write performance, sure, but reads shouldn't be that much slower?
For DUP on same device I was thinking about family photos, source code
and such, not for compiles or databases with a lot of queries.
Of course you need backups, offsite backups.. I had a fire a couple of
years ago, and, well.. If the second machine also is in the vicinity..
We were lucky this time, but a couple of more minutes and all would
have been lost. Got me thinking a bit more.
Post by Hugo Mills
The question is, why? If you have enough disk media errors to make
it worth using multiple copies, then your storage device is basically
broken and needs replacing, and it can't really be relied on for very
much longer.
I was thinking that DUP on same device was mostly for protection
against bit rot and smaller errors, not device failure.
If the device starts to misbehave, it might be enough to rescue the
data to another device if you have DUPes. Ok, a backup will probably
help there too.

I'm putting together a new server at home, and want the checksums in
btrfs, and multiple copies of the important data. As I understand it
it's better than RAID6 that I used earlier, which has it's own set of
problems.
And multiple offsite backups.

I'll try and see if it's possible to use DUP for data on same device,
when I looked around it seemed as it wasn't possible.
Post by Hugo Mills
Hugo.
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Imran Geriskovan
2014-06-25 13:33:54 UTC
Permalink
Post by Hugo Mills
Post by Daniel Landstedt
Storage is pretty cheap now, and to have multiple copies in btrfs is
something that I think could be used a lot. I know I will use multiple
copies of my data if made possible.
The question is, why? If you have enough disk media errors to make
it worth using multiple copies, then your storage device is basically
broken and needs replacing, and it can't really be relied on for very
much longer.
Because btrfs single data profile can detect bitrot but can not recover
from it. Hardware Raid may be the solution. But you can not use it on
a laptop, or backup usb drive. However, you can still have 2 partitions
and mount them as Raid1.

Ofcourse we all have backups. But loss of certain files in a big file
set may have gone unnoticed if you do not scan though whole
backup log each time.

You will definetly lose some files unless you keep 5-10 years of
incremental backups. Even if you keep them they are too susceptible
to bitrots too.

Thus, there is definetely a need for ensured/enhanced data integrity.

Note that deduplication features of modern drives makes duplication
useless unless you used encrypted disk.

Regard, Imran
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Anton Mitterer
2014-06-25 14:41:46 UTC
Permalink
Post by Hugo Mills
This has variously been possible and not over the last few years. I
think it's finally come down on the side of "not",
I think that would really be a loss... :(
Post by Hugo Mills
The question is, why?
Well imagine you have some computer which can only have one disk drive
(laptop, etc.) and you still want at least some kind of redundancy
against bit rot errors.


IMO, btrfs should support most flavours out there...
- n-way duplicates on the same device (and not just DUP with n=2)
- n-way mirrors on multiple devices (i.e. what we have right now with
RAID1 plus up to classic RAID1 with copies on each device
- RAID5/6
- n-way striped+parity with n>2
- "stacked" layouts (RAID 10 as e.g. MD has it,... RAID50, 60)


And terminology should really be re-worked... IMHO it's very bad to use
the term RAID1, if it's not what classic RAID1 does.


Cheers,
Chris.
Konstantinos Skarlatos
2014-06-25 22:26:17 UTC
Permalink
Post by Christoph Anton Mitterer
This has variously been possible and not over the last few years=
=2E I
Post by Christoph Anton Mitterer
think it's finally come down on the side of "not",
I think that would really be a loss... :(
The question is, why?
Well imagine you have some computer which can only have one disk driv=
e
Post by Christoph Anton Mitterer
(laptop, etc.) and you still want at least some kind of redundancy
against bit rot errors.
IMO, btrfs should support most flavours out there...
- n-way duplicates on the same device (and not just DUP with n=3D2)
=46or the same device there is also erasure coding, where you lose lets=
=20
say 10% capacity, and have the benefit of recovering from the most=20
probable disk errors that dont take the whole disk with them, bad secto=
rs.
Post by Christoph Anton Mitterer
- n-way mirrors on multiple devices (i.e. what we have right now with
RAID1 plus up to classic RAID1 with copies on each device
- RAID5/6
- n-way striped+parity with n>2
- "stacked" layouts (RAID 10 as e.g. MD has it,... RAID50, 60)
And terminology should really be re-worked... IMHO it's very bad to u=
se
Post by Christoph Anton Mitterer
the term RAID1, if it's not what classic RAID1 does.
Cheers,
Chris.
--=20
Konstantinos Skarlatos

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Murphy
2014-06-25 18:55:36 UTC
Permalink
Post by Hugo Mills
The question is, why? If you have enough disk media errors to make
it worth using multiple copies, then your storage device is basically
broken and needs replacing, and it can't really be relied on for very
much longer.
Yeah basically -d dup tells me the user believes "I do not trust the media that much". Specifically they believe the media surface variability is what they are suspicious of, not the read/write head, actuator, or spindle, or motor. And a.) I don't know how they can possibly have reliable information to arrive at this kind of suspicion; b.) why bother with such crap hardware?

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Imran Geriskovan
2014-06-26 21:13:52 UTC
Permalink
Post by Chris Murphy
Post by Hugo Mills
The question is, why? If you have enough disk media errors to make
it worth using multiple copies, then your storage device is basically
broken and needs replacing, and it can't really be relied on for very
much longer.
Yeah basically -d dup tells me the user believes "I do not trust the media
that much". Specifically they believe the media surface variability is what
they are suspicious of, not the read/write head, actuator, or spindle, or
motor. And a.) I don't know how they can possibly have reliable information
to arrive at this kind of suspicion; b.) why bother with such crap
hardware?
Chris Murphy
Does "Dup Metadata" need to tell anything to anyone?

Regards, Imran
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan
2014-06-25 10:48:39 UTC
Permalink
Post by Daniel Landstedt
Will it be possible to use DUP for data as well as for metadata on a
single device?
See Hugo's answer for the general case. I've learned a lot from him. =:^)

While I believe and as he says the general answer is no, there are a
couple ways around that, which he doesn't mention. Tho as he warns,
you'll see a performance drop as a result.

1) Btrfs has what's called mixed-bg (block group) mode, which combines
data and metadata in the same chunks instead of creating separate chunks
for data and metadata. Mixed-mode was designed for the small btrfs use-
case and is the default on btrfs of 1 GiB or under, but can be specified
on larger btrfs as well. The fact that mixed mode allows dup data is a
side effect of the fact that mixed-mode chunks are shared data/metadata,
but in this case it's a useful side effect. =:^)

Tho mixed-mode does come with a performance cost, some people recommend
using it on btrfs upto 32-64 GiB (and perhaps upto 128 MiB) anyway,
because it /does/ eliminate data-vs-metadata allocation issues that tend
to be worse on such small filesystems. Of course you can specify it on
larger btrfs as well, but my understanding is that the performance spread
between mixed and normal mode isn't as noticeable on small filesystems,
but on filesystems above 100 GiB or so the performance loss is more
noticeable.

Mixed-mode chunk size is (like metadata chunks in normal mode) 256 MiB.
(FWIW data chunks are 1 GiB in normal mode.)

Mixed-mode must be specified at mkfs.btrfs time, using the -M/--mixed
option, and if you specify replication mode at the same time instead of
simply taking the default, you'll need to specify both -m/--metadata and
-d/--data replication mode as the same thing. Mixed-vs-separate data/
metadata is configured separately from replication mode, so it's possible
to configure mixed single mode or mixed dup mode on a single device, and
of course mixed mode with the various raid modes on multiple device
filesystems.

For your case, the mkfs.btrfs would be invoked with:

--mixed --data dup --metadata dup

2) The other way to do it would be to create two separate partitions,
presumably the same size, on the same physical device, and treat them as
if they were two separate devices. This would allow you to configure
btrfs for multi-device raid1 mode. Of course you could do the same with
raid0 mode or with more partitions, raid5, raid6, or raid10 modes, but
that would needlessly complicate things to no purpose.

But there /is/ a narrow purpose to dual-device raid1 mode, where both
"devices" are partitions on the same physical device -- precisely the one
under discussion here, data replication on a single (physical) device.

Unlike the mixed-mode above, that would give you split data/metadata mode
on a single (physical) device, with full 1 GiB size data chunks.

On spinning rust media this would arguably be incrementally more
reliable, since it would force the two copies to separate parts of the
physical media, a good thing if one portion of the media happens to be
weaker than the rest. However, seek costs would be measurably higher, so
performance would likely be measurably lower.

On SSD, the FTL layer relocates blocks at will anyway, so there's less
benefit to single-physical-device raid1 mode there. But at the same time
there's zero seek cost, so writes should take exactly 2X penalty
(compared to single device single mode) since you're doing 2X the
writing, while reads should be essentially 0 penalty, since (as long as
the checksums verify) btrfs will read only one copy effectively at random.

Of course the 2X data costs will half effective filesystem capacity in
either case, same as with mixed-mode.

** That *DOES* assume that your SSD doesn't do internal compression/
deduplication, of course. Some SSD firmware (sandforce based firmware is
the commonly known case) does do compression/deduplication, in which case
either dup mode or raid1 mode won't get you the desired redundancy since
the firmware will likely be deduping down to a single copy anyway. But
not all SSD firmware operates this way. Point of fact, my SSDs have as a
bullet-point feature that they do NOT do deduplication, etc, selling this
as more reliable performance, the same performance all the time, no
matter the data written. So on SSDs do your research. =:^)

The mkfs.btrfs would be invoked with:

--data raid1 --metadata raid1

Unfortunately, at present raid1 mode still only creates two copies of
each chunk even if there's more than two devices, so partitioning up the
physical device into additional partitions simply to feed more "devices"
to mkfs.btrfs won't get you additional copies, only more complexity and
less control over where those copies go.
Post by Daniel Landstedt
And if so, am I going to be able to specify more than 1 copy of the data?
I assume by "1 copy" you meant two copies, working copy plus single
backup copy.

As Hugo says, you get precisely two copies. However, they can't really
be considered working and backup; it's simply two equal copies, with both
chunks written and whichever one is handy read and verified, with the
other one being a fallback if the checksum doesn't validate on the first
one read.
Post by Daniel Landstedt
Storage is pretty cheap now, and to have multiple copies in btrfs is
something that I think could be used a lot. I know I will use multiple
copies of my data if made possible.
As Hugo, I feel compelled to ask what your use-case is.

I'm a strong booster of the N-way-mirroring feature not yet available,
because I find the 3-device/3-way-mirroring case compelling, particularly
given btrfs data integrity features.

And there's certainly a case to be made for two-way-redundancy on a
single device, for the same reasons.

But there's little practical use-case for 3-plus-copies on the same
physical device, because the performance costs are simply too high to
justify on a single physical device with its corresponding single-device
risk of failure.

IMO, if the use-case calls for three or more copies (working plus two) of
the data, it equally well justifies placing it on separate physical
devices, thereby protecting against all but one of the physical devices
failing as well.

OTOH, perhaps there's a use-case I'm simply not seeing...
Post by Daniel Landstedt
Is it something that might be available when RAID1 gets N mirrors
instead of just 1 mirror?
In theory, and at least with N-way partitioning, yes. However, I'm not
sure that they'll enable N-way-data-dup in the single-device-btrfs case,
for the same reasons they don't enable simple two-way data-dup mode now.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...