Discussion:
unexplainable corruptions 3.17.0
(too old to reply)
Tomasz Torcz
2014-10-16 09:17:26 UTC
Permalink
Raw Message
Hi,

Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.

System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache

Broken files are in /var/log/journal directory. This directory
is set NOCOW with chattr, all the files within too.

Example of broken file:
***@0005057fe87730cf-6d3d85ed59bd70ae.journal~

When read with dd_rescue, there are many I/O errors
-..-..xxxxxxxxx---x.-..-..-...-..-..-...-< 100%
Reads with cat, hexdump fails with:
read(4, 0x1001000, 65536) = -1 EIO (Input/output error)

But btrfs dev stat reports no errors!
$ btrfs dev stat .
[/dev/dm-0].write_io_errs 0
[/dev/dm-0].read_io_errs 0
[/dev/dm-0].flush_io_errs 0
[/dev/dm-0].corruption_errs 0
[/dev/dm-0].generation_errs 0
[/dev/dm-1].write_io_errs 0
[/dev/dm-1].read_io_errs 0
[/dev/dm-1].flush_io_errs 0
[/dev/dm-1].corruption_errs 0
[/dev/dm-1].generation_errs 0

There are no hardware errors in dmesg.

This is perplexing. How to find out what is causing the
brokeness and howto avoid it in the future?
--
Tomasz .. oo o. oo o. .o .o o. o. oo o. ..
Torcz .. .o .o .o .o oo oo .o .. .. oo oo
o.o.o. .o .. o. o. o. o. o. o. oo .. .. o.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2014-10-17 08:02:03 UTC
Permalink
Raw Message
Post by Tomasz Torcz
Hi,
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache
Broken files are in /var/log/journal directory. This directory
is set NOCOW with chattr, all the files within too.
When read with dd_rescue, there are many I/O errors
-..-..xxxxxxxxx---x.-..-..-...-..-..-...-< 100%
read(4, 0x1001000, 65536) = -1 EIO (Input/output error)
But btrfs dev stat reports no errors!
$ btrfs dev stat .
[/dev/dm-0].write_io_errs 0
[/dev/dm-0].read_io_errs 0
[/dev/dm-0].flush_io_errs 0
[/dev/dm-0].corruption_errs 0
[/dev/dm-0].generation_errs 0
[/dev/dm-1].write_io_errs 0
[/dev/dm-1].read_io_errs 0
[/dev/dm-1].flush_io_errs 0
[/dev/dm-1].corruption_errs 0
[/dev/dm-1].generation_errs 0
There are no hardware errors in dmesg.
This is perplexing. How to find out what is causing the
brokeness and howto avoid it in the future?
Does scrub work for you?

thanks,
-liubo
Post by Tomasz Torcz
--
Tomasz .. oo o. oo o. .o .o o. o. oo o. ..
Torcz .. .o .o .o .o oo oo .o .. .. oo oo
o.o.o. .o .. o. o. o. o. o. o. oo .. .. o.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Torcz
2014-10-17 08:10:09 UTC
Permalink
Raw Message
Post by Liu Bo
Post by Tomasz Torcz
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache
read(4, 0x1001000, 65536) = -1 EIO (Input/output error)
Does scrub work for you?
As there seem to be no way to scrub individual files, I've started
scrub of full volume. It will take some hours to finish.

Meanwhile, could you satisfy my curiosity what would scrub do that
wouldn't be done by just reading the whole file?
--
Tomasz Torcz "Never underestimate the bandwidth of a station
xmpp: ***@chrome.pl wagon filled with backup tapes." -- Jim Gray

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hugo Mills
2014-10-17 08:17:37 UTC
Permalink
Raw Message
Post by Tomasz Torcz
Post by Liu Bo
Post by Tomasz Torcz
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache
read(4, 0x1001000, 65536) = -1 EIO (Input/output error)
Does scrub work for you?
As there seem to be no way to scrub individual files, I've started
scrub of full volume. It will take some hours to finish.
Meanwhile, could you satisfy my curiosity what would scrub do that
wouldn't be done by just reading the whole file?
It checks both copies. Reading the file will only read one of the
copies of any given block (so if that's good and the other copy is
bad, it won't fix anything).

Hugo.
--
=== Hugo Mills: ***@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- The future isn't what it used to be. ---
Zygo Blaxell
2014-10-20 14:04:37 UTC
Permalink
Raw Message
Post by Hugo Mills
Post by Tomasz Torcz
Post by Liu Bo
Post by Tomasz Torcz
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache
read(4, 0x1001000, 65536) = -1 EIO (Input/output error)
Does scrub work for you?
As there seem to be no way to scrub individual files, I've started
scrub of full volume. It will take some hours to finish.
Meanwhile, could you satisfy my curiosity what would scrub do that
wouldn't be done by just reading the whole file?
It checks both copies. Reading the file will only read one of the
copies of any given block (so if that's good and the other copy is
bad, it won't fix anything).
Really? One of my earliest btrfs tests was to run a loop of 'sha1sum
-c' on a gigabyte or two of files in one window while I used dd to
write random data in random locations directly to one of the filesystem
mirror partitions in the other. I did this test *specifically* to
watch the automatic checksumming and self-healing features of btrfs
in action. A complete 'sha1sum' verification of the filesystem contents
passed even though the kernel log was showing checksum errors scrolling
by faster than I could read, which strongly implies that read() normally
does check both mirrors before returning EIO. This was on kernel version
3.12.21 or so, so it should be working on 3.17 too.

Thomasz reports using 'nocow', which breaks the data integrity checks.
I'd expect the read() to return success and provide garbage data, but the
observed behavior is EIO instead. The underlying device doesn't seem
to be generating the I/O errors, so it's probably metadata corruption
of some kind. Are there btrfs kernel messages in dmesg?
Rich Freeman
2014-10-20 14:52:38 UTC
Permalink
Raw Message
Post by Zygo Blaxell
Post by Hugo Mills
Post by Tomasz Torcz
Post by Liu Bo
Post by Tomasz Torcz
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache
read(4, 0x1001000, 65536) = -1 EIO (Input/output error)
Does scrub work for you?
As there seem to be no way to scrub individual files, I've started
scrub of full volume. It will take some hours to finish.
Meanwhile, could you satisfy my curiosity what would scrub do that
wouldn't be done by just reading the whole file?
It checks both copies. Reading the file will only read one of the
copies of any given block (so if that's good and the other copy is
bad, it won't fix anything).
Really? One of my earliest btrfs tests was to run a loop of 'sha1sum
-c' on a gigabyte or two of files in one window while I used dd to
write random data in random locations directly to one of the filesystem
mirror partitions in the other. I did this test *specifically* to
watch the automatic checksumming and self-healing features of btrfs
in action. A complete 'sha1sum' verification of the filesystem contents
passed even though the kernel log was showing checksum errors scrolling
by faster than I could read, which strongly implies that read() normally
does check both mirrors before returning EIO.
I think you misread the earlier post. It sounds like the algorithm is:
1. Receive request to read block from file.
2. Determine which mirrored block to read it from (it sounds like
this is sub-optimal today, presumably you'd want to use the least busy
disk or disk with the head closest to the right cylinder to do it).
3. Read the block. Verify the checksum. If it matches return the data.
4. If not find another mirrored block to read it from if one exists.
Verify the checksum. If it matches return the data and update all
other mirrored copies with it.
5. Repeat step 4 until you run out of mirrored copies. If so, return an error.

So, doing random reads will NOT be equivalent to scrubbing the disks,
because with a scrub you want to check that ALL copies are code, and
the algorithm above only determines that any copy is good.

When you used dd to overwrite blocks, you didn't get errors because
when the first copy failed the filesystem just read the second copy as
intended. That isn't a scrub - it is a recovery.

An actual scrub isn't file-focused, but device focused. It starts
reading at the start of the device, and verifies each logical unit of
data sequentially. This can be done asynchronously since btrfs stores
checksums, as opposed to a traditional RAID where the reads need to be
synchronous since the validity of a mirror/stripe can only be
ascertained by comparing it to all the other devices in that
mirror/stripe (and then unless you're using something like RAID6+ you
couldn't determine which copy is bad without a checksum). In theory
I'd expect a scrub with btrfs to be less detrimental to performance as
a result - a read request could halt the scrub on one device without
delaying the scrub on the other devices. Writes in RAID1 mode
necessarily disrupt two devices, but others would not be impacted.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2014-10-17 08:29:36 UTC
Permalink
Raw Message
Post by Tomasz Torcz
Post by Liu Bo
Post by Tomasz Torcz
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache
read(4, 0x1001000, 65536) = -1 EIO (Input/output error)
Does scrub work for you?
As there seem to be no way to scrub individual files, I've started
scrub of full volume. It will take some hours to finish.
Meanwhile, could you satisfy my curiosity what would scrub do that
wouldn't be done by just reading the whole file?
(Hugo has answered that in this thread.)

Well..I don't know exactly what's the cause, but as the file is NOCOW, it writes
data in place, have you experienced a hard reboot or something recently?

And any message in dmesg log while getting EIO by reading the file?

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Torcz
2014-10-17 08:54:51 UTC
Permalink
Raw Message
Post by Liu Bo
Post by Tomasz Torcz
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
read(4, 0x1001000, 65536) = -1 EIO (Input/output error)
Well..I don't know exactly what's the cause, but as the file is NOCOW, it writes
data in place, have you experienced a hard reboot or something recently?
Nothing like that. Server is on an UPS, there were couple normal shutdowns
this year (few kernel upgrades).
Post by Liu Bo
And any message in dmesg log while getting EIO by reading the file?
Nothing in dmesg, no btrfs messages, no SCSI/SATA errors, nothing. That's
why I find those corruptions mysterious.
Maybe there is some way to inspect internal btrfs state and find out what
causing the problems? Or maybe this is related to patch mentioned in this thread?
--
Tomasz Torcz "Never underestimate the bandwidth of a station
xmpp: ***@chrome.pl wagon filled with backup tapes." -- Jim Gray

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2014-10-17 12:53:06 UTC
Permalink
Raw Message
Post by Tomasz Torcz
Post by Tomasz Torcz
Post by Tomasz Torcz
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
read(4, 0x1001000, 65536) = -1 EIO
(Input/output error)
Well..I don't know exactly what's the cause, but as the file is NOCOW, it writes
data in place, have you experienced a hard reboot or something recently?
Nothing like that. Server is on an UPS, there were couple normal shutdowns
this year (few kernel upgrades).
Post by Tomasz Torcz
And any message in dmesg log while getting EIO by reading the file?
Nothing in dmesg, no btrfs messages, no SCSI/SATA errors, nothing. That's
why I find those corruptions mysterious.
Maybe there is some way to inspect internal btrfs state and find out what
causing the problems? Or maybe this is related to patch mentioned in this thread?
This sounds like the problem fixed with some patches to our extent
mapping code that went in with the merge window. I've cherry picked a
few for stable and I'm running them through tests now. They are in my
stable-3.17 branch, and I'll send to Greg once Linus grabs the revert
for the last one.

But, if you want to try that branch out, it may fix this EIO.
Otherwise we'll start sending you debugging.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Rich Freeman
2014-10-17 18:09:30 UTC
Permalink
Raw Message
This sounds like the problem fixed with some patches to our extent mapping
code that went in with the merge window. I've cherry picked a few for
stable and I'm running them through tests now. They are in my stable-3.17
branch, and I'll send to Greg once Linus grabs the revert for the last one.
Just for clarity - when can we expect to see these in the kernel? I
wasn't sure which merge windows you're referring to. I take it that
3.17.1 is still unpatched (for this and the readonly snapshot issue -
which requires reverting 9c3b306e1c9e6be4be09e99a8fe2227d1005effc).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Samuel
2014-10-18 07:32:49 UTC
Permalink
Raw Message
Post by Rich Freeman
Just for clarity - when can we expect to see these in the kernel?
The stable kernel rules say:

https://www.kernel.org/doc/Documentation/stable_kernel_rules.txt

# - It or an equivalent fix must already exist in Linus' tree (upstream).

So until Linus merges the revert into the mainline kernel it cannot go into a
stable release, and he's not merged it yet.

cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Samuel
2014-10-19 03:01:38 UTC
Permalink
Raw Message
Post by Chris Samuel
So until Linus merges the revert into the mainline kernel it cannot go into
a stable release, and he's not merged it yet.
It was merged last night.
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc Dietrich
2014-10-20 08:01:56 UTC
Permalink
Raw Message
Post by Chris Samuel
Post by Rich Freeman
Just for clarity - when can we expect to see these in the kernel?
https://www.kernel.org/doc/Documentation/stable_kernel_rules.txt
# - It or an equivalent fix must already exist in Linus' tree (upstream).
So until Linus merges the revert into the mainline kernel it cannot go into
a stable release, and he's not merged it yet.
it also says a few lines below:

- To have the patch automatically included in the stable tree, add the tag
Cc: ***@vger.kernel.org
in the sign-off area. Once the patch is merged it will be applied to
the stable tree without anything else needing to be done by the author
or subsystem maintainer.

so fixes would be tagged earlier this way and merged automaticly.

Marc
Chris Samuel
2014-10-20 09:14:12 UTC
Permalink
Raw Message
Post by Marc Dietrich
so fixes would be tagged earlier this way and merged automaticly.
I don't think there's a lot automatic about stable, Greg K-H merges patches
into a git tree here:

http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git

As you can see since last night he pulled in a bunch of btrfs fixes into that
based upon what Chris Mason emailed out yesterday.


commit 2792dbfd1e02a70a8eef7e0cc3f44cb77d6c100f
Author: Greg Kroah-Hartman <***@linuxfoundation.org>
Date: Mon Oct 20 07:08:43 2014 +0800

3.17-stable patches

added patches:
btrfs-add-missing-compression-property-remove-in-btrfs_ioctl_setflags.patch
btrfs-cleanup-error-handling-in-build_backref_tree.patch
btrfs-don-t-do-async-reclaim-during-log-replay.patch
btrfs-don-t-go-readonly-on-existing-qgroup-items.patch
btrfs-fix-a-deadlock-in-btrfs_dev_replace_finishing.patch
btrfs-fix-and-enhance-merge_extent_mapping-to-insert-best-fitted-extent-map.patch
btrfs-fix-build_backref_tree-issue-with-multiple-shared-blocks.patch
btrfs-fix-race-in-wait_sync-ioctl.patch
btrfs-fix-the-wrong-condition-judgment-about-subset-extent-map.patch
btrfs-fix-up-bounds-checking-in-lseek.patch
btrfs-try-not-to-enospc-on-log-replay.patch
btrfs-wake-up-transaction-thread-from-sync_fs-ioctl.patch
revert-btrfs-race-free-update-of-commit-root-for-ro-snapshots.patch

(there are also a bunch going in for 3.10, 3.14 and 3.16 too)
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Torcz
2014-10-20 19:09:37 UTC
Permalink
Raw Message
Post by Tomasz Torcz
Post by Tomasz Torcz
Post by Tomasz Torcz
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
read(4, 0x1001000, 65536) = -1 EIO (Input/output
error)
Well..I don't know exactly what's the cause, but as the file is NOCOW, it writes
data in place, have you experienced a hard reboot or something recently?
Nothing like that. Server is on an UPS, there were couple normal shutdowns
this year (few kernel upgrades).
Post by Tomasz Torcz
And any message in dmesg log while getting EIO by reading the file?
Nothing in dmesg, no btrfs messages, no SCSI/SATA errors, nothing. That's
why I find those corruptions mysterious.
Maybe there is some way to inspect internal btrfs state and find out what
causing the problems? Or maybe this is related to patch mentioned in this thread?
This sounds like the problem fixed with some patches to our extent mapping
code that went in with the merge window. I've cherry picked a few for
stable and I'm running them through tests now. They are in my stable-3.17
branch, and I'll send to Greg once Linus grabs the revert for the last one.
But, if you want to try that branch out, it may fix this EIO. Otherwise
we'll start sending you debugging.
Good shot. Fedora kernel maintainer was kind enough to include those patches
and build a kernel for F21. With this kernel EIO is not showing and files
are readable. Thanks!
--
Tomasz Torcz ,,If you try to upissue this patchset I shall be seeking
xmpp: ***@chrome.pl an IP-routable hand grenade.'' -- Andrew Morton (LKML)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan
2014-10-17 11:38:22 UTC
Permalink
Raw Message
Post by Liu Bo
Post by Tomasz Torcz
Hi,
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying as this
is btrfs raid1 setup and I expected auto-healing.
System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache
Broken files are in /var/log/journal directory. This directory
is set NOCOW with chattr, all the files within too.
Does scrub work for you?
NOCOW implies no checksum, so scrub shouldn't be able to help.

Some time back people were reporting problems with corrupted journald
journal files, but I've seen no such reports in a long time.

This isn't likely much help for your (OP's) use-case, but FWIW, here's
what I did with journald.

When I switched to systemd here, I set it to volatile storage only, and
kept syslog-ng setup for longer term storage. I arranged things so
journald's volatile logs had enough room to grow for a normal single
session in the /run/log tmpfs. That gives me the nice journald systemd
integration, systemctl status reporting the last few log entries for a
specific service, etc.

But everything still gets passed to syslog-ng (which being on gentoo, I
set the systemd USE flag for, so it integrates nicely) as well, and that
spits out my normal text logs just as I had it setup to do long before
systemd ever came along. It's those that I keep on non-volatile storage
so they stick around thru a reboot, and they play nicely with btrfs so
I've not had to worry about what journald's binary files might do.

Btw, unless you have a need for relatime, noatime is strongly recommended
for btrfs.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Murphy
2014-10-17 15:07:42 UTC
Permalink
Raw Message
Post by Duncan
When I switched to systemd here, I set it to volatile storage only, and
kept syslog-ng setup for longer term storage. I arranged things so
journald's volatile logs had enough room to grow for a normal single
session in the /run/log tmpfs. That gives me the nice journald systemd
integration, systemctl status reporting the last few log entries for a
specific service, etc.
But everything still gets passed to syslog-ng
For the uninitiated: To do the above, delete /var/log/journal and install syslog daemon of choice (and is systemd-journald compatible of course). That's it. By deleting /var/log/journal, systemd-journald will write logs to /run/log/journal.


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Torcz
2014-10-17 17:29:32 UTC
Permalink
Raw Message
Post by Liu Bo
Does scrub work for you?
Scrub ended with not errors:
scrub status for a4f339d4-c129-4485-acc1-1233d29c665d
scrub started at Fri Oct 17 10:04:24 2014 and finished after 31992 seconds
total bytes scrubbed: 6.03TiB with 0 errors

I guess I'll have to check the patch Marc pointed out.
--
Tomasz Torcz "Never underestimate the bandwidth of a station
xmpp: ***@chrome.pl wagon filled with backup tapes." -- Jim Gray

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc Dietrich
2014-10-17 08:17:09 UTC
Permalink
Raw Message
Post by Tomasz Torcz
Hi,
Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache
Broken files are in /var/log/journal directory. This directory
is set NOCOW with chattr, all the files within too.
When read with dd_rescue, there are many I/O errors
-..-..xxxxxxxxx---x.-..-..-...-..-..-...-< 100%
sounds like
https://patchwork.kernel.org/patch/4929981/
to me. We urgently need some stable patches or people will quickly corrupt
their filesystems.

Marc
Chris Murphy
2014-10-17 15:01:51 UTC
Permalink
Raw Message
Post by Tomasz Torcz
Broken files are in /var/log/journal directory. This directory
is set NOCOW with chattr, all the files within too.
What do you get for 'journalctl --verify' ? I'm curious if any journal files are considered corrupt by journalctl, and if there's parity between journalctl and dd_rescue when it comes to good/bad journals.
Post by Tomasz Torcz
When read with dd_rescue, there are many I/O errors
-..-..xxxxxxxxx---x.-..-..-...-..-..-...-< 100%
read(4, 0x1001000, 65536) = -1 EIO (Input/output error)
Yeah weird, I'd expect in any case that there'd be a kernel message, whether it's a Btrfs or hardware problem.

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Torcz
2014-10-20 19:10:34 UTC
Permalink
Raw Message
Post by Chris Murphy
Post by Tomasz Torcz
Broken files are in /var/log/journal directory. This directory
is set NOCOW with chattr, all the files within too.
What do you get for 'journalctl --verify' ? I'm curious if any journal files are considered corrupt by journalctl, and if there's parity between journalctl and dd_rescue when it comes to good/bad journals.
journalctl "bus errors" on them.
--
Tomasz Torcz ,,If you try to upissue this patchset I shall be seeking
xmpp: ***@chrome.pl an IP-routable hand grenade.'' -- Andrew Morton (LKML)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...