Discussion:
strange 3.16.3 problem
(too old to reply)
Russell Coker
2014-10-18 03:54:19 UTC
Permalink
I have a system running the Debian 3.16.3-2 AMD64 kernel for the Xen Dom0 and
the DomUs.

The Dom0 has a pair of 500G SATA disks in a BTRFS RAID-1 array. The RAID-1
array has some subvols exported by NFS as well as a subvol for the disk images
for the DomUs - I am not using NoCOW as performance is fine without it and I
like having checksums on everything.

I have started having some problems with a mail server that is running in a
DomU. The mail server has 32bit user-space because it was copied from a 32bit
system and I had no reason to upgrade it to 64bit, but it's running a 64bit
kernel so I don't think that 32bit user-space is related to my problem.

# find . -name "*546"
./1412233213.M638209P10546
# ls -l ./1412233213.M638209P10546
ls: cannot access ./1412233213.M638209P10546: No such file or directory

Above is the problem, find says that the file in question exists but ls
doesn't think so, the file in question is part of a Maildir spool that's NFS
mounted. This problem persisted across a reboot of the DomU, so it's a
problem with the Dom0 (the NFS server).

The dmesg output on the Dom0 doesn't appear to have anything relevant, and a
find command doesn't find the file. I don't know if this is a NFS problem or
a BTRFS problem. I haven't rebooted the Dom0 yet because a remote reboot of a
server running a kernel from Debian/Unstable is something I try to avoid.

Any suggestions?
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Russell Coker
2014-10-18 10:29:16 UTC
Permalink
The NFS client is part of the kernel iirc, so it should be 64 bit. This
would allow the creation of files larger than 4gb and create possible
issues with a 32 bit user space utility.
A correctly written 32bit application will handle files >4G in size.

While some applications may have problems, I'm fairly sure that ls will be ok.

# dd if=/dev/zero of=/tmp/test bs=1024k count=1 seek=5000
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00383089 s, 274 MB/s
# /bin/ls -lh /tmp/test
-rw-r--r--. 1 root root 4.9G Oct 18 20:47 /tmp/test
# file /bin/ls
/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically
linked (uses shared libs), for GNU/Linux 2.6.26,
BuildID[sha1]=0xd3280633faaabf56a14a26693d2f810a32222e51, stripped

A quick test shows that a 32bit ls can handle this.
I would mount from a client with 64 bit user space and see if the problem
occurs there. If so, it is probably not a btrfs issue (if I am
understanding your environment correctly).
I'll try that later.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Robert White
2014-10-18 13:33:10 UTC
Permalink
Post by Russell Coker
# find . -name "*546"
./1412233213.M638209P10546
# ls -l ./1412233213.M638209P10546
ls: cannot access ./1412233213.M638209P10546: No such file or directory
Any suggestions?
Does "ls -l *546" show the file to exist? e.g. what happens if you use
the exact same wildcard in the ls command as you used in the find?

It is possible (and back in the day it was quite common) for files to be
created with non-renderable nonsense in the name. for instance if the
first four characters of the name were "13^H4" (where ^H is the single
backspace character) the file wold look like it was named 14* but it
would be listed by ls using "13*". If the file name is "damaged", which
is usually a failing in the program that created the file, then it can
be "hidden in plain sight".

Note that this sort of name is hidden from the copy-paste done in the
terminal window because the binary nonsense is just not in the output
any more by the time you select it with the mouse.

It doesn't have to be a backspace, BTW, it can be any character that the
terminal window will not render.

If things get really ugly you may need to remove the file using

find . -name "*546" -exec rm "{}" \;

(This takes the wildcard expansion out of the hands of the shell and
makes it happen in the find command, which may have different
functionality in your build.)

Anyway, this sort of mangled file name can happen in any file system as
the various binary and non-printable name elements are completely legal
in the POSIX standard.

-- Rob.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Russell Coker
2014-10-18 23:41:41 UTC
Permalink
Post by Robert White
Post by Russell Coker
# find . -name "*546"
./1412233213.M638209P10546
# ls -l ./1412233213.M638209P10546
ls: cannot access ./1412233213.M638209P10546: No such file or directory
Any suggestions?
Does "ls -l *546" show the file to exist? e.g. what happens if you use
the exact same wildcard in the ls command as you used in the find?
# ls -l *546
ls: cannot access 1412233213.M638209P10546: No such file or directory

That gives the same result as find, the shell matches the file name but then
ls can't view it.

lstat64("1412233213.M638209P10546", 0x9fab0c8) = -1 ENOENT (No such file or
directory)
Post by Robert White
From strace, the lstat64 system call fails.
It is possible (and back in the day it was quite common) for files to be
created with non-renderable nonsense in the name. for instance if the
first four characters of the name were "13^H4" (where ^H is the single
backspace character) the file wold look like it was named 14* but it
would be listed by ls using "13*". If the file name is "damaged", which
is usually a failing in the program that created the file, then it can
be "hidden in plain sight".
If that's the case then it's still a kernel bug somewhere. Maildrop and
Dovecot don't create files with any unusual characters in the names.
Post by Robert White
Note that this sort of name is hidden from the copy-paste done in the
terminal window because the binary nonsense is just not in the output
any more by the time you select it with the mouse.
It doesn't have to be a backspace, BTW, it can be any character that the
terminal window will not render.
If things get really ugly you may need to remove the file using
find . -name "*546" -exec rm "{}" \;
# find . -name "*546" -exec rm "{}" \;
rm: cannot remove `./1412233213.M638209P10546': No such file or directory
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan
2014-10-19 05:37:36 UTC
Permalink
Post by Russell Coker
# find . -name "*546" -exec rm "{}" \;
rm: cannot remove `./1412233213.M638209P10546': No such file or directory
Going with the non-printable-character theory, what happens if you expand
that *546 find one character at a time? Does *0546 work? *10546? etc.

Additionally, I'd say use the default print instead of the -exec rm.
Because once you find it, you might want to do other tests (doing a file
on it to find type, finding the size, possibly catting it...) to figure
out what it is and possibly how it came to get there, before ultimate
removal.

When you find a boundary where it goes from working to not-working, what
happens if you stick a wildcard in that boundary? Assuming *0546 doesn't
work, for instance, thus creating a boundary between the 0 and the 5,
what about *0*546 or *0?546?

... Just things I'd be trying were I to see such a thing here.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan
2014-10-19 10:19:50 UTC
Permalink
Post by Duncan
Post by Russell Coker
# find . -name "*546" -exec rm "{}" \;
rm: cannot remove `./1412233213.M638209P10546': No such file or directory
Going with the non-printable-character theory, what happens if you
expand that *546 find one character at a time? Does *0546 work? *10546?
etc.
When you find a boundary where it goes from working to not-working, what
happens if you stick a wildcard in that boundary? Assuming *0546
doesn't work, for instance, thus creating a boundary between the 0 and
the 5, what about *0*546 or *0?546?
FWIW, I just had something similar happen here, except ls could see the
files and tell me what happened, tho for a moment I was wondering... In
my case it was a couple symlinks, dead because the partition they pointed
into wasn't mounted. But with this thread fresh in my mind, of course it
was the first thing to come to mind...


Another idea for potentially figuring out what's going on...

If you have tab-completion active, what sort of auto-completes does it
offer with for instance ls 141<tab> ? If necessary, again you can try
expanding one character at a time, except of course from the left here
instead of from the right as above.

For things like colons, I know bash-completion here fills in \: in place
of simply colon. I just tested what it'd do with a backspace char
embedded in a filename, and tab-completion substitutes ^H (while ls
substitutes ? ).
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Robert White
2014-10-20 17:37:28 UTC
Permalink
Post by Russell Coker
Post by Robert White
Post by Russell Coker
# find . -name "*546"
./1412233213.M638209P10546
# ls -l ./1412233213.M638209P10546
ls: cannot access ./1412233213.M638209P10546: No such file or directory
Any suggestions?
Does "ls -l *546" show the file to exist? e.g. what happens if you use
the exact same wildcard in the ls command as you used in the find?
# ls -l *546
ls: cannot access 1412233213.M638209P10546: No such file or directory
That gives the same result as find, the shell matches the file name but then
ls can't view it.
lstat64("1412233213.M638209P10546", 0x9fab0c8) = -1 ENOENT (No such file or
directory)
From strace, the lstat64 system call fails.
Okay, from the strace output the shell _is_ finding the file in the
directory read and expand (readdir) pass. That is "*546" is being
expanded to the full file name text "1412233213.M638209P10546" but then
the actual operation fails because the name is apparently not associated
with anything.

So what pass of scrub or btrfsck checks directory connectedness? Does
that pass give your file system a clean bill of health?

Also you said that you are using a 32bit user space "copied from another
server" under a 64bit kernel. Is the "ls" command a 32 bit executable then?

What happens if you stop the Xen domain for the mail server and then
mount the disks into a native 64bit environment and then ls the file name?

I ask because the man page for lstat64 says its a "wrapper" for the
underlying system call (fstatat64). It is not impossible that you might
have a case where the wrapper is failing inside glibc due to some 32/64
bit conversion taking place.

Since you copied the entire 32bit environment from another (older?)
server there may be some nonsense happening where the two interfaces meet.

I'd check the file system against a native 64bit kernel and user-space
next. Possibly from a distro CD if necessary, just to isolate the
potential file system causes from the user-space causes. If the native
64bit environment fails then its a fs issue, if the natvie 64bit
operations work, then its a userspace problem and you win the fun of
remaking the mail server from scratch.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2014-10-20 20:21:04 UTC
Permalink
[...]
Post by Robert White
Also you said that you are using a 32bit user space "copied from
another server" under a 64bit kernel. Is the "ls" command a 32 bit
executable then?
Could this be related to the inode overflow in 32 bit system
(see inode_cache options) ? If so running a 64bit "ls -i" should
work....
Post by Robert White
-- To unsubscribe from this list: send the line "unsubscribe
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan
2014-10-21 09:50:37 UTC
Permalink
Goffredo Baroncelli posted on Mon, 20 Oct 2014 22:21:04 +0200 as
[...]
Post by Robert White
Also you said that you are using a 32bit user space "copied from
another server" under a 64bit kernel. Is the "ls" command a 32 bit
executable then?
Could this be related to the inode overflow in 32 bit system (see
inode_cache options) ? If so running a 64bit "ls -i" should work....
Good point. Russell might just owe you a beverage of choice. =:^)

The inode_cache mount option isn't recommended for any bitness.

@ Russ, are you mounting with inode_cache? If so, definitely try running
without it and see if it changes the results.

(FWIW I wish that mount option would just go away as it would definitely
remove an invitation to a Russian roulette party with their data for the
unwary, but I suppose there's someone paying some bills somewhere that
wants it kept for some specific use-case where the performance gain must
be worth the calculated risk, thus continuing that invitation to data
Russian roulette for everyone else.)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Roman Mamedov
2014-10-21 10:16:11 UTC
Permalink
On Tue, 21 Oct 2014 09:50:37 +0000 (UTC)
Post by Duncan
(FWIW I wish that mount option would just go away as it would definitely
remove an invitation to a Russian roulette party with their data for the
unwary, but I suppose there's someone paying some bills somewhere that
wants it kept for some specific use-case where the performance gain must
be worth the calculated risk, thus continuing that invitation to data
Russian roulette for everyone else.)
Why do you think it is so dangerous? Just because of possible bugs? But bugs
can be anywhere in Btrfs, why specifically single out one mount option.

Let's take a look at its description in the wiki:

"inode_cache (since 3.0)
Enable free inode number caching. Not recommended to use unless files on your
filesystem get assigned inode numbers that are approaching 2^64. Normally, new
files in each subvolume get assigned incrementally (plus one from the last
time) and are not reused. The mount option turns on caching of the existing
inode numbers and reuse of inode numbers of deleted files. This option may
slow down your system at first run, or after mounting without the option."
https://btrfs.wiki.kernel.org/index.php/Mount_options

As you can see it's not about performance, but rather more of a recognition
that a filesystem with some pre-determined finite lifetime expectancy is not a
good thing to have; even though 2^64 is a lot, there are various scenarios out
there, including millions of files and constant creation and removal of
snapshots, that may make the FS hit the limit faster than you would expect.
--
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan
2014-10-21 12:08:34 UTC
Permalink
Post by Roman Mamedov
On Tue, 21 Oct 2014 09:50:37 +0000 (UTC)
Post by Duncan
(FWIW I wish that mount option would just go away as it would
definitely remove an invitation to a Russian roulette party with their
data for the unwary, but I suppose there's someone paying some bills
somewhere that wants it kept for some specific use-case where the
performance gain must be worth the calculated risk, thus continuing
that invitation to data Russian roulette for everyone else.)
Why do you think it is so dangerous? Just because of possible bugs? But
bugs can be anywhere in Btrfs, why specifically single out one mount
option.
"inode_cache (since 3.0)
Enable free inode number caching. Not recommended to use unless files on
your filesystem get assigned inode numbers that are approaching 2^64.
Normally, new files in each subvolume get assigned incrementally (plus
one from the last time) and are not reused. The mount option turns on
caching of the existing inode numbers and reuse of inode numbers of
deleted files.
This option may slow down your system at first run, or after mounting
without the option."
https://btrfs.wiki.kernel.org/index.php/Mount_options
As you can see it's not about performance, but rather more of a
recognition that a filesystem with some pre-determined finite lifetime
expectancy is not a good thing to have; even though 2^64 is a lot, there
are various scenarios out there, including millions of files and
constant creation and removal of snapshots, that may make the FS hit the
limit faster than you would expect.
inode_cache is generally not needed on 64-bit, and it is known to cause
problems on 32-bit where a cache overflow and non-unique cached inode-
numbers is possible on large filesystems, as well as boot-time slowdowns
(including timeouts on mounting, for filesystems mounted at boot) on 64-
bit.

I guess the real trouble is that the problems with it aren't well
documented and relatively few people know about them, mostly regulars on
this list, so people end up enabling it even on 64-bit where about the
only effect is a boot-time slowdown and the increased chance of crash-
corruption of yet another cache, as well as on 32-bit where it's actually
useful if somewhat risky (especially for large filesystems), thus getting
themselves in needless trouble. If there was a big IF YOU USE THIS AND
IT GOES BAD YOU GET TO KEEP THE PIECES warning on it, I guess fewer
people would use it, but then people would be asking questions about why
it's there in the first place.

And I don't know why, as I've only seen it cause needless problems, never
actually help, and I know that everyone here recommends turning it off
without any exception I've seen. But the conspiracy theory side of me
says if it's causing problems and not helping, and it's still there,
there must be a reason...
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2014-10-21 16:40:19 UTC
Permalink
Post by Duncan
Goffredo Baroncelli posted on Mon, 20 Oct 2014 22:21:04 +0200 as
[...]
Post by Duncan
Could this be related to the inode overflow in 32 bit system (see
inode_cache options) ? If so running a 64bit "ls -i" should work....
Good point. Russell might just owe you a beverage of choice. =:^)
The inode_cache mount option isn't recommended for any bitness.
Hi Ducan,
could you elaborate this sentence ? From my understanding
inode_cache is *needed* on 32bit system in order to avoid inode number
overflow. Why are you saying that it is not recommended ?
Even if there are bugs, these have to be corrected. A bugs cannot be
a reason to remove a needed option.

Inode exhaustion is worse than a slowness... Otherwise BTRFS would be not
suitable to a 32 bit system... But please tell me your opinion because may
be I misunderstood something...

BR
G.Baroncelli
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan
2014-10-22 07:12:42 UTC
Permalink
Goffredo Baroncelli posted on Tue, 21 Oct 2014 18:40:19 +0200 as
Post by Goffredo Baroncelli
Post by Duncan
Goffredo Baroncelli posted on Mon, 20 Oct 2014 22:21:04 +0200 as
=20
[...]
Post by Duncan
=20
Could this be related to the inode overflow in 32 bit system (see
inode_cache options) ? If so running a 64bit "ls -i" should work.=
=2E..
Post by Goffredo Baroncelli
Post by Duncan
Good point. Russell might just owe you a beverage of choice. =3D:^=
)
Post by Goffredo Baroncelli
Post by Duncan
=20
The inode_cache mount option isn't recommended for any bitness.
=20
=20
Hi Ducan,
could you elaborate this sentence ? From my understanding inode_cache=
is
Post by Goffredo Baroncelli
*needed* on 32bit system in order to avoid inode number overflow. Why
are you saying that it is not recommended ?
My understanding of this is limited as I'm a sysadmin and list regular,=
=20
not a dev let alone a btrfs dev, but see the btrfs (5) manpage (aka btr=
fs-
mount), under mount options:

"""""

inode_cache: Enable free inode number caching. Defaults to off due to a=
n=20
overflow problem when the free space crcs don=E2=80=99t fit inside a si=
ngle page.

"""""

As I understand it based on developer comments to this effect, 64-bit=20
doesn't need it at all, and on 32-bit, in theory there are cases where=20
it'd be useful, but in practice, this overflow problem, among others (s=
ee=20
the discussion below), limits its usefulness to such an extent that it'=
s=20
not recommended for use, even on 32-bit where in theory it could be of=20
use.

That's the extent of the theory I know on the subject.

Then there's the real-world reports and their effect on things:

* In part because inode_cache is pretty well universally negative-
recommended when it is seen, it's a poorly tested feature, and reported=
=20
bugs never get traced to it because as soon as people see it, they say=20
turn it off as its problems are worse than the problem it's trying to=20
cure, so it's turned off and the bugs disappear and everbody's happy,=20
without tracing down the problem.

One solid example of that was a report that btrfs was consistently taki=
ng=20
an unreasonably long time (five minutes plus) to mount, making it=20
unworkable as a filesystem mounted at boot from fstab. (I believe that=
=20
user was on systemd and systemd was timing out the localmount service,=20
but it would have been similar on any other init system, as very few wi=
ll=20
by default let anything but fsck go for five minutes without timing it=20
out.) inode_cache was apparently reinitializing at every mount, instea=
d=20
of just once. Were that space_cache, the bug would have almost certain=
ly=20
been traced and ultimately fixed. But being inode_cache which isn't=20
recommended anyway, we recommended that he turn inode_cache off, and he=
=20
did, and btrfs suddenly behaved itself, effectively confirming opinions=
=20
that inode_cache isn't worth the trouble.

I believe I've also seen failure to boot due to inode_cache corruption=20
issues reported, very similar to the ones that used to plague space_cac=
he=20
and that hit me at one point so I know how bad they were, as nospace_ca=
che=20
and/or clear_cache could fix the space_cache problem, but back then it=20
was a manual fix so you had to know about it. But the space_cache issu=
es=20
were traced and fixed since it's the default and detection and recovery=
=20
for space_cache corruption is normally automatic these days, while who=20
knows _what_ happened to the same sorts of issues with inode_cache,=20
because the recommendation is simply to turn it off and be done with th=
e=20
problem, instead. However, I'm not as sure on this one as on the long-
mount-time issue, I believe because I was still getting my own btrfs=20
bearings at the time, and the link wasn't as strong to me as it got los=
t=20
in the blur of everything else I was learning about btrfs at the same=20
time.

I guess I just changed my own mind a bit on it as I wrote that, but the=
=20
end-user effect is almost the same, except there's an exception now. =20
Basically, the situation is still the same for ordinary users, don't=20
touch it as it's likely to result in needless problems. But for that=20
stubborn but tech inclined user willing to be a guinea pig, particularl=
y=20
if they're a dev that can actively help trace down bugs in the code as=20
well as usefully write up in sysadmin's or plainer English exactly wher=
e=20
it makes sense to use this option and what its real problems are, there=
's=20
definitely an opening to help make this a (hopefully much, but just abo=
ut=20
anything would be an improvement) better documented and less buggy moun=
t=20
option. =3D:^)

--=20
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Samuel
2014-10-19 10:46:59 UTC
Permalink
Hiya Russell,
Post by Russell Coker
# find . -name "*546"
./1412233213.M638209P10546
# ls -l ./1412233213.M638209P10546
ls: cannot access ./1412233213.M638209P10546: No such file or directory
Does:

find . -name "*546" -ls

work at all?
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan
2014-10-20 04:38:28 UTC
Permalink
Post by Russell Coker
# find . -name "*546"
./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls: cannot
access ./1412233213.M638209P10546: No such file or directory
Does your mail server do a lot of renames? Is one perhaps stuck? If so,
that sounds like the same thing "Zygo Blaxell" is reporting in the
"3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014
15:25:26 -400, Msg-ID: <***@hungrycats.org>, as linked
here:

<http://permalink.gmane.org/gmane.comp.file-systems.btrfs/39539>

I pointed him at this thread too. I hadn't seen you mention a hung
rename, but the other symptoms sound similar.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Zygo Blaxell
2014-10-20 13:02:44 UTC
Permalink
Post by Duncan
Post by Russell Coker
# find . -name "*546"
./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls: cannot
access ./1412233213.M638209P10546: No such file or directory
Does your mail server do a lot of renames? Is one perhaps stuck? If so,
that sounds like the same thing "Zygo Blaxell" is reporting in the
"3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014
<http://permalink.gmane.org/gmane.comp.file-systems.btrfs/39539>
I pointed him at this thread too. I hadn't seen you mention a hung
rename, but the other symptoms sound similar.
Not really. It looks like Russell having a NFS client-side problem,
I'm having a server-side one (maybe). Also, all Russell's system calls
seem to be returning promptly, while some of mine are not. Even if
there were timeouts, an NFS server timeout gives a different error than
'No such file or directory'. Finally, the one and only thing I _can_
do with my bug is 'ls' on the renamed files (for me, the find would get
stuck before returning any output).

For Russell's issue...most of the stuff I can think of has been
tried already. I didn't see if there was any attempt try to ls the
file from the NFS server as well as the client side. If ls is OK on
the server but not the client, it's an NFS issue (possibly interacting
with some btrfs-specific quirk); otherwise, it's likely a corrupted
filesystem (mail servers seem to be unusually good at making these).

Most of the I/O time on mail servers tends to land in the fsync() system
call, and some nasty fsync() btrfs bugs were fixed in 3.17 (i.e. after
3.16, and not in the 3.16.x stable update for x <= 5 (the last one
I've checked)). That said, I'm not familiar with how fsync() translates
over NFS, so it might not be relevant after all.

If the NFS server's view of the filesystem is OK, check the NFS protocol
version from /proc/mounts on the client. Sometimes NFS clients will
get some transient network error during connection and fall back to some
earlier (and potentially buggier) NFS version. I've seen very different
behavior in some important corner cases from v4 and v3 clients, for
example, and if the client is falling all the way back to v2 the bugs
and their workarounds start to get just plain _weird_ (e.g. filenames
which produce specific values from some hash function or that contain
specific character sequences are unusable). v2 is so old it may even
have issues with 64-bit inode numbers.
Austin S Hemmelgarn
2014-10-20 13:19:36 UTC
Permalink
Post by Zygo Blaxell
Post by Duncan
Post by Russell Coker
# find . -name "*546"
./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls: cannot
access ./1412233213.M638209P10546: No such file or directory
Does your mail server do a lot of renames? Is one perhaps stuck? If so,
that sounds like the same thing "Zygo Blaxell" is reporting in the
"3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014
<http://permalink.gmane.org/gmane.comp.file-systems.btrfs/39539>
I pointed him at this thread too. I hadn't seen you mention a hung
rename, but the other symptoms sound similar.
Not really. It looks like Russell having a NFS client-side problem,
I'm having a server-side one (maybe). Also, all Russell's system calls
seem to be returning promptly, while some of mine are not. Even if
there were timeouts, an NFS server timeout gives a different error than
'No such file or directory'. Finally, the one and only thing I _can_
do with my bug is 'ls' on the renamed files (for me, the find would get
stuck before returning any output).
For Russell's issue...most of the stuff I can think of has been
tried already. I didn't see if there was any attempt try to ls the
file from the NFS server as well as the client side. If ls is OK on
the server but not the client, it's an NFS issue (possibly interacting
with some btrfs-specific quirk); otherwise, it's likely a corrupted
filesystem (mail servers seem to be unusually good at making these).
Most of the I/O time on mail servers tends to land in the fsync() system
call, and some nasty fsync() btrfs bugs were fixed in 3.17 (i.e. after
3.16, and not in the 3.16.x stable update for x <= 5 (the last one
I've checked)). That said, I'm not familiar with how fsync() translates
over NFS, so it might not be relevant after all.
If the NFS server's view of the filesystem is OK, check the NFS protocol
version from /proc/mounts on the client. Sometimes NFS clients will
get some transient network error during connection and fall back to some
earlier (and potentially buggier) NFS version. I've seen very different
behavior in some important corner cases from v4 and v3 clients, for
example, and if the client is falling all the way back to v2 the bugs
and their workarounds start to get just plain _weird_ (e.g. filenames
which produce specific values from some hash function or that contain
specific character sequences are unusable). v2 is so old it may even
have issues with 64-bit inode numbers.
Just now saw this thread, but IIRC 'No such file or directory' also gets
returned sometimes when trying to automount a share that can't be
enumerated by the client, and also sometimes when there is a stale NFS
file handle.
Russell Coker
2014-10-21 10:13:29 UTC
Permalink
Post by Zygo Blaxell
Post by Duncan
Post by Russell Coker
# find . -name "*546"
cannot access ./1412233213.M638209P10546: No such file or directory
Does your mail server do a lot of renames? Is one perhaps stuck? If so,
that sounds like the same thing "Zygo Blaxell" is reporting in the
"3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014
It's a Maildir server so it does a lot of renames, but I don't think anything
is stuck. I've just rebooted the Dom0 and nothing has changed.
Post by Zygo Blaxell
For Russell's issue...most of the stuff I can think of has been
tried already. I didn't see if there was any attempt try to ls the
file from the NFS server as well as the client side. If ls is OK on
the server but not the client, it's an NFS issue (possibly interacting
with some btrfs-specific quirk); otherwise, it's likely a corrupted
filesystem (mail servers seem to be unusually good at making these).
# ls -l *546
ls: cannot access *546: No such file or directory

Above is on the server.

# ls -l *546
ls: cannot access 1412233213.M638209P10546: No such file or directory

Above is on the client. Note that wildcard expansion worked because readdir()
found the file even though stat can't.
Post by Zygo Blaxell
Most of the I/O time on mail servers tends to land in the fsync() system
call, and some nasty fsync() btrfs bugs were fixed in 3.17 (i.e. after
3.16, and not in the 3.16.x stable update for x <= 5 (the last one
I've checked)). That said, I'm not familiar with how fsync() translates
over NFS, so it might not be relevant after all.
That's going to suck for people running mail servers on Debian.
Post by Zygo Blaxell
If the NFS server's view of the filesystem is OK, check the NFS protocol
version from /proc/mounts on the client. Sometimes NFS clients will
get some transient network error during connection and fall back to some
earlier (and potentially buggier) NFS version. I've seen very different
behavior in some important corner cases from v4 and v3 clients, for
example, and if the client is falling all the way back to v2 the bugs
and their workarounds start to get just plain _weird_ (e.g. filenames
which produce specific values from some hash function or that contain
specific character sequences are unusable). v2 is so old it may even
have issues with 64-bit inode numbers.
Rebooting the client multiple times and rebooting the server once doesn't
change it. I don't think it's any transient error.
Post by Zygo Blaxell
Just now saw this thread, but IIRC 'No such file or directory' also gets
returned sometimes when trying to automount a share that can't be
enumerated by the client, and also sometimes when there is a stale NFS
file handle.
I think that rebooting both client and server precludes the possibility of a
stale file handle. Even rebooting the client (which I have done several
times) should fix it.
Post by Zygo Blaxell
Okay, from the strace output the shell _is_ finding the file in the
directory read and expand (readdir) pass. That is "*546" is being
expanded to the full file name text "1412233213.M638209P10546" but then
the actual operation fails because the name is apparently not associated
with anything.
So what pass of scrub or btrfsck checks directory connectedness? Does
that pass give your file system a clean bill of health?
That's inconvenient for a remote system with a single BTRFS filesystem.
Post by Zygo Blaxell
Also you said that you are using a 32bit user space "copied from another
server" under a 64bit kernel. Is the "ls" command a 32 bit executable then?
Yes.
Post by Zygo Blaxell
What happens if you stop the Xen domain for the mail server and then
mount the disks into a native 64bit environment and then ls the file name?
The filesystem in question is NFS mounted from a server with 64bit kernel+user
to a virtual server with 64bit kernel+32bit user. On the file server (the Xen
Dom0) ls doesn't even see that file in readdir.
Post by Zygo Blaxell
I ask because the man page for lstat64 says its a "wrapper" for the
underlying system call (fstatat64). It is not impossible that you might
have a case where the wrapper is failing inside glibc due to some 32/64
bit conversion taking place.
If there is a 32/64 conversion then we have another problem. The mail server
is configured to reject messages bigger than about 50M, I don't recall the
exact number but it's a lot smaller than 2G.
Post by Zygo Blaxell
Could this be related to the inode overflow in 32 bit system
(see inode_cache options) ? If so running a 64bit "ls -i" should
work....
I've just installed coreutils:amd64 on the NFS client and I get the same
results.
Post by Zygo Blaxell
The inode_cache mount option isn't recommended for any bitness.
@ Russ, are you mounting with inode_cache? If so, definitely try running
without it and see if it changes the results.
/dev/sda3 / btrfs rw,seclabel,noatime,space_cache,skip_balance 0 0

The above is in /proc/mounts. I have configured my systems to use
skip_balance because in the past I've had a balance cause big problems on
several occasions and I've never had a resumed balance do any good. I think
that noatime is unlikely to cause any problems. I don't know what space_cache
is about, is that something the kernel adds automatically?
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Russell Coker
2014-10-21 10:42:22 UTC
Permalink
I've just upgraded the Dom0 (NFS server) from 3.16.3 to 3.16.5 and it all
works.

Prior to upgrading the Dom0 I had the same problem occur with different file
names. All the names in question were truncated names of files that exist.
It seems that 3.16.3 has a bug with NFS serving files with long names.

Thanks for all the suggestions.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Robert White
2014-10-21 15:23:31 UTC
Permalink
Post by Russell Coker
I've just upgraded the Dom0 (NFS server) from 3.16.3 to 3.16.5 and it all
works.
Prior to upgrading the Dom0 I had the same problem occur with different file
names. All the names in question were truncated names of files that exist.
It seems that 3.16.3 has a bug with NFS serving files with long names.
Thanks for all the suggestions.
Well never mind my message from a few minutes ago...

But thanks for finding that problem/solution. I've been having an NFS
problem of my own and it is from a server running 3.16.3... so you may
have just made my day. 8-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan
2014-10-21 12:25:02 UTC
Permalink
Post by Russell Coker
I don't know what
space_cache is about, is that something the kernel adds automatically?
Yes, space_cache is the default.

Apparently early in space_cache history you had to mount with space_cache
once, and the kernel would then detect the existence of the space-cache-
tree and always use the option after that.

But for quite some time now, over a year since I've not ever added that
to my mount options since I got the ssds and began using btrfs on them,
the kernel seems to enable it automatically from the first mount, unless
you specifically tell it not to.

Similarly for the ssd option, if the kernel detects that you are running
an ssd (I believe it checks the ata/scsi rotational media property, which
it should detect properly on raw hardware, but which can get lost if
btrfs is layered over top of lvm/mdraid/dmcrypt/etc), it'll automatically
enable the ssd mount option, which is exactly what it does here.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Robert White
2014-10-21 15:10:32 UTC
Permalink
Post by Russell Coker
Post by Robert White
What happens if you stop the Xen domain for the mail server and then
mount the disks into a native 64bit environment and then ls the file name?
The filesystem in question is NFS mounted from a server with 64bit kernel+user
to a virtual server with 64bit kernel+32bit user. On the file server (the Xen
Dom0) ls doesn't even see that file in readdir.
So we need to do some variable isolation as I am now not sure what Xen
would have to do with anything.

If the file doesn't exist under that name on the NFS server, then _that_
is where you need to do the find/ls checks for various name expansions.
That is, all the various wildcard checks need to happen on the real
server that has mounted the BTRFS in order to find the actual file that
is leading to the phantom file. E.g. if the file "isn't there" on the
BTRFS then the problem is really an NFS translation problem of some sort.

This problem involves two physical servers or just one?

The network connection between the two semantic servers is physical
(real cables) or semantic (a Xen bridge etc)?

You are using NFS version? Over udp or tcp? using what options?

You are or you are not using any sort of secondary cache on top of your
NFS? e.g. a cachefiles directory on a little local slice somewhere on
either system. If so you have or have not cleared that cache manually?

You have or have not cleared the NFS server state (typically found in
/var/lib/nfs or some such)?

The means you are using to synchronize time between the systems is?

Understand that at this point you've described an NFS problem (possibly
an NFS server problem with BTRFS) but not a BTRFS problem per-se, so we
have to figure out what the server sees on the file system before we can
guess why the client is seeing what it is seeing.
Post by Russell Coker
Post by Robert White
I ask because the man page for lstat64 says its a "wrapper" for the
underlying system call (fstatat64). It is not impossible that you might
have a case where the wrapper is failing inside glibc due to some 32/64
bit conversion taking place.
If there is a 32/64 conversion then we have another problem. The mail server
is configured to reject messages bigger than about 50M, I don't recall the
exact number but it's a lot smaller than 2G.
This potential conversion issue has nothing to do with file size and
everything to do with internal structure alignment and significant bits
in things like file handles. (though now I'm not sure what matters now.)

NFS is sort of old and crufty in some cases, particularly it's own
internal file handles operation, that was originally designed around
absolute inodes-by-number. Technology moved on while NFS was just sort
of cruft-patched to deal with what it could no longer understand. NFSv4
is intended to fix lots of those problems (and if you aren't using it,
it might be worth a stab, but it has its own departures and issues,
particularly with trying to mount a v4 root without an initramfs stage).

(NOTE: I think there _is_ something NFS-server-from-BTRFS related as
when I wireshark a particular problem I've been having with an NFS root
environment, I've been getting some unexpected NOENT responses in the
NFS data stream. If you are comfortable with wireshark/tcpdump etc you
might want to look there as well. Coercing a mount point at the point of
service and using fsid= in /etc/exports seems to have given me some
progresss, but it sounds like that might be a bit much for your problem.)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...