Discussion:
Why does stat() return invalid st_dev field for btrfs ??
Mark Lord
2009-08-17 20:47:22 UTC
Permalink
Chris / list,

stat(2) seems to return invalid major/minor device info
for btrfs filesystems.

Why? Is this a bug?

Eg.

[~] uname -r
2.6.31-rc6
[~] mkfs.btrfs /dev/sdb

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/sdb
nodesize 4096 leafsize 4096 sectorsize 4096 size 30.06GB
Btrfs Btrfs v0.19
[~] mount /dev/sdb /x -t btrfs
[~] stat --format="%04D" /x
0017
[~] touch /x/junk
[~] stat --format="%04D" /x/junk
0017

This gives major=0x00, minor=0x17 for /dev/sdb,
which should have major=8, minor=0x10.

???
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Lord
2009-08-17 21:15:22 UTC
Permalink
Post by Mark Lord
Chris / list,
stat(2) seems to return invalid major/minor device info
for btrfs filesystems.
Why? Is this a bug?
Eg.
[~] uname -r
2.6.31-rc6
[~] mkfs.btrfs /dev/sdb
WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using
fs created label (null) on /dev/sdb
nodesize 4096 leafsize 4096 sectorsize 4096 size 30.06GB
Btrfs Btrfs v0.19
[~] mount /dev/sdb /x -t btrfs
[~] stat --format="%04D" /x
0017
[~] touch /x/junk
[~] stat --format="%04D" /x/junk
0017
This gives major=0x00, minor=0x17 for /dev/sdb,
which should have major=8, minor=0x10.
..

Mmm.. btrfs appears to configure itself as a "pseudo" filesystem,
which is why it returns fake device numbers via stat(),
similar to procfs or sysfs.

The problem I'm trying to solve, is how to determine the underlying
block device node for a mounted btrfs filesystem.

Well, actually, the other way around: given a block device major:minor,
how to determine whether or not this block device is currently mounted.

Simple, eh? Except for the silly "/dev/root" stuff that some distros practice.

????????
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Lord
2009-08-17 21:59:08 UTC
Permalink
Hi,
Post by Mark Lord
Mmm.. btrfs appears to configure itself as a "pseudo" filesystem,
which is why it returns fake device numbers via stat(), similar
to procfs or sysfs.
Probably because a single btrfs filesystem can be composed of multiple
devices; one major/minor would not be sufficient.
..

So I'm seeing in the code.

But for the 99% common case (personal computers, one drive), it would be
rather useful it it would comply with filesystem standards there.

In the unlikely event that a btrfs actually is composed of multiple devices,
then in that case perhaps return something nonsensical.

Mmm.. don't we already *have* an LVM layer in Linux?

Seems like a rather bad idea to have a new Linux-specific
filesystem re-implement it's own private LVM, and thus
confuse various disk management tools and the like.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Lord
2009-08-17 22:03:33 UTC
Permalink
Post by Mark Lord
stat(2) seems to return invalid major/minor device info
for btrfs filesystems.
Why? Is this a bug?
Eg.
[~] uname -r
2.6.31-rc6
[~] mkfs.btrfs /dev/sdb
WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using
fs created label (null) on /dev/sdb
nodesize 4096 leafsize 4096 sectorsize 4096 size 30.06GB
Btrfs Btrfs v0.19
[~] mount /dev/sdb /x -t btrfs
[~] stat --format="%04D" /x
0017
[~] touch /x/junk
[~] stat --format="%04D" /x/junk
0017
This gives major=0x00, minor=0x17 for /dev/sdb,
which should have major=8, minor=0x10.
???
Hi,
Post by Mark Lord
Mmm.. btrfs appears to configure itself as a "pseudo" filesystem,
which is why it returns fake device numbers via stat(), similar
to procfs or sysfs.
Probably because a single btrfs filesystem can be composed of multiple
devices; one major/minor would not be sufficient.
..
So I'm seeing in the code.
But for the 99% common case (personal computers, one drive), it would be
rather useful it it would comply with filesystem standards there.
In the unlikely event that a btrfs actually is composed of multiple devices,
then in that case perhaps return something nonsensical.
Mmm.. don't we already *have* an LVM layer in Linux?
Seems like a rather bad idea to have a new Linux-specific
filesystem re-implement it's own private LVM, and thus
confuse various disk management tools and the like.
..

[added linux-kernel to CC: list]

Along those lines -- since btrfs reports invalid device information to stat(2),
then I would suggest that it should also return -ENOTSUP for the FIBMAP and FIEMAP
ioctl() calls. Otherwise, somebody's filesystem is going to get corrupted.

Cheers
Chris Ball
2009-08-17 21:52:17 UTC
Permalink
Hi,
Post by Mark Lord
Mmm.. btrfs appears to configure itself as a "pseudo" filesystem,
which is why it returns fake device numbers via stat(), similar
to procfs or sysfs.
Probably because a single btrfs filesystem can be composed of multiple
devices; one major/minor would not be sufficient.

- Chris.
--
Chris Ball <***@laptop.org>
One Laptop Per Child
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Kay Sievers
2009-08-18 00:29:30 UTC
Permalink
Post by Mark Lord
Chris / list,
stat(2) seems to return invalid major/minor device info
for btrfs filesystems.
Why? =C2=A0Is this a bug?
This is not invalid and not a bug. It's a superblock without a device,
and expected behavior.

There is no one-to-one relation from a btrfs mountpoint to a device,
it's a tree, and therefore therefore there can not be a single
major/minor.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Lord
2009-08-18 02:01:26 UTC
Permalink
Post by Kay Sievers
Post by Mark Lord
Chris / list,
stat(2) seems to return invalid major/minor device info
for btrfs filesystems.
Why? Is this a bug?
This is not invalid and not a bug. It's a superblock without a device,
and expected behavior.
There is no one-to-one relation from a btrfs mountpoint to a device,
it's a tree, and therefore therefore there can not be a single
major/minor.
..

Sure there is for the most common case.
When there is only a single device, stat() should return that device.
When there are several, it should do something different.

But really, it should be using DM/LVM when there are multiple devices.
Chris Samuel
2009-08-18 02:40:54 UTC
Permalink
Post by Mark Lord
But really, it should be using DM/LVM when there are multiple devices.
Chris Mason addressed a number of these points in this January 2008 comment on
LWN - http://lwn.net/Articles/265533/ - especially:

# This is something LVM cannot provide because it cannot maintain
# consistent checksums for the FS.

and

# When multiple devices are present, Btrfs will want to know it is
# mirroring on different physical spindles. This is a challenge
# with LVM since the locations of physical extents can change without
# the FS knowing about it. Even if there were hooks so the FS could
# know the current extent mappings, it would end up duplicating a copy
# of the mappings internally.
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
Jens Axboe
2009-08-18 21:21:49 UTC
Permalink
Post by Mark Lord
Post by Kay Sievers
Post by Mark Lord
Chris / list,
stat(2) seems to return invalid major/minor device info
for btrfs filesystems.
Why? Is this a bug?
This is not invalid and not a bug. It's a superblock without a device,
and expected behavior.
There is no one-to-one relation from a btrfs mountpoint to a device,
it's a tree, and therefore therefore there can not be a single
major/minor.
..
Sure there is for the most common case.
When there is only a single device, stat() should return that device.
When there are several, it should do something different.
I actually think it's quite sane, since then you get the same behaviour
on multi vs single disk file systems. The last thing you want is to have
different behaviour when you later add a disk, for instance.
--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...