Discussion:
btrfs subvolume snapshot hung in btrfs_commit_transaction
Ian! D. Allen
2010-12-08 14:01:03 UTC
Permalink
I've been exercising btrfs doing a continuous loop of:

- delete an old snapshot to keep disk space about the same
- create snapshot from previous snapshot
- rsync root into new snapshot

I have room for 150 snapshots on disk. I delete the oldest, create
the newest, do the rsync into the newest, repeat. It hung today on
snapshot 564:

$ ps uww 24575
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 24575 0.0 0.0 6224 332 pts/10 DN 07:35 0:00 btrfs subvolume snapshot /mnt/sde1/snap564 /mnt/sde1/snap565

$ ps lww 24575
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 0 24575 27716 35 - 6224 332 btrfs_ DN pts/10 0:00 btrfs subvolume snapshot /mnt/sde1/snap564 /mnt/sde1/snap565

$ ps -o wchan 24575
WCHAN
btrfs_commit_transaction

No messages in "dmesg" or kernel log. Anyone want me to run some other
debug tests to find out what is wrong? Anything that tries to access
anything inside the btrfs file system /dev/sde1 hangs uninterruptably:

1 0 1863 2 20 0 0 0 wait_f D ? 0:29 [btrfs-transacti]
4 0 4933 4925 20 0 26524 2864 lookup D+ pts/10 0:02 /bin/bash
1 777 27995 7318 20 0 26576 1784 vfs_re D+ pts/52 0:00 bash
0 777 29395 7284 20 0 21856 688 vfs_re D pts/51 0:00 ls -abp --color=auto /mnt/sde1
0 777 29510 7284 20 0 21856 692 vfs_re D pts/51 0:00 /bin/ls /mnt/sde1

$ ps -o wchan 1863
WCHAN
wait_for_commit

$ ps -o wchan 27995
WCHAN
vfs_readdir
--
| Ian! D. Allen - ***@idallen.ca - Ottawa, Ontario, Canada
| Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ian! D. Allen
2010-12-09 19:46:39 UTC
Permalink
Hello developers - Last chance to get more details on this btrfs hang
(below) before I reboot the machine. Anything I can do to gather
more data?

Linux linux 2.6.35-23-generic #40-Ubuntu SMP Wed Nov 17 22:14:33 UTC 2010 x86_64 GNU/Linux

Description: Ubuntu 10.10

Package: btrfs-tools
Status: install ok installed
Priority: optional
Section: admin
Installed-Size: 1408
Maintainer: Ubuntu Developers <ubuntu-devel-***@lists.ubuntu.com>
Architecture: amd64
Version: 0.19+20100601-3
Depends: e2fslibs (>= 1.37), libc6 (>= 2.7), libcomerr2 (>= 1.01), libuuid1 (>= 2.16), zlib1g (>= 1:1.2.0)
Original-Maintainer: Daniel Baumann <***@lists.debian-maintainers.org>
Homepage: http://btrfs.wiki.kernel.org/
Post by Ian! D. Allen
- delete an old snapshot to keep disk space about the same
- create snapshot from previous snapshot
- rsync root into new snapshot
I have room for 150 snapshots on disk. I delete the oldest, create
the newest, do the rsync into the newest, repeat. It hung today on
$ ps uww 24575
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 24575 0.0 0.0 6224 332 pts/10 DN 07:35 0:00 btrfs subvolume snapshot /mnt/sde1/snap564 /mnt/sde1/snap565
$ ps lww 24575
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 0 24575 27716 35 - 6224 332 btrfs_ DN pts/10 0:00 btrfs subvolume snapshot /mnt/sde1/snap564 /mnt/sde1/snap565
$ ps -o wchan 24575
WCHAN
btrfs_commit_transaction
No messages in "dmesg" or kernel log. Anyone want me to run some other
debug tests to find out what is wrong? Anything that tries to access
1 0 1863 2 20 0 0 0 wait_f D ? 0:29 [btrfs-transacti]
4 0 4933 4925 20 0 26524 2864 lookup D+ pts/10 0:02 /bin/bash
1 777 27995 7318 20 0 26576 1784 vfs_re D+ pts/52 0:00 bash
0 777 29395 7284 20 0 21856 688 vfs_re D pts/51 0:00 ls -abp --color=auto /mnt/sde1
0 777 29510 7284 20 0 21856 692 vfs_re D pts/51 0:00 /bin/ls /mnt/sde1
$ ps -o wchan 1863
WCHAN
wait_for_commit
$ ps -o wchan 27995
WCHAN
vfs_readdir
--
| Ian! D. Allen - ***@idallen.ca - Ottawa, Ontario, Canada
| Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ian! D. Allen
2010-12-12 03:43:39 UTC
Permalink
This is a second try to post this follow-up, with the dmesg log info
deleted. What is the size/length limit on this mailing list?
Post by Ian! D. Allen
Hello developers - Last chance to get more details on this btrfs hang
(below) before I reboot the machine. Anything I can do to gather
more data?
Too late. The machine eventually hung and had to be rebooted. I've saved
that hung btrfs disk partition (185GB) if anyone wants to contact me to
examine it further.

I put in a larger disk (250GB), set up a partition for btrfs, and ran
the same continuous snapshotting test. It got up to creating snapshot
150 and then btrfs hung again. So the bug is repeatable and makes btrfs
0.19 on Ubuntu 10.10 unusable.

# ps laxgww | awk '$10 ~ /D/ {print }'
0 0 3723 28355 35 - 6224 332 btrfs_ DN+ pts/7 0:00 btrfs subvolume snapshot /mnt/sdb1/snap000150 /mnt/sdb1/snap000151
1 0 27334 2 20 0 0 0 wait_f D ? 0:02 [btrfs-transacti]

I ran an "ls" on the root of the btrfs partition and it hung, too,
as does anything trying to access anything in that partition:

0 777 10005 3732 20 0 21856 692 vfs_re D pts/11 0:00 ls -abp --color=auto /mnt/sdb1

Here are the blocked process WCHANs:

# ps -o wchan 3723 27334 10005
WCHAN
btrfs_commit_transaction
wait_for_commit
vfs_readdir

Here is what appeared in /var/log/kernel.log regarding the above processes:

2010-12-11T10:47:32.583745-05:00 linux kernel: [43201.070404] INFO: task btrfs-transacti:27334 blocked for more than 120 seconds.

[... 175 more related dmesg lines deleted ... ]
--
| Ian! D. Allen - ***@idallen.ca - Ottawa, Ontario, Canada
| Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ian! D. Allen
2010-12-12 08:14:00 UTC
Permalink
Post by Ian! D. Allen
I put in a larger disk (250GB), set up a partition for btrfs, and ran
the same continuous snapshotting test. It got up to creating snapshot
150 and then btrfs hung again. So the bug is repeatable and makes
btrfs 0.19 on Ubuntu 10.10 unusable.
I rebooted the above machine (had to use SYSRQ to do it - the regular
shutdown hung) and ran btrfsck on the partition. It produced 50,572
lines of output: 151 different unique root numbers between 5 and 409
and 342 different unique inodes between 32465 and 184349. Sample:

root 5 inode 32465 errors 2000
root 5 inode 32468 errors 2000
root 5 inode 32471 errors 2000
root 5 inode 32477 errors 2000
root 5 inode 32483 errors 2000
root 5 inode 32492 errors 2000
root 5 inode 32495 errors 2000
root 5 inode 32499 errors 2000
root 5 inode 32501 errors 2000
root 5 inode 32504 errors 2000
[... 50,500+ lines deleted ...]
root 409 inode 175629 errors 2000
root 409 inode 175637 errors 2000
root 409 inode 175652 errors 2000
root 409 inode 175655 errors 2000
root 409 inode 175657 errors 2000
root 409 inode 175661 errors 2000
root 409 inode 175667 errors 2000
root 409 inode 175668 errors 2000
root 409 inode 175676 errors 2000
root 409 inode 175677 errors 2000
root 409 inode 175680 errors 2000
root 409 inode 175689 errors 2000
root 409 inode 184349 errors 2000
found 142348439552 bytes used err is 1
total csum bytes: 138091404
total tree bytes: 942841856
total fs tree bytes: 692547584
btree space waste bytes: 158649297
file data blocks allocated: 219977064448
referenced 219976880128
Btrfs Btrfs v0.19

More info:

# btrfs file show
failed to read /dev/fd0u800
failed to read /dev/sr1
failed to read /dev/sr0
failed to read /dev/fd0
Label: none uuid: b9865e4e-1890-4a1b-8f85-597774822f42
Total devices 1 FS bytes used 132.57GB
devid 1 size 232.88GB used 140.29GB path /dev/sdb1
Btrfs Btrfs v0.19

# mount -r /dev/sdb1 /mnt/sdb1

# tail -1 /var/log/kern.log
2010-12-12T02:05:08.962448-05:00 linux kernel: [ 1731.373691] device fsid 1b4a90184e5e86b9-422f82747759858f devid 1 transid 429 /dev/sdb1

# df /mnt/sdb1
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb1 244197560 139932892 104264668 58% /mnt/sdb1

# btrfs file df /mnt/sdb1
Data: total=137.01GB, used=131.69GB
Metadata: total=1.63GB, used=899.14MB
System: total=12.00MB, used=20.00KB

# btrfs subvolume list /mnt/sdb1
ID 258 top level 5 path snap000001
ID 259 top level 5 path snap000002
ID 260 top level 5 path snap000003
ID 261 top level 5 path snap000004
ID 262 top level 5 path snap000005
[...]
ID 405 top level 5 path snap000146
ID 406 top level 5 path snap000147
ID 407 top level 5 path snap000148
ID 408 top level 5 path snap000149
ID 409 top level 5 path snap000150

There are 140 snapshots listed. There should be 150, since my script
creates each snapshot from the previous one without gaps. Inspection
shows that from missing snap000024 onward every 13th snapshot is missing:

snap000024
snap000037
snap000050
snap000063
snap000076
snap000089
snap000102
snap000115
snap000128
snap000141

I had interrupted and restarted my snapshotting script at snap 13 and
so I had deleted (the incomplete) snap 13 and restarted the script to
recreate it and continue on. I also interrupted at snap 24 and did a
similar thing, deleting snap 24 and restarting the script to recreate it.

I interrupted again at snap 74, deleted 74, and restarted, and looking
back at the log file from that time I see now that snaps 24, 37, 50,
and 63 were already "missing" from the "btrfs subvolume list" output
issued by the script starting up at snap 74. So the missing snaps
weren't caused by file system damage caused by the hang and reboot -
they were already missing long before the system hung at snap 151.

On disk, all 150 snapshots have directory entries:

total 608
256 drwxr-xr-x 1 root root 3008 Dec 11 05:08 ./
2621441 drwxr-xr-x 20 root root 4096 Dec 11 03:27 ../
550940 drwxr-xr-x 1 root root 328 Dec 11 03:58 ROOT/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000001/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000002/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000003/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000004/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000005/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000006/
[...]
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000141/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000142/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000143/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000144/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000145/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000146/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000147/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000148/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000149/
256 drwxr-xr-x 1 root root 28 Dec 11 04:33 snap000150/

Running "rsync -avxn snap000023/. snap000024" turns up the expected
number of different files, as does "rsync -avxn snap000024/. snap000025".
So the snapshot 24 "works"; it just isn't listed in the output of btrfs
subvolume list. More testing needed.
--
| Ian! D. Allen - ***@idallen.ca - Ottawa, Ontario, Canada
| Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ian! D. Allen
2010-12-12 09:45:17 UTC
Permalink
Creating new snapshots from previous snapshots eventually causes "btrfs
subvolume list" to omit some of the created snapshots. The set of
omitted snapshots changes as one creates new snapshots.

(This different bug thread was found while exploring this previous thread:
Subject: btrfs subvolume snapshot hung in btrfs_commit_transaction)

What I did:

- take one SATA disk of size 200GB
- create one partition (/dev/sdf1) with fdisk

# mkfs.btrfs /dev/sdf1
WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using
fs created label (null) on /dev/sdf1
nodesize 4096 leafsize 4096 sectorsize 4096 size 186.31GB
Btrfs Btrfs v0.19

# mount -o noatime /dev/sdf1 /mnt/sdf1

- set <previous> = /mnt/sdf1
- set <current> = /mnt/sdf1/snap000001
loop:
# btrfs subvolume snapshot <previous> <current>
- check to make sure "btrfs subvolume list" shows all snapshots
- set <previous> = <current>
- set <current> = <current>+1
GOTO loop

After creating snapshot 10 from snapshot 9, snapshot 6 vanished from
the output of "btrfs subvolume list /dev/sdf1":

# btrfs subv list /mnt/sdf1
ID 256 top level 5 path snap000001
ID 257 top level 5 path snap000002
ID 258 top level 5 path snap000003
ID 259 top level 5 path snap000004
ID 260 top level 5 path snap000005
ID 262 top level 5 path snap000007
ID 263 top level 5 path snap000008
ID 264 top level 5 path snap000009
ID 265 top level 5 path snap000010

All the snapshots, including the missing snap 6, are still on disk:

# ls -la /mnt/sdf1/
total 48
dr-xr-xr-x 1 root root 208 Dec 12 03:54 ./
drwxr-xr-x 20 root root 4096 Dec 12 03:50 ../
drwxr-xr-x 1 root root 3666 Dec 12 03:55 ROOT/
dr-xr-xr-x 1 root root 28 Dec 12 03:54 snap000001/
dr-xr-xr-x 1 root root 28 Dec 12 03:54 snap000002/
dr-xr-xr-x 1 root root 28 Dec 12 03:54 snap000003/
dr-xr-xr-x 1 root root 28 Dec 12 03:54 snap000004/
dr-xr-xr-x 1 root root 28 Dec 12 03:54 snap000005/
dr-xr-xr-x 1 root root 28 Dec 12 03:54 snap000006/
dr-xr-xr-x 1 root root 28 Dec 12 03:54 snap000007/
dr-xr-xr-x 1 root root 28 Dec 12 03:54 snap000008/
dr-xr-xr-x 1 root root 28 Dec 12 03:54 snap000009/
dr-xr-xr-x 1 root root 28 Dec 12 03:54 snap000010/

# du -sh /mnt/sdf1/snap0000*
1.6G /mnt/sdf1/snap000001
1.6G /mnt/sdf1/snap000002
1.6G /mnt/sdf1/snap000003
1.6G /mnt/sdf1/snap000004
1.6G /mnt/sdf1/snap000005
1.6G /mnt/sdf1/snap000006
1.6G /mnt/sdf1/snap000007
1.6G /mnt/sdf1/snap000008
1.6G /mnt/sdf1/snap000009
1.6G /mnt/sdf1/snap000010

The on-disk snapshot appears to be intact and working; it just doesn't
show up in the output of "btrfs subvolume list".

If I keep creating snapshots in the above manner, the list of missing
snapshots changes:

- after creating snap 11, no snapshots are missing from the output (!)
- after creating snap 18, snap 17 goes missing
- after creating snap 21, snaps 4 and 17 are missing
- after creating snap 22, only snap 16 is missing
- after creating snap 27, snaps 3 and 16 are missing
- after creating snap 28, only snap 15 is missing
- after creating snap 29, snaps 15 and 28 are missing
- after creating snap 32, snaps 2, 15, and 28 are missing
- after creating snap 33, snaps 14 and 27 are missing
- after creating snap 39, snaps 25 and 38 are missing
- after creating snap 49, snaps 24 and 37 are missing
...etc...

Surely something is wrong here?
--
| Ian! D. Allen - ***@idallen.ca - Ottawa, Ontario, Canada
| Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2010-12-13 08:47:22 UTC
Permalink
Post by Ian! D. Allen
After creating snapshot 10 from snapshot 9, snapshot 6 vanished from
# btrfs subv list /mnt/sdf1
ID 256 top level 5 path snap000001
ID 257 top level 5 path snap000002
ID 258 top level 5 path snap000003
ID 259 top level 5 path snap000004
ID 260 top level 5 path snap000005
ID 262 top level 5 path snap000007
ID 263 top level 5 path snap000008
ID 264 top level 5 path snap000009
ID 265 top level 5 path snap000010
...
Post by Ian! D. Allen
Surely something is wrong here?
Right, there's something wrong and I've figured it out.
Will send out a fix soon.

Thanks for the report!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ian! D. Allen
2011-01-20 23:01:27 UTC
Permalink
Still getting "btrfs subvolume list" errors with this source:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git

I create ten snapshots, and after creating the tenth one, the sixth one
disappears from "btrfs subvolume list":

# btrfs subvolume list /mnt/sdb1
ID 256 top level 5 path snap000001
ID 257 top level 5 path snap000002
ID 258 top level 5 path snap000003
ID 259 top level 5 path snap000004
ID 260 top level 5 path snap000005
ID 262 top level 5 path snap000007
ID 263 top level 5 path snap000008
ID 264 top level 5 path snap000009
ID 265 top level 5 path snap000010

# ls -lt /mnt/sdb1
total 48
drwxr-xr-x 1 root root 42 Jan 20 17:55 ROOT/
dr-xr-xr-x 1 root root 208 Jan 20 17:55 ./
dr-xr-xr-x 1 root root 28 Jan 20 17:55 snap000001/
dr-xr-xr-x 1 root root 28 Jan 20 17:55 snap000002/
dr-xr-xr-x 1 root root 28 Jan 20 17:55 snap000003/
dr-xr-xr-x 1 root root 28 Jan 20 17:55 snap000004/
dr-xr-xr-x 1 root root 28 Jan 20 17:55 snap000005/
dr-xr-xr-x 1 root root 28 Jan 20 17:55 snap000006/
dr-xr-xr-x 1 root root 28 Jan 20 17:55 snap000007/
dr-xr-xr-x 1 root root 28 Jan 20 17:55 snap000008/
dr-xr-xr-x 1 root root 28 Jan 20 17:55 snap000009/
dr-xr-xr-x 1 root root 28 Jan 20 17:55 snap000010/
drwxr-xr-x 4 root root 4096 Jan 20 15:30 ../
Post by Li Zefan
Post by Ian! D. Allen
After creating snapshot 10 from snapshot 9, snapshot 6 vanished from
# btrfs subv list /mnt/sdf1
ID 256 top level 5 path snap000001
ID 257 top level 5 path snap000002
ID 258 top level 5 path snap000003
ID 259 top level 5 path snap000004
ID 260 top level 5 path snap000005
ID 262 top level 5 path snap000007
ID 263 top level 5 path snap000008
ID 264 top level 5 path snap000009
ID 265 top level 5 path snap000010
...
Post by Ian! D. Allen
Surely something is wrong here?
Right, there's something wrong and I've figured it out.
Will send out a fix soon.
Thanks for the report!
--
| Ian! D. Allen - ***@idallen.ca - Ottawa, Ontario, Canada
| Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2011-01-21 02:03:55 UTC
Permalink
Post by Ian! D. Allen
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
Just because the fix hasn't been merged into the git tree..
Post by Ian! D. Allen
I create ten snapshots, and after creating the tenth one, the sixth one
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ian! D. Allen
2010-12-14 02:23:24 UTC
Permalink
I can reliably get btrfs 0.19 to hang in btrfs_commit_transaction.
Below is another case where it hung after creating just 110 snapshots
of /var/log/. Here is an excerpt from the script log file showing the
Dec 12 10:25:23 EST 2010 snapshot /mnt/sdf1/snap000108 /mnt/sdf1/snap000109
Create a snapshot of '/mnt/sdf1/snap000108' in '/mnt/sdf1/snap000109'
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdf1 195359960 133800028 61559932 69% /mnt/sdf1
Data: total=184.28GB, used=127.29GB
Metadata: total=1.01GB, used=161.60MB
System: total=12.00MB, used=28.00KB
Dec 12 10:26:17 EST 2010 snapshot /mnt/sdf1/snap000109 /mnt/sdf1/snap000110
[... HUNG HERE in btrfs subvolume snapshot /mnt/sdf1/snap000109 /mnt/sdf1/snap000110 ...]

Here's a small example of what's in kern.log:

2010-12-12T10:28:18.721597-05:00 linux kernel: [31921.122910] INFO: task btrfs-transacti:26891 blocked for more than 120 seconds.
2010-12-12T10:28:18.721642-05:00 linux kernel: [31921.122918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2010-12-12T10:28:18.721656-05:00 linux kernel: [31921.122925] btrfs-transac D 00000001003010a3 0 26891 2 0x00000000
2010-12-12T10:28:18.721667-05:00 linux kernel: [31921.122936] ffff8801b1649d40 0000000000000046 0000000000000000 0000000000015980
2010-12-12T10:28:18.721678-05:00 linux kernel: [31921.122946] ffff8801b1649fd8 0000000000015980 ffff8801b1649fd8 ffff8800833f16e0
2010-12-12T10:28:18.721688-05:00 linux kernel: [31921.122956] 0000000000015980 0000000000015980 ffff8801b1649fd8 0000000000015980
2010-12-12T10:28:18.721696-05:00 linux kernel: [31921.122964] Call Trace:
2010-12-12T10:28:18.721708-05:00 linux kernel: [31921.123001] [<ffffffffa05403a9>] wait_for_commit+0x89/0xf0 [btrfs]
2010-12-12T10:28:18.721718-05:00 linux kernel: [31921.123014] [<ffffffff8107f620>] ? autoremove_wake_function+0x0/0x40
2010-12-12T10:28:18.721728-05:00 linux kernel: [31921.123047] [<ffffffffa0541c20>] btrfs_commit_transaction+0x5f0/0x6f0 [btrfs]
2010-12-12T10:28:18.721738-05:00 linux kernel: [31921.123077] [<ffffffffa054212b>] ? start_transaction+0x1ab/0x230 [btrfs]
2010-12-12T10:28:18.721751-05:00 linux kernel: [31921.123087] [<ffffffff8107f620>] ? autoremove_wake_function+0x0/0x40
2010-12-12T10:28:18.721763-05:00 linux kernel: [31921.123115] [<ffffffffa053bf33>] transaction_kthread+0x283/0x290 [btrfs]
2010-12-12T10:28:18.721774-05:00 linux kernel: [31921.123143] [<ffffffffa053bcb0>] ? transaction_kthread+0x0/0x290 [btrfs]
2010-12-12T10:28:18.721782-05:00 linux kernel: [31921.123152] [<ffffffff8107f0c6>] kthread+0x96/0xa0
2010-12-12T10:28:18.721792-05:00 linux kernel: [31921.123161] [<ffffffff8100aee4>] kernel_thread_helper+0x4/0x10
2010-12-12T10:28:18.721800-05:00 linux kernel: [31921.123169] [<ffffffff8107f030>] ? kthread+0x0/0xa0
2010-12-12T10:28:18.721812-05:00 linux kernel: [31921.123177] [<ffffffff8100aee0>] ? kernel_thread_helper+0x0/0x10

After a reboot we try the fsck:

# btrfsck /dev/sdc1
fs tree 5151 refs 110

followed by 110 lines of the form:

unresolved ref root NNNN dir 256 index 4898 namelen 10 name snap000001 error 600

where NNNN goes from 5150 to 5259. Then this closing output:

found 136841752576 bytes used err is 1
total csum bytes: 133469012
total tree bytes: 169484288
total fs tree bytes: 4497408
btree space waste bytes: 22338098
file data blocks allocated: 149437534208
referenced 149425737728
Btrfs Btrfs v0.19
Command exited with non-zero status 1

The "ps" listing showed the same as in my previous messages.
--
| Ian! D. Allen - ***@idallen.ca - Ottawa, Ontario, Canada
| Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2010-12-17 01:47:05 UTC
Permalink
Post by Ian! D. Allen
I can reliably get btrfs 0.19 to hang in btrfs_commit_transaction.
Below is another case where it hung after creating just 110 snapshots
of /var/log/. Here is an excerpt from the script log file showing the
I think this hang is something that sage fixed. Which kernel is this
ubuntu including?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ian! D. Allen
2010-12-17 04:45:12 UTC
Permalink
Post by Chris Mason
I think this hang is something that sage fixed. Which kernel is this
ubuntu including?
All that detail is posted in the second message in the thread you quoted:

http://www.mail-archive.com/linux-***@vger.kernel.org/msg07448.html

Is the btrfsck snapshot error (from Tuesday) fixed too?

http://www.mail-archive.com/linux-***@vger.kernel.org/msg07544.html

I'd love to exercise btrfs, but I can't get even a single snapshot to
pass btrfsck.
--
| Ian! D. Allen - ***@idallen.ca - Ottawa, Ontario, Canada
| Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2010-12-17 14:52:10 UTC
Permalink
Post by Ian! D. Allen
Post by Chris Mason
I think this hang is something that sage fixed. Which kernel is this
ubuntu including?
The 2.6.35 kernel didn't have Sage's fix. I'd say the deadlock will be
fixed in the current btrfs-unstable git tree (against 2.6.36).
Post by Ian! D. Allen
Is the btrfsck snapshot error (from Tuesday) fixed too?
I'm looking into that one, I think it is a btrfsck bug.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...