packages (LINUX_3_0): kernel/kernel-small_fixes.patch - xfs stable fixes qu...

Sat Nov 19 20:36:22 CET 2011

Author: arekm                        Date: Sat Nov 19 19:36:22 2011 GMT
Module: packages                      Tag: LINUX_3_0
---- Log message:
- xfs stable fixes queued for 3.0.11

---- Files affected:
packages/kernel:
   kernel-small_fixes.patch (1.43.2.5 -> 1.43.2.6) 

---- Diffs:

================================================================
Index: packages/kernel/kernel-small_fixes.patch
diff -u packages/kernel/kernel-small_fixes.patch:1.43.2.5 packages/kernel/kernel-small_fixes.patch:1.43.2.6

--- packages/kernel/kernel-small_fixes.patch:1.43.2.5	Fri Nov 18 10:24:43 2011
+++ packages/kernel/kernel-small_fixes.patch	Sat Nov 19 20:36:16 2011
@@ -348,64 +348,6 @@
  				exit
  			fi
  		done
-commit 37b652ec6445be99d0193047d1eda129a1a315d3
-Author: Dave Chinner <dchinner at redhat.com>
-Date:   Thu Aug 25 07:17:01 2011 +0000
-
-    xfs: don't serialise direct IO reads on page cache checks
-    
-    There is no need to grab the i_mutex of the IO lock in exclusive
-    mode if we don't need to invalidate the page cache. Taking these
-    locks on every direct IO effective serialises them as taking the IO
-    lock in exclusive mode has to wait for all shared holders to drop
-    the lock. That only happens when IO is complete, so effective it
-    prevents dispatch of concurrent direct IO reads to the same inode.
-    
-    Fix this by taking the IO lock shared to check the page cache state,
-    and only then drop it and take the IO lock exclusively if there is
-    work to be done. Hence for the normal direct IO case, no exclusive
-    locking will occur.
-    
-    Signed-off-by: Dave Chinner <dchinner at redhat.com>
-    Tested-by: Joern Engel <joern at logfs.org>
-    Reviewed-by: Christoph Hellwig <hch at lst.de>
-    Signed-off-by: Alex Elder <aelder at sgi.com>
-
-diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
-index 7f7b424..8fd4a07 100644
---- a/fs/xfs/linux-2.6/xfs_file.c
-+++ b/fs/xfs/linux-2.6/xfs_file.c
-@@ -317,7 +317,19 @@ xfs_file_aio_read(
- 	if (XFS_FORCED_SHUTDOWN(mp))
- 		return -EIO;
- 
--	if (unlikely(ioflags & IO_ISDIRECT)) {
-+	/*
-+	 * Locking is a bit tricky here. If we take an exclusive lock
-+	 * for direct IO, we effectively serialise all new concurrent
-+	 * read IO to this file and block it behind IO that is currently in
-+	 * progress because IO in progress holds the IO lock shared. We only
-+	 * need to hold the lock exclusive to blow away the page cache, so
-+	 * only take lock exclusively if the page cache needs invalidation.
-+	 * This allows the normal direct IO case of no page cache pages to
-+	 * proceeed concurrently without serialisation.
-+	 */
-+	xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
-+	if ((ioflags & IO_ISDIRECT) && inode->i_mapping->nrpages) {
-+		xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
- 		xfs_rw_ilock(ip, XFS_IOLOCK_EXCL);
- 
- 		if (inode->i_mapping->nrpages) {
-@@ -330,8 +342,7 @@ xfs_file_aio_read(
- 			}
- 		}
- 		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
--	} else
--		xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
-+	}
- 
- 	trace_xfs_file_read(ip, size, iocb->ki_pos, ioflags);
- 
 
 An integer overflow will happen on 64bit archs if task's sum of rss, swapents
 and nr_ptes exceeds (2^31)/1000 value. This was introduced by commit
@@ -993,3 +935,395 @@
  }
  
  /*
+Subject: [PATCH 1/9] [PATCH 1/9] "xfs: fix error handling for synchronous
+	writes"
+
+xfs: fix for hang during synchronous buffer write error
+
+If removed storage while synchronous buffer write underway,
+"xfslogd" hangs.
+
+Detailed log http://oss.sgi.com/archives/xfs/2011-07/msg00740.html
+
+Related work bfc60177f8ab509bc225becbb58f7e53a0e33e81
+"xfs: fix error handling for synchronous writes"
+
+Given that xfs_bwrite actually does the shutdown already after
+waiting for the b_iodone completion and given that we actually
+found that calling xfs_force_shutdown from inside
+xfs_buf_iodone_callbacks was a major contributor the problem
+it better to drop this call.
+
+Signed-off-by: Ajeet Yadav <ajeet.yadav.77 at gmail.com>
+Reviewed-by: Christoph Hellwig <hch at lst.de>
+Signed-off-by: Alex Elder <aelder at sgi.com>
+---
+ fs/xfs/xfs_buf_item.c |    1 -
+ 1 files changed, 0 insertions(+), 1 deletions(-)
+
+diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
+index a7342e8..7888a75 100644
+--- a/fs/xfs/xfs_buf_item.c
++++ b/fs/xfs/xfs_buf_item.c
+@@ -1023,7 +1023,6 @@ xfs_buf_iodone_callbacks(
+ 	XFS_BUF_UNDELAYWRITE(bp);
+ 
+ 	trace_xfs_buf_error_relse(bp, _RET_IP_);
+-	xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
+ 
+ do_callbacks:
+ 	xfs_buf_do_callbacks(bp);
+-- 
+1.7.7
+
+
+_______________________________________________
+xfs mailing list
+xfs at oss.sgi.com
+http://oss.sgi.com/mailman/listinfo/xfs
+Subject: [PATCH 2/9] [PATCH 2/9] xfs: fix xfs_mark_inode_dirty during umount
+
+During umount we do not add a dirty inode to the lru and wait for it to
+become clean first, but force writeback of data and metadata with
+I_WILL_FREE set.  Currently there is no way for XFS to detect that the
+inode has been redirtied for metadata operations, as we skip the
+mark_inode_dirty call during teardown.  Fix this by setting i_update_core
+nanually in that case, so that the inode gets flushed during inode reclaim.
+
+Alternatively we could enable calling mark_inode_dirty for inodes in
+I_WILL_FREE state, and let the VFS dirty tracking handle this.  I decided
+against this as we will get better I/O patterns from reclaim compared to
+the synchronous writeout in write_inode_now, and always marking the inode
+dirty in some way from xfs_mark_inode_dirty is a better safetly net in
+either case.
+
+Signed-off-by: Christoph Hellwig <hch at lst.de>
+Reviewed-by: Dave Chinner <dchinner at redhat.com>
+Signed-off-by: Alex Elder <aelder at sgi.com>
+(cherry picked from commit da6742a5a4cc844a9982fdd936ddb537c0747856)
+
+Signed-off-by: Alex Elder <aelder at sgi.com>
+---
+ fs/xfs/linux-2.6/xfs_iops.c |   14 +++++++++++---
+ 1 files changed, 11 insertions(+), 3 deletions(-)
+
+diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
+index d44d92c..a9b3e1e 100644
+--- a/fs/xfs/linux-2.6/xfs_iops.c
++++ b/fs/xfs/linux-2.6/xfs_iops.c
+@@ -69,9 +69,8 @@ xfs_synchronize_times(
+ }
+ 
+ /*
+- * If the linux inode is valid, mark it dirty.
+- * Used when committing a dirty inode into a transaction so that
+- * the inode will get written back by the linux code
++ * If the linux inode is valid, mark it dirty, else mark the dirty state
++ * in the XFS inode to make sure we pick it up when reclaiming the inode.
+  */
+ void
+ xfs_mark_inode_dirty_sync(
+@@ -81,6 +80,10 @@ xfs_mark_inode_dirty_sync(
+ 
+ 	if (!(inode->i_state & (I_WILL_FREE|I_FREEING)))
+ 		mark_inode_dirty_sync(inode);
++	else {
++		barrier();
++		ip->i_update_core = 1;
++	}
+ }
+ 
+ void
+@@ -91,6 +94,11 @@ xfs_mark_inode_dirty(
+ 
+ 	if (!(inode->i_state & (I_WILL_FREE|I_FREEING)))
+ 		mark_inode_dirty(inode);
++	else {
++		barrier();
++		ip->i_update_core = 1;
++	}
++
+ }
+ 
+ /*
+-- 
+1.7.7
+
+
+_______________________________________________
+xfs mailing list
+xfs at oss.sgi.com
+http://oss.sgi.com/mailman/listinfo/xfs
+Subject: [PATCH 3/9] [PATCH 3/9] xfs: fix ->write_inode return values
+
+Currently we always redirty an inode that was attempted to be written out
+synchronously but has been cleaned by an AIL pushed internall, which is
+rather bogus.  Fix that by doing the i_update_core check early on and
+return 0 for it.  Also include async calls for it, as doing any work for
+those is just as pointless.  While we're at it also fix the sign for the
+EIO return in case of a filesystem shutdown, and fix the completely
+non-sensical locking around xfs_log_inode.
+
+Signed-off-by: Christoph Hellwig <hch at lst.de>
+Reviewed-by: Dave Chinner <dchinner at redhat.com>
+Signed-off-by: Alex Elder <aelder at sgi.com>
+(cherry picked from commit 297db93bb74cf687510313eb235a7aec14d67e97)
+
+Signed-off-by: Alex Elder <aelder at sgi.com>
+---
+ fs/xfs/linux-2.6/xfs_super.c |   34 +++++++++-------------------------
+ 1 files changed, 9 insertions(+), 25 deletions(-)
+
+diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
+index 347cae9..28de70b 100644
+--- a/fs/xfs/linux-2.6/xfs_super.c
++++ b/fs/xfs/linux-2.6/xfs_super.c
+@@ -878,33 +878,17 @@ xfs_log_inode(
+ 	struct xfs_trans	*tp;
+ 	int			error;
+ 
+-	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+ 	tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
+ 	error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0);
+-
+ 	if (error) {
+ 		xfs_trans_cancel(tp, 0);
+-		/* we need to return with the lock hold shared */
+-		xfs_ilock(ip, XFS_ILOCK_SHARED);
+ 		return error;
+ 	}
+ 
+ 	xfs_ilock(ip, XFS_ILOCK_EXCL);
+-
+-	/*
+-	 * Note - it's possible that we might have pushed ourselves out of the
+-	 * way during trans_reserve which would flush the inode.  But there's
+-	 * no guarantee that the inode buffer has actually gone out yet (it's
+-	 * delwri).  Plus the buffer could be pinned anyway if it's part of
+-	 * an inode in another recent transaction.  So we play it safe and
+-	 * fire off the transaction anyway.
+-	 */
+-	xfs_trans_ijoin(tp, ip);
++	xfs_trans_ijoin_ref(tp, ip, XFS_ILOCK_EXCL);
+ 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+-	error = xfs_trans_commit(tp, 0);
+-	xfs_ilock_demote(ip, XFS_ILOCK_EXCL);
+-
+-	return error;
++	return xfs_trans_commit(tp, 0);
+ }
+ 
+ STATIC int
+@@ -919,7 +903,9 @@ xfs_fs_write_inode(
+ 	trace_xfs_write_inode(ip);
+ 
+ 	if (XFS_FORCED_SHUTDOWN(mp))
+-		return XFS_ERROR(EIO);
++		return -XFS_ERROR(EIO);
++	if (!ip->i_update_core)
++		return 0;
+ 
+ 	if (wbc->sync_mode == WB_SYNC_ALL) {
+ 		/*
+@@ -930,12 +916,10 @@ xfs_fs_write_inode(
+ 		 * of synchronous log foces dramatically.
+ 		 */
+ 		xfs_ioend_wait(ip);
+-		xfs_ilock(ip, XFS_ILOCK_SHARED);
+-		if (ip->i_update_core) {
+-			error = xfs_log_inode(ip);
+-			if (error)
+-				goto out_unlock;
+-		}
++		error = xfs_log_inode(ip);
++		if (error)
++			goto out;
++		return 0;
+ 	} else {
+ 		/*
+ 		 * We make this non-blocking if the inode is contended, return
+-- 
+1.7.7
+
+
+_______________________________________________
+xfs mailing list
+xfs at oss.sgi.com
+http://oss.sgi.com/mailman/listinfo/xfs
+Subject: [PATCH 4/9] [PATCH 4/9] xfs: dont serialise direct IO reads on page
+	cache
+
+There is no need to grab the i_mutex of the IO lock in exclusive
+mode if we don't need to invalidate the page cache. Taking these
+locks on every direct IO effective serialises them as taking the IO
+lock in exclusive mode has to wait for all shared holders to drop
+the lock. That only happens when IO is complete, so effective it
+prevents dispatch of concurrent direct IO reads to the same inode.
+
+Fix this by taking the IO lock shared to check the page cache state,
+and only then drop it and take the IO lock exclusively if there is
+work to be done. Hence for the normal direct IO case, no exclusive
+locking will occur.
+
+Signed-off-by: Dave Chinner <dchinner at redhat.com>
+Tested-by: Joern Engel <joern at logfs.org>
+Reviewed-by: Christoph Hellwig <hch at lst.de>
+Signed-off-by: Alex Elder <aelder at sgi.com>
+---
+ fs/xfs/linux-2.6/xfs_file.c |   17 ++++++++++++++---
+ 1 files changed, 14 insertions(+), 3 deletions(-)
+
+diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
+index 7f782af2..93cc02d 100644
+--- a/fs/xfs/linux-2.6/xfs_file.c
++++ b/fs/xfs/linux-2.6/xfs_file.c
+@@ -309,7 +309,19 @@ xfs_file_aio_read(
+ 	if (XFS_FORCED_SHUTDOWN(mp))
+ 		return -EIO;
+ 
+-	if (unlikely(ioflags & IO_ISDIRECT)) {
++	/*
++	 * Locking is a bit tricky here. If we take an exclusive lock
++	 * for direct IO, we effectively serialise all new concurrent
++	 * read IO to this file and block it behind IO that is currently in
++	 * progress because IO in progress holds the IO lock shared. We only
++	 * need to hold the lock exclusive to blow away the page cache, so
++	 * only take lock exclusively if the page cache needs invalidation.
++	 * This allows the normal direct IO case of no page cache pages to
++	 * proceeed concurrently without serialisation.
++	 */
++	xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
++	if ((ioflags & IO_ISDIRECT) && inode->i_mapping->nrpages) {
++		xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
+ 		xfs_rw_ilock(ip, XFS_IOLOCK_EXCL);
+ 
+ 		if (inode->i_mapping->nrpages) {
+@@ -322,8 +334,7 @@ xfs_file_aio_read(
+ 			}
+ 		}
+ 		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
+-	} else
+-		xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
++	}
+ 
+ 	trace_xfs_file_read(ip, size, iocb->ki_pos, ioflags);
+ 
+-- 
+1.7.7
+
+
+_______________________________________________
+xfs mailing list
+xfs at oss.sgi.com
+http://oss.sgi.com/mailman/listinfo/xfs
+Subject: [PATCH 5/9] [PATCH 5/9] xfs: avoid direct I/O write vs buffered I/O
+	race
+
+Currently a buffered reader or writer can add pages to the pagecache
+while we are waiting for the iolock in xfs_file_dio_aio_write.  Prevent
+this by re-checking mapping->nrpages after we got the iolock, and if
+nessecary upgrade the lock to exclusive mode.  To simplify this a bit
+only take the ilock inside of xfs_file_aio_write_checks.
+
+Signed-off-by: Christoph Hellwig <hch at lst.de>
+Reviewed-by: Dave Chinner <dchinner at redhat.com>
+Signed-off-by: Alex Elder <aelder at sgi.com>
+---
+ fs/xfs/linux-2.6/xfs_file.c |   17 ++++++++++++++---
+ 1 files changed, 14 insertions(+), 3 deletions(-)
+
+diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
+index 93cc02d..b679198 100644
+--- a/fs/xfs/linux-2.6/xfs_file.c
++++ b/fs/xfs/linux-2.6/xfs_file.c
+@@ -669,6 +669,7 @@ xfs_file_aio_write_checks(
+ 	xfs_fsize_t		new_size;
+ 	int			error = 0;
+ 
++	xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
+ 	error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
+ 	if (error) {
+ 		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
+@@ -760,14 +761,24 @@ xfs_file_dio_aio_write(
+ 		*iolock = XFS_IOLOCK_EXCL;
+ 	else
+ 		*iolock = XFS_IOLOCK_SHARED;
+-	xfs_rw_ilock(ip, XFS_ILOCK_EXCL | *iolock);
++	xfs_rw_ilock(ip, *iolock);
+ 
+ 	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
+ 	if (ret)
+ 		return ret;
+ 
++	/*
++	 * Recheck if there are cached pages that need invalidate after we got
++	 * the iolock to protect against other threads adding new pages while
++	 * we were waiting for the iolock.
++	 */
++	if (mapping->nrpages && *iolock == XFS_IOLOCK_SHARED) {
++		xfs_rw_iunlock(ip, *iolock);
++		*iolock = XFS_IOLOCK_EXCL;
++		xfs_rw_ilock(ip, *iolock);
++	}
++
+ 	if (mapping->nrpages) {
+-		WARN_ON(*iolock != XFS_IOLOCK_EXCL);
+ 		ret = -xfs_flushinval_pages(ip, (pos & PAGE_CACHE_MASK), -1,
+ 							FI_REMAPF_LOCKED);
+ 		if (ret)
+@@ -812,7 +823,7 @@ xfs_file_buffered_aio_write(
+ 	size_t			count = ocount;
+ 
+ 	*iolock = XFS_IOLOCK_EXCL;
+-	xfs_rw_ilock(ip, XFS_ILOCK_EXCL | *iolock);
++	xfs_rw_ilock(ip, *iolock);
+ 
+ 	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
+ 	if (ret)
+-- 
+1.7.7
+
+
+_______________________________________________
+xfs mailing list
+xfs at oss.sgi.com
+http://oss.sgi.com/mailman/listinfo/xfs
+Subject: [PATCH 6/9] [PATCH 6/9] xfs: Return -EIO when xfs_vn_getattr() failed
+
+An attribute of inode can be fetched via xfs_vn_getattr() in XFS.
+Currently it returns EIO, not negative value, when it failed.  As a
+result, the system call returns not negative value even though an
+error occured. The stat(2), ls and mv commands cannot handle this
+error and do not work correctly.
+
+This patch fixes this bug, and returns -EIO, not EIO when an error
+is detected in xfs_vn_getattr().
+
+Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu at hitachi.com>
+Reviewed-by: Christoph Hellwig <hch at lst.de>
+Signed-off-by: Alex Elder <aelder at sgi.com>
+---
+ fs/xfs/linux-2.6/xfs_iops.c |    2 +-
+ 1 files changed, 1 insertions(+), 1 deletions(-)
+
+diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
+index a9b3e1e..f5b697b 100644
+--- a/fs/xfs/linux-2.6/xfs_iops.c
++++ b/fs/xfs/linux-2.6/xfs_iops.c
+@@ -464,7 +464,7 @@ xfs_vn_getattr(
+ 	trace_xfs_getattr(ip);
+ 
+ 	if (XFS_FORCED_SHUTDOWN(mp))
+-		return XFS_ERROR(EIO);
++		return -XFS_ERROR(EIO);
+ 
+ 	stat->size = XFS_ISIZE(ip);
+ 	stat->dev = inode->i_sb->s_dev;
+-- 
+1.7.7
+
+
+_______________________________________________
+xfs mailing list
+xfs at oss.sgi.com
+http://oss.sgi.com/mailman/listinfo/xfs
================================================================

---- CVS-web:
    http://cvs.pld-linux.org/cgi-bin/cvsweb.cgi/packages/kernel/kernel-small_fixes.patch?r1=1.43.2.5&r2=1.43.2.6&f=u