SOURCES (LINUX_2_6_11): inotify-2.6.12-rc3.patch (NEW) - recovered...

pluto pluto at pld-linux.org
Sun Jun 26 12:10:59 CEST 2005


Author: pluto                        Date: Sun Jun 26 10:10:59 2005 GMT
Module: SOURCES                       Tag: LINUX_2_6_11
---- Log message:
- recovered (deleted by accident).

---- Files affected:
SOURCES:
   inotify-2.6.12-rc3.patch (1.1.2.2.2.1 -> 1.1.2.2.2.2)  (NEW)

---- Diffs:

================================================================
Index: SOURCES/inotify-2.6.12-rc3.patch
diff -u /dev/null SOURCES/inotify-2.6.12-rc3.patch:1.1.2.2.2.2
--- /dev/null	Sun Jun 26 12:10:59 2005
+++ SOURCES/inotify-2.6.12-rc3.patch	Sun Jun 26 12:10:54 2005
@@ -0,0 +1,1958 @@
+Subject: [patch] latest inotify.
+From: Robert Love <rml at novell.com>
+
+Below is the latest inotify, against 2.6.12-rc3.
+
+Changes since the last post:
+
+	- Explicitly define IN_ALL_EVENTS
+	- Couple of bug fixes related to the recent changes
+	- Fix for Viro's race
+	- Add some rationale to the documentation
+	- Misc. cleanup and such
+
+Enjoy.
+
+	Robert Love
+
+
+inotify!
+
+inotify is intended to correct the deficiencies of dnotify, particularly
+its inability to scale and its terrible user interface:
+
+        * dnotify requires the opening of one fd per each directory
+          that you intend to watch. This quickly results in too many
+          open files and pins removable media, preventing unmount.
+        * dnotify is directory-based. You only learn about changes to
+          directories. Sure, a change to a file in a directory affects
+          the directory, but you are then forced to keep a cache of
+          stat structures.
+        * dnotify's interface to user-space is awful.  Signals?
+
+inotify provides a more usable, simple, powerful solution to file change
+notification:
+
+        * inotify's interface is a device node, not SIGIO.  You open a 
+          single fd to the device node, which is select()-able.
+        * inotify has an event that says "the filesystem that the item
+          you were watching is on was unmounted."
+        * inotify can watch directories or files.
+
+Inotify is currently used by Beagle (a desktop search infrastructure),
+Gamin (a FAM replacement), and other projects.
+
+Signed-off-by: Robert Love <rml at novell.com>
+
+ Documentation/filesystems/inotify.txt |  123 ++++
+ fs/Kconfig                            |   13 
+ fs/Makefile                           |    1 
+ fs/attr.c                             |   33 -
+ fs/compat.c                           |   12 
+ fs/file_table.c                       |    3 
+ fs/inode.c                            |    6 
+ fs/inotify.c                          |  971 ++++++++++++++++++++++++++++++++++
+ fs/namei.c                            |   30 -
+ fs/open.c                             |    4 
+ fs/read_write.c                       |   15 
+ fs/xattr.c                            |    5 
+ include/linux/fs.h                    |    6 
+ include/linux/fsnotify.h              |  230 ++++++++
+ include/linux/inotify.h               |  126 ++++
+ include/linux/sched.h                 |    4 
+ kernel/user.c                         |    4 
+ 17 files changed, 1530 insertions(+), 56 deletions(-)
+
+diff -urN linux-2.6.12-rc3/Documentation/filesystems/inotify.txt linux/Documentation/filesystems/inotify.txt
+--- linux-2.6.12-rc3/Documentation/filesystems/inotify.txt	1969-12-31 19:00:00.000000000 -0500
++++ linux/Documentation/filesystems/inotify.txt	2005-04-28 16:36:13.000000000 -0400
+@@ -0,0 +1,123 @@
++				    inotify
++	     a powerful yet simple file change notification system
++
++
++
++Document started 15 Mar 2005 by Robert Love <rml at novell.com>
++
++(i) User Interface
++
++Inotify is controlled by a device node, /dev/inotify.  If you do not use udev,
++this device may need to be created manually.  First step, open it
++
++	int dev_fd = open ("/dev/inotify", O_RDONLY);
++
++Change events are managed by "watches".  A watch is an (object,mask) pair where
++the object is a file or directory and the mask is a bitmask of one or more
++inotify events that the application wishes to receive.  See <linux/inotify.h>
++for valid events.  A watch is referenced by a watch descriptor, or wd.
++
++Watches are added via a file descriptor.
++
++Watches on a directory will return events on any files inside of the directory.
++
++Adding a watch is simple,
++
++	/* 'wd' represents the watch on fd with mask */
++	struct inotify_request req = { fd, mask };
++	int wd = ioctl (dev_fd, INOTIFY_WATCH, &req);
++
++You can add a large number of files via something like
++
++	for each file to watch {
++		struct inotify_request req;
++		int file_fd;
++
++		file_fd = open (file, O_RDONLY);
++		if (fd < 0) {
++			perror ("open");
++			break;
++		}
++
++		req.fd = file_fd;
++		req.mask = mask;
++
++		wd = ioctl (dev_fd, INOTIFY_WATCH, &req);
++
++		close (fd);
++	}
++
++You can update an existing watch in the same manner, by passing in a new mask.
++
++An existing watch is removed via the INOTIFY_IGNORE ioctl, for example
++
++	ioctl (dev_fd, INOTIFY_IGNORE, wd);
++
++Events are provided in the form of an inotify_event structure that is read(2)
++from /dev/inotify.  The filename is of dynamic length and follows the struct.
++It is of size len.  The filename is padded with null bytes to ensure proper
++alignment.  This padding is reflected in len.
++
++You can slurp multiple events by passing a large buffer, for example
++
++	size_t len = read (fd, buf, BUF_LEN);
++
++Will return as many events as are available and fit in BUF_LEN.
++
++/dev/inotify is also select() and poll() able.
++
++You can find the size of the current event queue via the FIONREAD ioctl.
++
++All watches are destroyed and cleaned up on close.
++
++
++(ii) Internal Kernel Implementation
++
++Each open inotify device is associated with an inotify_device structure.
++
++Each watch is associated with an inotify_watch structure.  Watches are chained
++off of each associated device and each associated inode.
++
++See fs/inotify.c for the locking and lifetime rules.
++
++
++(iii) Rationale
++
++Q: What is the design decision behind not tying the watch to the
++open fd of the watched object?
++
++A: Watches are associated with an open inotify device, not an
++open file.  This solves the primary problem with dnotify:
++keeping the file open pins the file and thus, worse, pins the
++mount.  Dnotify is therefore infeasible for use on a desktop
++system with removable media as the media cannot be unmounted.
++
++Q: What is the design decision behind using an-fd-per-device as
++opposed to an fd-per-watch?
++
++A: An fd-per-watch quickly consumes more file descriptors than
++are allowed, more fd's than are feasible to manage, and more
++fd's than are ideally select()-able.  Yes, root can bump the
++per-process fd limit and yes, users can use epoll, but requiring
++both is silly and an extraneous requirement.  A watch consumes
++less memory than an open file, separating the number spaces is
++thus sensible.  The current design is what user-space developers
++want: Users open the device, once, and add n watches, requiring
++but one fd and no twiddling with fd limits.
++Opening /dev/inotify two thousand times is silly.  If we can
++implement user-space's preferences cleanly--and we can, the idr
++layer makes stuff like this trivial--then we should.
++
++Q: Why a device node?
++
++A: The second biggest problem with dnotify is that the user
++interface sucks ass.  Signals are a terrible, terrible interface
++for file notification.  Or for anything, for that matter.  The
++idea solution, from all perspectives, is a file descriptor based
++one that allows basic file I/O and poll/select.  Obtaining the
++fd and managing the watches could of been done either via a
++device file or a family of new system calls.  We decided to
++implement a device file because adding three or four new system
++calls that mirrored open, close, and ioctl seemed silly.  A
++character device makes sense from user-space and was easy to
++implement inside of the kernel.
+diff -urN linux-2.6.12-rc3/fs/attr.c linux/fs/attr.c
+--- linux-2.6.12-rc3/fs/attr.c	2005-04-27 11:49:45.000000000 -0400
++++ linux/fs/attr.c	2005-04-27 11:50:41.000000000 -0400
+@@ -10,7 +10,7 @@
+ #include <linux/mm.h>
+ #include <linux/string.h>
+ #include <linux/smp_lock.h>
+-#include <linux/dnotify.h>
++#include <linux/fsnotify.h>
+ #include <linux/fcntl.h>
+ #include <linux/quotaops.h>
+ #include <linux/security.h>
+@@ -107,31 +107,8 @@
+ out:
+ 	return error;
+ }
+-
+ EXPORT_SYMBOL(inode_setattr);
+ 
+-int setattr_mask(unsigned int ia_valid)
+-{
+-	unsigned long dn_mask = 0;
+-
+-	if (ia_valid & ATTR_UID)
+-		dn_mask |= DN_ATTRIB;
+-	if (ia_valid & ATTR_GID)
+-		dn_mask |= DN_ATTRIB;
+-	if (ia_valid & ATTR_SIZE)
+-		dn_mask |= DN_MODIFY;
+-	/* both times implies a utime(s) call */
+-	if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
+-		dn_mask |= DN_ATTRIB;
+-	else if (ia_valid & ATTR_ATIME)
+-		dn_mask |= DN_ACCESS;
+-	else if (ia_valid & ATTR_MTIME)
+-		dn_mask |= DN_MODIFY;
+-	if (ia_valid & ATTR_MODE)
+-		dn_mask |= DN_ATTRIB;
+-	return dn_mask;
+-}
+-
+ int notify_change(struct dentry * dentry, struct iattr * attr)
+ {
+ 	struct inode *inode = dentry->d_inode;
+@@ -197,11 +174,9 @@
+ 	if (ia_valid & ATTR_SIZE)
+ 		up_write(&dentry->d_inode->i_alloc_sem);
+ 
+-	if (!error) {
+-		unsigned long dn_mask = setattr_mask(ia_valid);
+-		if (dn_mask)
+-			dnotify_parent(dentry, dn_mask);
+-	}
++	if (!error)
++		fsnotify_change(dentry, ia_valid);
++
+ 	return error;
+ }
+ 
+diff -urN linux-2.6.12-rc3/fs/compat.c linux/fs/compat.c
+--- linux-2.6.12-rc3/fs/compat.c	2005-04-27 11:49:46.000000000 -0400
++++ linux/fs/compat.c	2005-04-27 11:50:41.000000000 -0400
+@@ -37,7 +37,7 @@
+ #include <linux/ctype.h>
+ #include <linux/module.h>
+ #include <linux/dirent.h>
+-#include <linux/dnotify.h>
++#include <linux/fsnotify.h>
+ #include <linux/highuid.h>
+ #include <linux/sunrpc/svc.h>
+ #include <linux/nfsd/nfsd.h>
+@@ -1307,9 +1307,13 @@
+ out:
+ 	if (iov != iovstack)
+ 		kfree(iov);
+-	if ((ret + (type == READ)) > 0)
+-		dnotify_parent(file->f_dentry,
+-				(type == READ) ? DN_ACCESS : DN_MODIFY);
++	if ((ret + (type == READ)) > 0) {
++		struct dentry *dentry = file->f_dentry;
++		if (type == READ)
++			fsnotify_access(dentry);
++		else
++			fsnotify_modify(dentry);
++	}
+ 	return ret;
+ }
+ 
+diff -urN linux-2.6.12-rc3/fs/file_table.c linux/fs/file_table.c
+--- linux-2.6.12-rc3/fs/file_table.c	2005-03-02 02:37:47.000000000 -0500
++++ linux/fs/file_table.c	2005-04-27 11:50:41.000000000 -0400
+@@ -16,6 +16,7 @@
+ #include <linux/eventpoll.h>
+ #include <linux/mount.h>
+ #include <linux/cdev.h>
++#include <linux/fsnotify.h>
+ 
+ /* sysctl tunables... */
+ struct files_stat_struct files_stat = {
+@@ -123,6 +124,8 @@
+ 	struct inode *inode = dentry->d_inode;
+ 
+ 	might_sleep();
++
++	fsnotify_close(file);
+ 	/*
+ 	 * The function eventpoll_release() should be the first called
+ 	 * in the file cleanup chain.
+diff -urN linux-2.6.12-rc3/fs/inode.c linux/fs/inode.c
+--- linux-2.6.12-rc3/fs/inode.c	2005-04-27 11:49:46.000000000 -0400
++++ linux/fs/inode.c	2005-04-27 11:50:41.000000000 -0400
+@@ -21,6 +21,7 @@
+ #include <linux/pagemap.h>
+ #include <linux/cdev.h>
+ #include <linux/bootmem.h>
++#include <linux/inotify.h>
+ 
+ /*
+  * This is needed for the following functions:
+@@ -129,6 +130,10 @@
+ #ifdef CONFIG_QUOTA
+ 		memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
+ #endif
++#ifdef CONFIG_INOTIFY
++		INIT_LIST_HEAD(&inode->inotify_watches);
++		sema_init(&inode->inotify_sem, 1);
++#endif
+ 		inode->i_pipe = NULL;
+ 		inode->i_bdev = NULL;
+ 		inode->i_cdev = NULL;
+@@ -355,6 +360,7 @@
+ 
+ 	down(&iprune_sem);
+ 	spin_lock(&inode_lock);
++	inotify_unmount_inodes(&sb->s_inodes);
+ 	busy = invalidate_list(&sb->s_inodes, &throw_away);
+ 	spin_unlock(&inode_lock);
+ 
+diff -urN linux-2.6.12-rc3/fs/inotify.c linux/fs/inotify.c
+--- linux-2.6.12-rc3/fs/inotify.c	1969-12-31 19:00:00.000000000 -0500
++++ linux/fs/inotify.c	2005-04-28 16:31:13.000000000 -0400
+@@ -0,0 +1,971 @@
++/*
++ * fs/inotify.c - inode-based file event notifications
++ *
++ * Authors:
++ *	John McCutchan	<ttb at tentacle.dhs.org>
++ *	Robert Love	<rml at novell.com>
++ *
++ * Copyright (C) 2005 John McCutchan
++ *
++ * This program is free software; you can redistribute it and/or modify it
++ * under the terms of the GNU General Public License as published by the
++ * Free Software Foundation; either version 2, or (at your option) any
++ * later version.
++ *
++ * This program is distributed in the hope that it will be useful, but
++ * WITHOUT ANY WARRANTY; without even the implied warranty of
++ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++ * General Public License for more details.
++ */
++
++#include <linux/module.h>
++#include <linux/kernel.h>
++#include <linux/sched.h>
++#include <linux/spinlock.h>
++#include <linux/idr.h>
++#include <linux/slab.h>
++#include <linux/fs.h>
++#include <linux/file.h>
++#include <linux/namei.h>
++#include <linux/poll.h>
++#include <linux/device.h>
++#include <linux/miscdevice.h>
++#include <linux/init.h>
++#include <linux/list.h>
++#include <linux/writeback.h>
++#include <linux/inotify.h>
++
++#include <asm/ioctls.h>
++
++static atomic_t inotify_cookie;
++
++static kmem_cache_t *watch_cachep;
++static kmem_cache_t *event_cachep;
++
++static int max_user_devices;
++static int max_user_watches;
++static unsigned int max_queued_events;
++
++/*
++ * Lock ordering:
++ *
++ * dentry->d_lock (used to keep d_move() away from dentry->d_parent)
++ * iprune_sem (synchronize versus shrink_icache_memory())
++ * 	inode_lock (protects the super_block->s_inodes list)
++ * 	inode->inotify_sem (protects inode->inotify_watches and watches->i_list)
++ * 		inotify_dev->sem (protects inotify_device and watches->d_list)
++ */
++
++/*
++ * Lifetimes of the three main data structures--inotify_device, inode, and
++ * inotify_watch--are managed by reference count.
++ *
++ * inotify_device: Lifetime is from open until release.  Additional references
++ * can bump the count via get_inotify_dev() and drop the count via
++ * put_inotify_dev().
++ *
++ * inotify_watch: Lifetime is from create_watch() to destory_watch().
++ * Additional references can bump the count via get_inotify_watch() and drop
++ * the count via put_inotify_watch().
++ *
++ * inode: Pinned so long as the inode is associated with a watch, from
++ * create_watch() to put_inotify_watch().
++ */
++
++/*
++ * struct inotify_device - represents an open instance of an inotify device
++ *
++ * This structure is protected by the semaphore 'sem'.
++ */
++struct inotify_device {
++	wait_queue_head_t 	wq;		/* wait queue for i/o */
++	struct idr		idr;		/* idr mapping wd -> watch */
++	struct semaphore	sem;		/* protects this bad boy */	
++	struct list_head 	events;		/* list of queued events */
++	struct list_head	watches;	/* list of watches */
++	atomic_t		count;		/* reference count */
++	struct user_struct	*user;		/* user who opened this dev */
++	unsigned int		queue_size;	/* size of the queue (bytes) */
++	unsigned int		event_count;	/* number of pending events */
++	unsigned int		max_events;	/* maximum number of events */
++};
++
++/*
++ * struct inotify_kernel_event - An intofiy event, originating from a watch and
++ * queued for user-space.  A list of these is attached to each instance of the
++ * device.  In read(), this list is walked and all events that can fit in the
++ * buffer are returned.
++ *
++ * Protected by dev->sem of the device in which we are queued.
++ */
++struct inotify_kernel_event {
++	struct inotify_event	event;	/* the user-space event */
++	struct list_head        list;	/* entry in inotify_device's list */
++	char			*name;	/* filename, if any */
++};
++
++/*
++ * struct inotify_watch - represents a watch request on a specific inode
++ *
++ * d_list is protected by dev->sem of the associated watch->dev.
++ * i_list and mask are protected by inode->inotify_sem of the associated inode.
++ * dev, inode, and wd are never written to once the watch is created.
++ */
++struct inotify_watch {
++	struct list_head	d_list;	/* entry in inotify_device's list */
++	struct list_head	i_list;	/* entry in inode's list */
++	atomic_t		count;	/* reference count */
++	struct inotify_device	*dev;	/* associated device */
++	struct inode		*inode;	/* associated inode */
++	s32 			wd;	/* watch descriptor */
++	u32			mask;	/* event mask for this watch */
++};
++
++static ssize_t show_max_queued_events(struct class_device *class, char *buf)
++{
++	return sprintf(buf, "%d\n", max_queued_events);
++}
++
++static ssize_t store_max_queued_events(struct class_device *class,
++				       const char *buf, size_t count)
++{
++	unsigned int max;
++
++	if (sscanf(buf, "%u", &max) > 0 && max > 0) {
++		max_queued_events = max;
++		return strlen(buf);
++	}
++	return -EINVAL;
++}
++
++static ssize_t show_max_user_devices(struct class_device *class, char *buf)
++{
++	return sprintf(buf, "%d\n", max_user_devices);
++}
++
++static ssize_t store_max_user_devices(struct class_device *class,
++				      const char *buf, size_t count)
++{
++	int max;
++
++	if (sscanf(buf, "%d", &max) > 0 && max > 0) {
++		max_user_devices = max;
++		return strlen(buf);
++	}
++	return -EINVAL;
++}
++
++static ssize_t show_max_user_watches(struct class_device *class, char *buf)
++{
++	return sprintf(buf, "%d\n", max_user_watches);
++}
++
++static ssize_t store_max_user_watches(struct class_device *class,
++				      const char *buf, size_t count)
++{
++	int max;
++
++	if (sscanf(buf, "%d", &max) > 0 && max > 0) {
++		max_user_watches = max;
++		return strlen(buf);
++	}
++	return -EINVAL;
++}
++
++static CLASS_DEVICE_ATTR(max_queued_events, S_IRUGO | S_IWUSR,
++			 show_max_queued_events, store_max_queued_events);
++static CLASS_DEVICE_ATTR(max_user_devices, S_IRUGO | S_IWUSR,
++			 show_max_user_devices, store_max_user_devices);
++static CLASS_DEVICE_ATTR(max_user_watches, S_IRUGO | S_IWUSR,
++			 show_max_user_watches, store_max_user_watches);
++
++static inline void get_inotify_dev(struct inotify_device *dev)
++{
++	atomic_inc(&dev->count);
++}
++
++static inline void put_inotify_dev(struct inotify_device *dev)
++{
++	if (atomic_dec_and_test(&dev->count)) {
++		atomic_dec(&dev->user->inotify_devs);
++		free_uid(dev->user);
++		kfree(dev);
++	}
++}
++
++static inline void get_inotify_watch(struct inotify_watch *watch)
++{
++	atomic_inc(&watch->count);
++}
++
++/*
++ * put_inotify_watch - decrements the ref count on a given watch.  cleans up
++ * the watch and its references if the count reaches zero.
++ */
++static inline void put_inotify_watch(struct inotify_watch *watch)
++{
++	if (atomic_dec_and_test(&watch->count)) {
++		put_inotify_dev(watch->dev);
++		iput(watch->inode);
++		kmem_cache_free(watch_cachep, watch);
++	}
++}
++
++/*
++ * kernel_event - create a new kernel event with the given parameters
++ *
++ * This function can sleep.
++ */
++static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
++						  const char *name)
++{
++	struct inotify_kernel_event *kevent;
++
++	kevent = kmem_cache_alloc(event_cachep, GFP_KERNEL);
++	if (unlikely(!kevent))
++		return NULL;
++
++	/* we hand this out to user-space, so zero it just in case */
++	memset(&kevent->event, 0, sizeof(struct inotify_event));
++
++	kevent->event.wd = wd;
++	kevent->event.mask = mask;
++	kevent->event.cookie = cookie;
++
++	INIT_LIST_HEAD(&kevent->list);
++
++	if (name) {
++		size_t len, rem, event_size = sizeof(struct inotify_event);
++
++		/*
++		 * We need to pad the filename so as to properly align an
++		 * array of inotify_event structures.  Because the structure is
++		 * small and the common case is a small filename, we just round
++		 * up to the next multiple of the structure's sizeof.  This is
++		 * simple and safe for all architectures.
++		 */
++		len = strlen(name) + 1;
++		rem = event_size - len;
++		if (len > event_size) {
++			rem = event_size - (len % event_size);
++			if (len % event_size == 0)
++				rem = 0;
++		}
++		len += rem;
++
++		kevent->name = kmalloc(len, GFP_KERNEL);
++		if (unlikely(!kevent->name)) {
<<Diff was trimmed, longer than 597 lines>>



More information about the pld-cvs-commit mailing list