SOURCES: kernel-desktop-tuxonice.patch (NEW) - raw uncompressed fr...
glen
glen at pld-linux.org
Tue Apr 8 23:13:47 CEST 2008
Author: glen Date: Tue Apr 8 21:13:46 2008 GMT
Module: SOURCES Tag: HEAD
---- Log message:
- raw uncompressed from http://www.tuxonice.net/downloads/all/tuxonice-3.0-rc5-for-2.6.24.patch.bz2
---- Files affected:
SOURCES:
kernel-desktop-tuxonice.patch (NONE -> 1.1) (NEW)
---- Diffs:
================================================================
Index: SOURCES/kernel-desktop-tuxonice.patch
diff -u /dev/null SOURCES/kernel-desktop-tuxonice.patch:1.1
--- /dev/null Tue Apr 8 23:13:46 2008
+++ SOURCES/kernel-desktop-tuxonice.patch Tue Apr 8 23:13:41 2008
@@ -0,0 +1,19998 @@
+diff --git a/Documentation/power/tuxonice-internals.txt b/Documentation/power/tuxonice-internals.txt
+new file mode 100644
+index 0000000..2247939
+--- /dev/null
++++ b/Documentation/power/tuxonice-internals.txt
+@@ -0,0 +1,469 @@
++ TuxOnIce 2.2 Internal Documentation.
++ Updated to 18 September 2007
++
++1. Introduction.
++
++ TuxOnIce 2.2 is an addition to the Linux Kernel, designed to
++ allow the user to quickly shutdown and quickly boot a computer, without
++ needing to close documents or programs. It is equivalent to the
++ hibernate facility in some laptops. This implementation, however,
++ requires no special BIOS or hardware support.
++
++ The code in these files is based upon the original implementation
++ prepared by Gabor Kuti and additional work by Pavel Machek and a
++ host of others. This code has been substantially reworked by Nigel
++ Cunningham, again with the help and testing of many others, not the
++ least of whom is Michael Frank. At its heart, however, the operation is
++ essentially the same as Gabor's version.
++
++2. Overview of operation.
++
++ The basic sequence of operations is as follows:
++
++ a. Quiesce all other activity.
++ b. Ensure enough memory and storage space are available, and attempt
++ to free memory/storage if necessary.
++ c. Allocate the required memory and storage space.
++ d. Write the image.
++ e. Power down.
++
++ There are a number of complicating factors which mean that things are
++ not as simple as the above would imply, however...
++
++ o The activity of each process must be stopped at a point where it will
++ not be holding locks necessary for saving the image, or unexpectedly
++ restart operations due to something like a timeout and thereby make
++ our image inconsistent.
++
++ o It is desirous that we sync outstanding I/O to disk before calculating
++ image statistics. This reduces corruption if one should suspend but
++ then not resume, and also makes later parts of the operation safer (see
++ below).
++
++ o We need to get as close as we can to an atomic copy of the data.
++ Inconsistencies in the image will result in inconsistent memory contents at
++ resume time, and thus in instability of the system and/or file system
++ corruption. This would appear to imply a maximum image size of one half of
++ the amount of RAM, but we have a solution... (again, below).
++
++ o In 2.6, we choose to play nicely with the other suspend-to-disk
++ implementations.
++
++3. Detailed description of internals.
++
++ a. Quiescing activity.
++
++ Safely quiescing the system is achieved using three separate but related
++ aspects.
++
++ First, we note that the vast majority of processes don't need to run during
++ suspend. They can be 'frozen'. We therefore implement a refrigerator
++ routine, which processes enter and in which they remain until the cycle is
++ complete. Processes enter the refrigerator via try_to_freeze() invocations
++ at appropriate places. A process cannot be frozen in any old place. It
++ must not be holding locks that will be needed for writing the image or
++ freezing other processes. For this reason, userspace processes generally
++ enter the refrigerator via the signal handling code, and kernel threads at
++ the place in their event loops where they drop locks and yield to other
++ processes or sleep.
++
++ The task of freezing processes is complicated by the fact that there can be
++ interdependencies between processes. Freezing process A before process B may
++ mean that process B cannot be frozen, because it stops at waiting for
++ process A rather than in the refrigerator. This issue is seen where
++ userspace waits on freezeable kernel threads or fuse filesystem threads. To
++ address this issue, we implement the following algorithm for quiescing
++ activity:
++
++ - Freeze filesystems (including fuse - userspace programs starting
++ new requests are immediately frozen; programs already running
++ requests complete their work before being frozen in the next
++ step)
++ - Freeze userspace
++ - Thaw filesystems (this is safe now that userspace is frozen and no
++ fuse requests are outstanding).
++ - Invoke sys_sync (noop on fuse).
++ - Freeze filesystems
++ - Freeze kernel threads
++
++ If we need to free memory, we thaw kernel threads and filesystems, but not
++ userspace. We can then free caches without worrying about deadlocks due to
++ swap files being on frozen filesystems or such like.
++
++ b. Ensure enough memory & storage are available.
++
++ We have a number of constraints to meet in order to be able to successfully
++ suspend and resume.
++
++ First, the image will be written in two parts, described below. One of these
++ parts needs to have an atomic copy made, which of course implies a maximum
++ size of one half of the amount of system memory. The other part ('pageset')
++ is not atomically copied, and can therefore be as large or small as desired.
++
++ Second, we have constraints on the amount of storage available. In these
++ calculations, we may also consider any compression that will be done. The
++ cryptoapi module allows the user to configure an expected compression ratio.
++
++ Third, the user can specify an arbitrary limit on the image size, in
++ megabytes. This limit is treated as a soft limit, so that we don't fail the
++ attempt to suspend if we cannot meet this constraint.
++
++ c. Allocate the required memory and storage space.
++
++ Having done the initial freeze, we determine whether the above constraints
++ are met, and seek to allocate the metadata for the image. If the constraints
++ are not met, or we fail to allocate the required space for the metadata, we
++ seek to free the amount of memory that we calculate is needed and try again.
++ We allow up to four iterations of this loop before aborting the cycle. If we
++ do fail, it should only be because of a bug in Suspend's calculations.
++
++ These steps are merged together in the prepare_image function, found in
++ prepare_image.c. The functions are merged because of the cyclical nature
++ of the problem of calculating how much memory and storage is needed. Since
++ the data structures containing the information about the image must
++ themselves take memory and use storage, the amount of memory and storage
++ required changes as we prepare the image. Since the changes are not large,
++ only one or two iterations will be required to achieve a solution.
++
++ The recursive nature of the algorithm is miminised by keeping user space
++ frozen while preparing the image, and by the fact that our records of which
++ pages are to be saved and which pageset they are saved in use bitmaps (so
++ that changes in number or fragmentation of the pages to be saved don't
++ feedback via changes in the amount of memory needed for metadata). The
++ recursiveness is thus limited to any extra slab pages allocated to store the
++ extents that record storage used, and he effects of seeking to free memory.
++
++ d. Write the image.
++
++ We previously mentioned the need to create an atomic copy of the data, and
++ the half-of-memory limitation that is implied in this. This limitation is
++ circumvented by dividing the memory to be saved into two parts, called
++ pagesets.
++
++ Pageset2 contains the page cache - the pages on the active and inactive
++ lists. These pages aren't needed or modifed while TuxOnIce is running, so
++ they can be safely written without an atomic copy. They are therefore
++ saved first and reloaded last. While saving these pages, TuxOnIce carefully
++ ensures that the work of writing the pages doesn't make the image
++ inconsistent.
++
++ Once pageset2 has been saved, we prepare to do the atomic copy of remaining
++ memory. As part of the preparation, we power down drivers, thereby providing
++ them with the opportunity to have their state recorded in the image. The
++ amount of memory allocated by drivers for this is usually negligible, but if
++ DRI is in use, video drivers may require significants amounts. Ideally we
++ would be able to query drivers while preparing the image as to the amount of
++ memory they will need. Unfortunately no such mechanism exists at the time of
++ writing. For this reason, TuxOnIce allows the user to set an
++ 'extra_pages_allowance', which is used to seek to ensure sufficient memory
++ is available for drivers at this point. TuxOnIce also lets the user set this
++ value to 0. In this case, a test driver suspend is done while preparing the
++ image, and the difference (plus a margin) used instead.
++
++ Having suspended the drivers, we save the CPU context before making an
++ atomic copy of pageset1, resuming the drivers and saving the atomic copy.
++ After saving the two pagesets, we just need to save our metadata before
++ powering down.
++
++ As we mentioned earlier, the contents of pageset2 pages aren't needed once
++ they've been saved. We therefore use them as the destination of our atomic
++ copy. In the unlikely event that pageset1 is larger, extra pages are
++ allocated while the image is being prepared. This is normally only a real
++ possibility when the system has just been booted and the page cache is
++ small.
++
++ This is where we need to be careful about syncing, however. Pageset2 will
++ probably contain filesystem meta data. If this is overwritten with pageset1
++ and then a sync occurs, the filesystem will be corrupted - at least until
++ resume time and another sync of the restored data. Since there is a
++ possibility that the user might not resume or (may it never be!) that
++ suspend might oops, we do our utmost to avoid syncing filesystems after
++ copying pageset1.
++
++ e. Power down.
++
++ Powering down uses standard kernel routines. TuxOnIce supports powering down
++ using the ACPI S3, S4 and S5 methods or the kernel's non-ACPI power-off.
++ Supporting suspend to ram (S3) as a power off option might sound strange,
++ but it allows the user to quickly get their system up and running again if
++ the battery doesn't run out (we just need to re-read the overwritten pages)
++ and if the battery does run out (or the user removes power), they can still
++ resume.
++
++4. Data Structures.
++
++ TuxOnIce uses three main structures to store its metadata and configuration
++ information:
++
++ a) Pageflags bitmaps.
++
++ Suspend records which pages will be in pageset1, pageset2, the destination
++ of the atomic copy and the source of the atomically restored image using
++ bitmaps. These bitmaps are created from order zero allocations to maximise
++ reliability. The individual pages are combined together with pointers to
++ form per-zone bitmaps, which are in turn combined with another layer of
++ pointers to construct the overall bitmap.
++
++ The pageset1 bitmap is thus easily stored in the image header for use at
++ resume time.
++
++ As mentioned above, using bitmaps also means that the amount of memory and
++ storage required for recording the above information is constant. This
++ greatly simplifies the work of preparing the image. In earlier versions of
++ TuxOnIce, extents were used to record which pages would be stored. In that
++ case, however, eating memory could result in greater fragmentation of the
++ lists of pages, which in turn required more memory to store the extents and
++ more storage in the image header. These could in turn require further
++ freeing of memory, and another iteration. All of this complexity is removed
++ by having bitmaps.
++
++ Bitmaps also make a lot of sense because TuxOnIce only ever iterates
++ through the lists. There is therefore no cost to not being able to find the
++ nth page in order 0 time. We only need to worry about the cost of finding
++ the n+1th page, given the location of the nth page. Bitwise optimisations
++ help here.
++
++ The data structure is: unsigned long ***.
++
++ b) Extents for block data.
++
++ TuxOnIce supports writing the image to multiple block devices. In the case
++ of swap, multiple partitions and/or files may be in use, and we happily use
++ them all. This is accomplished as follows:
++
++ Whatever the actual source of the allocated storage, the destination of the
++ image can be viewed in terms of one or more block devices, and on each
++ device, a list of sectors. To simplify matters, we only use contiguous,
++ PAGE_SIZE aligned sectors, like the swap code does.
++
++ Since sector numbers on each bdev may well not start at 0, it makes much
++ more sense to use extents here. Contiguous ranges of pages can thus be
++ represented in the extents by contiguous values.
++
++ Variations in block size are taken account of in transforming this data
++ into the parameters for bio submission.
++
++ We can thus implement a layer of abstraction wherein the core of TuxOnIce
++ doesn't have to worry about which device we're currently writing to or
++ where in the device we are. It simply requests that the next page in the
++ pageset or header be written, leaving the details to this lower layer.
++ The lower layer remembers where in the sequence of devices and blocks each
++ pageset starts. The header always starts at the beginning of the allocated
++ storage.
++
++ So extents are:
++
++ struct extent {
++ unsigned long minimum, maximum;
++ struct extent *next;
++ }
++
++ These are combined into chains of extents for a device:
++
++ struct extent_chain {
++ int size; /* size of the extent ie sum (max-min+1) */
++ int allocs, frees;
++ char *name;
++ struct extent *first, *last_touched;
++ };
++
++ For each bdev, we need to store a little more info:
++
++ struct suspend_bdev_info {
++ struct block_device *bdev;
++ dev_t dev_t;
++ int bmap_shift;
++ int blocks_per_page;
++ };
++
++ The dev_t is used to identify the device in the stored image. As a result,
++ we expect devices at resume time to have the same major and minor numbers
++ as they had while suspending. This is primarily a concern where the user
++ utilises LVM for storage, as they will need to dmsetup their partitions in
++ such a way as to maintain this consistency at resume time.
++
++ bmap_shift and blocks_per_page record apply the effects of variations in
++ blocks per page settings for the filesystem and underlying bdev. For most
++ filesystems, these are the same, but for xfs, they can have independant
++ values.
++
++ Combining these two structures together, we have everything we need to
++ record what devices and what blocks on each device are being used to
++ store the image, and to submit i/o using bio_submit.
++
++ The last elements in the picture are a means of recording how the storage
++ is being used.
++
++ We do this first and foremost by implementing a layer of abstraction on
++ top of the devices and extent chains which allows us to view however many
++ devices there might be as one long storage tape, with a single 'head' that
++ tracks a 'current position' on the tape:
++
++ struct extent_iterate_state {
++ struct extent_chain *chains;
++ int num_chains;
++ int current_chain;
++ struct extent *current_extent;
++ unsigned long current_offset;
++ };
++
++ That is, *chains points to an array of size num_chains of extent chains.
++ For the filewriter, this is always a single chain. For the swapwriter, the
++ array is of size MAX_SWAPFILES.
++
++ current_chain, current_extent and current_offset thus point to the current
++ index in the chains array (and into a matching array of struct
++ suspend_bdev_info), the current extent in that chain (to optimise access),
++ and the current value in the offset.
++
++ The image is divided into three parts:
++ - The header
++ - Pageset 1
++ - Pageset 2
++
++ The header always starts at the first device and first block. We know its
++ size before we begin to save the image because we carefully account for
++ everything that will be stored in it.
++
++ The second pageset (LRU) is stored first. It begins on the next page after
++ the end of the header.
++
++ The first pageset is stored second. It's start location is only known once
++ pageset2 has been saved, since pageset2 may be compressed as it is written.
++ This location is thus recorded at the end of saving pageset2. It is page
++ aligned also.
++
++ Since this information is needed at resume time, and the location of extents
++ in memory will differ at resume time, this needs to be stored in a portable
++ way:
++
++ struct extent_iterate_saved_state {
++ int chain_num;
++ int extent_num;
++ unsigned long offset;
++ };
++
++ We can thus implement a layer of abstraction wherein the core of TuxOnIce
++ doesn't have to worry about which device we're currently writing to or
++ where in the device we are. It simply requests that the next page in the
++ pageset or header be written, leaving the details to this layer, and
++ invokes the routines to remember and restore the position, without having
++ to worry about the details of how the data is arranged on disk or such like.
++
++ c) Modules
++
++ One aim in designing TuxOnIce was to make it flexible. We wanted to allow
++ for the implementation of different methods of transforming a page to be
++ written to disk and different methods of getting the pages stored.
++
++ In early versions (the betas and perhaps Suspend1), compression support was
++ inlined in the image writing code, and the data structures and code for
++ managing swap were intertwined with the rest of the code. A number of people
++ had expressed interest in implementing image encryption, and alternative
++ methods of storing the image.
++
++ In order to achieve this, TuxOnIce was given a modular design.
++
++ A module is a single file which encapsulates the functionality needed
++ to transform a pageset of data (encryption or compression, for example),
++ or to write the pageset to a device. The former type of module is called
++ a 'page-transformer', the later a 'writer'.
++
++ Modules are linked together in pipeline fashion. There may be zero or more
++ page transformers in a pipeline, and there is always exactly one writer.
++ The pipeline follows this pattern:
++
++ ---------------------------------
++ | TuxOnIce Core |
++ ---------------------------------
++ |
++ |
++ ---------------------------------
++ | Page transformer 1 |
++ ---------------------------------
++ |
++ |
++ ---------------------------------
++ | Page transformer 2 |
++ ---------------------------------
++ |
++ |
++ ---------------------------------
++ | Writer |
++ ---------------------------------
++
++ During the writing of an image, the core code feeds pages one at a time
++ to the first module. This module performs whatever transformations it
++ implements on the incoming data, completely consuming the incoming data and
++ feeding output in a similar manner to the next module. A module may buffer
++ its output.
++
++ During reading, the pipeline works in the reverse direction. The core code
++ calls the first module with the address of a buffer which should be filled.
++ (Note that the buffer size is always PAGE_SIZE at this time). This module
++ will in turn request data from the next module and so on down until the
++ writer is made to read from the stored image.
++
++ Part of definition of the structure of a module thus looks like this:
++
++ int (*rw_init) (int rw, int stream_number);
++ int (*rw_cleanup) (int rw);
++ int (*write_chunk) (struct page *buffer_page);
++ int (*read_chunk) (struct page *buffer_page, int sync);
++
++ It should be noted that the _cleanup routine may be called before the
++ full stream of data has been read or written. While writing the image,
++ the user may (depending upon settings) choose to abort suspending, and
++ if we are in the midst of writing the last portion of the image, a portion
++ of the second pageset may be reread. This may also happen if an error
++ occurs and we seek to abort the process of writing the image.
++
++ The modular design is also useful in a number of other ways. It provides
++ a means where by we can add support for:
++
++ - providing overall initialisation and cleanup routines;
++ - serialising configuration information in the image header;
++ - providing debugging information to the user;
++ - determining memory and image storage requirements;
++ - dis/enabling components at run-time;
++ - configuring the module (see below);
++
++ ...and routines for writers specific to their work:
++ - Parsing a resume= location;
++ - Determining whether an image exists;
++ - Marking a resume as having been attempted;
++ - Invalidating an image;
++
++ Since some parts of the core - the user interface and storage manager
++ support - have use for some of these functions, they are registered as
++ 'miscellaneous' modules as well.
++
++ d) Sysfs data structures.
++
++ This brings us naturally to support for configuring TuxOnIce. We desired to
++ provide a way to make TuxOnIce as flexible and configurable as possible.
++ The user shouldn't have to reboot just because they want to now suspend to
++ a file instead of a partition, for example.
++
++ To accomplish this, TuxOnIce implements a very generic means whereby the
++ core and modules can register new sysfs entries. All TuxOnIce entries use
++ a single _store and _show routine, both of which are found in sysfs.c in
++ the kernel/power directory. These routines handle the most common operations
++ - getting and setting the values of bits, integers, longs, unsigned longs
++ and strings in one place, and allow overrides for customised get and set
++ options as well as side-effect routines for all reads and writes.
++
++ When combined with some simple macros, a new sysfs entry can then be defined
++ in just a couple of lines:
++
++ { TOI_ATTR("progress_granularity", SYSFS_RW),
++ SYSFS_INT(&progress_granularity, 1, 2048)
++ },
++
++ This defines a sysfs entry named "progress_granularity" which is rw and
++ allows the user to access an integer stored at &progress_granularity, giving
++ it a value between 1 and 2048 inclusive.
++
++ Sysfs entries are registered under /sys/power/tuxonice, and entries for
++ modules are located in a subdirectory named after the module.
++
+diff --git a/Documentation/power/tuxonice.txt b/Documentation/power/tuxonice.txt
+new file mode 100644
+index 0000000..aa2a486
+--- /dev/null
++++ b/Documentation/power/tuxonice.txt
+@@ -0,0 +1,709 @@
++ --- TuxOnIce, version 2.2 ---
++
++1. What is it?
++2. Why would you want it?
++3. What do you need to use it?
++4. Why not just use the version already in the kernel?
++5. How do you use it?
++6. What do all those entries in /sys/power/tuxonice do?
++7. How do you get support?
++8. I think I've found a bug. What should I do?
++9. When will XXX be supported?
++10 How does it work?
++11. Who wrote TuxOnIce?
++
++1. What is it?
++
++ Imagine you're sitting at your computer, working away. For some reason, you
++ need to turn off your computer for a while - perhaps it's time to go home
++ for the day. When you come back to your computer next, you're going to want
++ to carry on where you left off. Now imagine that you could push a button and
++ have your computer store the contents of its memory to disk and power down.
++ Then, when you next start up your computer, it loads that image back into
++ memory and you can carry on from where you were, just as if you'd never
++ turned the computer off. Far less time to start up, no reopening
++ applications and finding what directory you put that file in yesterday.
++ That's what TuxOnIce does.
++
++ TuxOnIce has a long heritage. It began life as work by Gabor Kuti, who,
++ with some help from Pavel Machek, got an early version going in 1999. The
++ project was then taken over by Florent Chabaud while still in alpha version
++ numbers. Nigel Cunningham came on the scene when Florent was unable to
++ continue, moving the project into betas, then 1.0, 2.0 and so on up to
++ the present series. During the 2.0 series, the name was contracted to
++ Suspend2 and the website suspend2.net created. Beginning around July 2007,
++ a transition to calling the software TuxOnIce was made, to seek to help
++ make it clear that TuxOnIce is more concerned with hibernation than suspend
++ to ram.
++
++ Pavel Machek's swsusp code, which was merged around 2.5.17 retains the
++ original name, and was essentially a fork of the beta code until Rafael
++ Wysocki came on the scene in 2005 and began to improve it further.
++
++2. Why would you want it?
++
++ Why wouldn't you want it?
++
++ Being able to save the state of your system and quickly restore it improves
++ your productivity - you get a useful system in far less time than through
++ the normal boot process.
++
++3. What do you need to use it?
++
++ a. Kernel Support.
++
++ i) The TuxOnIce patch.
++
++ TuxOnIce is part of the Linux Kernel. This version is not part of Linus's
++ 2.6 tree at the moment, so you will need to download the kernel source and
++ apply the latest patch. Having done that, enable the appropriate options in
++ make [menu|x]config (under Power Management Options), compile and install your
++ kernel. TuxOnIce works with SMP, Highmem, preemption, fuse filesystems,
++ x86-32, PPC and x86_64.
++
++ TuxOnIce patches are available from http://tuxonice.net.
++
++ ii) Compression support.
++
++ Compression support is implemented via the cryptoapi. You will therefore want
++ to select any Cryptoapi transforms that you want to use on your image from
++ the Cryptoapi menu while configuring your kernel.
++
++ You can also tell TuxOnIce to write it's image to an encrypted and/or
++ compressed filesystem/swap partition. In that case, you don't need to do
++ anything special for TuxOnIce when it comes to kernel configuration.
++
++ iii) Configuring other options.
++
++ While you're configuring your kernel, try to configure as much as possible
++ to build as modules. We recommend this because there are a number of drivers
++ that are still in the process of implementing proper power management
++ support. In those cases, the best way to work around their current lack is
++ to build them as modules and remove the modules while suspending. You might
++ also bug the driver authors to get their support up to speed, or even help!
++
++ b. Storage.
++
++ i) Swap.
++
++ TuxOnIce can store the suspend image in your swap partition, a swap file or
++ a combination thereof. Whichever combination you choose, you will probably
++ want to create enough swap space to store the largest image you could have,
++ plus the space you'd normally use for swap. A good rule of thumb would be
++ to calculate the amount of swap you'd want without using TuxOnIce, and then
++ add the amount of memory you have. This swapspace can be arranged in any way
++ you'd like. It can be in one partition or file, or spread over a number. The
++ only requirement is that they be active when you start a suspend cycle.
++
++ There is one exception to this requirement. TuxOnIce has the ability to turn
++ on one swap file or partition at the start of suspending and turn it back off
++ at the end. If you want to ensure you have enough memory to store a image
++ when your memory is fully used, you might want to make one swap partition or
++ file for 'normal' use, and another for TuxOnIce to activate & deactivate
++ automatically. (Further details below).
++
++ ii) Normal files.
++
++ TuxOnIce includes a 'file allocator'. The file allocator can store your
++ image in a simple file. Since Linux has the concept of everything being a
++ file, this is more powerful than it initially sounds. If, for example, you
++ were to set up a network block device file, you could suspend to a network
<<Diff was trimmed, longer than 597 lines>>
More information about the pld-cvs-commit
mailing list