SOURCES: kernel-desktop-tuxonice.patch (NEW) - raw uncompressed fr...

glen glen at pld-linux.org
Tue Apr 8 23:13:47 CEST 2008


Author: glen                         Date: Tue Apr  8 21:13:46 2008 GMT
Module: SOURCES                       Tag: HEAD
---- Log message:
- raw uncompressed from http://www.tuxonice.net/downloads/all/tuxonice-3.0-rc5-for-2.6.24.patch.bz2

---- Files affected:
SOURCES:
   kernel-desktop-tuxonice.patch (NONE -> 1.1)  (NEW)

---- Diffs:

================================================================
Index: SOURCES/kernel-desktop-tuxonice.patch
diff -u /dev/null SOURCES/kernel-desktop-tuxonice.patch:1.1
--- /dev/null	Tue Apr  8 23:13:46 2008
+++ SOURCES/kernel-desktop-tuxonice.patch	Tue Apr  8 23:13:41 2008
@@ -0,0 +1,19998 @@
+diff --git a/Documentation/power/tuxonice-internals.txt b/Documentation/power/tuxonice-internals.txt
+new file mode 100644
+index 0000000..2247939
+--- /dev/null
++++ b/Documentation/power/tuxonice-internals.txt
+@@ -0,0 +1,469 @@
++		   TuxOnIce 2.2 Internal Documentation.
++			Updated to 18 September 2007
++
++1.  Introduction.
++
++    TuxOnIce 2.2 is an addition to the Linux Kernel, designed to
++    allow the user to quickly shutdown and quickly boot a computer, without
++    needing to close documents or programs. It is equivalent to the
++    hibernate facility in some laptops. This implementation, however,
++    requires no special BIOS or hardware support.
++
++    The code in these files is based upon the original implementation
++    prepared by Gabor Kuti and additional work by Pavel Machek and a
++    host of others. This code has been substantially reworked by Nigel
++    Cunningham, again with the help and testing of many others, not the
++    least of whom is Michael Frank. At its heart, however, the operation is
++    essentially the same as Gabor's version.
++
++2.  Overview of operation.
++
++    The basic sequence of operations is as follows:
++
++	a. Quiesce all other activity.
++	b. Ensure enough memory and storage space are available, and attempt
++	   to free memory/storage if necessary.
++	c. Allocate the required memory and storage space.
++	d. Write the image.
++	e. Power down.
++
++    There are a number of complicating factors which mean that things are
++    not as simple as the above would imply, however...
++
++    o The activity of each process must be stopped at a point where it will
++    not be holding locks necessary for saving the image, or unexpectedly
++    restart operations due to something like a timeout and thereby make
++    our image inconsistent.
++
++    o It is desirous that we sync outstanding I/O to disk before calculating
++    image statistics. This reduces corruption if one should suspend but
++    then not resume, and also makes later parts of the operation safer (see
++    below).
++
++    o We need to get as close as we can to an atomic copy of the data.
++    Inconsistencies in the image will result in inconsistent memory contents at
++    resume time, and thus in instability of the system and/or file system
++    corruption. This would appear to imply a maximum image size of one half of
++    the amount of RAM, but we have a solution... (again, below).
++
++    o In 2.6, we choose to play nicely with the other suspend-to-disk
++    implementations.
++
++3.  Detailed description of internals.
++
++    a. Quiescing activity.
++
++    Safely quiescing the system is achieved using three separate but related
++    aspects.
++
++    First, we note that the vast majority of processes don't need to run during
++    suspend. They can be 'frozen'. We therefore implement a refrigerator
++    routine, which processes enter and in which they remain until the cycle is
++    complete. Processes enter the refrigerator via try_to_freeze() invocations
++    at appropriate places.  A process cannot be frozen in any old place. It
++    must not be holding locks that will be needed for writing the image or
++    freezing other processes. For this reason, userspace processes generally
++    enter the refrigerator via the signal handling code, and kernel threads at
++    the place in their event loops where they drop locks and yield to other
++    processes or sleep.
++
++    The task of freezing processes is complicated by the fact that there can be
++    interdependencies between processes. Freezing process A before process B may
++    mean that process B cannot be frozen, because it stops at waiting for
++    process A rather than in the refrigerator. This issue is seen where
++    userspace waits on freezeable kernel threads or fuse filesystem threads. To
++    address this issue, we implement the following algorithm for quiescing
++    activity:
++
++	- Freeze filesystems (including fuse - userspace programs starting
++		new requests are immediately frozen; programs already running
++		requests complete their work before being frozen in the next
++		step)
++	- Freeze userspace
++	- Thaw filesystems (this is safe now that userspace is frozen and no
++		fuse requests are outstanding).
++	- Invoke sys_sync (noop on fuse).
++	- Freeze filesystems
++	- Freeze kernel threads
++
++    If we need to free memory, we thaw kernel threads and filesystems, but not
++    userspace. We can then free caches without worrying about deadlocks due to
++    swap files being on frozen filesystems or such like.
++
++    b. Ensure enough memory & storage are available.
++
++    We have a number of constraints to meet in order to be able to successfully
++    suspend and resume.
++
++    First, the image will be written in two parts, described below. One of these
++    parts needs to have an atomic copy made, which of course implies a maximum
++    size of one half of the amount of system memory. The other part ('pageset')
++    is not atomically copied, and can therefore be as large or small as desired.
++
++    Second, we have constraints on the amount of storage available. In these
++    calculations, we may also consider any compression that will be done. The
++    cryptoapi module allows the user to configure an expected compression ratio.
++   
++    Third, the user can specify an arbitrary limit on the image size, in
++    megabytes. This limit is treated as a soft limit, so that we don't fail the
++    attempt to suspend if we cannot meet this constraint.
++
++    c. Allocate the required memory and storage space.
++
++    Having done the initial freeze, we determine whether the above constraints
++    are met, and seek to allocate the metadata for the image. If the constraints
++    are not met, or we fail to allocate the required space for the metadata, we
++    seek to free the amount of memory that we calculate is needed and try again.
++    We allow up to four iterations of this loop before aborting the cycle. If we
++    do fail, it should only be because of a bug in Suspend's calculations.
++    
++    These steps are merged together in the prepare_image function, found in
++    prepare_image.c. The functions are merged because of the cyclical nature
++    of the problem of calculating how much memory and storage is needed. Since
++    the data structures containing the information about the image must
++    themselves take memory and use storage, the amount of memory and storage
++    required changes as we prepare the image. Since the changes are not large,
++    only one or two iterations will be required to achieve a solution.
++
++    The recursive nature of the algorithm is miminised by keeping user space
++    frozen while preparing the image, and by the fact that our records of which
++    pages are to be saved and which pageset they are saved in use bitmaps (so
++    that changes in number or fragmentation of the pages to be saved don't
++    feedback via changes in the amount of memory needed for metadata). The
++    recursiveness is thus limited to any extra slab pages allocated to store the
++    extents that record storage used, and he effects of seeking to free memory.
++
++    d. Write the image.
++
++    We previously mentioned the need to create an atomic copy of the data, and
++    the half-of-memory limitation that is implied in this. This limitation is
++    circumvented by dividing the memory to be saved into two parts, called
++    pagesets.
++
++    Pageset2 contains the page cache - the pages on the active and inactive
++    lists. These pages aren't needed or modifed while TuxOnIce is running, so
++    they can be safely written without an atomic copy. They are therefore
++    saved first and reloaded last. While saving these pages, TuxOnIce carefully
++    ensures that the work of writing the pages doesn't make the image
++    inconsistent.
++
++    Once pageset2 has been saved, we prepare to do the atomic copy of remaining
++    memory. As part of the preparation, we power down drivers, thereby providing
++    them with the opportunity to have their state recorded in the image. The
++    amount of memory allocated by drivers for this is usually negligible, but if
++    DRI is in use, video drivers may require significants amounts. Ideally we
++    would be able to query drivers while preparing the image as to the amount of
++    memory they will need. Unfortunately no such mechanism exists at the time of
++    writing. For this reason, TuxOnIce allows the user to set an
++    'extra_pages_allowance', which is used to seek to ensure sufficient memory
++    is available for drivers at this point. TuxOnIce also lets the user set this
++    value to 0. In this case, a test driver suspend is done while preparing the
++    image, and the difference (plus a margin) used instead.
++
++    Having suspended the drivers, we save the CPU context before making an
++    atomic copy of pageset1, resuming the drivers and saving the atomic copy.
++    After saving the two pagesets, we just need to save our metadata before
++    powering down.
++
++    As we mentioned earlier, the contents of pageset2 pages aren't needed once
++    they've been saved. We therefore use them as the destination of our atomic
++    copy. In the unlikely event that pageset1 is larger, extra pages are
++    allocated while the image is being prepared. This is normally only a real
++    possibility when the system has just been booted and the page cache is
++    small.
++
++    This is where we need to be careful about syncing, however. Pageset2 will
++    probably contain filesystem meta data. If this is overwritten with pageset1
++    and then a sync occurs, the filesystem will be corrupted - at least until
++    resume time and another sync of the restored data. Since there is a
++    possibility that the user might not resume or (may it never be!) that
++    suspend might oops, we do our utmost to avoid syncing filesystems after
++    copying pageset1.
++
++    e. Power down.
++
++    Powering down uses standard kernel routines. TuxOnIce supports powering down
++    using the ACPI S3, S4 and S5 methods or the kernel's non-ACPI power-off.
++    Supporting suspend to ram (S3) as a power off option might sound strange,
++    but it allows the user to quickly get their system up and running again if
++    the battery doesn't run out (we just need to re-read the overwritten pages)
++    and if the battery does run out (or the user removes power), they can still
++    resume.
++
++4.  Data Structures.
++
++    TuxOnIce uses three main structures to store its metadata and configuration
++    information:
++
++    a) Pageflags bitmaps.
++
++    Suspend records which pages will be in pageset1, pageset2, the destination
++    of the atomic copy and the source of the atomically restored image using
++    bitmaps. These bitmaps are created from order zero allocations to maximise
++    reliability. The individual pages are combined together with pointers to
++    form per-zone bitmaps, which are in turn combined with another layer of
++    pointers to construct the overall bitmap.
++
++    The pageset1 bitmap is thus easily stored in the image header for use at
++    resume time.
++
++    As mentioned above, using bitmaps also means that the amount of memory and
++    storage required for recording the above information is constant. This
++    greatly simplifies the work of preparing the image. In earlier versions of
++    TuxOnIce, extents were used to record which pages would be stored. In that
++    case, however, eating memory could result in greater fragmentation of the
++    lists of pages, which in turn required more memory to store the extents and
++    more storage in the image header. These could in turn require further
++    freeing of memory, and another iteration. All of this complexity is removed
++    by having bitmaps.
++
++    Bitmaps also make a lot of sense because TuxOnIce only ever iterates
++    through the lists. There is therefore no cost to not being able to find the
++    nth page in order 0 time. We only need to worry about the cost of finding
++    the n+1th page, given the location of the nth page. Bitwise optimisations
++    help here.
++
++    The data structure is: unsigned long ***.
++
++    b) Extents for block data.
++
++    TuxOnIce supports writing the image to multiple block devices. In the case
++    of swap, multiple partitions and/or files may be in use, and we happily use
++    them all. This is accomplished as follows:
++
++    Whatever the actual source of the allocated storage, the destination of the
++    image can be viewed in terms of one or more block devices, and on each
++    device, a list of sectors. To simplify matters, we only use contiguous,
++    PAGE_SIZE aligned sectors, like the swap code does.
++
++    Since sector numbers on each bdev may well not start at 0, it makes much
++    more sense to use extents here. Contiguous ranges of pages can thus be
++    represented in the extents by contiguous values.
++
++    Variations in block size are taken account of in transforming this data
++    into the parameters for bio submission.
++
++    We can thus implement a layer of abstraction wherein the core of TuxOnIce
++    doesn't have to worry about which device we're currently writing to or
++    where in the device we are. It simply requests that the next page in the
++    pageset or header be written, leaving the details to this lower layer.
++    The lower layer remembers where in the sequence of devices and blocks each
++    pageset starts. The header always starts at the beginning of the allocated
++    storage.
++
++    So extents are:
++
++    struct extent {
++      unsigned long minimum, maximum;
++      struct extent *next;
++    }
++
++    These are combined into chains of extents for a device:
++
++    struct extent_chain {
++      int size; /* size of the extent ie sum (max-min+1) */
++      int allocs, frees;
++      char *name;
++      struct extent *first, *last_touched;
++    };
++
++    For each bdev, we need to store a little more info:
++
++    struct suspend_bdev_info {
++       struct block_device *bdev;
++       dev_t dev_t;
++       int bmap_shift;
++       int blocks_per_page;
++    };
++
++    The dev_t is used to identify the device in the stored image. As a result,
++    we expect devices at resume time to have the same major and minor numbers
++    as they had while suspending.  This is primarily a concern where the user
++    utilises LVM for storage, as they will need to dmsetup their partitions in
++    such a way as to maintain this consistency at resume time.
++
++    bmap_shift and blocks_per_page record apply the effects of variations in
++    blocks per page settings for the filesystem and underlying bdev. For most
++    filesystems, these are the same, but for xfs, they can have independant
++    values.
++
++    Combining these two structures together, we have everything we need to
++    record what devices and what blocks on each device are being used to
++    store the image, and to submit i/o using bio_submit.
++
++    The last elements in the picture are a means of recording how the storage
++    is being used.
++
++    We do this first and foremost by implementing a layer of abstraction on
++    top of the devices and extent chains which allows us to view however many
++    devices there might be as one long storage tape, with a single 'head' that
++    tracks a 'current position' on the tape:
++
++    struct extent_iterate_state {
++      struct extent_chain *chains;
++      int num_chains;
++      int current_chain;
++      struct extent *current_extent;
++      unsigned long current_offset;
++    };
++
++    That is, *chains points to an array of size num_chains of extent chains.
++    For the filewriter, this is always a single chain. For the swapwriter, the
++    array is of size MAX_SWAPFILES.
++
++    current_chain, current_extent and current_offset thus point to the current
++    index in the chains array (and into a matching array of struct
++    suspend_bdev_info), the current extent in that chain (to optimise access),
++    and the current value in the offset.
++
++    The image is divided into three parts:
++    - The header
++    - Pageset 1
++    - Pageset 2
++
++    The header always starts at the first device and first block. We know its
++    size before we begin to save the image because we carefully account for
++    everything that will be stored in it.
++
++    The second pageset (LRU) is stored first. It begins on the next page after
++    the end of the header.
++
++    The first pageset is stored second. It's start location is only known once
++    pageset2 has been saved, since pageset2 may be compressed as it is written.
++    This location is thus recorded at the end of saving pageset2. It is page
++    aligned also.
++
++    Since this information is needed at resume time, and the location of extents
++    in memory will differ at resume time, this needs to be stored in a portable
++    way:
++
++    struct extent_iterate_saved_state {
++        int chain_num;
++        int extent_num;
++        unsigned long offset;
++    };
++
++    We can thus implement a layer of abstraction wherein the core of TuxOnIce
++    doesn't have to worry about which device we're currently writing to or
++    where in the device we are. It simply requests that the next page in the
++    pageset or header be written, leaving the details to this layer, and
++    invokes the routines to remember and restore the position, without having
++    to worry about the details of how the data is arranged on disk or such like.
++
++    c) Modules
++
++    One aim in designing TuxOnIce was to make it flexible. We wanted to allow
++    for the implementation of different methods of transforming a page to be
++    written to disk and different methods of getting the pages stored.
++
++    In early versions (the betas and perhaps Suspend1), compression support was
++    inlined in the image writing code, and the data structures and code for
++    managing swap were intertwined with the rest of the code. A number of people
++    had expressed interest in implementing image encryption, and alternative
++    methods of storing the image.
++
++    In order to achieve this, TuxOnIce was given a modular design.
++
++    A module is a single file which encapsulates the functionality needed
++    to transform a pageset of data (encryption or compression, for example),
++    or to write the pageset to a device. The former type of module is called
++    a 'page-transformer', the later a 'writer'.
++
++    Modules are linked together in pipeline fashion. There may be zero or more
++    page transformers in a pipeline, and there is always exactly one writer.
++    The pipeline follows this pattern:
++
++		---------------------------------
++		|          TuxOnIce Core        |
++		---------------------------------
++				|
++				|
++		---------------------------------
++		|	Page transformer 1	|
++		---------------------------------
++				|
++				|
++		---------------------------------
++		|	Page transformer 2	|
++		---------------------------------
++				|
++				|
++		---------------------------------
++		|            Writer		|
++		---------------------------------
++
++    During the writing of an image, the core code feeds pages one at a time
++    to the first module. This module performs whatever transformations it
++    implements on the incoming data, completely consuming the incoming data and
++    feeding output in a similar manner to the next module. A module may buffer
++    its output.
++
++    During reading, the pipeline works in the reverse direction. The core code
++    calls the first module with the address of a buffer which should be filled.
++    (Note that the buffer size is always PAGE_SIZE at this time). This module
++    will in turn request data from the next module and so on down until the
++    writer is made to read from the stored image.
++
++    Part of definition of the structure of a module thus looks like this:
++
++        int (*rw_init) (int rw, int stream_number);
++        int (*rw_cleanup) (int rw);
++        int (*write_chunk) (struct page *buffer_page);
++        int (*read_chunk) (struct page *buffer_page, int sync);
++
++    It should be noted that the _cleanup routine may be called before the
++    full stream of data has been read or written. While writing the image,
++    the user may (depending upon settings) choose to abort suspending, and
++    if we are in the midst of writing the last portion of the image, a portion
++    of the second pageset may be reread. This may also happen if an error
++    occurs and we seek to abort the process of writing the image.
++
++    The modular design is also useful in a number of other ways. It provides
++    a means where by we can add support for:
++
++    - providing overall initialisation and cleanup routines;
++    - serialising configuration information in the image header;
++    - providing debugging information to the user;
++    - determining memory and image storage requirements;
++    - dis/enabling components at run-time;
++    - configuring the module (see below);
++
++    ...and routines for writers specific to their work:
++    - Parsing a resume= location;
++    - Determining whether an image exists;
++    - Marking a resume as having been attempted;
++    - Invalidating an image;
++
++    Since some parts of the core - the user interface and storage manager
++    support - have use for some of these functions, they are registered as
++    'miscellaneous' modules as well.
++
++    d) Sysfs data structures.
++
++    This brings us naturally to support for configuring TuxOnIce. We desired to
++    provide a way to make TuxOnIce as flexible and configurable as possible.
++    The user shouldn't have to reboot just because they want to now suspend to
++    a file instead of a partition, for example.
++
++    To accomplish this, TuxOnIce implements a very generic means whereby the
++    core and modules can register new sysfs entries. All TuxOnIce entries use
++    a single _store and _show routine, both of which are found in sysfs.c in
++    the kernel/power directory. These routines handle the most common operations
++    - getting and setting the values of bits, integers, longs, unsigned longs
++    and strings in one place, and allow overrides for customised get and set
++    options as well as side-effect routines for all reads and writes.
++
++    When combined with some simple macros, a new sysfs entry can then be defined
++    in just a couple of lines:
++
++    { TOI_ATTR("progress_granularity", SYSFS_RW),
++      SYSFS_INT(&progress_granularity, 1, 2048)
++    },
++
++    This defines a sysfs entry named "progress_granularity" which is rw and
++    allows the user to access an integer stored at &progress_granularity, giving
++    it a value between 1 and 2048 inclusive.
++
++    Sysfs entries are registered under /sys/power/tuxonice, and entries for
++    modules are located in a subdirectory named after the module.
++
+diff --git a/Documentation/power/tuxonice.txt b/Documentation/power/tuxonice.txt
+new file mode 100644
+index 0000000..aa2a486
+--- /dev/null
++++ b/Documentation/power/tuxonice.txt
+@@ -0,0 +1,709 @@
++	--- TuxOnIce, version 2.2 ---
++
++1.  What is it?
++2.  Why would you want it?
++3.  What do you need to use it?
++4.  Why not just use the version already in the kernel?
++5.  How do you use it?
++6.  What do all those entries in /sys/power/tuxonice do?
++7.  How do you get support?
++8.  I think I've found a bug. What should I do?
++9.  When will XXX be supported?
++10  How does it work?
++11. Who wrote TuxOnIce?
++
++1. What is it?
++
++   Imagine you're sitting at your computer, working away. For some reason, you
++   need to turn off your computer for a while - perhaps it's time to go home
++   for the day. When you come back to your computer next, you're going to want
++   to carry on where you left off. Now imagine that you could push a button and
++   have your computer store the contents of its memory to disk and power down.
++   Then, when you next start up your computer, it loads that image back into
++   memory and you can carry on from where you were, just as if you'd never
++   turned the computer off. Far less time to start up, no reopening
++   applications and finding what directory you put that file in yesterday.
++   That's what TuxOnIce does.
++
++   TuxOnIce has a long heritage. It began life as work by Gabor Kuti, who,
++   with some help from Pavel Machek, got an early version going in 1999. The
++   project was then taken over by Florent Chabaud while still in alpha version
++   numbers. Nigel Cunningham came on the scene when Florent was unable to
++   continue, moving the project into betas, then 1.0, 2.0 and so on up to
++   the present series. During the 2.0 series, the name was contracted to
++   Suspend2 and the website suspend2.net created. Beginning around July 2007,
++   a transition to calling the software TuxOnIce was made, to seek to help
++   make it clear that TuxOnIce is more concerned with hibernation than suspend
++   to ram.
++
++   Pavel Machek's swsusp code, which was merged around 2.5.17 retains the
++   original name, and was essentially a fork of the beta code until Rafael
++   Wysocki came on the scene in 2005 and began to improve it further.
++
++2. Why would you want it?
++
++   Why wouldn't you want it?
++   
++   Being able to save the state of your system and quickly restore it improves
++   your productivity - you get a useful system in far less time than through
++   the normal boot process.
++   
++3. What do you need to use it?
++
++   a. Kernel Support.
++
++   i) The TuxOnIce patch.
++   
++   TuxOnIce is part of the Linux Kernel. This version is not part of Linus's
++   2.6 tree at the moment, so you will need to download the kernel source and
++   apply the latest patch. Having done that, enable the appropriate options in
++   make [menu|x]config (under Power Management Options), compile and install your
++   kernel. TuxOnIce works with SMP, Highmem, preemption, fuse filesystems,
++   x86-32, PPC and x86_64.
++
++   TuxOnIce patches are available from http://tuxonice.net.
++
++   ii) Compression support.
++
++   Compression support is implemented via the cryptoapi. You will therefore want
++   to select any Cryptoapi transforms that you want to use on your image from
++   the Cryptoapi menu while configuring your kernel.
++
++   You can also tell TuxOnIce to write it's image to an encrypted and/or
++   compressed filesystem/swap partition. In that case, you don't need to do
++   anything special for TuxOnIce when it comes to kernel configuration.
++
++   iii) Configuring other options.
++
++   While you're configuring your kernel, try to configure as much as possible
++   to build as modules. We recommend this because there are a number of drivers
++   that are still in the process of implementing proper power management
++   support. In those cases, the best way to work around their current lack is
++   to build them as modules and remove the modules while suspending. You might
++   also bug the driver authors to get their support up to speed, or even help!
++
++   b. Storage.
++
++   i) Swap.
++
++   TuxOnIce can store the suspend image in your swap partition, a swap file or
++   a combination thereof. Whichever combination you choose, you will probably
++   want to create enough swap space to store the largest image you could have,
++   plus the space you'd normally use for swap. A good rule of thumb would be
++   to calculate the amount of swap you'd want without using TuxOnIce, and then
++   add the amount of memory you have. This swapspace can be arranged in any way
++   you'd like. It can be in one partition or file, or spread over a number. The
++   only requirement is that they be active when you start a suspend cycle.
++   
++   There is one exception to this requirement. TuxOnIce has the ability to turn
++   on one swap file or partition at the start of suspending and turn it back off
++   at the end. If you want to ensure you have enough memory to store a image
++   when your memory is fully used, you might want to make one swap partition or
++   file for 'normal' use, and another for TuxOnIce to activate & deactivate
++   automatically. (Further details below).
++
++   ii) Normal files.
++
++   TuxOnIce includes a 'file allocator'. The file allocator can store your
++   image in a simple file. Since Linux has the concept of everything being a
++   file, this is more powerful than it initially sounds. If, for example, you
++   were to set up a network block device file, you could suspend to a network
<<Diff was trimmed, longer than 597 lines>>


More information about the pld-cvs-commit mailing list