Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
This commit is contained in:
Linus Torvalds
2005-04-16 15:20:36 -07:00
commit 1da177e4c3
17291 changed files with 6718755 additions and 0 deletions

View File

@@ -0,0 +1,319 @@
Device Power Management
Device power management encompasses two areas - the ability to save
state and transition a device to a low-power state when the system is
entering a low-power state; and the ability to transition a device to
a low-power state while the system is running (and independently of
any other power management activity).
Methods
The methods to suspend and resume devices reside in struct bus_type:
struct bus_type {
...
int (*suspend)(struct device * dev, pm_message_t state);
int (*resume)(struct device * dev);
};
Each bus driver is responsible implementing these methods, translating
the call into a bus-specific request and forwarding the call to the
bus-specific drivers. For example, PCI drivers implement suspend() and
resume() methods in struct pci_driver. The PCI core is simply
responsible for translating the pointers to PCI-specific ones and
calling the low-level driver.
This is done to a) ease transition to the new power management methods
and leverage the existing PM code in various bus drivers; b) allow
buses to implement generic and default PM routines for devices, and c)
make the flow of execution obvious to the reader.
System Power Management
When the system enters a low-power state, the device tree is walked in
a depth-first fashion to transition each device into a low-power
state. The ordering of the device tree is guaranteed by the order in
which devices get registered - children are never registered before
their ancestors, and devices are placed at the back of the list when
registered. By walking the list in reverse order, we are guaranteed to
suspend devices in the proper order.
Devices are suspended once with interrupts enabled. Drivers are
expected to stop I/O transactions, save device state, and place the
device into a low-power state. Drivers may sleep, allocate memory,
etc. at will.
Some devices are broken and will inevitably have problems powering
down or disabling themselves with interrupts enabled. For these
special cases, they may return -EAGAIN. This will put the device on a
list to be taken care of later. When interrupts are disabled, before
we enter the low-power state, their drivers are called again to put
their device to sleep.
On resume, the devices that returned -EAGAIN will be called to power
themselves back on with interrupts disabled. Once interrupts have been
re-enabled, the rest of the drivers will be called to resume their
devices. On resume, a driver is responsible for powering back on each
device, restoring state, and re-enabling I/O transactions for that
device.
System devices follow a slightly different API, which can be found in
include/linux/sysdev.h
drivers/base/sys.c
System devices will only be suspended with interrupts disabled, and
after all other devices have been suspended. On resume, they will be
resumed before any other devices, and also with interrupts disabled.
Runtime Power Management
Many devices are able to dynamically power down while the system is
still running. This feature is useful for devices that are not being
used, and can offer significant power savings on a running system.
In each device's directory, there is a 'power' directory, which
contains at least a 'state' file. Reading from this file displays what
power state the device is currently in. Writing to this file initiates
a transition to the specified power state, which must be a decimal in
the range 1-3, inclusive; or 0 for 'On'.
The PM core will call the ->suspend() method in the bus_type object
that the device belongs to if the specified state is not 0, or
->resume() if it is.
Nothing will happen if the specified state is the same state the
device is currently in.
If the device is already in a low-power state, and the specified state
is another, but different, low-power state, the ->resume() method will
first be called to power the device back on, then ->suspend() will be
called again with the new state.
The driver is responsible for saving the working state of the device
and putting it into the low-power state specified. If this was
successful, it returns 0, and the device's power_state field is
updated.
The driver must take care to know whether or not it is able to
properly resume the device, including all step of reinitialization
necessary. (This is the hardest part, and the one most protected by
NDA'd documents).
The driver must also take care not to suspend a device that is
currently in use. It is their responsibility to provide their own
exclusion mechanisms.
The runtime power transition happens with interrupts enabled. If a
device cannot support being powered down with interrupts, it may
return -EAGAIN (as it would during a system power management
transition), but it will _not_ be called again, and the transaction
will fail.
There is currently no way to know what states a device or driver
supports a priori. This will change in the future.
pm_message_t meaning
pm_message_t has two fields. event ("major"), and flags. If driver
does not know event code, it aborts the request, returning error. Some
drivers may need to deal with special cases based on the actual type
of suspend operation being done at the system level. This is why
there are flags.
Event codes are:
ON -- no need to do anything except special cases like broken
HW.
# NOTIFICATION -- pretty much same as ON?
FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
scratch. That probably means stop accepting upstream requests, the
actual policy of what to do with them beeing specific to a given
driver. It's acceptable for a network driver to just drop packets
while a block driver is expected to block the queue so no request is
lost. (Use IDE as an example on how to do that). FREEZE requires no
power state change, and it's expected for drivers to be able to
quickly transition back to operating state.
SUSPEND -- like FREEZE, but also put hardware into low-power state. If
there's need to distinguish several levels of sleep, additional flag
is probably best way to do that.
Transitions are only from a resumed state to a suspended state, never
between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).
All events are:
[NOTE NOTE NOTE: If you are driver author, you should not care; you
should only look at event, and ignore flags.]
#Prepare for suspend -- userland is still running but we are going to
#enter suspend state. This gives drivers chance to load firmware from
#disk and store it in memory, or do other activities taht require
#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
#are forbiden once the suspend dance is started.. event = ON, flags =
#PREPARE_TO_SUSPEND
Apm standby -- prepare for APM event. Quiesce devices to make life
easier for APM BIOS. event = FREEZE, flags = APM_STANDBY
Apm suspend -- same as APM_STANDBY, but it we should probably avoid
spinning down disks. event = FREEZE, flags = APM_SUSPEND
System halt, reboot -- quiesce devices to make life easier for BIOS. event
= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT
System shutdown -- at least disks need to be spun down, or data may be
lost. Quiesce devices, just to make life easier for BIOS. event =
FREEZE, flags = SYSTEM_SHUTDOWN
Kexec -- turn off DMAs and put hardware into some state where new
kernel can take over. event = FREEZE, flags = KEXEC
Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
may need to be enabled on some devices. This actually has at least 3
subtypes, system can reboot, enter S4 and enter S5 at the end of
swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
SYSTEM_SHUTDOWN, SYSTEM_S4
Suspend to ram -- put devices into low power state. event = SUSPEND,
flags = SUSPEND_TO_RAM
Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
devices into low power mode, but you must be able to reinitialize
device from scratch in resume method. This has two flavors, its done
once on suspending kernel, once on resuming kernel. event = FREEZE,
flags = DURING_SUSPEND or DURING_RESUME
Device detach requested from /sys -- deinitialize device; proably same as
SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
= FREEZE, flags = DEV_DETACH.
#These are not really events sent:
#
#System fully on -- device is working normally; this is probably never
#passed to suspend() method... event = ON, flags = 0
#
#Ready after resume -- userland is now running, again. Time to free any
#memory you ate during prepare to suspend... event = ON, flags =
#READY_AFTER_RESUME
#
Driver Detach Power Management
The kernel now supports the ability to place a device in a low-power
state when it is detached from its driver, which happens when its
module is removed.
Each device contains a 'detach_state' file in its sysfs directory
which can be used to control this state. Reading from this file
displays what the current detach state is set to. This is 0 (On) by
default. A user may write a positive integer value to this file in the
range of 1-4 inclusive.
A value of 1-3 will indicate the device should be placed in that
low-power state, which will cause ->suspend() to be called for that
device. A value of 4 indicates that the device should be shutdown, so
->shutdown() will be called for that device.
The driver is responsible for reinitializing the device when the
module is re-inserted during it's ->probe() (or equivalent) method.
The driver core will not call any extra functions when binding the
device to the driver.
pm_message_t meaning
pm_message_t has two fields. event ("major"), and flags. If driver
does not know event code, it aborts the request, returning error. Some
drivers may need to deal with special cases based on the actual type
of suspend operation being done at the system level. This is why
there are flags.
Event codes are:
ON -- no need to do anything except special cases like broken
HW.
# NOTIFICATION -- pretty much same as ON?
FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
scratch. That probably means stop accepting upstream requests, the
actual policy of what to do with them being specific to a given
driver. It's acceptable for a network driver to just drop packets
while a block driver is expected to block the queue so no request is
lost. (Use IDE as an example on how to do that). FREEZE requires no
power state change, and it's expected for drivers to be able to
quickly transition back to operating state.
SUSPEND -- like FREEZE, but also put hardware into low-power state. If
there's need to distinguish several levels of sleep, additional flag
is probably best way to do that.
Transitions are only from a resumed state to a suspended state, never
between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).
All events are:
[NOTE NOTE NOTE: If you are driver author, you should not care; you
should only look at event, and ignore flags.]
#Prepare for suspend -- userland is still running but we are going to
#enter suspend state. This gives drivers chance to load firmware from
#disk and store it in memory, or do other activities taht require
#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
#are forbiden once the suspend dance is started.. event = ON, flags =
#PREPARE_TO_SUSPEND
Apm standby -- prepare for APM event. Quiesce devices to make life
easier for APM BIOS. event = FREEZE, flags = APM_STANDBY
Apm suspend -- same as APM_STANDBY, but it we should probably avoid
spinning down disks. event = FREEZE, flags = APM_SUSPEND
System halt, reboot -- quiesce devices to make life easier for BIOS. event
= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT
System shutdown -- at least disks need to be spun down, or data may be
lost. Quiesce devices, just to make life easier for BIOS. event =
FREEZE, flags = SYSTEM_SHUTDOWN
Kexec -- turn off DMAs and put hardware into some state where new
kernel can take over. event = FREEZE, flags = KEXEC
Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
may need to be enabled on some devices. This actually has at least 3
subtypes, system can reboot, enter S4 and enter S5 at the end of
swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
SYSTEM_SHUTDOWN, SYSTEM_S4
Suspend to ram -- put devices into low power state. event = SUSPEND,
flags = SUSPEND_TO_RAM
Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
devices into low power mode, but you must be able to reinitialize
device from scratch in resume method. This has two flavors, its done
once on suspending kernel, once on resuming kernel. event = FREEZE,
flags = DURING_SUSPEND or DURING_RESUME
Device detach requested from /sys -- deinitialize device; proably same as
SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
= FREEZE, flags = DEV_DETACH.
#These are not really events sent:
#
#System fully on -- device is working normally; this is probably never
#passed to suspend() method... event = ON, flags = 0
#
#Ready after resume -- userland is now running, again. Time to free any
#memory you ate during prepare to suspend... event = ON, flags =
#READY_AFTER_RESUME
#

View File

@@ -0,0 +1,43 @@
Power Management Interface
The power management subsystem provides a unified sysfs interface to
userspace, regardless of what architecture or platform one is
running. The interface exists in /sys/power/ directory (assuming sysfs
is mounted at /sys).
/sys/power/state controls system power state. Reading from this file
returns what states are supported, which is hard-coded to 'standby'
(Power-On Suspend), 'mem' (Suspend-to-RAM), and 'disk'
(Suspend-to-Disk).
Writing to this file one of those strings causes the system to
transition into that state. Please see the file
Documentation/power/states.txt for a description of each of those
states.
/sys/power/disk controls the operating mode of the suspend-to-disk
mechanism. Suspend-to-disk can be handled in several ways. The
greatest distinction is who writes memory to disk - the firmware or
the kernel. If the firmware does it, we assume that it also handles
suspending the system.
If the kernel does it, then we have three options for putting the system
to sleep - using the platform driver (e.g. ACPI or other PM
registers), powering off the system or rebooting the system (for
testing). The system will support either 'firmware' or 'platform', and
that is known a priori. But, the user may choose 'shutdown' or
'reboot' as alternatives.
Reading from this file will display what the mode is currently set
to. Writing to this file will accept one of
'firmware'
'platform'
'shutdown'
'reboot'
It will only change to 'firmware' or 'platform' if the system supports
it.

View File

@@ -0,0 +1,41 @@
KERNEL THREADS
Freezer
Upon entering a suspended state the system will freeze all
tasks. This is done by delivering pseudosignals. This affects
kernel threads, too. To successfully freeze a kernel thread
the thread has to check for the pseudosignal and enter the
refrigerator. Code to do this looks like this:
do {
hub_events();
wait_event_interruptible(khubd_wait, !list_empty(&hub_event_list));
if (current->flags & PF_FREEZE)
refrigerator(PF_FREEZE);
} while (!signal_pending(current));
from drivers/usb/core/hub.c::hub_thread()
The Unfreezable
Some kernel threads however, must not be frozen. The kernel must
be able to finish pending IO operations and later on be able to
write the memory image to disk. Kernel threads needed to do IO
must stay awake. Such threads must mark themselves unfreezable
like this:
/*
* This thread doesn't need any user-level access,
* so get rid of all our resources.
*/
daemonize("usb-storage");
current->flags |= PF_NOFREEZE;
from drivers/usb/storage/usb.c::usb_stor_control_thread()
Such drivers are themselves responsible for staying quiet during
the actual snapshotting.

332
Documentation/power/pci.txt Normal file
View File

@@ -0,0 +1,332 @@
PCI Power Management
~~~~~~~~~~~~~~~~~~~~
An overview of the concepts and the related functions in the Linux kernel
Patrick Mochel <mochel@transmeta.com>
(and others)
---------------------------------------------------------------------------
1. Overview
2. How the PCI Subsystem Does Power Management
3. PCI Utility Functions
4. PCI Device Drivers
5. Resources
1. Overview
~~~~~~~~~~~
The PCI Power Management Specification was introduced between the PCI 2.1 and
PCI 2.2 Specifications. It a standard interface for controlling various
power management operations.
Implementation of the PCI PM Spec is optional, as are several sub-components of
it. If a device supports the PCI PM Spec, the device will have an 8 byte
capability field in its PCI configuration space. This field is used to describe
and control the standard PCI power management features.
The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses
(B0 - B3). The higher the number, the less power the device consumes. However,
the higher the number, the longer the latency is for the device to return to
an operational state (D0).
There are actually two D3 states. When someone talks about D3, they usually
mean D3hot, which corresponds to an ACPI D2 state (power is reduced, the
device may lose some context). But they may also mean D3cold, which is an
ACPI D3 state (power is fully off, all state was discarded); or both.
Bus power management is not covered in this version of this document.
Note that all PCI devices support D0 and D3cold by default, regardless of
whether or not they implement any of the PCI PM spec.
The possible state transitions that a device can undergo are:
+---------------------------+
| Current State | New State |
+---------------------------+
| D0 | D1, D2, D3|
+---------------------------+
| D1 | D2, D3 |
+---------------------------+
| D2 | D3 |
+---------------------------+
| D1, D2, D3 | D0 |
+---------------------------+
Note that when the system is entering a global suspend state, all devices will
be placed into D3 and when resuming, all devices will be placed into D0.
However, when the system is running, other state transitions are possible.
2. How The PCI Subsystem Handles Power Management
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The PCI suspend/resume functionality is accessed indirectly via the Power
Management subsystem. At boot, the PCI driver registers a power management
callback with that layer. Upon entering a suspend state, the PM layer iterates
through all of its registered callbacks. This currently takes place only during
APM state transitions.
Upon going to sleep, the PCI subsystem walks its device tree twice. Both times,
it does a depth first walk of the device tree. The first walk saves each of the
device's state and checks for devices that will prevent the system from entering
a global power state. The next walk then places the devices in a low power
state.
The first walk allows a graceful recovery in the event of a failure, since none
of the devices have actually been powered down.
In both walks, in particular the second, all children of a bridge are touched
before the actual bridge itself. This allows the bridge to retain power while
its children are being accessed.
Upon resuming from sleep, just the opposite must be true: all bridges must be
powered on and restored before their children are powered on. This is easily
accomplished with a breadth-first walk of the PCI device tree.
3. PCI Utility Functions
~~~~~~~~~~~~~~~~~~~~~~~~
These are helper functions designed to be called by individual device drivers.
Assuming that a device behaves as advertised, these should be applicable in most
cases. However, results may vary.
Note that these functions are never implicitly called for the driver. The driver
is always responsible for deciding when and if to call these.
pci_save_state
--------------
Usage:
pci_save_state(dev, buffer);
Description:
Save first 64 bytes of PCI config space. Buffer must be allocated by
caller.
pci_restore_state
-----------------
Usage:
pci_restore_state(dev, buffer);
Description:
Restore previously saved config space. (First 64 bytes only);
If buffer is NULL, then restore what information we know about the
device from bootup: BARs and interrupt line.
pci_set_power_state
-------------------
Usage:
pci_set_power_state(dev, state);
Description:
Transition device to low power state using PCI PM Capabilities
registers.
Will fail under one of the following conditions:
- If state is less than current state, but not D0 (illegal transition)
- Device doesn't support PM Capabilities
- Device does not support requested state
pci_enable_wake
---------------
Usage:
pci_enable_wake(dev, state, enable);
Description:
Enable device to generate PME# during low power state using PCI PM
Capabilities.
Checks whether if device supports generating PME# from requested state
and fail if it does not, unless enable == 0 (request is to disable wake
events, which is implicit if it doesn't even support it in the first
place).
Note that the PMC Register in the device's PM Capabilties has a bitmask
of the states it supports generating PME# from. D3hot is bit 3 and
D3cold is bit 4. So, while a value of 4 as the state may not seem
semantically correct, it is.
4. PCI Device Drivers
~~~~~~~~~~~~~~~~~~~~~
These functions are intended for use by individual drivers, and are defined in
struct pci_driver:
int (*save_state) (struct pci_dev *dev, u32 state);
int (*suspend) (struct pci_dev *dev, u32 state);
int (*resume) (struct pci_dev *dev);
int (*enable_wake) (struct pci_dev *dev, u32 state, int enable);
save_state
----------
Usage:
if (dev->driver && dev->driver->save_state)
dev->driver->save_state(dev,state);
The driver should use this callback to save device state. It should take into
account the current state of the device and the requested state in order to
avoid any unnecessary operations.
For example, a video card that supports all 4 states (D0-D3), all controller
context is preserved when entering D1, but the screen is placed into a low power
state (blanked).
The driver can also interpret this function as a notification that it may be
entering a sleep state in the near future. If it knows that the device cannot
enter the requested state, either because of lack of support for it, or because
the device is middle of some critical operation, then it should fail.
This function should not be used to set any state in the device or the driver
because the device may not actually enter the sleep state (e.g. another driver
later causes causes a global state transition to fail).
Note that in intermediate low power states, a device's I/O and memory spaces may
be disabled and may not be available in subsequent transitions to lower power
states.
suspend
-------
Usage:
if (dev->driver && dev->driver->suspend)
dev->driver->suspend(dev,state);
A driver uses this function to actually transition the device into a low power
state. This should include disabling I/O, IRQs, and bus-mastering, as well as
physically transitioning the device to a lower power state; it may also include
calls to pci_enable_wake().
Bus mastering may be disabled by doing:
pci_disable_device(dev);
For devices that support the PCI PM Spec, this may be used to set the device's
power state to match the suspend() parameter:
pci_set_power_state(dev,state);
The driver is also responsible for disabling any other device-specific features
(e.g blanking screen, turning off on-card memory, etc).
The driver should be sure to track the current state of the device, as it may
obviate the need for some operations.
The driver should update the current_state field in its pci_dev structure in
this function, except for PM-capable devices when pci_set_power_state is used.
resume
------
Usage:
if (dev->driver && dev->driver->suspend)
dev->driver->resume(dev)
The resume callback may be called from any power state, and is always meant to
transition the device to the D0 state.
The driver is responsible for reenabling any features of the device that had
been disabled during previous suspend calls, such as IRQs and bus mastering,
as well as calling pci_restore_state().
If the device is currently in D3, it may need to be reinitialized in resume().
* Some types of devices, like bus controllers, will preserve context in D3hot
(using Vcc power). Their drivers will often want to avoid re-initializing
them after re-entering D0 (perhaps to avoid resetting downstream devices).
* Other kinds of devices in D3hot will discard device context as part of a
soft reset when re-entering the D0 state.
* Devices resuming from D3cold always go through a power-on reset. Some
device context can also be preserved using Vaux power.
* Some systems hide D3cold resume paths from drivers. For example, on PCs
the resume path for suspend-to-disk often runs BIOS powerup code, which
will sometimes re-initialize the device.
To handle resets during D3 to D0 transitions, it may be convenient to share
device initialization code between probe() and resume(). Device parameters
can also be saved before the driver suspends into D3, avoiding re-probe.
If the device supports the PCI PM Spec, it can use this to physically transition
the device to D0:
pci_set_power_state(dev,0);
Note that if the entire system is transitioning out of a global sleep state, all
devices will be placed in the D0 state, so this is not necessary. However, in
the event that the device is placed in the D3 state during normal operation,
this call is necessary. It is impossible to determine which of the two events is
taking place in the driver, so it is always a good idea to make that call.
The driver should take note of the state that it is resuming from in order to
ensure correct (and speedy) operation.
The driver should update the current_state field in its pci_dev structure in
this function, except for PM-capable devices when pci_set_power_state is used.
enable_wake
-----------
Usage:
if (dev->driver && dev->driver->enable_wake)
dev->driver->enable_wake(dev,state,enable);
This callback is generally only relevant for devices that support the PCI PM
spec and have the ability to generate a PME# (Power Management Event Signal)
to wake the system up. (However, it is possible that a device may support
some non-standard way of generating a wake event on sleep.)
Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's
PM Capabilties describe what power states the device supports generating a
wake event from:
+------------------+
| Bit | State |
+------------------+
| 11 | D0 |
| 12 | D1 |
| 13 | D2 |
| 14 | D3hot |
| 15 | D3cold |
+------------------+
A device can use this to enable wake events:
pci_enable_wake(dev,state,enable);
Note that to enable PME# from D3cold, a value of 4 should be passed to
pci_enable_wake (since it uses an index into a bitmask). If a driver gets
a request to enable wake events from D3, two calls should be made to
pci_enable_wake (one for both D3hot and D3cold).
5. Resources
~~~~~~~~~~~~
PCI Local Bus Specification
PCI Bus Power Management Interface Specification
http://pcisig.org

View File

@@ -0,0 +1,79 @@
System Power Management States
The kernel supports three power management states generically, though
each is dependent on platform support code to implement the low-level
details for each state. This file describes each state, what they are
commonly called, what ACPI state they map to, and what string to write
to /sys/power/state to enter that state
State: Standby / Power-On Suspend
ACPI State: S1
String: "standby"
This state offers minimal, though real, power savings, while providing
a very low-latency transition back to a working system. No operating
state is lost (the CPU retains power), so the system easily starts up
again where it left off.
We try to put devices in a low-power state equivalent to D1, which
also offers low power savings, but low resume latency. Not all devices
support D1, and those that don't are left on.
A transition from Standby to the On state should take about 1-2
seconds.
State: Suspend-to-RAM
ACPI State: S3
String: "mem"
This state offers significant power savings as everything in the
system is put into a low-power state, except for memory, which is
placed in self-refresh mode to retain its contents.
System and device state is saved and kept in memory. All devices are
suspended and put into D3. In many cases, all peripheral buses lose
power when entering STR, so devices must be able to handle the
transition back to the On state.
For at least ACPI, STR requires some minimal boot-strapping code to
resume the system from STR. This may be true on other platforms.
A transition from Suspend-to-RAM to the On state should take about
3-5 seconds.
State: Suspend-to-disk
ACPI State: S4
String: "disk"
This state offers the greatest power savings, and can be used even in
the absence of low-level platform support for power management. This
state operates similarly to Suspend-to-RAM, but includes a final step
of writing memory contents to disk. On resume, this is read and memory
is restored to its pre-suspend state.
STD can be handled by the firmware or the kernel. If it is handled by
the firmware, it usually requires a dedicated partition that must be
setup via another operating system for it to use. Despite the
inconvenience, this method requires minimal work by the kernel, since
the firmware will also handle restoring memory contents on resume.
If the kernel is responsible for persistantly saving state, a mechanism
called 'swsusp' (Swap Suspend) is used to write memory contents to
free swap space. swsusp has some restrictive requirements, but should
work in most cases. Some, albeit outdated, documentation can be found
in Documentation/power/swsusp.txt.
Once memory state is written to disk, the system may either enter a
low-power state (like ACPI S4), or it may simply power down. Powering
down offers greater savings, and allows this mechanism to work on any
system. However, entering a real low-power state allows the user to
trigger wake up events (e.g. pressing a key or opening a laptop lid).
A transition from Suspend-to-Disk to the On state should take about 30
seconds, though it's typically a bit more with the current
implementation.

View File

@@ -0,0 +1,235 @@
From kernel/suspend.c:
* BIG FAT WARNING *********************************************************
*
* If you have unsupported (*) devices using DMA...
* ...say goodbye to your data.
*
* If you touch anything on disk between suspend and resume...
* ...kiss your data goodbye.
*
* If your disk driver does not support suspend... (IDE does)
* ...you'd better find out how to get along
* without your data.
*
* If you change kernel command line between suspend and resume...
* ...prepare for nasty fsck or worse.
*
* If you change your hardware while system is suspended...
* ...well, it was not good idea.
*
* (*) suspend/resume support is needed to make it safe.
You need to append resume=/dev/your_swap_partition to kernel command
line. Then you suspend by
echo shutdown > /sys/power/disk; echo disk > /sys/power/state
. If you feel ACPI works pretty well on your system, you might try
echo platform > /sys/power/disk; echo disk > /sys/power/state
Article about goals and implementation of Software Suspend for Linux
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Author: Gábor Kuti
Last revised: 2003-10-20 by Pavel Machek
Idea and goals to achieve
Nowadays it is common in several laptops that they have a suspend button. It
saves the state of the machine to a filesystem or to a partition and switches
to standby mode. Later resuming the machine the saved state is loaded back to
ram and the machine can continue its work. It has two real benefits. First we
save ourselves the time machine goes down and later boots up, energy costs
are real high when running from batteries. The other gain is that we don't have to
interrupt our programs so processes that are calculating something for a long
time shouldn't need to be written interruptible.
swsusp saves the state of the machine into active swaps and then reboots or
powerdowns. You must explicitly specify the swap partition to resume from with
``resume='' kernel option. If signature is found it loads and restores saved
state. If the option ``noresume'' is specified as a boot parameter, it skips
the resuming.
In the meantime while the system is suspended you should not add/remove any
of the hardware, write to the filesystems, etc.
Sleep states summary
====================
There are three different interfaces you can use, /proc/acpi should
work like this:
In a really perfect world:
echo 1 > /proc/acpi/sleep # for standby
echo 2 > /proc/acpi/sleep # for suspend to ram
echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power conservative
echo 4 > /proc/acpi/sleep # for suspend to disk
echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system
and perhaps
echo 4b > /proc/acpi/sleep # for suspend to disk via s4bios
Frequently Asked Questions
==========================
Q: well, suspending a server is IMHO a really stupid thing,
but... (Diego Zuccato):
A: You bought new UPS for your server. How do you install it without
bringing machine down? Suspend to disk, rearrange power cables,
resume.
You have your server on UPS. Power died, and UPS is indicating 30
seconds to failure. What do you do? Suspend to disk.
Ethernet card in your server died. You want to replace it. Your
server is not hotplug capable. What do you do? Suspend to disk,
replace ethernet card, resume. If you are fast your users will not
even see broken connections.
Q: Maybe I'm missing something, but why don't the regular I/O paths work?
A: We do use the regular I/O paths. However we cannot restore the data
to its original location as we load it. That would create an
inconsistent kernel state which would certainly result in an oops.
Instead, we load the image into unused memory and then atomically copy
it back to it original location. This implies, of course, a maximum
image size of half the amount of memory.
There are two solutions to this:
* require half of memory to be free during suspend. That way you can
read "new" data onto free spots, then cli and copy
* assume we had special "polling" ide driver that only uses memory
between 0-640KB. That way, I'd have to make sure that 0-640KB is free
during suspending, but otherwise it would work...
suspend2 shares this fundamental limitation, but does not include user
data and disk caches into "used memory" by saving them in
advance. That means that the limitation goes away in practice.
Q: Does linux support ACPI S4?
A: Yes. That's what echo platform > /sys/power/disk does.
Q: My machine doesn't work with ACPI. How can I use swsusp than ?
A: Do a reboot() syscall with right parameters. Warning: glibc gets in
its way, so check with strace:
reboot(LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, 0xd000fce2)
(Thanks to Peter Osterlund:)
#include <unistd.h>
#include <syscall.h>
#define LINUX_REBOOT_MAGIC1 0xfee1dead
#define LINUX_REBOOT_MAGIC2 672274793
#define LINUX_REBOOT_CMD_SW_SUSPEND 0xD000FCE2
int main()
{
syscall(SYS_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2,
LINUX_REBOOT_CMD_SW_SUSPEND, 0);
return 0;
}
Also /sys/ interface should be still present.
Q: What is 'suspend2'?
A: suspend2 is 'Software Suspend 2', a forked implementation of
suspend-to-disk which is available as separate patches for 2.4 and 2.6
kernels from swsusp.sourceforge.net. It includes support for SMP, 4GB
highmem and preemption. It also has a extensible architecture that
allows for arbitrary transformations on the image (compression,
encryption) and arbitrary backends for writing the image (eg to swap
or an NFS share[Work In Progress]). Questions regarding suspend2
should be sent to the mailing list available through the suspend2
website, and not to the Linux Kernel Mailing List. We are working
toward merging suspend2 into the mainline kernel.
Q: A kernel thread must voluntarily freeze itself (call 'refrigerator').
I found some kernel threads that don't do it, and they don't freeze
so the system can't sleep. Is this a known behavior?
A: All such kernel threads need to be fixed, one by one. Select the
place where the thread is safe to be frozen (no kernel semaphores
should be held at that point and it must be safe to sleep there), and
add:
if (current->flags & PF_FREEZE)
refrigerator(PF_FREEZE);
If the thread is needed for writing the image to storage, you should
instead set the PF_NOFREEZE process flag when creating the thread.
Q: What is the difference between between "platform", "shutdown" and
"firmware" in /sys/power/disk?
A:
shutdown: save state in linux, then tell bios to powerdown
platform: save state in linux, then tell bios to powerdown and blink
"suspended led"
firmware: tell bios to save state itself [needs BIOS-specific suspend
partition, and has very little to do with swsusp]
"platform" is actually right thing to do, but "shutdown" is most
reliable.
Q: I do not understand why you have such strong objections to idea of
selective suspend.
A: Do selective suspend during runtime power managment, that's okay. But
its useless for suspend-to-disk. (And I do not see how you could use
it for suspend-to-ram, I hope you do not want that).
Lets see, so you suggest to
* SUSPEND all but swap device and parents
* Snapshot
* Write image to disk
* SUSPEND swap device and parents
* Powerdown
Oh no, that does not work, if swap device or its parents uses DMA,
you've corrupted data. You'd have to do
* SUSPEND all but swap device and parents
* FREEZE swap device and parents
* Snapshot
* UNFREEZE swap device and parents
* Write
* SUSPEND swap device and parents
Which means that you still need that FREEZE state, and you get more
complicated code. (And I have not yet introduce details like system
devices).
Q: There don't seem to be any generally useful behavioral
distinctions between SUSPEND and FREEZE.
A: Doing SUSPEND when you are asked to do FREEZE is always correct,
but it may be unneccessarily slow. If you want USB to stay simple,
slowness may not matter to you. It can always be fixed later.
For devices like disk it does matter, you do not want to spindown for
FREEZE.
Q: After resuming, system is paging heavilly, leading to very bad interactivity.
A: Try running
cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null
after resume. swapoff -a; swapon -a may also be usefull.

View File

@@ -0,0 +1,27 @@
swsusp/S3 tricks
~~~~~~~~~~~~~~~~
Pavel Machek <pavel@suse.cz>
If you want to trick swsusp/S3 into working, you might want to try:
* go with minimal config, turn off drivers like USB, AGP you don't
really need
* turn off APIC and preempt
* use ext2. At least it has working fsck. [If something seemes to go
wrong, force fsck when you have a chance]
* turn off modules
* use vga text console, shut down X. [If you really want X, you might
want to try vesafb later]
* try running as few processes as possible, preferably go to single
user mode.
* due to video issues, swsusp should be easier to get working than
S3. Try that first.
When you make it work, try to find out what exactly was it that broke
suspend, and preferably fix that.

View File

@@ -0,0 +1,169 @@
Video issues with S3 resume
~~~~~~~~~~~~~~~~~~~~~~~~~~~
2003-2005, Pavel Machek
During S3 resume, hardware needs to be reinitialized. For most
devices, this is easy, and kernel driver knows how to do
it. Unfortunately there's one exception: video card. Those are usually
initialized by BIOS, and kernel does not have enough information to
boot video card. (Kernel usually does not even contain video card
driver -- vesafb and vgacon are widely used).
This is not problem for swsusp, because during swsusp resume, BIOS is
run normally so video card is normally initialized. S3 has absolutely
no chance of working with SMP/HT. Be sure it to turn it off before
testing (swsusp should work ok, OTOH).
There are a few types of systems where video works after S3 resume:
(1) systems where video state is preserved over S3.
(2) systems where it is possible to call the video BIOS during S3
resume. Unfortunately, it is not correct to call the video BIOS at
that point, but it happens to work on some machines. Use
acpi_sleep=s3_bios.
(3) systems that initialize video card into vga text mode and where
the BIOS works well enough to be able to set video mode. Use
acpi_sleep=s3_mode on these.
(4) on some systems s3_bios kicks video into text mode, and
acpi_sleep=s3_bios,s3_mode is needed.
(5) radeon systems, where X can soft-boot your video card. You'll need
new enough X, and plain text console (no vesafb or radeonfb), see
http://www.doesi.gmxhome.de/linux/tm800s3/s3.html. Actually you
should probably use vbetool (6) instead.
(6) other radeon systems, where vbetool is enough to bring system back
to life. It needs text console to be working. Do vbetool vbestate
save > /tmp/delme; echo 3 > /proc/acpi/sleep; vbetool post; vbetool
vbestate restore < /tmp/delme; setfont <whatever>, and your video
should work.
(7) on some systems, it is possible to boot most of kernel, and then
POSTing bios works. Ole Rohne has patch to do just that at
http://dev.gentoo.org/~marineam/patch-radeonfb-2.6.11-rc2-mm2.
Now, if you pass acpi_sleep=something, and it does not work with your
bios, you'll get a hard crash during resume. Be careful. Also it is
safest to do your experiments with plain old VGA console. The vesafb
and radeonfb (etc) drivers have a tendency to crash the machine during
resume.
You may have a system where none of above works. At that point you
either invent another ugly hack that works, or write proper driver for
your video card (good luck getting docs :-(). Maybe suspending from X
(proper X, knowing your hardware, not XF68_FBcon) might have better
chance of working.
Table of known working systems:
Model hack (or "how to do it")
------------------------------------------------------------------------------
Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI
Acer TM 242FX vbetool (6)
Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6)
Acer TM 4052LCi s3_bios (2)
Acer TM 636Lci s3_bios vga=normal (2)
Acer TM 650 (Radeon M7) vga=normal plus boot-radeon (5) gets text console back
Acer TM 660 ??? (*)
Acer TM 800 vga=normal, X patches, see webpage (5) or vbetool (6)
Acer TM 803 vga=normal, X patches, see webpage (5) or vbetool (6)
Acer TM 803LCi vga=normal, vbetool (6)
Arima W730a vbetool needed (6)
Asus L2400D s3_mode (3)(***) (S1 also works OK)
Asus L3800C (Radeon M7) s3_bios (2) (S1 also works OK)
Asus M6NE ??? (*)
Athlon64 desktop prototype s3_bios (2)
Compal CL-50 ??? (*)
Compaq Armada E500 - P3-700 none (1) (S1 also works OK)
Compaq Evo N620c vga=normal, s3_bios (2)
Dell 600m, ATI R250 Lf none (1), but needs xorg-x11-6.8.1.902-1
Dell D600, ATI RV250 vga=normal and X, or try vbestate (6)
Dell Inspiron 4000 ??? (*)
Dell Inspiron 500m ??? (*)
Dell Inspiron 600m ??? (*)
Dell Inspiron 8200 ??? (*)
Dell Inspiron 8500 ??? (*)
Dell Inspiron 8600 ??? (*)
eMachines athlon64 machines vbetool needed (6) (someone please get me model #s)
HP NC6000 s3_bios, may not use radeonfb (2); or vbetool (6)
HP NX7000 ??? (*)
HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X
HP Omnibook XE3 athlon version none (1)
HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV
IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work.
IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(]
IBM TP R32 / Type 2658-MMG none (1)
IBM TP R40 2722B3G ??? (*)
IBM TP R50p / Type 1832-22U s3_bios (2)
IBM TP R51 ??? (*)
IBM TP T30 236681A ??? (*)
IBM TP T40 / Type 2373-MU4 none (1)
IBM TP T40p none (1)
IBM TP R40p s3_bios (2)
IBM TP T41p s3_bios (2), switch to X after resume
IBM TP T42 ??? (*)
IBM ThinkPad T42p (2373-GTG) s3_bios (2)
IBM TP X20 ??? (*)
IBM TP X30 ??? (*)
IBM TP X31 / Type 2672-XXH none (1), use radeontool (http://fdd.com/software/radeon/) to turn off backlight.
IBM Thinkpad X40 Type 2371-7JG s3_bios,s3_mode (4)
Medion MD4220 ??? (*)
Samsung P35 vbetool needed (6)
Sharp PC-AR10 (ATI rage) none (1)
Sony Vaio PCG-F403 ??? (*)
Sony Vaio PCG-N505SN ??? (*)
Sony Vaio vgn-s260 X or boot-radeon can init it (5)
Toshiba Libretto L5 none (1)
Toshiba Satellite 4030CDT s3_mode (3)
Toshiba Satellite 4080XCDT s3_mode (3)
Toshiba Satellite 4090XCDT ??? (*)
Toshiba Satellite P10-554 s3_bios,s3_mode (4)(****)
Uniwill 244IIO ??? (*)
(*) from http://www.ubuntulinux.org/wiki/HoaryPMResults, not sure
which options to use. If you know, please tell me.
(***) To be tested with a newer kernel.
(****) Not with SMP kernel, UP only.
VBEtool details
~~~~~~~~~~~~~~~
(with thanks to Carl-Daniel Hailfinger)
First, boot into X and run the following script ONCE:
#!/bin/bash
statedir=/root/s3/state
mkdir -p $statedir
chvt 2
sleep 1
vbetool vbestate save >$statedir/vbe
To suspend and resume properly, call the following script as root:
#!/bin/bash
statedir=/root/s3/state
curcons=`fgconsole`
fuser /dev/tty$curcons 2>/dev/null|xargs ps -o comm= -p|grep -q X && chvt 2
cat /dev/vcsa >$statedir/vcsa
sync
echo 3 >/proc/acpi/sleep
sync
vbetool post
vbetool vbestate restore <$statedir/vbe
cat $statedir/vcsa >/dev/vcsa
rckbd restart
chvt $[curcons%6+1]
chvt $curcons
Unless you change your graphics card or other hardware configuration,
the state once saved will be OK for every resume afterwards.
NOTE: The "rckbd restart" command may be different for your
distribution. Simply replace it with the command you would use to
set the fonts on screen.

View File

@@ -0,0 +1,34 @@
This driver implement the ACPI Extensions For Display Adapters
for integrated graphics devices on motherboard, as specified in
ACPI 2.0 Specification, Appendix B, allowing to perform some basic
control like defining the video POST device, retrieving EDID information
or to setup a video output, etc. Note that this is an ref. implementation only.
It may or may not work for your integrated video device.
Interfaces exposed to userland through /proc/acpi/video:
VGA/info : display the supported video bus device capability like ,Video ROM, CRT/LCD/TV.
VGA/ROM : Used to get a copy of the display devices' ROM data (up to 4k).
VGA/POST_info : Used to determine what options are implemented.
VGA/POST : Used to get/set POST device.
VGA/DOS : Used to get/set ownership of output switching:
Please refer ACPI spec B.4.1 _DOS
VGA/CRT : CRT output
VGA/LCD : LCD output
VGA/TV : TV output
VGA/*/brightness : Used to get/set brightness of output device
Notify event through /proc/acpi/event:
#define ACPI_VIDEO_NOTIFY_SWITCH 0x80
#define ACPI_VIDEO_NOTIFY_PROBE 0x81
#define ACPI_VIDEO_NOTIFY_CYCLE 0x82
#define ACPI_VIDEO_NOTIFY_NEXT_OUTPUT 0x83
#define ACPI_VIDEO_NOTIFY_PREV_OUTPUT 0x84
#define ACPI_VIDEO_NOTIFY_CYCLE_BRIGHTNESS 0x82
#define ACPI_VIDEO_NOTIFY_INC_BRIGHTNESS 0x83
#define ACPI_VIDEO_NOTIFY_DEC_BRIGHTNESS 0x84
#define ACPI_VIDEO_NOTIFY_ZERO_BRIGHTNESS 0x85
#define ACPI_VIDEO_NOTIFY_DISPLAY_OFF 0x86