Runtime PM support for Intel Linux Graphics – Part 2: the internal parts

Welcome to the second part of the Runtime PM (RPM) documentation. On the first part I gave an overview of the Intel Linux Kernel Graphics Runtime PM infrastructure. On this part I will explain a little bit about the feature design, debugging and our test suite. If you’re someone who has to give maintenance of i915.ko to your users, you should definitely read this.

Disclaimer

I work for Intel and I am a member of its Linux Kernel Graphics team. On this text I will talk about things that are related to my work at Intel, but this text reflects only my own opinion, not necessarily Intel’s opinion. Everything I talk about here is already public information.

Feature design

The design of the i915 RPM is based on reference counting: whenever some code needs to use the graphics device, it needs to call intel_runtime_pm_get(), and whenever the code is done using the hardware, it can call intel_runtime_pm_put(). Whenever the refcount is zero we can runtime suspend the device. This is the most basic concept, but we have some things on top.

An interesting detail of some of our hardware generations is that we have the concept of power wells, which are specific pieces of the hardware that can be powered off independently of the others, saving some power. For example, on Haswell, if you’re just using the eDP panel on pipe A, you can turn off the power well responsible for pipes B and C, saving some power on a usage case that is very common for laptop owners. Check the diagram on page 139 of the public Haswell documentation to see which pieces of the hardware are affected by the power well.

As you can see on drivers/gpu/drm/i915/intel_runtime_pm.c, different platforms have different sets of power wells, so we created the power domain concept in order to map our code abstractions to the different power wells on different platforms. For example, the display code has to grab POWER_DOMAIN_PIPE_B when it uses the pipe B, but the real power wells that will actually be grabbed when this power domain is grabbed depend on the platform. Just like the runtime PM subsystem, the power domains framework is based on reference counting. It is also worth mentioning that whenever any power domain is grabbed, we also call intel_runtime_pm_get().

Besides the power domains, there are other pieces of our code that grab RPM references, such as forcewake code – which prevents RC6 from being enabled – and a few others. To discover them, just grep our code for intel_runtime_pm_get() calls.

The difficulties

Developers: a new paradigm

Before all this got implemented, the state of power management on our driver was as simple as it could be: the hardware was always there for the driver, so any time the driver needed to interact with it – such as when reading or writing registers -, it would succeed. After RPM and the power domains were implemented, the situation changed: the hardware might decide to go away for a nap, so if the driver does not explicitly prevent it from sleeping or explicitly wake it up, all the communication with the hardware will be ignored.

It is also important to remember that when the hardware gets runtime suspended – or when a specific power well is powered down – it may lose some of its state. So if you write a register, runtime suspend, resume, and then read the register, the value you read may not be the value you wrote. So now we need to make sure all the relevant hardware state is reprogrammed whenever we resume – or that we just don’t runtime suspend whenever there is state we need to keep.

And since the developers were all used with the previous model where they never needed to think about grabbing reference counts before doing things, we had a big period of time where regressions after regressions were added. This was a big pain: the developers and reviewers always forgot to grab the refcounts and forgot that the hardware might just go away and lose some of its state. Today the situation is much better since the power management concepts are now usually remembered during code writing or reviewing, but we’re still not regression-free, and we’ll probably never be – unless the developers get replaced by robots or something else.

Driver entry points

Another major problem contributing to the difficulty of RPM is that the graphics driver has way too many entry points. We have a massive interaction with drm.ko, so it calls many of our functions and we call many of its functions. We have a huge number of IOCTLs, some defined by our own driver and some inherited by drm.ko. We also have files on different places, such as sysfs and debugfs. We have the command submission infrastructure, which has IOCTLs as its entry points, but requires the hardware to be awake even after the IOCTL is over. We allow the user space to map graphics memory. We have many workqueues and delayed work functions. All this and more. Most of these interfaces require the hardware to be awake at some point – or to stay awake even after the interface is used – so it is really hard to guarantee that all the possible holes are covered by our reference count get() and put() calls.

Debugging

Based on all the difficulties listed above, it is easy to see that we can’t really be sure that we covered all the holes and that they will stay covered forever, leading to high anxiety levels for the developers and a lot of work for the bug triagers. So in order to reduce our anxiety levels we decided to add code to help us catch these possible problems.

While the upper layer and the driver entry points are many and difficult to check, we have a few specific functions at the end of our call stack that are responsible for actually touching the hardware. So on these functions we added some assertions. One of these assertions is a function called assert_device_not_suspended(), which is called whenever we do a register read or write, and also whenever we release forcewake. Another assertion we have is the unclaimed register checking code: for certain pieces of our hardware, if we try to read from or write to a register that does not exist, a specific bit in a specific register will change, so we can know that we did an operation that was ignored by the hardware. We also have a few other assertions that I can’t remember right now, but the ones explained are the most important. We are also probably missing assertions like these at other points of our code, and patches are welcome. Bug reports too.

As part of the Kernel, we also have some debugfs files that print information about the current state of some features, such as the reference counts for the power wells, among other things.

In addition to the Kernel code, we also use intel-gpu-tools (IGT) to try to catch RPM bugs. First of all, all the existing IGT tests can potentially trigger the assertions above, so all IGT tests are somehow helping test RPM. Second, we also added RPM subtests to some of the tests that already existed before RPM was implemented. Usually these subtests have the word suspend as part of their names. Third, we added a test called pm_rpm that has the goal to explicitly test the areas not covered by the other tests.

Most of the subtests of pm_rpm follow the same script: (i) put the device on a state where it can runtime suspend; (ii) assert it is runtime suspended; (iii) use one of the driver interfaces; (iv) make sure everything is fine, and the operation just done had the desired effect; and finally (v) make sure the device runtime suspends after everything is finished. We do have variations and other special things, but the main idea is this. If you’re interesting in finding out more about the many interfaces between the Kernel and i915.ko, this test is a very good place to look at.

So if you’re some distribution maintainer or someone else interested in making use of i915.ko runtime suspend, please run intel-gpu-tools in order to check the driver sanity. If you can’t run it all, please make sure you at least pass all tests of pm_rpm, and also make sure that none of these tests produce Kernel WARNs, BUGs or any other messages at the error level (hint: dmesg | egrep "(WARN|ERR|BUG)").

I found a problem, your driver suck!

Well, congratulations! You’re one step away from becoming a Linux Kernel Contributor! That is going to look good on your resumé, isn’t it? You’re welcome.

If you read everything up to this point you can already imagine how complex everything is, so I hope you’re not mad at us. The very first thing which you need to do is to report the problem. While we do look at bugs reported through email, the best way to guarantee that your problem won’t be forgotten is to report a bug on freedesktop.org.

Now if you’re feeling adventurous, you can try to discover how to reproduce the bug. Once you do that, you can try to write a test case for intel-gpu-tools – possibly a subtest for pm_rpm.c. And if you really care, you can try to implement new assertions on our Kernel driver to prevent similar bugs. And now your resumé is going to look really good!

Planet Paulo Zanoni

Directly from /dev/random