How To Write A Device Driver That Doesn’t Break On Partitioned Systems

How To Write A Device
Driver That Doesn’t Break
On Partitioned Systems
Mike Tricker
Program Manager
Windows Kernel
Microsoft Corporation
Bruce Sherwin
Software Design Engineer
Windows Kernel
Microsoft Corporation
Dynamic Partitioning (DP) on Windows
Server codename “Longhorn”
Kernel driver reminders
Driver considerations on
partitionable systems
Call to action
Dynamic Partitioning
Windows Server
Longhorn dynamic
hardware partitioning
features are focused on
improving server RAS
IO Bridge
IO Bridge
IO Bridge
IO Bridge
Partition Manager
PCI Express
Example Hardware Partitionable Server
1. Partition Manager provides the UI
for partition creation and management
2. Service Processor controls the inter
processor and IO connections
3. Platforms partitionable to the socket level;
virtualization used for sub socket partitioning
4. Support for dynamic partitioning and
socket replacement
Motivation For DP
This is being driven in part by the availability of
multi-core processors
In a year or two a 4-processor system will have the performance
of a 32-processor system today
And will require Enterprise-class RAS features
In Windows Server Longhorn DP is focused primarily
on Reliability, Availability and Serviceability (RAS)
Minimizing unplanned downtime due to failing hardware
Replacing hardware that is showing signs of impending failure
Capacity on demand
Adding system resources as needed
What We Are Doing
Windows Server Longhorn 64-bit will add support
for Hot Add of processors
Together with PCI Express native hot plug of devices
And continues to support Hot Add memory
It will also support Hot Replace of memory
and processors
Transparent to applications
Drivers receive power IRPs
Windows Server Longhorn is planned to support
Hot Add of I/O host bridges
Supporting devices using line-based interrupts
DP will not be available on all Server SKUs
Kernel Driver Reminders
All 64-bit x64 kernel drivers must be signed or
they will not load on Windows Server Longhorn
or Windows Vista
Drivers that manipulate critical kernel data
structures will cause the system to bugcheck,
for example
System service tables
Kernel stacks not allocated by the kernel
Patching any part of the kernel
Taking Advantage Of DP
In Your Device Driver
DP-Aware Drivers
Receiving Notifications
Use IoRegisterPlugPlayNotification()
EventCategory is
The GUID used in the
structure will be
For Hot Add memory: GUID_DEVICE_MEMORY
For Hot Add processor: GUID_DEVICE_PROCESSOR
Include WDM.H and POCLASS.H
DP-Aware Drivers
Memory Changes
Note: Hot Add memory does not affect
either paged or non-paged pool sizes in
Windows Server Longhorn
Drivers may attempt to allocate more
physical memory after receiving the Hot
Add memory notification
Since memory usage is fundamentally
competitive they should handle the case
of an allocation failure as they (should)
do already
DP-Aware Drivers
Processor Changes
Drivers that care about the number of active processors
should call
to update their count
Do not use KeNumberProcessors – it’s undocumented and is
not static
If they receive a Hot Add processor notification
they must call this DDI again to see what
Which gives them both the updated count and the
affinity mask
So they can update their internal count
Affinity Mask Manipulation
Avoid affinity manipulation “by hand”; use
appropriate RTL APIs for affinity manipulation
when absolutely required
As mentioned on the previous slide
should do exactly what you need
As well as returning the count of active processors it
also returns the affinity mask:
Count = KeQueryActiveProcessorCount(
This will provide you with a current mask that
you can use as you do today
Processor Scaling
If a driver uses per-processor worker threads
Create new ones per newly added processor
Load balancing algorithms should be Hot Add
processor-aware to ensure scalability
As processors are added workloads may need to
rebalance between threads
Miniport developers should ask the class driver
owners about possible behavior changes on
Hot Add
How the miniport could benefit from the
additional resources
Per-Processor Data
Using arrays (or “slotted” data structures) with one entry
per processor works well
Either allocate enough memory to handle the maximum number
of processors possible for that architecture when creating the
data structure
And accept the overhead when the additional processors
are not present
Or use a data structure that can be grown dynamically as
processors are added
So long as your driver gets notified when processors are Hot Added
Provide wrapper functions to hide the complexity of
accessing the entries using the processor number as
the index
E.g., GetData(Processor) and SetData(Processor)
Using Per-Processor Data
// Register for PnP notification
status = IoRegisterPlugPlayNotification(
(PVOID) PerProcessorData
IN PVOID NotificationStructure,
IN PVOID Context
// Get current processor count and affinity
newNumberOfProcessors =
Using Per-Processor Data
if (newNumberOfProcessors > oldNumberOfProcessors) {
// Expand existing per-processor data structures to be aware of new
// Create a thread that runs with the new processor affinity
(PVOID) &newActiveProcessorMask);
MyWorkerThread(IN OUT PVOID Context)
PKAFFINITY affinityMask = (PKAFFINITY)Context;
// Set thread affinity
Interrupt Targeting
To a New Processor
System triggers a tree-wide rebalance
Allows interrupt-consuming device
drivers to connect interrupts to newly
added processors
DPCs queued from an ISR will run on the
same processor by default
Please don’t fail QUERY_STOP or you
will prevent the system from distributing
interrupts to new processors
DP-Aware Drivers
Resource Rebalance
Started State
Stop-Pending State
Stopped State
Failed Restart (5)
Connecting Interrupts
Windows will include new processor in
the affinity for interrupt resource assigned
during rebalance
New resources passed in
Driver should call
IoConnectInterruptEx() using this
new affinity
This will connect the device interrupt
to new processor
Application Implications
Running processes will not change affinity by default
for compatibility
They can receive notifications and change their affinity if desired
New processes will take advantage of new processor(s)
System process affinity will be changed to include the
newly added processors
Worker threads run in system process
User Mode IOCTLs
Applications may issue custom IOCTLs to a partner
device driver
~40% of “tier 1” applications we test today install drivers to
handle specific tasks
In this case both need to register for Hot Add processor
notifications otherwise they may get out of sync
For example the application may issue IOCTLs on
a thread running on a processor the driver doesn’t
know about
If the driver uses per-processor data, then it needs to register
for hot add processor notification and update its data structures
Hot Replace Flow
System state is migrated in stages
1. Paged memory is copied
2. System is quiesced
Stop DMA
Stop interrupts
Devices moved to D3
Quiesce duration ~1 second
3. Non-paged memory is copied
Processor state is migrated
4. System is resumed
DMA and interrupts resumed
Drivers return to D0
Hot Replace Implications
For Drivers
Hot Replace uses a pseudo-S4 state
where we temporarily “hibernate” the
system to quiesce devices (stop DMA
operations and stop interrupts)
But without actually using a hiberfile
Your driver therefore needs to handle
power S IRP and D IRP requests
Note: This includes IA-64
System Implications
For Hot Replace
Applications do not see a system change
when a Hot Replace operation occurs
Amount of physical memory is unchanged
Number of logical processor is unchanged
This is deliberate to avoid application
compatibility issues
Hot Replace is treated as an
atomic operation
Hot Add Processor
Bruce Sherwin
Software Design Engineer
Windows Kernel
Related Issues
NUMA Behavior
NUMA is becoming widespread and
should be transparent in most cases
If the platform supports the _PXM ACPI
method you will automatically use NUMA
local memory at AddDevice and
StartDevice time
This ensures your device extensions and
common buffers are allocated out of NUMA
local memory
Today's DP-capable systems are typically large,
restricting access for driver developers
We expect that to change in the next few years
In the meantime if your device needs to work
with DP we expect you to work with the OEM on
whose system you’ll ship to ensure a great
customer experience
Also work with Microsoft, since we’re also testing
these systems
If you’re already in-box and can run on these
systems you should be covered
Future Technologies
Indicating DP Support
Detecting If DP Is Available
We’ve been asked how to determine if a system
actually supports DP
Since OEMs may ship very similar systems with DP
as a high profile RAS feature
And thus not enabled on lower cost systems
Firmware mechanism to report
platform capabilities
Kernel and user mode API to query
this capability
Exact mechanism TBD
Supporting Hot Remove
Windows Server Longhorn will not support Hot
Remove of memory or processors
When we do we’re concerned about drivers
and applications
Sparse affinity mask and drivers parsing the mask
until finding a missing entry
Memory that’s been pinned for DMA
Drivers using KeNumberProcessors
We’ll have to be very careful about what drivers
we load on such a platform
And consider what may need to be blocked
Call To Action
If you want your 64-bit x64 drivers to load on
Windows Server Longhorn (or Windows Vista)
ensure they’re signed
Ensure that your 64-bit x64 drivers avoid manipulating
undocumented private kernel data structures as they
will bugcheck the system
Please follow the guidelines for good behavior outlined
in this presentation
Realize that DP won’t be “just a high end feature” for
much longer
That will change during the life of Windows Server Longhorn
Ensure that any driver targeting future Windows Server
releases is DP (and NUMA) aware
Additional Resources
Web resources
Dynamic Partitioning home page
White papers
WinHEC 2005 sessions
Kernel patching FAQ
Related sessions
Kernel Enhancements for Windows Server Longhorn
How to Use the WDK to Develop, Sign and Test Drivers
For feedback on all things pertaining to Dynamic Partitioning please
send mail to: dpfb @
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.