Click here to learn
about this Sponsor:
Home  |  News  |  Articles  |  Forum

  Home arrow Windows For Devices Articles arrow Reliability in Windows CE Device Drivers

Reliability in Windows CE Device Drivers
By Staff

Rate This Article: Add This Article To:

Foreword: Microsoft Windows CE is a real-time operating system for embedded applications, PDAs, cellular telephones, industrial process controllers, entertainment devices, and machine/appliance controllers. As is typical with modern operating systems, Windows CE interacts with devices and supports protocols through distinct device drivers, which are expressed as separately generated binary...

modules. Reliability of these device drivers is critical to overall reliability of a Windows CE-based product. This technical article examines the elements of reliability and discusses techniques to produce device drivers with high reliability.



Reliability in Windows CE Device Drivers
by Ian King
Microsoft Corp.
Originally published January, 2003


Definitions: What Is a Device Driver?

The term device driver is commonly used to broadly describe a module that interfaces with the CPU, as "virtualized" by the kernel, to other hardware. This definition will be used or the purposes of this paper. A driver may fit into one or more of the following categories:
  • Physical device driver -- A module that interfaces with physical hardware at the register/port level, for example, a serial device driver that directly accesses registers in a serial interface device.

  • Virtual device driver -- A module that interacts with physical hardware and exposes a virtual device interface that is not accessed directly by applications, but rather by other drivers. Network Driver Interface Specification (NDIS) is an example of this type.

  • Protocol driver -- A module that exposes a defined protocol to applications or other drivers. A protocol driver may interface to another driver, for example TCP/IP interfacing with the NDIS driver or directly to hardware, such as an IEEE 1394 driver.

  • Service driver -- A distinct, separable module that provides a service to applications. Windows CE provides a Power Manager service that interfaces with other drivers and interacts with applications to implement power management policies. Some examples of power management policies are "sleep after five minutes of inactivity in the shell" or "wake the device if a network packet is received". Note that a service does not necessarily imply a service driver. There are other services, such as Simple Object Access Protocol (SOAP) that are not drivers; these additional services are loaded by services.exe.
Note that these definitions are offered to define the broad scope of this discussion, and to include all of these conceptual elements in the discussion.

Implementation

Windows CE implements device drivers as dynamic-link libraries (DLLs). Briefly, device drivers are loaded (brought into the active memory space of the system) as follows:
  1. A special module called the Device Manager exposes interfaces that are invoked to load drivers.
  2. At system initialization, Device Manager is called to load boot time drivers, which are named under a special key in the system registry.
  3. Device Manager enumerates that registry branch, initializing all drivers by bringing the DLL into memory, calling that driver's initialization code (within the DLL), —for example, memory space and I/O ports, as necessary. In its initialization, the driver must call the I/O Resource Manager to discover what resources it may assign. I/O Resource Manager does the bookkeeping necessary to avoid conflicts.
    Note: There is more than one mechanism by which a device driver can make its interfaces available to the operating system and applications.
Device drivers can also be loaded after system initialization, through explicit calls to interfaces exposed by Device Manager. In addition, there are some drivers loaded prior to Device Manager being called, because they are necessary for system initialization. This is handled in the OEM Adaptation Layer (OAL).

Why Are Device Drivers a Critical Point of Failure?

In Windows CE, as in most operating systems, device drivers interact intimately with both the kernel and physical devices. Device drivers are commonly run in the most privileged level of a multi-level operating system. For example, kernel mode in UNIX and Ring 0 in Windows, primarily for performance reasons—it is not necessary for the driver to repeatedly invoke system calls, with the resulting overhead.
Note: Windows CE provides a model that varies from this generalization, with the goal of minimizing this risk. This does not invalidate the considerations addressed in this paper.

Device drivers often manipulate physical memory directly; they are not subject to the checks and balances that may be imposed by a virtual memory manager. Finally, device drivers are the "public face" of the kernel. If the device driver fails, the kernel may continue its work. But the customers of the kernel, which can include users, controlled devices or applications, see none of that work. This makes the system of little use to the user.

What Is "Reliability" for a Device Driver?

The simple, but incomplete, answer to this question is that it keeps working. Although true at the highest level, a decomposition of this concept is essential to getting a complete explanation.
  • A driver must perform its work -- Device drivers that are untested or that are still in development may not function according to the assumed definition of functionality. Sometimes the problem is in the implementation, and other times the documentation is incorrect, deficient or non-existent. A driver's interfaces should be clearly documented with domain (acceptable inputs) and range (expected outputs and behaviors). Interaction with the operating system or application outside the interfaces, for example, resource requirements, error returns and exceptions thrown, should also be clearly documented. The documentation should shape the testing of the driver -- the testing designers should work from the same information that will be supplied to the consumer of the driver, that is, the developer of an embedded system.

  • A driver must use allocable resources responsibly -- Device drivers are typically active within the memory space of the system for extended periods -- often the entire time the system is operational, which may be for weeks, months or years for an embedded system. Any leakage of resources, in which resources are acquired and then not released, is considered a flaw in software design. In a device driver, this is almost always a fatal flaw, exacerbated by the driver's life span. In Windows CE, resources that must be responsibly conserved are:
    • Memory
    • I/O space for processors that support it
    • Object handles (a memory object, a limited resource by design)
    • Device contexts (a GDI concept, essentially a type of object handle, but subject to different limitations)

  • A driver must tolerate faults in the hardware -- Hardware sometimes fails. The associated device driver(s) must recognize the failure and, at the very least, not cause system software failure as a result. A well-designed driver should offer the opportunity for the system to discover the failure and take appropriate action (alert an operator, use alternate hardware and so on), through appropriate interaction with the kernel and/or application. Removable hardware presents some similar challenges, although this is not considered to be a failure in the hardware.

  • A driver must tolerate faults in the software -- If other software doesn't work properly, a well-designed device driver can either gracefully fail or recover after an error condition is corrected. For instance, another system element (such as an application) may consume large amounts of system memory, leaving the driver with insufficient memory to function. In this event, a device driver should indicate, through some error reporting mechanism, that it cannot complete its operations. When memory is again available, the driver should resume function without intervention.

  • A driver must not compromise system security -- A device driver typically operates in the highest privilege mode of a processor for processors that support multiple privilege levels. They can, however, also interact with applications, which operate at a lower privilege. A poorly designed driver can allow an application to gain access to privileges at either the kernel level or, in the worst-case scenario, the system level. The former allows denial of service attacks or co-opting of system resources; the latter may allow access to user passwords, certificates or other high-level objects.

  • A driver must use processor time responsibly -- A device driver typically operates with high priority in the system, as represented by thread and/or interrupt priority. Abuse of this privilege means excessive use of CPU time to the exclusion of other system elements. End users do not see a device driver consuming resources -- they see a poorly responding user interface or slow interaction between the device and other devices or systems.

  • A driver must be manageable -- Windows CE offers services, such as Power Manager, that can be used by designers to build a system that meets related criteria, for example battery life. A well-designed driver should respond accurately and promptly to such services, if designed to do so; the underlying code must be more than "return TRUE", and must generate an actual change in state. Code that purportedly enables, quiesces or otherwise changes the state of a driver, must work, regardless of other aspects of the driver's state. If a driver cannot make the requested transition because of other state elements, the request should cause an error return or notification. Quiet failures are bad.
A driver designed with these elements in mind will contribute to the reliability of the system as a whole.

Although it is not directly related to the reliability of a particular module, software is often reused or revised. Reliability of future implementations built on a given code base can be enhanced by ensuring that the driver code is maintainable. Programmers interested in performance will often write very tight, terse code. Code written under these circumstances also tends to be sparsely commented. Modern compilers are remarkably efficient, and in nearly every case the compiler will optimize clearly written code into the same sequence of instructions generated for terse, inscrutable code. Indeed, some programming "tricks" can frustrate the efforts of optimizing compilers and produce code that is less efficient.

Another element of maintainability is the quality of documentation. All design documentation should be self-explanatory and inclusive. Failure to preserve the rationale behind a design will hamper future efforts to reuse the code; in many cases, it is actually a better practice to write completely new code and discard the former work and lose its value.

How Is Reliability Compromised?

There are many ways to improve the reliability of device drivers and minimize system failures related to drivers. Most of these methods are related to design and implementation, but some are external elements that can minimize the impact of driver failures.

The following section might implicitly assume a system in which device drivers are added after initial generation of the system image. Some of the scenarios will be moot for a closed system that cannot be extended. Nonetheless, attention to these issues will make system code easier to maintain by its developer.

Memory Management

A classic failure in device drivers is a memory leak that consumes all system resources. In a Windows CE system, there are numerous types of resources that can be consumed and not recycled. As is true in ecosystems, this is a recipe for disaster. Some resources that can be consumed and not recycled are:
  • Physical memory (the portion not managed by the operating system)
  • Heap memory
  • Object handles, which are used for
    • Files
    • File-mapping objects
    • Communications devices and other stream interfaces
    • Databases and database enumeration contexts
    • Events
    • Mutexes
    • Processes
    • Sockets
    • Threads
  • GDI device contexts (DCs)
At the base of these resources, they are nearly all memory objects; however, some resources, such as handles and Device Contexts, are also subject to system design limits.

In Windows CE, all drivers are composed of an interrupt service routine (ISR) that is called by the kernel's exception handler in response to a hardware interrupt, and an interrupt service thread (IST) that responds to a signal by the ISR. The ISR runs in kernel mode, and the ISR typically does little work; actual device servicing is usually performed in the IST. In some real-time systems, the ISR bears the load. ISRs are sometimes written in assembler, for better performance. The IST typically runs in user mode; special API calls exist to allocate memory that can be accessed by both the kernel-mode ISR and user-mode IST. See the Platform Builder CEDDK documentation regarding these APIs.

Because the ISR usually does little processing, the IST is commonly the source of resource leaks. The typical IST, when started, performs some initialization and then goes into an infinite loop in which it blocks on a system event. The kernel exception handler signals the event when the ISR notifies it of an interrupt.

A typical programmatic structure is a while-loop with a Boolean variable as its argument. The variable is set to true until the thread is ordered to terminate, and then the variable is set to false and the thread completes as the while-loop terminates. Note that it is possible to perform work, for example reclamation of resources, after the while-loop terminates.

In this loop, leaks are generated by creation of objects that are not freed by either the IST or a thread that consumes the IST's objects. As an example of the latter often-overlooked scenario, Power Manager notification messages are typically queued in a message queue created by an application that has requested those notifications. If the application that receives the messages fails to consume them and reclaim their memory, that memory is lost to the system.

If a driver (for instance, a PCMCIA client driver) supports insertion and deletion of the device at system runtime (hot swapping), the IST may be started, stopped and restarted. Prior to Windows CE 4.1, suspend/resume also caused stopping and starting of PCMCIA client drivers, which allows for the same error scenario described here. If allocated resources are not either reused or released, new resources may be allocated for each insertion. Further, named devices are usually identified by a three-letter sequence followed by a sequential numerical index, for example, COM1. Upon removal, this index must either be released and possibly recycled or preserved for the next insertion of the device. The first option is usually preferable. This defect would likely lie in the IST's initialization code. Given that Windows CE devices often remain operational, even though UI is suspended, for an extended period of time, even a small leak can create a problem. A system could potentially work for weeks or months, and then mysteriously fail. This is different from the desktop computer experience, in which a system will often be shut down when not in use.

In some cases, a driver needs to create and destroy objects in memory in the course of its operation. If the driver is manipulating a high-speed data stream, for example network data, this cycle can be repeated a very large number of times. This can lead to fragmentation of heap memory, which negatively impacts performance and, in a system in which it is likely to occur, may lead to lost data. Designers should consider custom memory management -- for instance, creating a pool of objects that can be reused rather than created and destroyed. Algorithms to manage such pools are well understood and often simple. Windows CE supports the allocation of private heaps. If a defect or unexpected situation exhausts the private heap, it is less of a negative impact on the system than exhaustion of the global heap. The private heap is used through standard Microsoft Win32 heap functions, and allocations will return an error condition when the private heap is exhausted.

Windows CE offers sophisticated memory management mechanisms with features that specifically address the embedded environment.

Memory Initialization

There are APIs that explicitly set memory to a predictable value. ZeroMemory is a particularly useful wrapper function. Most programmers are astute enough to avoid the typical pitfalls in this regard, but embedded systems can offer additional challenges.

One defect seen and fixed in an API call was reliance on the value in a memory location to indicate whether the device had been restarted. Intermittent failures were seen in tests that relied on this functionality. After extensive investigation, it was discovered that the tests were turning off the device for a short period of time -- too short for the memory cells to discharge; this time period was in the tens of seconds. The valid value was still present in that location, although not all memory cells were still in a valid state. The defect was fixed by explicitly zeroing the location as part of device shutdown.

More simply, Windows CE-based systems often remain operational for extended periods of time, although apparently turned off. It is common to suspend the system rather than actually stop it. Assumptions regarding on and off may result in code that is not reinitialized when and as expected, especially if desktop applications are ported to the embedded device.

Buffer Allocation

Related to the issue of memory management are the issues of buffer allocation and use. The most common problem is incorrect sizing; although a buffer that is too large wastes precious memory, a buffer that is too small can lead to system failures and security issues. If the size of expected input is not fixed, an upper bound should be determined and enforced through an appropriate mechanism. There are classic data structures, for example, ring buffers, which are not susceptible to overrun failure but can still lose data.

Memory and I/O Space Allocation Conflicts

Windows CE includes infrastructure to manage memory and I/O space resources for device drivers (the I/O Resource Manager). This mechanism can be easily extended to include other resources. Nonetheless, there are drivers written that hard code values and are at least potentially a source of resource conflicts.

It is obvious that hard-coded values should be avoided. However, it is also conceivable that some hardware may place physical constraints on how a resource must be allocated -- for instance, if an I/O address is set by hardware jumpers or is not configurable at all. In this circumstance, the device driver must be written to identify the resource, if it is not known when the driver is written, request the resource from I/O Resource Manager and, if that resource is not available, provide a clear error return. Alternatively, there is a mechanism in the build process to reserve resources by not making them available to the I/O Resource Manager. If the hardware in question is built in, this is also a viable option to avoid conflicts, but it must be carefully documented to ensure that potential software revisions will not inadvertently neglect this reservation.

A driver must never assume that any particular resource is available. Availability should be determined either explicitly (through run-time query) or implicitly (through build-time reservation).

Hammering with the Wrench -- Misuse of Library/API Call Returns

Library and API calls typically provide some feedback if an error occurs: C-style return values, "out" parameters in the argument list, exceptions and, in some cases, explicit calls to an error-handling module. This feedback does not help you if you do not use it. A classic coding error is the failure to evaluate the return value from a memory allocation call -- the programmer effectively makes the assumption that a malloc or new call will always succeed. Although good testing will always find such an egregious bug, it is far more efficient to never code it in the first place. With most such library/API calls, it is relatively simple to define a C macro that will encapsulate the necessary test. Although the ASSERT macro is very useful during development, it is not very helpful or reassuring to the end user. User-friendly error handling is covered later in this paper.

The following list is not exhaustive, but should serve as a helpful starting point for code review of device drivers. In general, return values and "out" parameters are there because the developer of the library or API thought the user would need to know what was going on "under the hood", so you should check them:
  • Return value from malloc or new
  • Bytes read in a file/device call
  • Bytes written in a file/device call
  • Handle value for any function that returns handle, including:
    • Files
    • File-mapping objects
    • Communications devices and other stream interfaces
    • Databases and database enumeration contexts

    Note: Common practice is to check for NULL as a return value. However, this is not the only return value that demonstrates an invalid handle; for instance, CreateFile returns INVALID_HANDLE_VALUE.

  • Events
  • Mutexes
    • Processes
    • Sockets
    • Threads
  • CloseHandle -- see discussion below
  • API is given an invalid handle, which may have been valid at one time, but was invalidated by activity, for example on another thread
If there is any question of a handle's validity, it is easy to add the following code:


Listing

if(! hMyHandle)
{
RespondToError(); // may or may not terminate program flow
}
else
{
DoSomethingWith(hMyHandle,…);
}


For safety, a handle's validity should be questioned when it is created (to do so, test the return value from the creation function call), and upon subsequent use, unless it is created and used completely within a single-threaded process. Even in that circumstance, the overhead is insignificant compared to the advantages of developing this as a programming habit, or employing a macro to do this consistently.

Note that closing a handle does not necessarily delete its associated object; most objects are reference-counted and are deleted only when the count reaches zero. Threads will not be destroyed by CloseHandle in any circumstance. There are other legitimate reasons for a call to CloseHandle to return a BOOL false value, including permission issues. By assuming that CloseHandle always succeeds, that is, not checking its return value, a potential resource leak is created.

In Windows CE, many calls return a generic failure value, and additional information is available through GetLastError -- However, as the operating system evolves, the individual error values sometimes change; for instance, a hypothetical FILE_NOT_FOUND error might be replaced with a hypothetical PARTITION_NOT_FOUND error, to reflect that a file system now supports multiple partitions. Unless the value is specifically documented for the API, as distinguished from more generic error messages such as those returned by GetLastError, this value should be considered informational. Given the example, the message is "what you're looking for isn't there" and appropriate action should be taken on that more general condition.

Are You Thread-Safe?

Thread conflicts can break drivers and other components. In many cases, a driver needs to return information in response to a read-type request; this request is by necessity on another thread of execution. Careless design can lead to the scenario of data changing while it is being accessed -- picture a buffer containing an Ethernet frame. In addition, if an application attempts to control a device based on data persisted by its driver and the data changes while being read or written, (for instance, if an argument is both an in-parameter and an out-parameter), the resulting state is difficult to predict.

Programmers sometimes avoid synchronization objects because they perceive them as an impediment to performance. However, judicious use of synchronization objects, such as critical sections and mutexes, will avoid many of the described problems. Further, by allowing safe use of shared buffers and parameters, developers can avoid memory copies into multiple buffers or other mechanisms to avoid concurrent access to data by both the driver and a client. This can improve performance.

Thread Priority

Typically, drivers should allow their threads' priority to be set through the registry. This allows the consumer of your driver to tune the performance of a given configuration; your driver may run with higher or lower priority than the default you provide. It is important to determine what deadlocks may result from this situation. Situations can include starvation of critical system threads or other non-system threads, starvation of the driver's own thread(s) or classic deadlocks -- driver thread waiting to consume input from or provide output to a blocked, lower-priority thread. Windows CE supports priority inversion to break such a deadlock. However, it is best to design around the scenario. It is impossible to test all possible configurations, but it is possible to examine and evaluate the relationship between standard system threads and your driver. Test strategy should include:
  • Through the registry value you have established, configure your driver to run at a very high priority in the real-time space above the OS, and observe behavior in functional test and stress scenarios.
  • Perform the same testing with the driver configured to run at the lowest possible (idle) priority.
  • If you have established any real-time or time-critical threads, perform the same bracketing around these threads.
Any dependencies should be documented. This information will be used by system integrators and support engineers who must troubleshoot field failures.

Determinacy and Events

Events and messages are not guaranteed to be delivered in any particular order. If the sequence of events or messages, either from or to your driver, is of significance, consider using message queuing. One caveat is that OS message queues are persistent unless they are explicitly flushed or destroyed.
Note: Distinct from Windows message queues, OS message queues are unique to Windows CE. See the Windows CE documentation for an in-depth description of OS message queues.

If messages to a removable device are not read prior to the device's removal, they may persist when the device is reinserted, depending on driver design. This is also noted in the discussion of resource leaks. A message queue should be explicitly flushed and in most cases, destroyed when the associated device is removed.

Power Management

Windows CE provides a service and infrastructure for power management of devices. Nonetheless, many drivers try to be clever within themselves, and thereby create conflicts that are difficult to diagnose. Windows CE documentation describes a set of power states that is analogous to those outlined in the ACPI specification. Absent clearly understood reasons to depart from that design, its adoption is beneficial.

Windows CE Power Manager provides more flexibility than ACPI, to better address the requirements of embedded systems. As a result, there is not a direct parallel between ACPI and Windows CE power management.

The driver should advertise its power capabilities to the Power Manager and register to receive power notifications. It is important that the device is then managed to comply with its stated capabilities. Failure to do so may lead to unnecessarily high power consumption and shortened battery life in portable devices, or deadlock scenarios like when a device "sleeps" but cannot be "roused."

Best Practices in Driver Development

In addition to consideration of the specific defect types discussed above, many of these pitfalls can be avoided through good development process. The following is a discussion of processes employed by Microsoft in the development of drivers shipped with Windows CE.

Two-Layer Architecture

Windows CE drivers are two-layer architectures that isolate hardware-specific requirements from interface requirements. The functionality of the driver is exposed to the operating system and applications through the Model Device Driver (MDD), which exposes a Device Driver Interface (DDI) suited to the type of device. One obvious advantage is that this exposes a small set of consistent semantics to application developers. Another advantage is that it constrains the underlying implementation to serve the metaphor of the device, for example, a file and a touch screen are conceptually quite different.

The Platform Dependent Driver (PDD) layer is written directly to the hardware and exposes a Device Driver Service Provider Interface (DDSI) that is prescribed by the MDD. In other words, the MDD expects to be able to invoke all the functions defined in the DDSI. Although the MDD and PDD are not necessarily separate code modules (DLLs), it is important to realize that there can be many PDDs written to the same MDD. The MDD code is not shared in the same way a DLL is shared, but rather statically linked at build time.

For some applications, particularly real-time drivers, a monolithic design may be a better choice because it avoids the function call overhead between the MDD and PDD. In most cases, however, the two-layer design will not impair performance noticeably and will improve implementation and maintainability.

Carefully Match Resource Creation and Destruction

There are many ways to write code that clearly demonstrates how resources are created and destroyed -- it is strongly suggested that you find one that works for you and use it. For instance, one structure is to establish an internal function that creates all required resources, and another that destroys all the same resources. The first is called on initialization or device insertion, and the second is called if the driver is unloaded or a device is removed. This makes it easy to match the effects of the functions, especially if the code is modified later. It also provides a simple mechanism for resource cleanup in exception handling. This sort of simple structural practice helps avoid simple mistakes.

Ensure that Hardware Interrupts are Always Handled

The structure of a driver should ensure that hardware interrupts are not ignored or lost, regardless of whether they are appropriately handled. The reason is simple: most architectures mask an interrupt when it is asserted to avoid conflict (reassertion) while interrupt handling is underway. Most architectures also mask all interrupts of lower priority established in hardware, and not to be confused with thread priority or simply all interrupts. This means that if hardware interrupts are not addressed, it is possible that not only the associated device but also the majority of the rest of the system's devices may be blocked.

In the ISR-IST structure used by Windows CE, the hardware interrupt is re-enabled in the IST. Note that it is good practice to call InterruptDone early in the IST loop -- even before performing any processing -- because this hardware interrupt (and perhaps others) are masked until then.
Note: Windows CE supports nested interrupts. For more information, see the documentation on this subject.

In any event, exception handling in the IST must ensure that InterruptDone is called, even if the actual handling of the interrupt (that is, the real work of the interrupt service code) is not accomplished.

Use of Debug-Time Assertions

Properly used, assertions can greatly aid debugging of a driver in development. Their correct use requires a keen eye to potential problems. For instance, there are many circumstances in which code is essentially unreachable, because of the logic of a function. An example of this is a switch-case statement that directs program flow based on a finite set of possible states. Good coding practice dictates the use of a default clause even if there is no possibility of an unexpected value. This is also a good place for an assertion. Assertions should not be employed as indicators of expected error conditions. Those conditions should be addressed in code as part of the input domain. The assertion should provide the message, "Something really unexpected happened here." For an excellent discussion of the use of assertions and many other good coding practices, see "Writing Solid Code", written by Steve Maguire and published by Microsoft Press.

Windows CE includes the cross-platform DebugBreak function. For instance, DebugBreak generates an int 3 instruction on x86 platforms. This will force the caller to unconditionally break into the debugger at the point of the statement.
Note: If a debugger cannot be found, the module will terminate with an unhandled breakpoint exception.

DebugBreak is expressed in both debug and retail builds. It should be removed from code before shipping. Perhaps a better practice is to use DEBUGCHK -- The DEBUGCHK macro is similar to the assert function in standard C and it calls DebugBreak if the condition is false. This macro is not expressed in retail builds.

Use of Debug Messages and Message Zones

Windows CE includes the DEBUGMSG macro for generating messages that appear in the debugger's message window; this macro is not expressed in retail builds. These debug messages are trivially easy to use, because the entire infrastructure is included in Platform Builder.

Generous use of debug messages can facilitate debugging during development, but it can also generate an overabundance of message text to be read and understood. To help avoid this overwhelming situation, Windows CE debug messages support the definition of debug zones. Messages are associated with zones, which are simple bit flags. Zones can be selectively enabled or disabled at run time, through the Platform Builder IDE.

Test Hooks

Another method of generating additional information or placing the driver into a distinct state that enhances debugging, is through the coding of test hooks. These are logical inputs to the driver that are not documented for use in the regular operation of the driver. One advantage of this approach is that there is no difference between the tested code and the shipping code -- the hooks are left in, but are not accessed in normal operation. Of course, the disadvantage is increased code size—this approach must be used in moderation. There are several ways to expose such hooks, including:
  • Export entry points that are not documented for use in the normal operation of the driver. The potential liability of this approach is that those entry points are visible (dumpbin will show them). If they are used for unintended purposes, they can generate instability that will be unfairly attributed to the driver.

  • Establish an IoControl (IOCTL) value to provide access to special testing functionality. This is conceptually compatible with the intent of IOCTL syntax, and is commonly used by Microsoft to access test hooks. If they are used for unintended purposes, these access points can generate instability.

  • Establish an unpopulated registry key (the key does not exist in the image by default) that the driver can access. This can then be set in the test environment. The initialization code then checks for the registry key and takes appropriate action if it is present. Note that placing this check in looping code, for instance, the IST, will almost certainly have a negative impact on performance.

  • Establish an environment value that the driver can access; this can then be set in the test environment. A liability is that this requires the driver to access an object outside its own scope, which may introduce stability and/or security problems. It also places additional constraints on the external environment, which may be difficult to guarantee.

  • Reserve an argument to an exposed entry point. The documentation may state that this value should always be NULL, but internal documentation provides one or more values that establish debugging states. A possible liability is that if an application programmer passes in an undefined value (accidentally or purposefully), the driver may be placed in a state inappropriate for normal functionality. This is not recommended, for security reasons.

  • Establish a magic value for a normal parameter. This should be a value that is unlikely to occur in normal use. There is a possibility that an application programmer will pass in the magic value by accident, but if it is carefully defined, you can reduce that likelihood; this is also not recommended, for security reasons.
Meaningful Error Messages

Little is more frustrating to the end user than an error indication that displays a hexadecimal number and a meaningless phrase such as "the memory could not be read." Any error message that is likely to be exposed to end users should provide sufficient information to allow the user to correct the situation. In extreme situations, an error message could say (in effect) "Sorry, things are really broken, buy a new device." But that is still more useful to an end user than "Method CFoo::Bar has generated an unhandled exception." Although it is bad form for a device driver to produce the UI (and of course in a Windows CE embedded device, one cannot assume there is a display to show the UI) the device driver developer can make life much easier for the driver's consumer by providing meaningful, distinct and well-documented error returns. The return value should sufficiently distinguish the error condition and if possible, its cause. There is little reason to be obscure, unless your goal is to not have happy customers and rising revenues.

Effective Testing of Device Drivers

There are some general principles that can make device driver testing more comprehensive and powerful; these are taken directly from strategies employed by Microsoft in testing of Windows CE drivers. These practices are intended as supplemental guidance for a structured quality assurance program. It is important to note that there is no "silver bullet," no one magic process or test -- each of these techniques contributes to better knowledge of the driver's behavior and quality assurance efforts.

Code Inspection

Much can be learned through direct inspection of source code. There are also numerous tools that can supplement the process; descendants of the legendary lint, diagnostic compilers generate reports of issues such as loss of precision due to type conversion across assignments or casts, unmatched memory allocation and reclamation calls, and security issues such as potential buffer overruns.

Structured Testing

Microsoft employs structured testing methods for all software, including drivers. It is assumed that the reader is familiar with general principles of testing; the following points are offered as specific matters to consider when testing a device driver:
  • Initialization and shutdown, and their relationship with system state, that is, dependencies in either direction.
  • Meaningful domain of inputs and range of outputs and the relationship between them.
  • Interaction with other drivers. For instance, there are several layers of drivers between an Ethernet port and a streaming TCP socket.
  • Error handling for erroneous inputs.
  • Error handling for resource starvation.
  • Error handling for hardware malfunction.
  • Stress behavior for relevant operational parameters.
Test specifications should identify these issues for all features of a driver, and the subsequent test plan should clearly identify how they will be evaluated. Through clear documentation of the test strategy, even small engineering groups, perhaps lacking dedicated QA personnel, can execute comprehensive and objective testing. Even the most diffident reading of standard QA texts will provide benefit in this regard.

Fault Injection

Sometimes it is very difficult to generate a particular error condition. If the exception handling mechanism for that condition is not trivial, it is important to exercise it. One method is by injecting a simulated fault, which can be accomplished in the Platform Builder debugger. A test hook may also be beneficial. Although the intrusion may invalidate timing considerations, there is still value in running through the actual logic of the error handler. A well-placed test hook may be helpful.

Uses of Automation

Test automation is nearly always worth the investment of time and effort. Regression testing becomes simple and painless, and stress and uptime testing is often a simple extension of functionality tests.

Windows CE includes an automation tool and suite called the Windows CE Test Kit (CETK). The tests provided are the same test modules used in Microsoft's own labs -- and they are shipped with full source code. Many can be used as-is for development and test of drivers for new devices of existing classes. By comparing the behavior of a Microsoft driver under the same testing regime, a developer can gain valuable insight into challenges with new code. The CETK harness supports extensive logging of test data, and tests can be run with debugging enabled.

Code Coverage Analysis

The use of code coverage through diagnostic compilers can be very helpful in determining the sufficiency of testing. It is particularly effective in conjunction with automated testing. It is important to use code coverage metrics properly. The percentage of coverage is indicative of the quality of the testing, not of the quality of the product. Code coverage can identify which conditional branches are not taken, where program control flows for particular operations (which may be unexpected and generate apparently non-deterministic results) and how exceptional conditions are handled.

Integration Testing

A device driver should always be tested in the shipping configuration. A full test pass (execution of all defined tests) should be done on a retail build of the binary to be shipped, using the installation mechanism to be used by customers. Microsoft generates a golden master of the installation media and then tests images made from that master; those tests are run on retail-build configurations. If the product passes the tests, the original golden master is provided to the production facility for creation of shipping product; there are analogous processes for Web-based releases. If defects are seen at this point, it is necessary to reset the process, which may include returning to debug builds to find the cause of the defect. After finding and fixing the defect, full integration testing as described should be rerun. Although this is a time-consuming process, it is less work than issuing patches or recalls for defective product in the field and explaining to your customers why they are needed.

Conclusion

The reliability of device drivers is critical to overall system reliability. Efforts to improve driver reliability will prove to be an effective expenditure of development resources. The reliability issues presented by drivers are mostly well understood, and they can be successfully mitigated through rigorous and thoughtful design and testing processes.



Copyright © 2003-2004 Microsoft Corp. All rights reserved. This article was initially published on Microsoft's MSDN website. Reproduced by WindowsForDevices.com with permission.




Discuss Reliability in Windows CE Device Drivers
 
>>> Be the FIRST to comment on this article!
 
 
 
>>> More Windows For Devices Articles Articles          >>> More By Staff
 



Windows XP for Embedded Applications
This white paper describes the benefits of using Windows XP when developing embedded applications.

A Manager's Guide to Selecting a Mobile Device Operating System
This white paper offers a comparative review of Microsoft Windows CE and Windows Mobile.

Visual Basic 6.0 to .NET Migration
This paper focuses on the methodology and techniques which Infosys (Microsoft Technology Center) has developed for migrating VB 6.0 Applications to .NET. Our approach ensures a smooth, cost effective, and efficient migration.

Mobile Device Security: Securing the Handheld, Securing the Enterprise
This whitepaper identifies security threats to corporate data on mobile devices and details how mobile devices can become a "backdoor" to the enterprise.

Mobile Device Security: The Eight Areas of Risk
It's common knowledge that adding mobile devices to your network increases security risks. There are multiple facets to mobile security, all of which should be paid close attention to. This E-Guide presents a more in depth look into the eight key areas of securing wireless devices.

Quality Assurance and .NET
This paper discusses best practices for functional, regression and load testing of .NET applications.

SCADA Security in Integrated Networks
As businesses leverage their SCADA systems by integrating them into the business networks, they must also assure the security of the SCADA system.

The Advantages of Small Form Factor HMI
HMIs have mutated and changed with new requirements, and they have become more flexible and capable. And while they've been doing that, they've become smaller and more useful.

9 Critical Requirements for Web Application Security
Learn why your Web applications expose dangerous security breaches and what’s required to effectively protect your Web applications and the sensitive information behind them.

Got a HOT tip?   please tell us!
Free weekly newsletter
Enter your email...
Click here for a profile of each sponsor:
PLATINUM SPONSORS
(Become a sponsor)

ADVERTISEMENT
(Advertise here)

Updated! The latest Windows-powered...

mobile phones!

other cool
gadgets

HOT TOPICS
Microsoft targets PNDs with new embedded OS
Microsoft tips .NET MF 3.0 highlights
Microsoft previews Windows Embedded Standard
Microsoft offers free Windows CE 6.0 textbook
Microsoft renames embedded operating systems
Microsoft unveils Windows Mobile 6.1
New Atom models target low-cost PCs
REFERENCE GUIDES
Windows Device Showcase
Intro to Windows Embedded
Intro to Shared Source
Real-time Windows Embedded
Windows Embedded books
Join our Windows Embedded discussion forums:
Windows XP Embedded
Windows CE
Windows Mobile


Windows Embedded developer newsgroups
Windows CE
XP Embedded
PocketPC
Smartphone

Microsoft's Windows Embedded resources
Embedded dev center
Mobile dev center
Windows CE tutorials
XP Embedded tutorials
Windows Embedded seminars
Windows Embedded application categories
3rd-party partners


BREAKING NEWS

• Pico-ITX PC takes to the road and the skies
• Thin client offers legacy ports
• Boards add watchdog functionality to PC/104-Plus systems
• 11.6-inch netbook has AMD processor
• Microsoft planning riposte to Google's "Chrome OS"?
• Embedded student competition winner is buggy (on purpose)
• Asus preps convertible netbooks
• Media-savvy reference design sports touchscreen, DVB-H
• Sony joins the netbook fray
• 2010 Census kicks off with Windows Mobile
• Sprint offers 99-cent netbook
• SODIMM module has industrial focus
• Microsoft picks finalists in Embedded Development competition
• Cortex-A8 SBCs target signage and kiosks
• Student competition offers a different kind of fireworks


MOST POPULAR (last 90 days)
• "Netbook" uses Intel's Atom N270
• Windows CE takes on Linux in low-end netbooks
• HTC ups Touch resolution
• Microsoft unleashes new embedded OS
• Windows Mobile phone gets 800 x 480 display
• HTC spins WiMAX phone?
• Smart camera sports Atom
• Dual-core AMD netbook gets rave review
• Windows Mobile 7 "delayed"
• GPS phone uses new Marvell "Tavor" chip
MOST POPULAR (Classics from the vault)
Windows XP Embedded USB boot
Troubleshooting Windows XPe's blue screen "Stop 0x0000007B" error
Asus reveals $190 mini notebook
Windows Mobile 6 SDKs available for download
Windows Mobile VPN client plays with Cisco
HTC adds GPS to Windows Mobile Touch line
Microsoft unveils Windows Mobile 6.1
Guide to HTC's Windows Mobile smartphone platforms
• HTC releases Touch Diamond ROM upgrade
Customizing Windows XP Embedded thin clients

Also visit our sister sites:

Sign up for WindowsForDevices.com's...


Or, follow us on Twitter...