Using Parallel Programming in Windows 7 : Writing an application that relies on parallel processing

9/22/2011 5:21:32 PM

1. CONSIDERING THE ADVANTAGES OF PARALLEL PROCESSING

The one undeniable advantage of parallel processing is speed. If you use multiple processors to solve a problem, the problem will be solved more quickly. However, it's important to understand that using two processors won't solve a problem twice as fast. Parallel processing incurs some level of overhead, the resources used to manage the multiple processors. Depending on the efficiency of the management software, processing can approach being twice as fast, but never quite attain it. Another misconception is that the application will approach being twice as fast overall. Only the compute-intensive portion of the application will approach twice as fast. The application will still function at the same speed for network and disk operations; it will also continue to wait on the user to provide required input. So overall you'll see a speed benefit using parallel processing, but you won't see anything approaching twice as fast when working with two processors.

Some applications today lack needed functionality because adding the functionality would make the application run too slowly. For example, you might want to include specialized graphics-smoothing routines in your application. However, after adding these routines, you can visually see the application add the smoothing to the output. In fact, you can go to lunch and come back in the time it takes for the application to finish rendering the display. Obviously, users will never tolerate anything that slow. At one time, developers would solve issues of this sort by using a processor with a higher clock speed, but heat and other issues have made this solution obsolete. Using parallel processing techniques can solve these sorts of issues at a reasonable price and by using technology available today.

A less-understood advantage of parallel processing is that it lends a certain amount of security to your application. Most naughty applications designed to peek at your application do so based on a single processor. The assumption is that the Trojan application will find what it needs based on the application's using a single processor because that's how applications generally work. When an application uses multiple processors, it has an advantage because it's considerably harder to peek at it. The application is using multiple processors, each of which is using different areas of memory. A Trojan writer has to work much harder to gain any valuable information about your application, and the random nature of using multiple processors means that a sneak-peek trick that works today probably won't work tomorrow.

2. UNDERSTANDING THE REQUIREMENTS FOR PARALLEL PROCESSING

As with any advanced programming technique, parallel processing has special requirements. You can't simply write some code and expect it to work. Adding multiple processors necessarily complicates the development scenario, which means that you must understand how to accommodate multiple processors as part of the application development plan. The following sections describe the requirements you should consider before you begin using multi-processing techniques in your application.

2.1. Evaluating the Task Length

The time it takes to perform a task is important when evaluating the suitability of an application for parallel processing. Short tasks don't typically prove worthwhile because the overhead of managing the parallelism outweighs the benefits of using multiple processors. In some cases, the overhead can actually overcome the benefits and make the resulting application work slower.

Of course, there's a difference between long tasks that can be done efficiently and tasks that are so long they become unwieldy. The common wisdom is to break long tasks into smaller pieces when possible in order to make the tasks more granular and produce a better result with multi-threading. This principle still applies when creating an application that relies on parallel processing. In fact, you want the tasks evenly sized if possible, so that each task completes at the same time and you can maximize processor throughput, but the reality is that achieving a strict balance is nearly impossible. Some threads will undoubtedly end up waiting for other threads to complete.

2.2. Evaluating the Task Type

Don't get the idea that parallel processing will magically fix your disk-bound database application. Parallel processing typically works best on compute-intensive applications. Of course, database applications do have compute-intensive sections where parallel processing will work fine, but the overall application may not be that much faster if the problem is actually the need to update the drive system on the host machine. When you target an application to use parallel processing, make sure you understand the types of tasks that the application performs and target those parts of the application that can benefit most.

It's important to consider the individual tasks carefully. For example, by using queries in parallel you can obtain the data needed for the application faster. However, you can also improve application execution speed by accessing only the data you need. Wasted resources are a major problem in most applications today. Combining parallel processing with reduced queries can garner the truly impressive results that most developers want, but you must think the process through carefully.

NOTE

Some developers think that parallel processing will perform miracles with poorly written applications. The reality is that well-written, tightly implemented code will always work better than sloppy code that wastes resources. Nothing can replace well-written code. Before you convert an existing application to realize the benefits of parallel processing, make sure you've squeezed all the wasted processing cycles out of it and that the application uses resources wisely. Otherwise, the parallel processing will simply add another potential source of frustration when you finally do work through the original performance problems and correct them.

2.3. Considering Debugging

Parallel applications can be difficult to troubleshoot. After all, the code is executing on multiple processors and your debugger doesn't really track that sort of execution well. What you really get is a type of thread-based debugging as described in an article at http://msdn.microsoft.com/magazine/ee410778.aspx . The theory of such debugging sounds great, but the reality is quite different. A parallel application can introduce errors that are non-repetitive. The environment is no longer a constant because you now have multiple processors in play. Consider the issues you encounter when debugging a multi-threaded application and square them because you now have multi-threading and multi-processing at the same time. Even so, Visual Studio 2010 does provide some tools in the form of thread-based debugging to help you with your parallel-processing needs.

2.4. Obtaining Required Resources

Some parallel-processing applications fail despite careful implementation and thorough analysis of the problem domain. Even if the developer squeezes out every last bit of resource-wasting processing, the application can still fail to perform as expected when the application becomes starved for resources. If your system is currently working hard to obtain access to memory for a single processed version of your application, it's going to fail when you turn to parallel processing. For example, if the application currently requires 1 GB of RAM to run effectively, it will require 2+ GB of RAM to run effectively when you use two processors. Each processor will require 1 GB of RAM and then you must also add RAM for the overhead generated by the parallel-processing requirements. In short, it's absolutely essential to profile your application in advance and determine the resources it requires before you move to parallel processing.

The problem is that the application won't necessarily show that it's resource-starved. The operating system will rely on virtual memory when it runs out of the physical equivalent. In some cases, the only clue you'll have is that the hard drive starts staying on all the time as the system thrashes. The system will constantly transfer data between RAM and the hard drive as it tries to comply with the requirements of parallel processing. In the end, your application will actually run slower if you don't have the resources required to implement parallel processing effectively.

2.5. Team Skills

Parallel processing is significantly harder to understand and implement than any other new technology. Other transitions aren't nearly as difficult. For example, moving to the 64-bit environment can be difficult, but only because the 64-bit environment requires some interesting code changes due to the change in handle sizes and so forth. The transition is manageable, though, if you rely on checklists to ensure that all the required changes take place. When you work in a parallel-processing environment, it's important to consider the change in viewpoint that the environment requires. The application is no longer working on a single processor — multiple processors are now truly doing things simultaneously. The timing issues that you experience when working with threads are now multiplied by the number of processors that you use because things truly do happen at the same time.

Most developers today are trained in procedural coding techniques. A few developers have used declarative languages, and an even smaller percentage understand how these languages work, but for the most part, most developers see applications as a procedural process. In order to work with parallel processing effectively, the development team as a whole must move beyond relying on procedures to a perspective where nothing is assumed about when or where the code will execute. You literally don't know — you know only that it will execute at some point, assuming the application doesn't crash. Such a viewpoint requires a team with special skills.

3. WRITING AN APPLICATION THAT RELIES ON PARALLEL PROCESSING

It would be difficult to include examples in a single article of every sort of parallelism that Windows 7 and the .NET Framework 4 support. In fact, it could be difficult to cover the topic extensively in a single book because the topic is relatively complex. The example in this section demonstrates just one technology, the Parallel class, which provides support for multiple processors. This particular example appears in the article because the Parallel class is straightforward, it's relatively easy to implement, and it provides a good starting point for anyone who wants to begin working with multiple processors. In addition, the example works just fine on systems that have only one processor. The following sections describe the Parallel Process example in more detail.

3.1. Understanding the Parallel Class

Microsoft recognizes the need to provide simple methods of adding parallelism to applications. Of course, parallelism is a type of multi-threading in that you create multiple threads that execute on separate processors. However, parallelism is more than simply creating a multi-threaded application. The threads must be able to execute in an independent manner. The Parallel class is part of the effort to create an environment in which applications can execute using more than one processor without adding undue complexity to the application itself. The concept is simple, but the implementation can be difficult. In this case, the application executes tasks within a special for loop. Each task can execute using a different processor.

The Parallel class is part of a much bigger experiment in parallelism, the Task Parallel Library (TPL) that's part of the .NET Framework 4. The components of the TPL appear as part of the System.Threading (http://msdn.microsoft.com/library/system.threading.aspx) and System.Threading.Tasks (http://msdn.microsoft.com/library/system.threading.tasks.aspx) namespaces. The Parallel class is just one technology in these classes, which also include the following concepts.

Data parallelism: When an application must work on multiple bits of independent data, as in database records, it's usually faster to work on each bit in parallel. Instead of updating each record individually, the database application can update multiple records simultaneously. Of course, the key word is "independent." You can't update dependent data in parallel without terrible consequences. Read more about data parallelism at http://msdn.microsoft.com/library/dd537608.aspx.
Task parallelism: Applications must often perform multiple independent tasks. In some cases, the tasks are similar, but different in a small way. For example, a scientific application can perform the same check using multiple instruments, or a security application can check the status of multiple intrusion sensors. As with data parallelism, the key word is "independent." The tasks must be independent of each other to succeed in a parallel-processing environment. You can read more about task parallelism at http://msdn.microsoft.com/library/dd537609.aspx.
Parallelism using asynchronous patterns: The common element of both data and task parallelism is the concept of asynchronous processing. It's possible to create a pattern that describes multiple independent elements of some sort. The TPL supports asynchronous patterns in various ways. You can read about these types of processing at http://msdn.microsoft.com/library/dd997405.aspx.
PLINQ: Most types of parallelism rely on the concept of doing something. An application processes multiple bits of independent data or checks multiple independent sensors. It's also possible to use parallelism when asking something. The sidebar, "Using the PLINQ Alternative," describes how to use PLINQ to perform multiple query tasks at once.

When working with the Parallel class, you have access to a number of For() and ForEach() loop structures that are implemented as methods (note the difference in capitalization from the standard C# for and foreach loops). In addition, the Parallel class supports an Invoke() method that accepts an array of actions to perform. All these methods can be executed in parallel if the Parallel class detects an opportunity to do so, and hardware resources are available to complete the action.

3.2. Configuring the Parallel Process Example

The example begins with a Windows Forms application. You need to add a Test (btnTest) button and a list box (lstColors). The list box will contain a list of items to process. The example uses colors, but you can use any set of strings desired. Add as many strings as you want, but you'll want to keep the number of unique items low to ensure you can see them in the dialog box that appears after the data is processed. Figure 1 shows a typical setup for this example.

Figure 1. The example requires a list box that contains items to process.

You don't need to add any special references for this example. However, you do need to add two special using statements as shown here:

using System.Threading.Tasks;
using System.Text;

3.3. Writing the Parallel-Process Example Code

The example code focuses on performing a task on multiple processors, rather than doing something elegant that you'd normally perform in a production application. In this case, the example processes a list of colors. It counts each color string and adds a new entry for each unique string. When the code completes, it outputs a dialog box with the results. Listing 1 contains the code needed for this example.

Example 1. Processing data items using multiple processors

private void btnTest_Click(object sender, EventArgs e)
{
    // Initialize the Colors array that is used to
    // hold the number of times each color appears.
    Dictionary<String, Int32> Colors = new Dictionary<String, Int32>();

    // Copy the list box object collection to an array for
    // processing.
    String[] ColorList = new String[lstColors.Items.Count];
    lstColors.Items.CopyTo(ColorList, 0);

    // Process each of the entries in the color list.
    Parallel.ForEach(ColorList, ThisItem =>

        // Create the lambda expression.
        {
            // Check the current color against those already
            // in the list.
            if (Colors.ContainsKey(ThisItem))

                // Update the color count if the color is
                // in the list.

Colors[ThisItem]++;
            else

                // Otherwise, add the color.
                Colors.Add(ThisItem, 1);
        }
    );

    // Create an output variable.
    StringBuilder Result = new StringBuilder();

    // Process the result.
    foreach (KeyValuePair<String, Int32> Item in Colors)
        Result.Append("Color: " + Item.Key + " appears "
            + Item.Value + " times.\n");

    // Display the result on-screen.
    MessageBox.Show(Result.ToString());
}

The code begins by creating a Dictionary object, Colors, that has a key of type String and a value of type Int32. Note that Colors will hold the summary of unique string names in lstColors and the number of times that the strings appear. For example, if red appears six times, the key will be red and the value will be 6.

Processing a ListBox.ObjectCollection can prove tricky, so the example creates a String array, ColorList. It uses the CopyTo() method to copy the list of colors found in lstColors.Items to ColorList for processing.

The next step is the actual parallel code for the example. The code calls Parallel.ForEach(), which is a parallel form of the foreach statement. The first argument is the list of items to process, which is contained within ColorList. The code then uses a lambda expression to process each element within ColorList. Lambda expressions are part of the LINQ technology. Each ColorList element appears within ThisItem.

The action for the lambda expression appears within the curly braces. When the color already appears in Colors, the code simply updates count value. Otherwise, the code uses the Add() method to add a new entry to Colors for the color in question. When the ForEach() method loop is complete, Colors will contain an entry for each unique color value and a count of the number of times this color appears in lstColors.

Figure 2. The example outputs a list of colors and the number of times each color appears.

The final steps of this example are output-related. The code begins by creating a StringBuilder object, Result. It then uses a standard foreach processing loop to add each of the entries in Colors to Result as a String. When Result is complete, the code uses Result.ToString() to display the message box shown in Figure 17-2.

3.4. Debugging the Parallel-Process Example Code

There are some problems debugging the example as it exists right now — problems that have nothing to do with the code. Try setting a break point on the

if (Colors.ContainsKey(ThisItem))

line of the example code. Choose Debug Start Debugging or press F5. You'll find that the example does stop at the right line, but not the first time through the loop in most cases. In some cases, the debugger will stop when Colors has nine items in it; at other times it will stop when Colors has only two items in it. If you try single-stepping through the code, you'll find that it lurches between steps. The odd behavior appears to be a problem with using multiple processors.

It's possible to obtain more consistent behavior from the debugger, but the logic of selecting a break point isn't always clear. Remove the previous break point and add a new one at the

Parallel.ForEach(ColorList, ThisItem =>

line of the example code. Choose Debug Start Debugging or press F5 again. This time, you'll be able to single-step through each of the items as it's added to Colors. At least the debugger seems to work more consistently. The lesson here is that placing a break point inside the lambda expression may not work as expected. Microsoft hasn't documented why, and no reason for this behavior is stated online, apparently. The point is that if one break point doesn't appear to work for you, try setting one a little earlier in the code to see if it will work better. You should be able to find a break point that will let you see your code in action.

You'll also want to know how you can tell that the example is actually using threads to process the information. Choose Debug Windows Parallel Tasks or press Ctrl+Shift+D,K to display the Parallel Tasks window shown in Figure 3 . In this case, the example is running four parallel tasks and has another one scheduled to run. Your window will very likely look different from the one shown, and it will also vary each time you run the application.

Figure 3. Visual Studio makes it possible to see which tasks are running within your application.

It's important to note that the window in Figure 3 shows the parallel tasks, not all the threads running on the system. If you want to see all the threads, then choose Debug => Windows => Threads or press Ctrl+Alt+H instead. Figure 4 shows how the Threads window appears in comparison. Notice that the application uses a number of threads, but not all of them are running in parallel.

Figure 4. Parallel tasks differ from the threads your application uses.

It's also possible to get a pictorial view of the parallel processing using the Parallel Stacks window. Choose Debug => Windows => Parallel Stacks or press Ctrl+Shift+D,S to display the Parallel Stacks window shown in Figure 5 . This pictorial view gives you a better understanding of precisely how your application is working. Hover your mouse over the entries to see the thread numbers and to obtain more information about them.

Figure 5. The pictorial view presented by the Parallel Stacks window tells you a lot about your application.

4. MOVING YOUR APPLICATIONS TO WINDOWS 7

This article has provided a quick overview of parallel-programming techniques you can use in Windows 7 to make your applications run faster. The applications don't simply appear to run faster through threading techniques; they actually are faster because you use multiple processors to perform the work. Each processor works independently and simultaneously. Even though the definitions for multi-threading and multi-processing are well understood, some developers still get confused about the benefits and problems of each technology. If you take one thing away from this article , it should be that multi-processing produces a true increase in application speed by using multiple processors, but that such processing is problematic due to the nature of imperative languages (which require worrying about state).

Before you begin making plans to use parallel-programming techniques for your next application upgrade or new application, you need to have a plan. Parallel programming can be time-consuming to implement, hard to debug, and not very noticeable when implemented poorly. You need to consider what you'll get out of the parallel programming first. Think about how your application works and whether it even lends itself to parallel programming techniques. Once you have goals in place, define how to achieve those goals. It may be that using the Parallel class won't achieve your goals, and you'll actually need to use a functional language like IronPython or F#.

Other -----------------

- Designing a Lite-Touch Deployment (part 3) - Customizing Target Deployments

- Designing a Lite-Touch Deployment (part 2) - Deploying Images to Target Computers

- Designing a Lite-Touch Deployment (part 1) - Understanding Lite-Touch Deployment Requirements

- Designing a Windows 7 Client Deployment Strategy : Choosing a Deployment Method (part 2) - Evaluating the Infrastructure & Scaling the Client Deployment Process

- Designing a Windows 7 Client Deployment Strategy : Choosing a Deployment Method (part 1) - Understanding Deployment Options & Deployment Scenarios

- Understanding the Windows 7 Deployment Process (part 4) - Using Windows System Image Manager

- Understanding the Windows 7 Deployment Process (part 3) - Using Microsoft Deployment Toolkit 2010

- Understanding the Windows 7 Deployment Process (part 2) - Using Windows 7 Automated Installation Kit

- Understanding the Windows 7 Deployment Process (part 1) - Windows 7 Deployment Basics & Using Windows Deployment Services

- Configuring Backups and Recovery : Safeguarding Your Computer and Recovering from Disaster & Using Advanced Boot Options