1. CONSIDERING THE ADVANTAGES OF PARALLEL PROCESSING
The one undeniable advantage of
parallel processing is speed. If you use multiple processors to solve a
problem, the problem will be solved more quickly. However, it's
important to understand that using two processors won't solve a problem
twice as fast. Parallel processing incurs some level of overhead,
the resources used to manage the multiple processors. Depending on the
efficiency of the management software, processing can approach being
twice as fast, but never quite attain it. Another misconception is that
the application will approach being twice as fast overall. Only the
compute-intensive portion of the application will approach twice as
fast. The application will still function at the same speed for network
and disk operations; it will also continue to wait on the user to
provide required input. So overall you'll see a speed benefit using
parallel processing, but you won't see anything approaching twice as
fast when working with two processors.
Some applications today
lack needed functionality because adding the functionality would make
the application run too slowly. For example, you might want to include
specialized graphics-smoothing routines in your application. However,
after adding these routines, you can visually see the application add
the smoothing to the output. In fact, you can go to lunch and come back
in the time it takes for the application to finish rendering the
display. Obviously, users will never tolerate anything that slow. At one
time, developers would solve issues of this sort by using a processor
with a higher clock speed, but heat and other issues have made this
solution obsolete. Using parallel processing techniques can solve these
sorts of issues at a reasonable price and by using technology available
today.
A less-understood
advantage of parallel processing is that it lends a certain amount of
security to your application. Most naughty applications designed to peek
at your application do so based on a single processor. The assumption
is that the Trojan application will find what it needs based on the
application's using a single processor because that's how applications
generally work. When an application uses multiple processors, it has an
advantage because it's considerably harder to peek at it. The
application is using multiple processors, each of which is using
different areas of memory. A Trojan writer has to work much harder to
gain any valuable information about your application, and the random
nature of using multiple processors means that a sneak-peek trick that
works today probably won't work tomorrow.
2. UNDERSTANDING THE REQUIREMENTS FOR PARALLEL PROCESSING
As with any advanced
programming technique, parallel processing has special requirements. You
can't simply write some code and expect it to work. Adding multiple
processors necessarily complicates the development scenario, which means
that you must understand how to accommodate multiple processors as part
of the application development plan. The following sections describe
the requirements you should consider before you begin using
multi-processing techniques in your application.
2.1. Evaluating the Task Length
The time it takes to perform a
task is important when evaluating the suitability of an application for
parallel processing. Short tasks don't typically prove worthwhile
because the overhead of managing the parallelism outweighs the benefits
of using multiple processors. In some cases, the overhead can actually
overcome the benefits and make the resulting application work slower.
Of course, there's a
difference between long tasks that can be done efficiently and tasks
that are so long they become unwieldy. The common wisdom is to break
long tasks into smaller pieces when possible in order to make the tasks
more granular and produce a better result with multi-threading. This
principle still applies when creating an application that relies on
parallel processing. In fact, you want the tasks evenly sized if
possible, so that each task completes at the same time and you can
maximize processor throughput, but the reality is that achieving a
strict balance is nearly impossible. Some threads will undoubtedly end
up waiting for other threads to complete.
2.2. Evaluating the Task Type
Don't get the idea
that parallel processing will magically fix your disk-bound database
application. Parallel processing typically works best on
compute-intensive applications. Of course, database applications do have
compute-intensive sections where parallel processing will work fine,
but the overall application may not be that much faster if the problem
is actually the need to update the drive system on the host machine.
When you target an application to use parallel processing, make sure you
understand the types of tasks that the application performs and target
those parts of the application that can benefit most.
It's important to
consider the individual tasks carefully. For example, by using queries
in parallel you can obtain the data needed for the application faster.
However, you can also improve application execution speed by accessing
only the data you need. Wasted resources are a major problem in most
applications today. Combining parallel processing with reduced queries
can garner the truly impressive results that most developers want, but
you must think the process through carefully.
NOTE
Some developers think
that parallel processing will perform miracles with poorly written
applications. The reality is that well-written, tightly implemented code
will always work better than sloppy code that wastes resources. Nothing
can replace well-written code. Before you convert an existing
application to realize the benefits of parallel processing, make sure
you've squeezed all the wasted processing cycles out of it and that the
application uses resources wisely. Otherwise, the parallel processing
will simply add another potential source of frustration when you finally
do work through the original performance problems and correct them.
2.3. Considering Debugging
Parallel applications can be
difficult to troubleshoot. After all, the code is executing on multiple
processors and your debugger doesn't really track that sort of execution
well. What you really get is a type of thread-based debugging as
described in an article at http://msdn.microsoft.com/magazine/ee410778.aspx.
The theory of such debugging sounds great, but the reality is quite
different. A parallel application can introduce errors that are
non-repetitive. The environment is no longer a constant because you now
have multiple processors in play. Consider the issues you encounter when
debugging a multi-threaded application and square them because you now
have multi-threading and multi-processing at the same time. Even so,
Visual Studio 2010 does provide some tools in the form of thread-based
debugging to help you with your parallel-processing needs.
2.4. Obtaining Required Resources
Some
parallel-processing applications fail despite careful implementation and
thorough analysis of the problem domain. Even if the developer squeezes
out every last bit of resource-wasting processing, the application can
still fail to perform as expected when the application becomes starved
for resources. If your system is currently working hard to obtain access
to memory for a single processed version of your application, it's
going to fail when you turn to parallel processing. For example, if the
application currently requires 1 GB of RAM to run effectively, it will
require 2+ GB of RAM to run effectively when you use two processors.
Each processor will require 1 GB of RAM and then you must also add RAM
for the overhead generated by the parallel-processing requirements. In
short, it's absolutely essential to profile your application in advance
and determine the resources it requires before you move to parallel
processing.
The problem is that the
application won't necessarily show that it's resource-starved. The
operating system will rely on virtual memory when it runs out of the
physical equivalent. In some cases, the only clue you'll have is that
the hard drive starts staying on all the time as the system thrashes.
The system will constantly transfer data between RAM and the hard drive
as it tries to comply with the requirements of parallel processing. In
the end, your application will actually run slower if you don't have the
resources required to implement parallel processing effectively.
2.5. Team Skills
Parallel processing
is significantly harder to understand and implement than any other new
technology. Other transitions aren't nearly as difficult. For example,
moving to the 64-bit environment can be difficult, but only because the
64-bit environment requires some interesting code changes due to the
change in handle sizes and so forth. The transition is manageable,
though, if you rely on checklists to ensure that all the required
changes take place. When you work in a parallel-processing environment,
it's important to consider the change in viewpoint that the environment
requires. The application is no longer working on a single processor —
multiple processors are now truly doing things simultaneously. The
timing issues that you experience when working with threads are now
multiplied by the number of processors that you use because things truly
do happen at the same time.
Most developers today are
trained in procedural coding techniques. A few developers have used
declarative languages, and an even smaller percentage understand how
these languages work, but for the most part, most developers see
applications as a procedural process. In order to work with parallel
processing effectively, the development team as a whole must move beyond
relying on procedures to a perspective where nothing is assumed about
when or where the code will execute. You literally don't know — you know
only that it will execute at some point, assuming the application
doesn't crash. Such a viewpoint requires a team with special skills.
3. WRITING AN APPLICATION THAT RELIES ON PARALLEL PROCESSING
It would be difficult to
include examples in a single article of every sort of parallelism that
Windows 7 and the .NET Framework 4 support. In fact, it could be
difficult to cover the topic extensively in a single book because the
topic is relatively complex. The example in this section demonstrates
just one technology, the Parallel class, which provides support for multiple processors. This particular example appears in the article because the Parallel
class is straightforward, it's relatively easy to implement, and it
provides a good starting point for anyone who wants to begin working
with multiple processors. In addition, the example works just fine on
systems that have only one processor. The following sections describe
the Parallel Process example in more detail.
3.1. Understanding the Parallel Class
Microsoft recognizes the
need to provide simple methods of adding parallelism to applications. Of
course, parallelism is a type of multi-threading in that you create
multiple threads that execute on separate processors. However,
parallelism is more than simply creating a multi-threaded application.
The threads must be able to execute in an independent manner. The Parallel
class is part of the effort to create an environment in which
applications can execute using more than one processor without adding
undue complexity to the application itself. The concept is simple, but
the implementation can be difficult. In this case, the application
executes tasks within a special for loop. Each task can execute using a different processor.
The Parallel class is
part of a much bigger experiment in parallelism, the Task Parallel
Library (TPL) that's part of the .NET Framework 4. The components of the
TPL appear as part of the System.Threading (http://msdn.microsoft.com/library/system.threading.aspx) and System.Threading.Tasks (http://msdn.microsoft.com/library/system.threading.tasks.aspx) namespaces. The Parallel class is just one technology in these classes, which also include the following concepts.
Data parallelism:
When an application must work on multiple bits of independent data, as
in database records, it's usually faster to work on each bit in
parallel. Instead of updating each record individually, the database
application can update multiple records simultaneously. Of course, the
key word is "independent." You can't update dependent data in parallel
without terrible consequences. Read more about data parallelism at http://msdn.microsoft.com/library/dd537608.aspx.
Task parallelism:
Applications must often perform multiple independent tasks. In some
cases, the tasks are similar, but different in a small way. For example,
a scientific application can perform the same check using multiple
instruments, or a security application can check the status of multiple
intrusion sensors. As with data parallelism, the key word is
"independent." The tasks must be independent of each other to succeed in
a parallel-processing environment. You can read more about task
parallelism at http://msdn.microsoft.com/library/dd537609.aspx.
Parallelism using asynchronous patterns:
The common element of both data and task parallelism is the concept of
asynchronous processing. It's possible to create a pattern that
describes multiple independent elements of some sort. The TPL supports
asynchronous patterns in various ways. You can read about these types of
processing at http://msdn.microsoft.com/library/dd997405.aspx.
PLINQ:
Most types of parallelism rely on the concept of doing something. An
application processes multiple bits of independent data or checks
multiple independent sensors. It's also possible to use parallelism when
asking something. The sidebar, "Using the PLINQ Alternative," describes
how to use PLINQ to perform multiple query tasks at once.
When working with the Parallel class, you have access to a number of For() and ForEach() loop structures that are implemented as methods (note the difference in capitalization from the standard C# for and foreach loops). In addition, the Parallel class supports an Invoke() method that accepts an array of actions to perform. All these methods can be executed in parallel if the Parallel class detects an opportunity to do so, and hardware resources are available to complete the action.
3.2. Configuring the Parallel Process Example
The example begins with a Windows Forms application. You need to add a Test (btnTest) button and a list box (lstColors).
The list box will contain a list of items to process. The example uses
colors, but you can use any set of strings desired. Add as many strings
as you want, but you'll want to keep the number of unique items low to
ensure you can see them in the dialog box that appears after the data is
processed. Figure 1 shows a typical setup for this example.
You don't need to add any special references for this example. However, you do need to add two special using statements as shown here:
using System.Threading.Tasks;
using System.Text;
3.3. Writing the Parallel-Process Example Code
The example code focuses
on performing a task on multiple processors, rather than doing something
elegant that you'd normally perform in a production application. In
this case, the example processes a list of colors. It counts each color
string and adds a new entry for each unique string. When the code
completes, it outputs a dialog box with the results. Listing 1 contains the code needed for this example.
Example 1. Processing data items using multiple processors
private void btnTest_Click(object sender, EventArgs e) { // Initialize the Colors array that is used to // hold the number of times each color appears. Dictionary<String, Int32> Colors = new Dictionary<String, Int32>();
// Copy the list box object collection to an array for // processing. String[] ColorList = new String[lstColors.Items.Count]; lstColors.Items.CopyTo(ColorList, 0);
// Process each of the entries in the color list. Parallel.ForEach(ColorList, ThisItem =>
// Create the lambda expression. { // Check the current color against those already // in the list. if (Colors.ContainsKey(ThisItem))
// Update the color count if the color is // in the list.
Colors[ThisItem]++; else
// Otherwise, add the color. Colors.Add(ThisItem, 1); } );
// Create an output variable. StringBuilder Result = new StringBuilder();
// Process the result. foreach (KeyValuePair<String, Int32> Item in Colors) Result.Append("Color: " + Item.Key + " appears " + Item.Value + " times.\n");
// Display the result on-screen. MessageBox.Show(Result.ToString()); }
|
The code begins by creating a Dictionary object, Colors, that has a key of type String and a value of type Int32. Note that Colors will hold the summary of unique string names in lstColors
and the number of times that the strings appear. For example, if red
appears six times, the key will be red and the value will be 6.
Processing a ListBox.ObjectCollection can prove tricky, so the example creates a String array, ColorList. It uses the CopyTo() method to copy the list of colors found in lstColors.Items to ColorList for processing.
The next step is the actual parallel code for the example. The code calls Parallel.ForEach(), which is a parallel form of the foreach statement. The first argument is the list of items to process, which is contained within ColorList. The code then uses a lambda expression to process each element within ColorList. Lambda expressions are part of the LINQ technology. Each ColorList element appears within ThisItem.
The action for the lambda expression appears within the curly braces. When the color already appears in Colors, the code simply updates count value. Otherwise, the code uses the Add() method to add a new entry to Colors for the color in question. When the ForEach() method loop is complete, Colors will contain an entry for each unique color value and a count of the number of times this color appears in lstColors.
The final steps of this example are output-related. The code begins by creating a StringBuilder object, Result. It then uses a standard foreach processing loop to add each of the entries in Colors to Result as a String. When Result is complete, the code uses Result.ToString() to display the message box shown in Figure 17-2.
3.4. Debugging the Parallel-Process Example Code
There are some problems
debugging the example as it exists right now — problems that have
nothing to do with the code. Try setting a break point on the
if (Colors.ContainsKey(ThisItem))
line of the example code. Choose Debug
Start Debugging or press F5. You'll find that the example does stop at
the right line, but not the first time through the loop in most cases.
In some cases, the debugger will stop when Colors has nine items in it; at other times it will stop when Colors
has only two items in it. If you try single-stepping through the code,
you'll find that it lurches between steps. The odd behavior appears to
be a problem with using multiple processors.
It's possible to obtain
more consistent behavior from the debugger, but the logic of selecting a
break point isn't always clear. Remove the previous break point and add
a new one at the
Parallel.ForEach(ColorList, ThisItem =>
line of the example code. Choose Debug Start Debugging or press F5 again. This time, you'll be able to single-step through each of the items as it's added to Colors.
At least the debugger seems to work more consistently. The lesson here
is that placing a break point inside the lambda expression may not work
as expected. Microsoft hasn't documented why, and no reason for this
behavior is stated online, apparently. The point is that if one break
point doesn't appear to work for you, try setting one a little earlier
in the code to see if it will work better. You should be able to find a
break point that will let you see your code in action.
You'll also want to know how you can tell that the example is actually using threads to process the information. Choose Debug Windows Parallel Tasks or press Ctrl+Shift+D,K to display the Parallel Tasks window shown in Figure 3.
In this case, the example is running four parallel tasks and has
another one scheduled to run. Your window will very likely look
different from the one shown, and it will also vary each time you run
the application.
It's important to note that the window in Figure 3 shows the parallel tasks, not all the threads running on the system. If you want to see all the threads, then choose Debug => Windows => Threads or press Ctrl+Alt+H instead. Figure 4
shows how the Threads window appears in comparison. Notice that the
application uses a number of threads, but not all of them are running in
parallel.
It's also possible to get a pictorial view of the parallel processing using the Parallel Stacks window. Choose Debug => Windows => Parallel Stacks or press Ctrl+Shift+D,S to display the Parallel Stacks window shown in Figure 5.
This pictorial view gives you a better understanding of precisely how
your application is working. Hover your mouse over the entries to see
the thread numbers and to obtain more information about them.
4. MOVING YOUR APPLICATIONS TO WINDOWS 7
This article has provided a
quick overview of parallel-programming techniques you can use in
Windows 7 to make your applications run faster. The applications don't
simply appear to run faster through threading techniques; they actually
are faster because you use multiple processors to perform the work. Each
processor works independently and simultaneously. Even though the
definitions for multi-threading and multi-processing are well
understood, some developers still get confused about the benefits and
problems of each technology. If you take one thing away from this article , it should be that multi-processing produces a true increase in
application speed by using multiple processors, but that such processing
is problematic due to the nature of imperative languages (which require
worrying about state).
Before you begin making
plans to use parallel-programming techniques for your next application
upgrade or new application, you need to have a plan. Parallel
programming can be time-consuming to implement, hard to debug, and not
very noticeable when implemented poorly. You need to consider what
you'll get out of the parallel programming first. Think about how your
application works and whether it even lends itself to parallel
programming techniques. Once you have goals in place, define how to
achieve those goals. It may be that using the Parallel class won't achieve your goals, and you'll actually need to use a functional language like IronPython or F#.