Multi-core processors: Software key to success

Multithread parallel computing software and appropriate user tools advance the promise of high performance and power efficiency from multi-core processors. Link to 3 other Control Engineering articles in this four-part series.

January 17, 2011

Software developers need to concentrate more on parallelism when programming code for multi-core processors (MCPs). (Below, link to three other Control Engineering articles in this four-part series.)

According to Casey Weltzin, product manager for software at National Instruments Inc., developers need to think more about programming if they want to tweak an MCP application for highest performance using different cores. This is an example of what Weltzin referred to as “artificial complexity” confronting software designers today.

One major issue facing software designers is how to ease the linking of data flow among different cores of the processor. Eventually, the separate operations ongoing in the cores must be connected and managed through a mechanism called memory architecture. “Some tools are already here to help. They lessen the need for programmers to think about underlying hardware and allow them to concentrate on programming,” said Weltzin.

Two basic methods exist to make a software application suitably parallel to take advantage of MCPs—multitasking and threading.

Multitasking

In multitasking, the easier of the two methods, different processes can be mapped to different cores because many operating systems (OSs) allow users to assign process “core affinity,” which tells the OS to run specific processes on specific cores, explained Ian Gilvarry, strategic marketing manager for industrial automation at Intel Corp. The OS will also have SMP (symmetric multiprocessing), that is, the ability to divide its OS processes between two cores. “After this multitasking approach is taken, compute load on the system will be divided and both cores will be working,” Gilvarry said. “However, multitasking will not scale easily to more cores (say, four or eight cores), but it is a short-term approach and its simplicity will encourage most customers to try this first.”

As the number of cores increases, it becomes problematic to manage the scheduling of multiple processes to different cores with multitasking. Gilvarry suggested three broad strategies for taking advantage of MCP technology:

  • Multitasking, which permits an SMP-enabled operating system to schedule other tasks on other cores—for example, freeing core 1 for application A
  • Distributed processing that provides coarse-grained distribution of “heavyweight processes” onto all cores, providing better load balancing, and
  • Application threading, which provides fine-grained distribution of “lightweight processes” onto all cores, providing best load balancing and scaling. 

Gilvarry considers the three strategies as a “good, better, and best” approach to obtain the desired goals of MCPs.

Threading

As for implementing the “best” approach, he outlined four steps to convert a serial, single-threaded application into a multithreaded concurrent application.

1. Analyze parallelism: This typically involves a profiling tool to determine hotspots in the program and generating a call graph* of the application. How to parallelize is determined after finding the hotspots. “Threading assistant” tools are available for integrating hotspot information into a feasibility analysis of the code targeted for parallelization. For example, the tool could answer such questions as: What’s the expected speed-up from multithreading? What variables are of most concern relative to synchronization?

[ * A “call graph” provides basic profiling analysis of a software program, for example tracking the flow of values between procedures. It represents calling relationships between subroutines in a software program, according to Wikipedia. Call graph profiling allows users to analyze critical functions and call sequences in their programs, indicating threads created, functions executed in memory, etc. Intel provides more information on graph profiling and call graphs.]

2. Express parallelism: What programming abstractions will be used? Should domain-specific multithreaded libraries be used, or will the end-user find parallelism and write the code? What language or application programming interface (API) will be used?

3. Exploit parallelism: Here, the actual software tools that implement and execute multithreaded code come into play. This includes compilers, run-time engines, and APIs that make partitioning efficient and easy. Hardware support overlaps into this area. To validate parallelism, debuggers need extensions to handle the added difficulty of stepping through actions of several simultaneously executing processors. How to simplify this for developers also has room for improvement.

4. Optimize parallelism: This step pertains to overhead and balance issues of threading.  Some questions to be answered: Are locks efficient? Is granularity of the parallelism just the right size? How can we help development engineers know when they’ve gotten it right and know what they could do better on their particular application?

User tools

MCPs rely on appropriate multithread programming and debugging tools for efficient execution of multiple programs. Processor vendors and software developers are responding to this user need.

Dataflow programming is a generic method used in software tools to identify issues among computations ongoing in the different cores and manage parallel sections of code. For example, National Instrument’s LabVIEW graphical software package contains dataflow. As users create a program in LabVIEW, dataflow helps to link the different, simultaneous operations—while an intelligent compiler automatically analyzes the code for threading and parallelism. Then, LabVIEW’s run-time engine can run dataflow applications across multiple CPUs automatically, Weltzin explained.

Another notable development tool for parallel computing is Parallel Studio 2011 from Intel Corp. Parallel Studio works together with the Integrated Development Environment (IDE) of Microsoft Visual Studio and is compatible with C/C++ applications. Parallel Studio includes a threading assistant, optimizing compiler with libraries, memory and threading error checker, and threading performance profiler among its tools. The software suite is comprised of separate elements as summarized below:

Intel Parallel Advisor 2011 is a tool to simplify and assist code threading. It is said to identify areas in parallel (and serial) applications where parallelism has the greatest potential impact.

Intel Parallel Composer 2011 is intended to streamline parallel application development with a combination of tools, including an optimizing C/C++ compiler, performance libraries, and support for Intel Parallel Building Blocks. The latter is a set of development models for implementing a wide range of parallelism requirements.

Intel Parallel Inspector 2011 is a memory and threading error checker. Reportedly, Parallel Inspector allows C/C++ developers to quickly analyze code and find threading and memory errors before they can cause a problem.

Intel Parallel Amplifier 2011 is a performance and scalability tool that helps ensure multiple cores and processor capabilities are optimally used. Parallel Amplifier is said to quickly find multi-core performance bottlenecks without expertise needed in specific processor architecture or assembly code.

In short, Parallel Studio lets software developers design, build/debug, verify, and tune their parallel applications for multi-core processors.

The foregoing challenges are not meant to discourage industrial automation designers from applying multi-core CPUs in their systems. “Instead, they need to focus on using the right tools to ease design challenges and mitigate risks to achieve the benefits that parallel processors can provide,” National Instruments’ Weltzin concluded. “The future of multi-core processors is software programming that allows less concentration on hardware. Engineers already face too much artificial complexity.”

For more on this topic, watch for the February 2011 feature article on multi-core processors.

www.intel.com

www.ni.com

Frank J. Bartos, P.E., is a Control Engineering contributing content specialist. Reach him at braunbart@sbcglobal.net.

Other articles in this series

Computing Power: Multi-Core Processors Help Industrial Automation – Two or more independent execution cores on one microprocessor chip can match—or exceed—single-core chip performance by running at lower frequencies and using less power. Different software programming is required to obtain full benefits.

Insights on multi-core processors – Intel says multi-core processor technology addresses numerous industrial control challenges by delivering greater ‘raw’ and real-time performance. This was driven by a need for critical applications to respond quickly and predictably to real-time events.

Growing applications for multi-core processors – Multi-core processors have wide industrial application potential—from vision inspection systems to motion control—as developers increasingly implement the technology, initially in high-end systems.