Technologies for the smartest machines in the IIoT era
Inside Machines: Essential technology developments within distributed, connected machines with real-time capabilities include heterogeneous processing architecture; hardware, software design and development; and readiness for Industrial Internet of Things (IIoT). See five architecture examples, diagrams.
What does it mean to make something smart? Consider the scope of this industry catch phrase and what it means for a machine to be smart and to provide advantages for increased use of Industrial Internet of Things (IIoT) design strategies.
Perhaps it means a machine is smart enough to sense everything a developer can dream up. Maybe this machine has the most precision or multiple sensor types to feed some new control and/or predictive maintenance algorithm. How about vision-guided motion? Or multiple protocol communications and translation? Are the smarts local or are they distributed control? Where does machine learning fit? Does smart mean the machine works in the domains of Industrie 4.0, the IIoT, Made in China 2025, Made in India, and more? Must a machine make new business models possible to be labeled as smart?
Heterogeneous processing architecture
In an advanced, smart machine, a fast and modern CPU is needed to process multi-axis motion and vision algorithms. This is a challenge today because the fastest processors are developed for server-type workloads and their complex pipelines, caches, etc., are optimized for throughput instead of deterministic, real-time responses. Multicore (up to four) and many-core (more than four) performance are the primary focuses of increasing processor performance.
To take advantage of these, the tasks must be partitioned into parallel control operations. Real-time operating systems and software libraries should provide thread-safety to lessen the burden of programming complexity of multi-threaded applications associated with multi/many-core development. At the lower to mid-range of the CPU performance scale, there is room for increased frequency and hence single thread performance, at the expense of area and power consumption.
For the most advanced, smartest machines with fast input/output (I/O) and the need for hard, deterministic real-time response in the sub µSec range, even the fastest processors cannot handle the entire range of performance requirements. The solution is to use a heterogeneous processing architecture.
Five architecture examples
A heterogeneous architecture provides different processing engines for optimizing several aspects of smart machine control as well as bringing additional benefits to the machine builder. Five examples of heterogeneous architectures combine:
- CPUs with digital signal processors (DSPs)
- CPUs with a general-purpose graphics processing unit (GPGPU)-a GPU can be used for more than rendering as it can be programmed to do algorithmic processing by an end user
- CPUs, DSPs, and GPGPUs
- CPUs with field-programmable gate arrays (FPGAs)
- Application-specific IP blocks implemented on/in any of the above.
These basic architecture descriptions can exist in discrete components or be integrated into a System-on-Chip (SoC). Some devices have additional IP blocks implemented that are more specific to an application use that are relevant to machine builders such as a DSP device with a PWM module. CPUs with FPGAs, in particular, have become a popular heterogeneous architecture in the last years. This provides the user with three processing elements because FPGA device vendors have created powerful DSP building blocks in their devices. The FPGA is designed to provide the capability to ensure sub µSec-nSec level, hardware determinism, and reliability as well as full customization, flexibility, field upgradeability, and bug fixes without hardware spins (like upgrading software).
Cost, performance, power consumption
Users can select a variety of performance, cost, size, power consumption, and I/O counts for the FPGA and CPUs to tailor the implementation to the needs of the machine. It is also possible to design a scalable hardware platform that uses common software on the CPUs and IP blocks in the FPGA. A few years ago, SoCs integrating CPUs with FPGAs (and DSP building blocks) were developed. Figure 1 illustrates a simplified version of the basic heterogeneous architectures.
A common question is, what is the best architecture? The answer is the one that best helps address technology, customer, and business requirements. And that depends on the situation and the application. One guideline is to focus on the architecture that provides long-term benefits and helps address multiple generations of control needs. Investments into architectures can have substantial payoffs, but changing them often can result in wasted efforts.
Very few performance benchmarks compare these various architectures because of the complexities involved. Most often, CPU-specific benchmarks (CoreMark, SPECint are examples) or FPGA feature-specific numbers are provided. What is needed is a workload-centric framework with a consistent methodology of comparing the relevant metrics in an end application-centric manner. One of the few that have attempted such for heterogeneous architectures is the NSF Center for High-Performance Reconfigurable Computing (chrec.org). Researchers created a framework to analyze various heterogeneous processor architectures to try to create an "apples-to-apples" equivalent analysis of these very different implementations.
For the control application, researchers used several relevant workloads such as remote sensing, image processing, motion control, trajectory generation, and communications.
Hardware, software design, development
A challenge to effectively using any heterogeneous architecture is the complexity of the hardware and software design and development. The collection of disparate and nonintegrated tools selected may complicate the workflow and design data management and create risk. Much attention is often given to the development side, which, in this context, means creating applications on top of the working hardware and run-time software stack; whereas, less design attention has been provided.
Designing a deployable, custom hardware and software system for advanced industrial control requires many deep and broad capabilities, tools, processes, and methodologies. As technology has progressed and taken advantage of Moore's law, the complexities, challenges, and risks of custom embedded design for high-performance systems have increased as have the expense of the tools and required designer expertise and specialized knowledge.
Semiconductor device speeds have increased in the core and the I/O rates to where signal integrity is a challenge at the board level, often requiring dedicated tools and understanding. Advanced packages with more than a 1,000 pins with small pin pitches (0.5 to 1.0mm), close-proximity decoupling capacitors, and other challenging design attributes require advanced, 10- to 16-layer boards, creating design, manufacturing, and certification challenges. The number of power rails and power management also add to the complexity of selecting the power supply and distribution.
Real-time Linux, a large-footprint operating system (OS), is becoming more popular with the increased processing capability, availability of megabytes to gigabytes of memory at low cost, the need for networking, and the desire to do more in software. Rolling your own version of the OS with the associated driver development and performing robust hardware/software validation on the heterogeneous architecture require an additional set of tools and expertise. For real-time Linux, there are organizations and projects such as the Linux Foundation and the Yocto Project working to standardize and ease its adoption.