By paul ~ December 16th, 2008. Filed under: Best Practices, Systems Engr..
Over the last year or so, I’ve been following the discussion around, “How do we make use of multi-core processors in embedded systems?” with a little amusement. I have to believe that there are a bunch of embedded systems developers out there that have ceased being amused and are already off designing some cool multi-core devices. A lot of the articles I’ve read seem to be focused on the “parallel programming” problem or, “How can we write sophisticated multi-threaded applications that optimally use multi-core processors?” The “Systems-Thinkers” out there, however, have already blown right past the theory and are delightedly employing all of that new-found power. Why? Because the systems designer has long been faced with the problem of how to implement the concurrent processes in their system given limited board space and a single processor or DSP. For years, they’ve been “shoe-horning” their systems into progressively faster processors, either implementing their own concurrency strategies or employing an embedded OS. Multi-core processors provide additional computing engines, enabling true concurrency in those same applications, if you can get it to work!
Grant Martin and Steve Liebson have authored an excellent article on SCDsource called ‘Convenient concurrency’ eases multicore programming that discusses this natural approach to multi-core processor utilization. They point out that many systems have a natural concurrency that maps nicely to multi-core systems. They also point out that there are real benefits to taking a multi-processing approach in the design of such systems (lower clock rates resulting in lower power requirements, relaxed timing constraints, easier programming, etc.) In essence, partitioning the system in a natural manner across multiple slower, less sophisticated, processors can result in a more efficient design than the same application implemented with one very fast, sophisticated, processor.
Note that there’s nothing new here. As the authors point out, folks have been doing this for a long time with application-specific processing elements (ASIPs or dedicated processors.) What is new is that multi-core processor ICs are showing up with MANY homogeneous general purpose cores. Grant Martin lists a few of these in his post “42” is not the answer over on Taken for Granted.
One of the obvious attractions of a general-purpose architecture as opposed to application-specific processors is the ability to map different functionality onto the cores as necessary depending on the system’s state. To use the cell phone architecture (from the SCDsource article) as an example, it might be possible for both radio functionality and image processing functionality to be mapped to the same core provided that they are not used simultaneously. Or, for a pipeline, two different functions in the same pipeline may share a core if their requirements don’t exceed the core’s available bandwidth. This may result in a simpler, more area-efficient, hardware design.
The challenge, of course, is getting the design right. The RAMS approach is one of the answers to that problem. (I’ve previously discussed the application of RAMS to Software Defined Radios which often utilize multiple GPPs. ) The same techniques can be applied to the design of any system that has “convenient concurrency.” Simply create a functional model of the system’s behavior and work out the concurrency issues and interfaces there via simulation. The Data Flow Diagram is a really natural way to express the data and control flow in a system with concurrency. (In fact, in a DFD, the assumption is that all of the processes can be concurrent.) Next, create a model of the platform, including the multiple cores. You can deal with the cores as discrete processors, or as a pool of processors, depending on your approach. Then, you map the functionality to the platform and analyze the system’s performance with simulation. You can easily change the mapping to evaluate different configurations to determine which optimally implements the system. For instance, you could answer the questions, “What is the necessary clock frequency of an Intel dual-core processor?” vs. “What is the necessary clock frequency for an Intel quad-core processor?” for the same application. You’ll be able to get the mapping right, and work out concurrency and interface issues up front, in a white-box environment, while directly monitoring the utilization of each core as a function of time.
This sort of system optimization is the RAMS “sweet spot”. RAMS takes the guesswork out of the resource sizing.
True, easy, symmetric multi-processing on multi-core processors may be a long way off. However, you can leverage multi-core processors in your complex embedded systems designs immediately, and a RAMS approach can really help get you there.