How to Select a DSP Processor
This article will be useful if you have already decided that you need a DSP processor to handle your computationally-intensive signal processing application in real-time, but need some help selecting a DSP processor. It will guide you through some of the characteristics of DSP processors that you can choose from to help you narrow-down the choice.
For this discussion it is assumed that you are interested in stand-alone DSP processor chips, and not on DSP cores that you will embed on your own ASIC.
For this discussion it is assumed that you are interested in stand-alone DSP processor chips, and not on DSP cores that you will embed on your own ASIC.
Floating-Point vs. Fixed-Point
You will need to decide if you need floating-point support from the DSP processor or not. If your application does not make use of floating-point variables, then you don’t need a floating-point unit. If your application does use floats, you can still convert the algorithms to fixed-point arithmetic. This is a time-consuming job, but allows you to save the cost of a floating-point unit on your DSP processor. If you leave floating-point operations in your program and compile it for a processor with no floating-point unit, it will use emulation libraries that are extremely slow.
In general all algorithms can be implemented in fixed-point arithmetic, and fixed-point implementations run a bit faster than floating-point implementations of the same algorithm. The advantage of floating-point is that it saves development time. But having a floating-point unit will make your DSP processor bigger and more expensive.
Note that floating point units in DSP processors will provide support for single-precision floating-point operations only. Double-precision operations will still use emulation libraries and will run very slowly.
32-Bit Vs. 16-Bit
The vast majority of DSP processors today are 32-bit. There are still some 16-bit DSP processors offered, which tend to be smaller, cheaper and consume less power than 32-bit processors.
To know if you need a 32-bit processor, you must first determine what the dynamic range of your data is. Dynamic range is the difference between the lowest and the highest values that you need to represent in a variable, including fractional bits. Voice applications usually work with 12-bit data, while CD-quality audio is 16-bits. In addition to the basic data size, you will need some extra bits to hold sums (the sum of two 16-bit numbers requires 17 bits). So practically speaking, 16-bit DSP processors are useful mostly for voice signal processing applications.
Even if your data fits comfortably inside of 16 bits, you might still prefer a 32-bit processor because of the optimization opportunities it provides. You can store two 16-bit data elements on a single 32-bit variable, and perform operations on them in parallel. If your application has 8-bit data (like video and image processing), you may be able to operate on 4 elements at a time on a 32-bit DSP. When perform multiplications of 16-bit values, the intermediate result is a 32-bit value. Manipulating intermediate multiplication results is done more efficiently on a 32-bit architecture than on a 16-bit one.
Multi-Core
A popular choice for embedded systems is to use a dual-core chip, where one of the cores is a DSP processor, and the other is a general-purpose processor. This arrangement suits many applications that require a DSP processor for number-crunching, but also need to run a lot of general code, like a full operating system, or a web browser. If your application will involve only a small amount of general-purpose code, then a single-core DSP might be good enough (you can run a simple UI and network interface inside the DSP). But if you need to run a complex operating system like Linux, you should consider a dual-core system on chip.
Another reason for multiple cores on a chip is to increase the total performance. Some applications are so computationally complex that a single DSP processor core is not enough. There are DSP chips with 2 to 4 DSP cores in them. Note that partitioning your application into multiple cores may require a substantial architecture effort.
VLIW
VLIW stands for Very-Large Instruction Word. This is a type of computer architecture that can execute multiple instructions every cycle, and therefore can reduce the execution time of complex programs. The number of parallel instructions issued on VLIW DSP processors varies from 4 to 8. Theoretically, an 8-way VLIW processor can execute 8 times as many operations per second than a regular (scalar) processor, but in practice the amount of speedup is limited by the available parallelism in the program. In average VLIW processors can execute 2 to 4 times faster than scalar DSPs, but manual optimization is required to fully exploit this feature.
On-Chip Memory
System memory (like DRAM) is much slower than the processor. When a program needs to read data from memory, the whole CPU will stall until the memory read completes. To alleviate this, processors utilize fast on-chip memory that can be read or stored in a single cycle. You still may need to move data from main memory, but once it is on-chip, the program will run very efficiently.
On-chip memories can either be arranged as caches, or as user-programmable memories. Cache memory will hold a copy of anything that is read from main memory, so subsequent accesses are done from the cache. Stores to main memory get trapped by the cache, and only get copied back to main memory when the cache is running out of space. On user-programmable on-chip memories, data needs to be explicitly moved in and out of them. Once on-chip, data accesses can be done without any stalls.
Most applications will require some amount of on-chip memory to maintain the DSP processor running at full speed. Figuring-out how much on-chip memory you need will require some careful analysis of your application and some benchmarking (simulation).
Speed
There is a wide range of CPU speeds for DSP processors, from about 100 MHz, to about 1 GHz. Faster processors will tend to consume more power and dissipate more heat, so you will want to select a DSP processor that is just powerful enough for your application.
The best way to determine if your application will fit on a given DSP processor is to benchmark it. This is the process of executing a set of programs that is representative of your application on a cycle-accurate simulator of your target platform.
Cost
DSP processor prices range from less than $10 to about $100. A 32-bit processor will be more expensive than a 16-bit. Having a floating-point unit will cost more. Larger caches or on-chip memories will also be more expensive. Adding VLIW or increasing the CPU speed will also add to the cost.
If your volumes are going to be high, you will want to carefully select the processor that just meets your needs. This is not an easy task, as you won’t know how fast your application will execute until it’s fully ported and optimized. However, it is still possible to estimate by using benchmarks.
Inband Software can help you select the right DSP processor for your application. We can help you interpret third-party benchmark data, and run detailed benchmark simulations specific to your application, and then recommend one DSP processor that is just right for your needs.