





| High-level languages for embedded S/W?                                                                                                                                                                                                                                                                                                                                                                    |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>No serious embedded programmer uses high-level<br/>languages.</li> <li>Why? <ul> <li>Terrible performance of machine code generated</li> </ul> </li> <li>Who are responsible? <ul> <li>Embedded processors are not compiler-friendly.</li> <li>Compilers are not smart enough to generate optimal machine code compatible with hand-written code in terms of performance.</li> </ul> </li> </ul> |
| 4 임베디드 프로세서 구조 및 프로그래밍 SO 🏧 🖡                                                                                                                                                                                                                                                                                                                                                                             |





















# **FIR filters**

• For FIR(finite impulse response) filters, each output signal y[n] is the sum of a finite number of weighted samples of the input signal sequence x[n].









# Fourier transform

- The frequency domain is useful to analyze sound.
- the Discrete Fourier Transform of a signal x[n]:

$$X(\omega_0) = \sum_{n=0}^{N-1} x[n] e^{-j\omega_0 n} \quad \text{or } X(k) = \sum_{n=0}^{N-1} x[n] e^{-j2\pi kn/N} \qquad \qquad \mathsf{O}(n^2)$$

FFT (Fast FT)

$$X(k) = \sum_{n=0}^{N/2-1} x_{ev}[n] W_{N/2}^{nk} + W_{N/2}^{k} \sum_{n=0}^{N/2-1} x_{od}[n] W_{N/2}^{nk}$$
 O(nlog<sub>2</sub>n)

where 
$$W^{nk} = e^{-j 2\pi k n/l}$$

solér

# FFT

18

• taking N signal samples (x[0],...,x[N-1]) in the time domain to produce N samples (X[0],...,X[N-1]) in the frequency domain

임베디드 프로세서 구조 및 프로그래밍

• A divide-and-conquer method













| Code for the bandpass filter                                                                                                                                                                                                                                                                  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| <pre>float iir_bpf(float x, float* c, float* yold) {    float z;    float y2 = x * c[0];    float y1 = yold[1];    float y0 = yold[0];    y2 = y2 - y1 * c[1];    y2 = y2 - y0 * c[2];    yold[0] = y1;    yold[1] = y2;    y2 = y2 + y1 * c[3];    z = y2 + y0 * c[4];    return(z); }</pre> |  |
| 27 임베디드 프로세서 구조 및 프로그래밍 등이 값                                                                                                                                                                                                                                                                  |  |











| Implementing an embedded DSP system                                                                                                                                                                                                                                                                                                                                                                                | ו  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| <ul> <li>Fixed-function solutions</li> <li>custom integrated circuits or FPGAs</li> <li>the smallest size and fastest running</li> <li>prohibitively high initial development costs for each system</li> <li>cost-effective for high-volume products</li> </ul>                                                                                                                                                    |    |
| <ul> <li>Programmable microprocessors</li> <li>off-the-shelf DSP processors or general-purpose processors</li> <li>easy changes, upgrades or fixes of product functionalities</li> <li>cost-effective and less risky for low-volume products</li> <li>GPPs are usually costlier and less energy efficient (also often slower) than DSP processors, which are optimized specifically for DSP algorithms.</li> </ul> |    |
| 32 임베디드 프로세서 구조 및 프로그래밍 등이                                                                                                                                                                                                                                                                                                                                                                                         | &r |































| Implied and bit-reversed addressing                                                                                                                                                                                                                                                                                                  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>Implied addressing         <ul> <li>Special registers (multipliers, accumulators,) dedicated to functional units are implicitly addressed by the instruction.</li> <li>e.g., add P (A ← P + A) A: accumulator mpy (P ← X * Y) P: product, X/Y: multiplier registers</li> </ul> </li> <li>Bit-reversed addressing</li> </ul> |
| <ul> <li>specialized for some DSP processors that are designed to efficiently run the FFT algorithm.</li> <li>The output of the address (or index) register is bit-reversed and applied to the memory address bus.</li> </ul>                                                                                                        |
| <ul> <li>e.g., BITREV(I,n): I ← I + n where I is an index register and<br/>is a 32-bit number</li> <li>an ADSP210XX instruction</li> </ul>                                                                                                                                                                                           |
| 50 임베디드 프로세서 구조 및 프로그래밍 50 한다.                                                                                                                                                                                                                                                                                                       |













# **On-chip memory units**



Two RAMs are common. One is exclusively for data. To allow multiple data accesses per cycle while saving cost, the other is often shared by program and data

#### ROM

mainly for bootstrapping code (loading operational code & communicating with the host) or kernel code for DSP Typical size for GPPs

#### • Cache

To utilize the locality of program on external memory, small (16~32 words) finstruction caches are commonly provided. Virtually all fixed-point DSPs have no data caches due to complex coherence H/W.

is 256 or larger..

## External Data Reset & Interrupts Location 2k x 32 Internal 20k x 32 Intern Program/Deta ROM Reserved for Recoder Function

Memory map of ZR38601

code = 32 bit. data = 20 bit

02800

40000

Address

10k x 20 Intern

tata RAM

so&r

임베디드 프로세서 구조 및 프로그래밍

#### Non-orthogonal instruction encoding typically found in DSP processors Multiply The SGS-Thomson D950 P = Left\_src \* Right\_src (Rnd) DSP processor imposes 0110 0001 1000 0000 restrictions on source or destination operands as is Oncode Rnd the case with the multiply Right Source instructions shown here. R0, R1, A0, A1 Left Source L0, L1, R0, R1, A0, A1, P Multiply-Accumulate with 2 indirect Register Loads A +-= P, P = Left \* Right (Rnd), Lx = \*AX + IX, Ry = \*AY 0000 0000 0000 0000 Opcode Left Lx +IY0, +IY1, +IY2, +IY3 ALU Dest AYO. AY1 LI A0, A1 AXO AY1 Right Ry RO +IX0, +IX1, +IX2, +IX3 B1 임베디드 프로세서 구조 및 프로그래밍 soler 58





# What the compiler prefers for the features of ISAs are ... Low-level expressiveness of instructions ... gives more room for code optimizations. Recall the examples of code optimizations for the load-store ISA and the register+memory ISA.

- Easy predictability of performance ...
  - is possible by simple, straightforward hardware, and
  - allows the compiler to have simple trade-offs among alternative instructions, which helps improving code quality and meeting real time constraints.

#### • Orthogonality, regularity

- Orthogonal ISAs support all addressing modes which apply to all instructions that transfer data.
- Simpler optimization techniques will do with orthogonal ISAs

| 60 | 임베디드 프로세서 구조 및 프로그래밍 | so&r |
|----|----------------------|------|











































### 



#### Conclusion • Embedded systems are subject to strict performance and cost constraints. • Embedded processors are often designed very irregularly to meet these constraints. • Various programming language paradigms have been applied to programming embedded systems. • Imperative programming languages dominate embedded system programming. - Strong C-affinity/loyality of EE engineers - Poor optimization - Limited versatility in non-imperative programming languages • So, major efforts go into.. - Leveling up assembly languages - Further extending C to other applications: SystemC improving C compiler optimizations 임베디드 프로세서 구조 및 프로그래밍 84 so&r