# **Programmable Commercial Band Stereo FM Transmitter**

Michael McCorquodale, Hector Torres, Chung Leong Ma, Youngjoon Kim

Center for Integrated Microsystems
Department of Electrical Engineering and Computer Science
University of Michigan
Ann Arbor, Michigan 48109-2122

Abstract-A practical and multifunctional use of Direct Digital Synthesis (DDS) in wireless applications is presented by demonstrating a system that realizes commercial band FM stereo transmission and meets specifications set by the FCC. This system contains the ability to synthesize passband signals that are modulated completely in the discrete time domain by performing baseband modulation in a custom CMOS VLSI core and interfacing to an off-chip Direct Digital Synthesizer in order to generate the high frequency carrier. Using off-chip analog-to-digital converters (ADCs), baseband modulation is achieved by band limiting two audio frequency inputs to 15kHz, sampling each at 106kHz, and then taking their sum and difference. The sum, or monaural signal, is added together with the difference signal that has been amplitude modulated on a 38kHz subcarrier. Lastly, a 19kHz pilot tone is added in phase with the subcarrier in order for the subcarrier to be regenerated coherently at the receiver. The pilot tone and the subcarrier are discrete time local oscillators that are synthesized on-chip. Once the baseband sample has been processed, it is converted to a phase deviation, added to the carrier phase, and written to the DDS in order for the passband frequency to be updated. The core architecture is based upon a 16-bit RISC microprocessor with application specific modules to achieve the design objectives. These modules include a multiplyaccumulate (MAC) unit, on-chip memory, and an on-chip lookup table. The passband, or carrier, frequencies are fully programmable, as is the core so that many different modulation schemes can be achieved.

### I. SYSTEM OVERVIEW

#### A. Motivation

With the advent of wireless communications has come an increased demand for higher levels of integration and functionality in the core circuits that are used to realize digital and analog communication functions. For example, wireless cellular communications has recently seen a transition from analog to digital modulation schemes. However, competing companies are pushing different technologies and these technologies are gaining varying degrees of success in different parts of the world. It would be advantageous to cell phone manufacturers if a single core could be designed that would realize these many different communication schemes. Direct Digital Synthesis (DDS) can be employed to do just this. The Direct Digital Synthesizer, combined with a programmable baseband modulation core, forms a complete communication system. The focus of this work is the design of the baseband processor, which meets FCC specifications for Commercial Band Stereo FM. This modulation scheme was selected as a model for demonstration since the baseband modulation required is more interesting and complicated than many others.

#### B. Justification for CMOS VLSI Design

CMOS VLSI is well justified in terms of cost, required bandwidth, and level of integration. It is well known that industry has invested a significant amount of resources into CMOS fabrication and therefore the cost per die has dropped dramatically

while the yield has increased substantially. Additionally, since the baseband modulation core does not require a very high bandwidth, exotic high frequency and power hungry technologies such as HBT processes are not required. Lastly, the DDS to which the core will interface is designed in a standard CMOS process. Indeed higher levels of integration can be achieved by combining the baseband modulation core with the DDS. In fact emerging technologies such as CMOS with MEMS can be used to truly realize a multifunctional and software programmable communication system on a chip.

#### C. Design Methodology

Taking into consideration the challenges and difficulties that can be encountered in VLSI design, a simple, strict, and standard design methodology was established. Specifically, a one-to-one correspondence between the hierarchy of the physical design and functional design was maintained. This facilitated the verification process by allowing LVS and parasitic extraction to be executed at any level. Additionally, symbols were created at each level, but no symbols were ever combined with transistor primitives at any level, which maintained a clean hierarchy. Every cell was fully tested for functionality before instantiating it into a higher level. Therefore, debugging at the top level was minimized. Verilog code was written using either a purely functional or a purely structural style for every cell.

The layout was designed with the objectives of low power and adequate bandwidth in mind. Proper pitch matching was maintained for every cell in the datapath. In order to reduce cross-coupled parasitic capacitance, polysilicon and metal-2 paths were run vertically while metal-1 paths were run horizontally. metal-2 was used for interconnecting cells in the datapath while polysilicon was used only for local interconnect since its associated sheet resistance is high. These efforts resulted in a fully functional and fast datapath that occupies small area.

### D. System Requirements

As explained in the abstract, the input signal sample rate is 106kHz and the core must be able to perform all the necessary instructions between sample times. The target clock frequency was chosen to be 10MHz, which provides 94 instructions per sample and is adequate for many modulation schemes. The baseband modulation process requires the synthesis of local oscillators (LOs). For this purpose a MAC unit was integrated into the core. In general, the design must meet the FCC specifications for baseband modulation but must also be flexible enough to implement many other modulation schemes. Lastly, low power, versatility, and a high level of integration were desired for use in different communication applications.

### E. Architectural Requirements

The design requirements dictated that certain architectural elements be included. For example, a MAC unit was designed for high precision and single cycle operation. The MAC unit, shown in Fig. 1, uses a modified Booth's algorithm, resulting in a minimum number of instructions needed for LO synthesis. Instructions were used to read and write into the MAC unit's accumulator as well as perform multiply and multiply-accumulate operations.

Applications in wireless communications demand low power. The core was deigned based upon a RISC architecture, which allows the control to be very simple. Consequently, the control logic is minimal and therefore power consumption is reduced. Furthermore, all instructions were designed to execute in a single clock cycle, which also minimizes the control logic and may be useful in applications that require intense processing between samples.



Fig. 1: Multiply-accumulate architecture.

An on-chip mask ROM contains the main program for the baseband modulation algorithm. By designing a mask ROM rather than custom ROM the user is able to change the main program before fabrication by changing only the metal-1 layer. Even though less than 100 instructions are enough for the target application, up to 256 instructions can be stored in this ROM. Moreover, the user may bypass the internal ROM and access instructions from an external ROM. The chip includes an output bus for the program counter (PC) and an input bus for the instruction words. Testing the core becomes simpler by using this external interface.

Data memory is also integrated on-chip. A 128-word RAM and a 128-word ROM were synthesized in Epoch and incorporated into the core. The RAM is used to store intermediate states during the modulation process. The ROM is used to store the constants needed for the carrier frequencies that are synthesized by the DDS. It consists of data for 40 different transmission frequencies allowed by FCC specifications. Each frequency uses two words (32-bits) for high numeric phase precision. The selection of the carrier frequency is pin-programmable, and thus the core contains seven input pins that the user can tie to VDD or ground at the board level in order to select the desired carrier.

The input and output interfaces consist of 16-bit memory mapped I/O busses. All of the unused on-chip address lines are brought off chip. The user can use any combination of these lines in order to access the external ADCs and DDS.

**Table 1: FCC Specifications for Stereo FM Transmission** 

| Item                        | FCC Standard                                      |
|-----------------------------|---------------------------------------------------|
| Assigned Frequency          | 200kHz channel spacing from 88.1-107.9MHz         |
| Channel BW                  | 200kHz                                            |
| Noncommercial Stations      | 88.1-91.9MHz                                      |
| Commercial Stations         | 92.1-107.9MHz                                     |
| Carrier Frequency Stability | +/-2kHz of assigned frequency                     |
| 100% Modulation             | ΔF=75kHz                                          |
| Audio Frequency Response    | 50Hz to 15kHz following 75µs preemphasis curve    |
| Modulation Index            | 5                                                 |
| FM noise                    | >60dB below 100% modulation at 400Hz              |
| AM noise                    | >50dB below level @ 100% AM in audio band         |
| Maximum power               | 100kW in horizontal and vertical polarized planes |

#### II. IMPLEMENTATION & ENGINEERING CONSIDERATIONS

## A. Specifications

The core must be able to achieve the baseband modulation specifications set by the FCC, as shown in Table 1, and then compute a new phase that updates the output of the DDS so a new frequency deviation is generated for each sample. The required baseband spectrum is shown in Fig. 2 and it is clear that the baseband bandwidth is 53kHz and therefore the minimum input sample rate is 106kHz. Once the baseband sample has been computed it is critical that the phase in the DDS is updated at equal intervals in time so the carrier frequency deviation is synthesized accurately. The designed architecture allows the DDS phase to be updated with 32 bits, the maximum resolution. Time must be allotted so an adequate and exact number of instructions can be executed between samples. Additionally, discrete time local oscillators must be synthesized on-chip in order for the pilot tone and subcarrier to be generated by the relationships shown in equations 1-9. In order to generate these oscillators a multiply-accumulate (MAC) unit is required. Synthesis is achieved by the following:

$$\sin(\alpha) + \sin(\beta) = 2\sin\frac{\alpha + \beta}{2}\cos\frac{\alpha - \beta}{2} \tag{1}$$

Let:

$$\alpha = (n+1)\omega_o \tag{2}$$

and:

$$\beta = (n-1)\omega_{\alpha} \tag{3}$$

$$\sin(n+1)\omega_{o} + \sin(n-1)\omega_{o} = 2\sin\frac{(n+1)\omega_{o} + (n-1)\omega_{o}}{2}\cos\frac{(n+1)\omega_{o} - (n-1)\omega_{o}}{2} \quad \ (4)$$

$$\sin(n+1)\omega_o = 2\sin n\omega_o \cos \omega_o - \sin(n-1)\omega_o \tag{5}$$

$$\sin n\omega_o = \kappa \sin(n-1)\omega_o - \sin(n-2)\omega_o \tag{6}$$

$$y[n] = \kappa y[n-1] - y[n-2]$$
 (7)

where:

$$y[n] = \sin n\omega_{\alpha} \tag{8}$$

$$\kappa = 2\cos\omega_o \tag{9}$$



Fig. 2: Stereo FM baseband spectrum.

Functionality was the first and foremost design objective of this work, but other practical specifications were still considered. The target application does not dictate any environmental specifications outside of those that can be met by the design technology. However, power is a concern as it is with all wireless applications. Since power is a function of frequency, minimizing the frequency is critical to achieving low power consumption. Once the firmware is in place and the number of instructions has been determined, the clock frequency can be lowered so only enough time is allotted to execute the baseband modulation instructions. Lastly, the end user was also considered in the design. Specifically, a lookup table was designed so the user can externally pin program the carrier frequency by tying the output pins correctly. Additionally, the user can bypass the on-chip ROM in order to execute other modulation schemes as well as access external RAM.

## B. Tradeoffs

The most significant tradeoffs made during the design procedure dealt with the functionality of the core, the specifications, and the time to design. CAD tools were used to their fullest extent when applicable, but detailed transistor level design was still required in many instances. The full custom datapath was designed with low power, adequate bandwidth, and reasonable size in mind. Certain simple and elegant optimizations were made in this custom datapath. Since throughput was a design constraint, the register file was designed to be 16 words for fast data memory access, however a sufficiently small physical size was achieved by designing a positive edge triggered masterslave flip-flop file realized by one row of master latches and sixteen rows of slaves. Similar techniques were used in other modules such as the shifter where only one half of a full logarithmic shifter was designed but with input and output muxes to flip the input data in order for left and right shifts to be executed with considerably fewer devices.

The MAC unit was required to provide full functionality of the core. With design time constraints in mind, the MAC unit was synthesized using Epoch. Similarly, it was trivial to synthesize the on-chip memory units with Epoch at the same time. This of course increased the level of integration significantly.

Lastly, the auto-place and route tool was used in Epoch to route the entire core rather than performing manual routing. These decisions discussed above demonstrate efficient and intelligent use of the tools available in order to minimize design time while maximizing useful design effort.

### C. Critical Paths and Timing

The MAC and the ALU exhibit the largest delays through the core. The ALU delay time is 58.1ns (Fig. 3) which is measured from the rising clock edge to the point when data is valid on the write data bus to the register file. This critical path permits a maximum clock frequency of approximately 17MHz, which is

well above the target frequency of 10MHz. For the MAC, the static timing analyzer TACTIC reported a worst case delay of approximately 54ns. However, this delay could not be reproduced in the functional model for a variety of inputs and therefore it was assumed to be a false path. These false paths are common in static timing analyzers and as result of the tests performed it was decided that the MAC was actually faster than the ALU. Hence, an argument can be made against the use of the ripple-carry adder that was designed since the critical path is through the ALU. However, recall that the application target frequency dictates the maximum critical delay. The ripple-carry adder showed adequate performance in this regard and therefore a faster ALU was not needed. This is a justified engineering decision which saved time and design effort since the implementation of the ripple-carry adder is straightforward as compared to faster adders such as the carry-look-ahead adder.



Fig. 4: Achieving passband via aliasing.

### D. Architecture and Algorithms

Using the fact that phase changes linearly with time over a constant frequency, one can use a high-speed lookup table and a DAC to synthesize high frequency waveforms based on phase. The DDS does just this and it is capable of synthesizing frequencies in the range of DC to 25MHz, for a DDS clock frequency of 50MHz. The aliased components of the synthesized signal (Fig. 4) around the second harmonic of the DDS clock will be inband and the desired transmission frequency can be selected while all others are rejected. The core completes the baseband modulation as described previously and a system level implementation is shown in Fig. 5.

The core must convert a baseband-modulated sample to a new phase deviation. This can be easily accomplished using the simple relationship between phase and frequency, as shown by equations 10-19, and the fact that  $2^{32}=2\pi$  in the DDS. Immedi-



Fig. 3: Timing diagram of critical path through ALU.



Fig. 5: System implementation of baseband processor for realization of stereo FM transmission.

ately after the baseband modulation is complete the 32-bit phase corresponding to the carrier is loaded to the accumulator based upon the user pin-program. It is accumulated with the product of the baseband sample and the maximum phase deviation, as shown below, and then written to the DDS. Here it is clear that the user can use the core to implement other modulation schemes such as binary phase shift keying (BPSK). Here two discrete phases (0° and 180°) represent logic states "1" and "0." Data can be sampled and the phase of the carrier can be flipped for each bit. Other schemes can be demonstrated with equivalent ease and the applications to wireless communications are clearly many.

The relationships for FM synthesis follow:

$$\phi = \omega dt \tag{10}$$

$$dt = \frac{1}{f_{clack}} \tag{11}$$

$$\omega = 2\pi f = \frac{\phi}{dt} \tag{12}$$

$$f = \frac{\phi f_{clock}}{2\pi} \tag{13}$$

$$f = \frac{\Phi f_{clock}}{2^{32}} \tag{14}$$

$$f = \frac{(\Delta \phi + \phi_{carrier}) f_{clock}}{2^{32}} \tag{15}$$

$$f = \frac{(x_{baseband} \Delta \phi_{maxdeviation} + \phi_{carrier}) f_{clock}}{2^{32}}$$
 (16)

where:

$$\phi_{carrier} = \frac{2^{32} f_{carrier}}{f_{clock}} \tag{17}$$

$$\Delta \phi_{maxdeviation} = \frac{2^{32} \Delta f_{maxdeviation}}{f_{clock}}$$
 (18)

$$x_{baseband} \le 1 \tag{19}$$

As stated previously, the core architecture, shown in Fig. 6, is based on a 16-bit RISC processor and the complete layout is shown in Fig. 7. The layout is I/O pad limited and this is largely due to the multifunctional design objectives.

#### E. Verification and Simulation

Verification and simulation were performed using the Mentor

Graphics Tools. Mentor Quicksim and Signalscan were used for functional verification of each initial transistor level module and Verilog module respectively. Once parasitic extraction was executed using the layout tool, IC Station, the Mentor Accusim environment provided detailed timing analysis and was used to backannotate each custom cell. Timing information from parts generated in Epoch was extracted automatically during the import process to Design Architect. Once the functional models were properly backannotated, further timing verification was performed in Quicksim and errors due to "glitches" were commonly encountered, but easily solved. In order to verify the physical design, full mask LVS was performed by importing the custom datapath into Epoch and generating a netlist for the entire core that included the custom and synthesized parts.

## F. Testing

The program counter and the instruction register are the most critical sections of the core since they determine the instructions that are executed internally. Therefore, a scan mode can be enabled for each of these sections. The scan mode is pin-programmable and scan data is loaded serially for each module. The register file has no built-in scan chain since the read busses of the core are externally accessible and data integrity can be determined by probing these pins. All on-chip memory can be bypassed in order for external RAM and ROM to be accessed. External RAM is accessible by using the extra data memory address pins, while external ROM can be accessed by overriding the internal ROM with pin-programming. At the system level, an external ROM can be used to program the core to execute simple functions such as writing a single phase to the DDS. The DDS output frequency can be monitored with a spectrum analyzer and correct synthesis can be determined. Complete system verification can be performed using a Vector or Network Analyzer and the demodulated FM can be displayed as well as heard. Indeed many more tests can be devised as part of the complete functional verification. This advantage is due to the accessibility of the core to the user

**Table 2: Chip Statistics** 

| Item             | Datapath                         | Chip                             |
|------------------|----------------------------------|----------------------------------|
| Size             | Length: 950.4μm                  | Length: 4069.8μm                 |
|                  | Width: 2340.6μm                  | Width: 4583.4μm                  |
|                  | Area: 2.22mm <sup>2</sup>        | Area: 18.65mm <sup>2</sup>       |
| Transistor Count | NMOS: 3988                       | NMOS: 27322                      |
|                  | PMOS: 3204                       | PMOS: 17113                      |
|                  | Total: 7192                      | Total: 44435                     |
| Density          | 3239 transistors/mm <sup>2</sup> | 2383 transistors/mm <sup>2</sup> |
| Max Clock        | N/A                              | 17MHz                            |

# G. Design Statistics

The core contains two distinct parts. These are the custom



Fig. 6: Complete baseband processor architecture.

datapath and the remaining parts generated in Epoch as described previously. Chip statistics, shown in Table 2, indicate that transistor density is not very high at the chip level. This is due to the fact that the chip is pad limited, which in turn is due to the design objective of maximum versatility. Nevertheless, the custom datapath taken alone shows respectable density as would be expected.



Fig. 7: Integrated circuit mask set.

# **SUMMARY**

This work has presented a practical use of DDS and its application to wireless communications. Specifications were met for Commercial Band Stereo FM Transmission by using novel techniques to perform baseband modulation in a CMOS VLSI core and then interface to an external DDS to reach passband. The design requirements were examined and application specific modules were added to a core based on a 16-bit RISC architecture. Proper physical and functional verification methods were demonstrated, as were appropriate engineering design tradeoffs. It was shown that this work can be extended to many modulation schemes and could be used as a multifunctional core in a variety of wireless applications. Further levels of integration can be achieved with emerging technologies.

# REFERENCES

- [1] Blanhut, R.E., *Digital Transmission of Information*, Addison-Wesley Publishing Company, Reading, Massachusetts, 1990.
- [2] Couch II, L. W., *Digital and Analog Communication Systems*, 5<sup>th</sup> ed., Prentice Hall, Upper Saddle River, NJ, 1997.
- [3] Garg, V.K. and J.E. Wilkes, *Wireless and Personal Communications Systems*, Prentice Hall, Upper Saddle River, NJ, 1996.
- [4] Epoch, *Epoch: Online Manuals*, Cascade Design Automation Corporation, Online, 1994.
- [5] Motorola, DSP56000 Digital Signal Processor Family Manual, Motorola Literature Distribution, Phoenix, Arizona, 1995.
- [6] Proakis J.G. and D.G. Manolakis, *Digital Signal Processing*, 3<sup>rd</sup> ed., Prentice Hall, Upper Saddle River, NJ, 1996.
- 7] Weste, N.H.E. and K. Eshraghian, *Principles of CMOS VLSI Design: A System Perspective*, 2<sup>nd</sup> ed., Addison-Wesley Publishing Company, Reading, Massachusetts, 1993.