A history of microprocessor debug, 1980–2016

July 25, 2017

Tony.Armitstead-July 25, 2017

Since the dawn of electronics design, where there have been designs, there have been bugs. But where they have been bugs, there inevitably was debug, engaged in an epic wrestling match with faults, bugs, and errors to determine which would prevail -- and how thoroughly.

In many ways, the evolution of debug technology is as fascinating as any aspect of design; but it rarely receives the spotlight. Debug has evolved from simple stimulus-response-observe approaches to sophisticated tools, equipment, and methodologies conceived to address increasingly complex designs. Now, in 2017, we sit at the dawn of a new and exciting era with the introduction of debug over functional I/O.

This is the culmination of decades of hard work and invention from around the globe. I've been involved in debug since 1984, so to truly appreciate the paradigm shift we're now experiencing in debug, it's useful to take a look back at the innovation that has taken place over the years.

System design was very different in this period compared to the way things are today. A typical system would consist of a CPU, (EP)ROM, RAM, and some peripherals (PIC, UART, DMA, TIMERs, IO...), each implemented in its own IC.

1980s single-board computer (SBC)
(Source: http://oldcomputers.net/ampro-little-board.html)

The typical development flow was to write your code in ASM or C and get it compiled, linked, and located so that you ended up with a HEX file for the ROM image. You would then take the old EEPROM(s) out of the sockets on the target board, place them in a UV EEPROM Eraser, and blast them with UV light for 20 mins.

EPROM Eraser
(Source: https://lightweightmiata.com/arcade/area51/area5114.jpg)

You then placed the EEPROM(s) into an EEPROM programmer and downloaded the HEX file from your computer (typically via a serial or parallel interface) to program them up.

EPROM Programmer
(Source: http://www.dataman.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/m/e/mempro.jpg)

Finally, you plugged the EPROM(s) back into the target board and powered it up to see if your program worked. If your program didn't function as expected, then you had several options available for debugging your code as follows:

Code Inspection: In this case, you would walk through your code staring long and hard at it looking for errors. This technique is still used today by those who view the use of any debugging tool as a failure of programming skill! The other reason you would do this is if the following techniques were either not available to you due to hardware restrictions or because of the cost.

LEDs: This technique is also still in use today. If you happen to have LEDs, or any other indicator on the target system, you can determine the path through your code by modifying the code to signal a state at significant places in the code. You can then just look at the LEDs to see the progress (or often lack of progress) through your code, thus helping you to determine where to focus your attention. (See also When life fails to provide a debugging interface, blink a RGB LED.) If you had several spare digital IOs and were lucky enough to have access to a logic analyser, you could effectively trace your path through the code in real time by tracing the states (locations) output by your program.

On target monitor: For those target boards that had a serial port (RS232) and enough free EPROM/RAM to include a monitor program, you could step through your code at the assembly level and display the contents of registers and memory locations. The monitor program was effectively a low-level debugger that you included in your own code. At some place in your program, you would jump into the monitor program and start debugging. The serial port was used to interact with the monitor program and the user would issue commands such as "s" to step an instruction and "m 83C4,16" to display the contents of 16 locations is memory starting at address 0x83C4, for example. Once the code was working as expected, the final program would usually be built without the monitor in place.

In-Circuit Emulator: For those who could afford it, the In-Circuit Emulator (ICE) was the ultimate debug tool. In some ways, this tool provided more functionality than the state-of-the-art debug tools provide developers today! The ICE would replace the CPU in the target system with electronics that emulated the CPU. These ICE tools were large (far larger than a desktop PC) and very expensive -- we are talking many thousands of dollars. In this era, the ICE was typically designed by the CPU manufacturer or one of the major tool companies of the time (Tektronix, HP/Agilent, Microtek, etc.) and would contain a 'bond-out' version of the CPU under emulation. The bond-out CPU literally had extra internal signals brought out to pins on the device so that the emulator could both control the CPU and gain extra visibility into its internal operation. The emulator could watch the operations performed by the CPU and would provide for complex breakpoints and tracing functionality that would be the envy of many a developer today. It was also possible to replace an area of on-target memory (typically the EPROM) with emulation RAM contained in the ICE. This let you download your code into the emulation RAM -- no more erasing and blowing of EPROMs during development -- bliss!

Motorola Exorciser ICE
(Source: http://www.exorciser.net/personal/exorciser/Original%20Files/exorciser.jpg)

(Source: http://www.computinghistory.org.uk/userdata/images/large/PRODPIC-731.jpg)

During the 1980s, three main changes evolved for the embedded developer. The first was that more integrated ICs started to appear that contained combinations of CPU, PIC, UART, DMA -- all included within the one device. Examples would be the Intel 80186/80188, which was an evolution of the 8086/8088 CPUs (original IBM PC), the Zilog Z180, which was an evolution of the Z80 (Sinclair Spectrum), and the Motorola CPU32 family (e.g., the 68302), which was an evolution of the 68000 (Apple Lisa).

The second was that the ICE became much more accessible to developers. Several companies had started manufacturing ICE tools at much lower cost than the CPU manufacturers' systems. Many of these companies did not use bond-out chips. Whilst this led to a small decrease in available functionality, it significantly contributed to the increased availability of lower-cost ICE products. An ICE for an 80186 could now be picked up for less than $10,000.

The third was that the ever-increasing CPU clock speeds started to cause problems for ICE technology. This placed significant challenges on the cabling systems that ICEs used, and started to cause problems with the emulation control technology, which just could not operate at these high speeds without becoming seriously expensive (again). CPU manufacturers were also becoming more reluctant to create bond-out versions of the CPUs since the extra on-chip connections interfered with chip operation. The solution to these problems was to build the CPU debug control circuitry on-chip. This allowed for single step, memory and register access, and breakpoint technology to operate at full CPU speed, but did not at this time provide for trace, which still needed access to the device external bus interface pins.

This trace was also less functional since for many internal peripheral accesses the external bus was not used. Hence, only external accesses were fully visible and the internal peripheral accesses were dark. Access to the on-chip debug (OCD) technology was either via a proprietary interface technology -- typically referred to as BDM (Background Debug Mode) -- or via the standard JTAG interface, which was more traditionally used for production test rather than debug. These interfaces allowed companies to create low-cost debug tools for control of CPU execution with no clock speed limitations. Features varied slightly between implementations; for example, some allowed the debug tool to access memory while the CPU was executing, whilst others did not.

External trace pretty much died out. The increase in CPU clock speeds, coupled with the introduction of internal CPU cache, made external trace pretty much useless. However, to diagnose more complex program defects, there was still a requirement to be able to record the execution path of the CPU. The challenge was how to do this using on-chip logic (so it can operate at full CPU speed) but to transport the trace data off chip at a feasible clock rate using as few pins as possible. The solution was to transform the execution path of the CPU into a compressed data set, which could be transported off-chip and captured by a debug tool. The tool can then use the data set to reconstruct the execution path. It was realized that if the debug tool had access to the executed program, the compression could be lossy. For example, if only the non-sequential program counter changes were output, the debug tool could "fill in the gaps" using knowledge of the program being executed. IBM's PowerPC, Motorola's ColdFire CPUs, ARM's 7TDMI based cores, and others all implemented trace systems based on this concept.

With the introduction of compressed core trace datasets, it became feasible to choose between transporting the dataset off chip and/or using a relatively small on-chip trace buffer to hold the data. In the early 2000s, various vendors strived to improve trace performance; ARM, for example, architected the Embedded Trace Buffer (ETB), which was accessible via JTAG and configurable in size to hold the trace data. This solved the issue of having to provide a relatively high speed off-chip trace port (though still nowhere near core clock speed) at the expense of using silicon area in the SoC.

In the mid-2000s, embedded CPU designers started to implement multi-core systems. The designs using ARM IP made use of JTAG technology, with each core appearing in the serial JTAG scan chain. This was not a problem until core power management was implemented, which resulted in cores losing their presence on the JTAG serial scan chain when powered down. JTAG does not support devices appearing and disappearing from the serial scan chain, so this caused complications for both debug tooling and SoC designers. To overcome this, ARM created a new debug architecture called CoreSight. This allowed a single JTAG-based debug access port (one device on the JTAG scan chain) to provide access to many memory-mapped CoreSight components, including all of the ARM cores in the system. Now, CoreSight-compliant devices were free to power down without affecting the JTAG scan chain (you can read more about CoreSight technology in this new whitepaper). This technology is still in use in more modern -- and much more complicated -- ARM IP-based systems that are designed today.

As embedded processors increased in capability -- especially with the advent of 64-bit cores -- it became more feasible to support on device debug. Previously, the typical debug system used debug tooling on a high-powered workstation utilizing a JTAG/BDM connection to the target system to control execution/trace. As Linux/Android gained widespread use, the kernel was augmented with device drivers to access the on-chip CoreSight components. By utilizing the perf subsystem, on-target trace capture and analysis is now possible.

With the introduction of the ARM Embedded Logic Analyser (ELA), it is now possible to return to the days of the ICE and have access to complex on-chip breakpoints, triggers, and trace with access to internal SoC signals -- just like the old bond-out chips used to provide in the early 1980s.

Today, after 40 years of innovation, we're on the cusp of a new era in debug, one in which engineers can perform debug and trace over functional I/O, thereby saving both time and money. The push for performing debug over existing device interfaces will not only provide a leaner solution, but will also help step up debug and trace capability to the next level. Thus begins a new chapter in our fascinating and long history in the war against bugs.

Loading comments...