Porting Embedded Windows CE 6.0 R2 to the OMAP-L138: Part 2

The authors discuss the pros and cons of the OMAP’s Programmable Real-Time Unit in the second of a three part series on porting the Windows CE 6.0 R2 embedded operating system to the Texas Instruments ARM-based family of OMAP-L138 processors.

Part 2: The Pros and Cons of OMAP’s Programmable Real-Time Unit

In addition to the capabilities discussed in Part 1 in this series, an important feature of the OMAP-L138 SoC family that is of enormous benefit to a developer is the availability of a separate subsystem, the Programmable Real-Time Unit (PRU). The PRU is based on two 32-bit cores, each with its own memory for storage of commands and data. Applications of this subsystem (PRUSS) are diverse, such as implementation of additional interfaces or maintenance of interfaces in order to arrange specific protocols such as in an auxiliary DSP or ARM core.

Figure 6 shows the general structure of PRU subsystem (PRUSS). This subsystem contains two independent 32-bit cores, each with their own instruction sets, independent of either the DSP or ARM cores.

on image to enlarge.

Figure 6: PRU subsystem structure

These cores have a simplified RISC architecture that supports 40 commands with determined time of execution (1 time unit), making possible enhanced opportunity for handling bits in registers. The cores do not have a ‘commands convey ‘ instruction or interrupt vector commands for hardware multiplication and division – all interrupts are processed in the mode of scanning via the indicator in one of the registers.

The PRUSS also has a general interrupt controller that allows unification of events from the peripheral, ARM, DSP and PRU cores. This controller can handle 32 events in two directions, both from the PRUSS to ARM and DSP cores, and from the ARM and DSP cores to PRUSS. Thus the interrupt controller can send any event to the similar interrupt controller on an associated ARM or DSP core, which in its turn leads to the call of the interrupt processors of these cores if they are enabled in the respective registers. Using the interrupt controller on a PRU module, it is possible to implement a simple interaction not only between PRU cores, but also between the ARM and DSP cores.

Figure 7 shows the structure of the PRU subsystems. Within it, each PRU core contains 32 registers, a process execution module, a table with 29 constants, and 4-Kbyte RAM commands. Independent fast input/output ports (GPIOs) associated with each core are connected directly to two registers, allowing the developer to make use of either the core’s own communications interfaces or the GPIOs to interface to standard interfaces such as UART, CAN, or ProfiBus.

on image to enlarge.

Figure 7. PRU core structure

Command RAM is available in the core itself and provides for the execution of instructions for any single time unit. All four SoC cores (ARM, DSP, PRU0, and PRU1) have access to RAM data, but each PRU core can execute a code only from its own command RAM, even though both PRU cores have access to all peripherals via the central bus.

Availability of sucn an embedded command set and data RAM allows the developer to unload the SoC central bus and implement interaction with the peripheral, mDDR/DDR 2 memory, and ARM/DSP cores with minimal load.

Optionally, the system can manage power and timing of the PRU subsystem. For subsystem timing, half of the ARM core frequency is used. This means that when the core is operating at a frequency of 450 MHz, it is possible to start the PRU cores at 225 MHz (4.4 ns per instruction). The power manager allows the PRU subsystem to be stopped or disabled when it is not needed, thus reducing the SoC’s general power consumption­­.

There is no official compiler in the C language for the PRUSS, nor any official support in TI‘s Code Composer Studio that we were able to determine. Despite that it is possible to set the Code Composer Studio environment for automated compilation of the PRU module code for the convenience of program development and to bring all data into one project.

To implement the system execution code, a specialized version of the open source PASM compiler is applied that uses an assembler as a basic language. An example of the code for the PRU0 node is shown below:

.setcallreg r28.w2
.origin 0
#include «PRU0.hp’
MOV r0, 0x00000000
ST32 r0, r1
MOV r0, 0x00000000
ST32 r0, r1
MOV32 regEDMA_2_ICR, 0x01C02470
MOV32 regEDMA_3_ICR, 0x01C02670
// Initialize pointer to INTC registers
MOV32 regOffset, 0x00000000
// Clear SYS_EVT
MOV32 r31, 0x00000000
// Global enable of all host interrupts
LDI regVal.w0, 0x0001

The PASM compiler supports several types of output files: binary, С-array, HEX-file, and other (including annotated listing). An example of an output file in the form of a C-array is shown below:

const unsigned int PRU0_Code[] =


The compiler locates the code directly from the zero address of the command RAM. This allows a C-file to be attached to the basic program, such as one that might be associated with an ARM core processor, and to copy data from the file directly to the command RAM of the appropriate core.

For environments other than Code Composer Studio, TI provides for the use of Notepad++ or TextPad for convenient code development with syntax highlighting. Setup files are provided with support of code syntax for the PRU module that has already been developed.

In BSP for Windows CE 6.0 for OMAP-L138 there is no support for the PRU subsystem. Officially, the code loader driver exists only in Linux and only for cases using a specialized patch. That is why during implementation of our projects a monolithic driver version of the PRU module was developed with support added for hardware interrupts from the PRU subsystem. This driver is configured to deliver a continuous stream of data during a specific interval of time between interrupts.

Figure 8 shows the driver subroutines needed for interaction with the OS and user applications. The PRU_Init software subroutine performs primary initialization of the driver and translates physical addresses of the memory allocated for the PRU subsystem into virtual ones for further use.

on image to enlarge.

Figure 8: PRU Cores.dll driver functions

The PRU_Deinit subroutine implements release of resources during the code loader driver uploading. The PRU_PreDeinit and PRU_PreClose subroutines are used as stubs. The rest of the subroutines are used for serving the software/hardware interface operations. Thus, the PRU_Open subroutine returns the device descriptor to the DeviceIOControl software subroutine. In its turn, PRU_Close performs context cleaning and is executed when calling the CloseHandle subroutine as the device descriptor is executed.

The PRU_PowerUp and PRU_PowerDown subroutines are used for notification of the PRU subsystem on transition to Suspend state and on cancellation of this state. In addition, the PRU_IOControl subroutine contains the whole functional implementation of the driver. When PRU_IOControl is called, the following operations are performed:

IOCTL_PRU_REQ_INT returns the system interrupt number that belongs to a specific event number (3…10) of the ARM-core interrupt controller;

IOCTL_PRU_RELESE_INT releases the system interrupts allocated using IOCTL_PRU_REQ_INT;

IOCTL_PRU_INT_INIT links a system event to a specific descriptor obtained from the API function of CreateEvent for further application of the WaitForSingleObject command with the help of an API software driver routine in the user application consisting of the following subroutines:

IOCTL_PRU_INT_DONE signals the core that the user application has processed the interrupt from PRU-core (InterruptDone analogue);

IOCTL_PRU_LOAD_CODE loads code into the command RAM of the PRU core (with a mandatory halt of the core). This sub routine also includes control of such operations as power starting of PRU subsystem in PSC controller (Power and Sleep Controller);

IOCTL_PRU_MAKE_SINGLESTEP starts program stepping (for debugging);

IOCTL_PRU_RUN starts PRU core for free program execution in the command RAM;

IOCTL_PRU_STOP stops PRU core;

IOCTL_PRU_WAIT_FOR_HALT waits for HALT command execution by PRU core;

IOCTL_PRU_SET_PC_STARTUP_POINT sets the program startup point;

IOCTL_PRU_SLEEP switches PRU core into the sleeping mode with the option for it to return to the normal mode on various events;

IOCTL_PRU_ENABLE_COUNTER switches the PRU core cycle counter;

IOCTL_PRU_GET_PC_COUNTER returns the current address of the command under execution;

IOCTL_PRU_GET_CYCLE_COUNT returns the cycle counter value;

IOCTL_PRU_SET_CYCLE_COUNT registers a new value of the cycle counter;

IOCTL_PRU_GET_STALL_COUNT returns the quantity of time units missed due to the code absence;

IOCTL_PRU_WRITE_GP logs in general-purpose registers (for debugging);

IOCTL_PRU_READ_GP reads from general-purpose registers (for debugging);

IOCTL_PRU_GET_DR AM_PTR returns the indicator to the data RAM area of PRU core translated to the user application memory area.

TheAPI software driver routine detailed above is a link between the device and the user application and is used to simplify the process of development and accelerate the final product manufacturing. For debugging of applications developed for the PRU subsystem, several methods are used. The one we prefer is the display of control points via data RAM and general-purpose registers with the help of the following hardware operations:

  • interrupts of ARM/DSP- cores;
  • use of the fast input/output port (R30 register); and
  • the use of infinite cycles and storage in a register to indicate the current address of the executed command.

However, in general the choice of the debugging method depends upon the application type and convenience of the method (several methods may be combined).