Here are techniques for exploiting Android’s strengths and managing its limitations, especially in hard real-time, mission-critical systems.
The details surrounding the meteroic rise of Android in the smartphone market are well documented. However, another revolution is taking place in other applications where Android provides distinct advantages over a “standard” Linux distribution. Android provides a tightly coupled environment for application development where the frameworks and middleware components are selected by Google. Traditional Linux distributions are typically “mix and match” (for example, some people prefer X11/KDE rather than Qt/embedded for graphics development), which burdens the software designer with the need to invest time in understanding the very complex options and make difficult choices that typically have an impact through the product’s lifecycle.
For these and other reasons, many refer to Android as “Linux made easy.” Today, even Windows Compact Embedded (WinCE) developers who once shied away from Linux due to its complexity are taking a second look at well-integrated Android solutions. Add into the mix a platform licensing approach free from “copy-left” burdens, also known as a method for making a program free, along with a cost you cannot beat (free), and you have Android’s recipe for success.
Google’s vision of connecting Android devices to a cloud and to share movies, music, books, and more via a single-user account is sure to fuel adoption even further. Android implementations today can be found in a large number of applications from tablets, e-readers, Internet TVs, portable media players, netbooks, GPS devices, digital cameras, personal accessories, exercise equipment. and more. Android is free and anybody can download the sources and use it for whatever purpose they wish. For example, a digital still camera could be shipped with a GPS receiver, WiFi, and Android, along with the apps for Flickr, Picasa, and Shutterfly and other applications that allow photos to be uploaded directly to the cloud from the camera. In the past, this would have taken months of software development and testing for each model of camera and each photo-sharing website. With Android, the camera maker can rely on the cloud vendor, such as Flickr, to maintain and develop the app for Android. All the camera vendor has to do is port Android to the device.
Android is seeing adoption in areas where it’s not inherently strong because it adds so much value in other areas as previously discussed. With clever silicon system-on-chips (SoCs) and software architectures, these Android limitations can be mitigated. Here are some tools, tips and architectures that help do that.
OpenMAX Integration Layer
Silicon vendors can use several optimization techniques with OpenMax Inegration Layer (IL) to add value to their silicon hardware offerings. (OpenMax is a royalty-free application programming interface from Khronos Group, a nonprofit consortium.) Android’s multimedia frameworks–including Packet Video OpenCore and Google Stagefright–are built on OpenMAX IL-based codec components. OpenMAX IL defines the integration layer, or provides application developers a consistent abstract interface to codecs whether they are implemented in hardware or software. It also goes one step further with the ability to “tunnel” the communication between two components so the application using the component does not get involved in every data transfer.
Heterogeneous systems, or systems with more than one processing core (such as a digital signal processor [DSP], general purpose processors, hardware accelerators, field-programmable gate array), can be further refined by distributing the OpenMAX IL components on other processing cores, tunneling data transfers between them, and eliminating the costly involvement of the host processor in moving buffers between components. Of course, this implies that the heterogeneous system needs to support shared memory across the processing elements. The goal in this approach is two-fold:
• Minimize or eliminate memory copies of large video buffers.
• Offload some of this CPU-intensive work to dedicated hardware, while relieving the host CPU from this burden.
Tunneling has the potential to not just reduce the host CPU utilization but also the latency, which is of paramount importance in applications such as enterprise video conferencing. Figure 1 shows OpenMax used in tunnel mode.
Click on image to enlarge.
The Android frameworks today work with the “non-tunneled” method of communicating between components, but this does not preclude apps written using the Android native development kit from taking advantage of the OpenMAX IL-tunnele ***a***d approach for use cases that can really benefit from it. Silicon vendors can take advantage of their silicon features by implementing their own tunneling approach as a means to add differentiating performance value to their end-customers. To the end-Android Java programmer, this would ideally be hidden, though developers also get free access to the necessary source code and make these enhancements themselves in an effort to squeeze out performance.
Even in the “non-tunneled” OpenMAX IL approach, some techniques can be applied to reduce CPU consumption. For instance, Stagefright’s default display method converts the output of the codec from YCbCr, the more common pixel format used for video-compression algorithms, to RGB color space (shown in Figure 2). A more efficient method takes the output from the codec and displays it directly (with no memory copies or color space conversion) as YCbCr.
Click on image to enlarge.
Of course, the SoC must have native YCbCr display support in the form of an overlay, and the SurfaceFlinger (graphics composition engine in Android) has to be modified to take advantage of the YCbCr overlay. Once done, the overlay reduces memory bandwidth and CPU utilization.
Digital signal processing
Some SoCs have an embedded, powerful DSP, in addition to an ARM core or video accelerator, that can add some serious processing power to the host CPU, especially for such tasks as complex math or intensive signal-processing algorithms. The challenge here is how to expose that processing power to applications written in Java without knowing anything about DSP.
We classify these SoCs as heterogeneous, as the processing cores have different architectures and instruction sets. The DSP typically runs on an RTOS, so the Linux kernel is not controlling or even aware of the DSP. Some form of inter-processor communication (IPC) is used to communicate between the cores, typically providing a master-slave relationship between the general purpose processor (GPP) and the DSP. The GPP can load code and data for the DSP in memory, pull the DSP out of reset and put it back in reset. Also, some form of basic messaging service is available for low-level communication.
To abstract the locality of a DSP function, a framework for remote procedure calls (RPCs) can be used. The processing cores on the SoC may not have the same C type sizes or endianism, so the RPC has to prepare the arguments for the DSP functions for the other processing core using a process called marshalling (see Figure 3). On the remote core, the function parameters needs to be unmarshalled to native types before being passed to the actual DSP function on the DSP. The return value of the function is treated in a similar way, but this time coming back from the DSP.
Click on image to enlarge.
The RPC also needs to manage the cache for any buffers passed between the cores. If the cache is dirty for a buffer about to be sent to the DSP (or back from the DSP), it needs to be written back before being passed to the other core or the data will be invalid. Similarly when a buffer is received from another core, the cache for this buffer needs to be invalidated before the buffer is accessed.
Both the RPC and IPC are typically written in C or C++ code, since the operations are machine- and architecture-specific and complete control of memory and types are required. The DSP’s RPC functions can be wrapped in the Java Native Interface (JNI), allowing an Android Java application to call DSP functions remotely and transparently.
In addition, some DSPs on SoCs have a flat memory model, meaning the CPU goes straight to the memory bus, as opposed to through a memory management unit. Android is built on a Linux kernel, which fragments memory into 4,096-byte “pages” on an ARM processor; this prevents normal “malice” memory from being accessed by the DSP because these pages are scattered throughout the physical memory map.
In this case, design teams must use a custom memory allocator that physically allocates contiguous buffers from the Linux kernel, allowing the DSP to run on. Java doesn’t have this type of granular memory control, but it does have a type of “direct byte buffers” as part of its java.nio.ByteBuffer. These buffers are not managed by the jvm and its garbage collector. It’s possible to wrap buffers allocated using the custom, contiguous Linux memory allocator in such a direct byte buffer using the JNI, after which it can be used like any other java.nio.ByteBuffer in an Android Java applica ***a***tion while calling the DSP RPC functions.
Real time can be broken down into subcategories of hard, firm, and soft, depending on the application’s tolerance to missing a deadline. In hard real time, this tolerance is zero and missing a deadline is considered a system failure. This section will focus on hard real time as this is often encountered in mission-critical embedded processing.
Let’s consider the example of a car’s antilock braking system. Data must be guaranteed to be processed in a specific time period no matter the overall system processing load. It would not be an ideal situation if a driver was scrolling through MP3s on an Android-based-in-car computer system and looked up to see the car in front braking rapidly, causing the driver to slam on the brakes. If the car’s system load is high from all the graphics operations of scrolling through the MP3 list, the antilock brake system may not get serviced in time and fail to operate. This is an extreme example, but illustrates the point about how important real-time-processing is to embedded systems. As Android finds its way into increasingly more end equipment, real-time capabilities will further increase in importance.
Android certainly has some challenges meeting these real-time requirements. Because Android is based on the Linux kernel and just like Linux, Android can’t be considered a real-time operating system (RTOS). This is even more true when you add the extensive use of the Java virtual machine (VM) for the middleware and application development. Along with Java’s requirement for asynchronous garbage collection, this makes the challenge of being able to meet real-time processing scheduling even more difficult.
One way of overcoming these limitations and obtaining true real-time performance is to partition your software in such a way that the user interface (UI) or main app runs on the host CPU and the real-time functions run on a separate processing core that’s running an RTOS. Data could be captured in real-time on the separate processing core, processed, and sent back to the Android application for display on the UI or saved to a file or network resource.
A key area of concern when deploying Android in a safety-critical environment is boot times. Referring to the antilock braking system example, it wouldn’t be much use if antilock brakes only became available two minutes after a driver turned the key of a car, because it might take this long for Android to start up and to finish cataloging new MP3 files. One approach is to run all critical systems on a heterogeneous core that can be booted in seconds, before the main processor even begins to boot Android. Once these system-critical functions are operational, Android can boot up in its own time and begin communicating with the rest of the non–mission-critical systems to provide a UI for interacting with them (for example, engine monitoring system showing gas mileage and engine condition).
Power consumption is another big concern, especially on portable devices. And while battery life is increasingly important, so are any techniques that help extend battery life on portable devices. Even devices that are permanently connected to a power source experience phantom load–electric power consumed by electronic appliances while they’re switched off or in a standby mode. This is a hot topic, and ways to minimize power consumption are increasingly important.
In an Android-based system, the main processor, display, and graphics can be put into a very deep sleep, leaving critical systems running on a very low-power CPU, such as an ARM Cortex-M3. This deep sleep allows critical communications with head-end equipment or safety-critical utility to still run, but the device can appear off to the user and phantom load can be minimized.
Little green muscle man
Android is a very powerful operating environment in which to build feature rich-applications and also to leverage an ever-growing catalogue of applications from a diverse set of authors. However, for low-media latency, mission-critical, hard real time, heavy signal processing or algorithmic-type applications, native Android does not necessarily provide the best fit. By employing some of the techniques mentioned in this article, the full range of Android benefits can be coupled with advanced embedded features to offer the most optimized solutions for end equipments utilizing all of Android’s highly desirable characteristics.
Juan Gonzales is a product marketing manager for the DaVinci digital media processors at Texas Instruments (TI). Juan has a master’s degree in computer engineering from the University of Central Florida and is completing his MBA from the University of Texas.
Darren Etheridge has 15 years of experience in embedded systems. He is currently leading the team that provides accelerated multimedia frameworks on a variety of TI’s video-centric devices with a recent emphasis on Android. Darren graduated from the University of Plymouth, UK in 1996 with a B.Sc. in computing informatics.
Niclas Anderberg has 10 years of experience in embedded systems, mainly focusing on Linux and DSP software. Niclas is currently leading the effort of enabling the the TI’s DSPs in Android on TI’s SoC devices. Niclas graduated from the University of Lund, Sweden in 2001 with an M.Sc. in computer science and engineering.
This article provided courtesy of Embedded.com and Embedded Systems Design magazine.
See more articles like this one on Embedded.com.
This material was first printed in Embedded Systems Design magazine.
Sign up for subscriptions and newsletters.
Copyright © 2011
UBM–All rights reserved.