Program Identifiability: How easily can you spot your code?

If someone stole your Android application, decompiled it, modified it, and sold it as their own, would you know? Now that most software is bought, sold, and downloaded online, this is becoming a growing problem. Smartphone apps and Facebook apps in particular can be very profitable, giving lots of reason to steal them and release them under a different title.


Applications written in traditional programming languages like C or C++ are compiled into ones and zeroes, but more modern programming languages like Java are compiled into bytecode, an intermediate form that contains more information about the program itself and is easier to decompile back to source code or reverse-engineer to understand how it works.


Also, the open nature of today’s platforms means that reverse engineering of applications is relatively easy, and many developers are concerned as applications similar to their own show up in the Android Market or Apple App Store. These developers want to know if their applications have been pirated. Fortunately, the same characteristics that make a smartphone app easy to reverse engineer and copy also provide opportunities for developers to compare downloaded applications to their own.


At Zeidman Consulting we have developed a process for comparing a developer’s application with a downloaded application, and we have defined an identifiability metric to quantify the degree to which an application can be identified by its bytecode or object code. Note that identifiability can be a positive or negative characteristic. A program that is easily identifiable after compilation may be easier to detect when it’s been pirated, even if it’s later modified. A program that’s difficult to identify after compilation may hide more of its trade secrets from reverse engineering and may be more difficult to duplicate.


Comparing and measuring .
We decided to test our process on some popular Android games. Android applications consist of Java bytecode delivered in an Android Package file (APK), which is a compressed archive file. The bytecode for the application is found in the classes.dex file of the application APK. We decided to try two approaches for comparing the source code of the original app to the bytecode of the downloaded app: 1) compare the bytecode form of the downloaded app and 2) decompile the bytecode into source code and compare the decompiled source code form of the downloaded app (Table 1, below).


Click
on image to enlarge.


Table 1. Software source code elements.