Retargeting Mobile Applications to Java Bytecode
Android applications are developed in Java but compiled to a platform-specific Dalvik bytecode. Dalvik bytecode runs in a Dalvik virtual machine, which was designed for resource-constrained platforms such as smartphones and tablets. Since existing analysis frameworks target Java source code and bytecode, it is necessary to convert Android applications to these well-known Java formats.
ded is a project which aims at decompiling Android applications. The ded tool retargets Android applications in .dex format to traditional .class files. These .class files can then be processed by existing Java tools, including decompilers. Thus, Android applications can be analyzed using a vast range of techniques developed for traditional Java applications.
ded was the first tool that was able to reliably convert Android applications to source code. It was used in a seminal large scale analysis of Android applications. We decompiled the 1,100 most popular applications using ded. The decompiled code was then analyzed using Fortify Source Code Analyzer (SCA). We implemented Android-specific detection rules in Fortify SCA. While this analysis did not reveal any malware, we found that phone identifiers and other personally identifiable information were widely used by Android applications.
On the other hand, the Dare tool adopts a principled approach to Dalvik retargeting. Its typed intermediate representation uses a strong type inference algorithm and allows translation to Java bytecode using only 9 rules for all 257 Dalvik opcodes. An important feature of Dare is its ability to rewrite unverifiable input bytecode so that the output Java bytecode is verifiable. In particular, the use of stronger methods makes it a better retargeting tool than ded, our first (ad hoc) retargeting tool. Dare is more reliable at retargeting Android bytecode and generates verifiable Java bytecode in a vast majority of cases. In order to enable the analysis of retargeted Android code by other researchers, we have made Dare available for download. Both binaries and source code are available from the Dare webpage.
Damien Octeau, Somesh Jha and Patrick McDaniel. Retargeting Android Applications to Java Bytecode. 20th International Symposium on the Foundations of Software Engineering (FSE). Cary, NC. November 2012. Best Artifact Award
William Enck, Damien Octeau Patrick McDaniel and Swarat Chaudhuri. A Study of Android Application Security. Proceedings of the 20th USENIX Security Symposium. San Francisco, CA, August 2011.
Damien Octeau, William Enck and Patrick McDaniel. The ded Decompiler. Technical Report NAS-TR-0140-2010, Network and Security Research Center, Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA.
Composite Constant Propagation and its Application to Program Analysis for Security
Many threats present in smartphones are the result of interactions between application components, not just artifacts of single components. For example, information may flow between components in an unsafe manner. A component in an application may retrieve a user's location data or contacts. It may subsequently send the sensitive private information to a component in another application. The receiving component may then leak the sensitive information to the network, to an untrusted third party.
We reduce the discovery of ICC to an instance of the Interprocedural Distributive Environment (IDE) data flow problem. This approach is very accurate, conservatively keeping track of multiple execution branches. It is path-sensitive, flow-sensitive, inter-procedural and context-sensitive. Our implementation of this approach is called Epicc (Effective and Precise ICC). It scales well, taking on average less than two minutes per application in a large scale study of 1,200 applications. Epicc uses Java classes as input, which can be generated from Android bytecode using our Dare retargeting tool.
While Epicc is a significant improvement over state-of-the-art approaches, it is still limited in coverage, due to the difficulty of individually specifying data domains and transfer functions. Thus, we generalize the problem of inferring values of objects with composite types as composite constant propagation problems. We introduce the COAL language to specify composite constant propagation problems and implement a solver that automatically generates data domains and transfer functions. Solutions are then found using existing algorithms, requiring minimal intervention from the analyst.
Using COAL, we build IC3, a tool for inferring ICC with significantly better precision than Epicc. Unlike Epicc, it models all ICC primitives. IC3 itself is used as the basis of inter-component information flow analysis in the related IccTA tool. COAL was also used with success to resolve reflection in Android applications.
Damien Octeau, Daniel Luchaup, Somesh Jha, and Patrick McDaniel. Composite Constant Propagation and its Application to Android Program Analysis. IEEE Transactions of Software Engineering (TSE), vol. 42, no. 11, pp. 999-1014, November 2016.
Li Li, Tegawende F. Bissyande, Damien Octeau, and Jacques Klein. DroidRA: Taming Reflection to Support Whole-Program Analysis of Android Apps. Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA). Saarbrucken, Germany, July 2016. Acceptance rate: 25.17%.
Damien Octeau, Daniel Luchaup, Matthew Dering, Somesh Jha, and Patrick McDaniel. Composite Constant Propagation: Application to Android Inter-Component Communication Analysis. Proceedings of the 37th International Conference on Software Engineering (ICSE), May 2015. Florence, Italy. Acceptance rate: 18.5%.
Li Li, Alexandre Bartel, Jacques Klein, Yves Le Traon, Steven Artz, Siegfried Rasthofer, Eric Bodden, Damien Octeau, and Patrick McDaniel. I Know What leaked in Your Pocket: Uncovering Privacy Leaks on Android Apps with Static Taint Analysis. Proceedings of the 37th International Conference on Software Engineering (ICSE), May 2015. Florence, Italy. Acceptance rate: 18.5%.
Damien Octeau, Patrick McDaniel, Somesh Jha, Alexandre Bartel, Eric Bodden, Jacques Klein, and Yves Le Traon. Effective Inter-Component Communication Mapping in Android with Epicc: An Essential Step Towards Holistic Security Analysis. Proceedings of the 22nd USENIX Security Symposium, August 2013. Washington, DC. Acceptance rate: 16.2%.
Combining Static Analysis Results with Probabilistic Models
Despite the many techniques devised to increase the precision of static analysis results, the results precision is often not high enough for large scale analysis. This is because the static inference of many properties is undecidable, and others are too computationally expensive. This is especially problematic with the rise of centralized application markets, where market providers may want to verify properties (e.g., security) in their entire corpus. In this case imprecise results are not acceptable.
We explore the use of probabilistic models in order to help sift through large numbers of results and prioritize them by decreasing order of likelihood. We apply this to the computation of links between over 10,000 Android applications with our PRIMO tool. We find that probabilistic models are an effective and accurate way to predict which links computed with static analysis are most likely to be false positives.
Damien Octeau, Somesh Jha, Matthew Dering, Patrick McDaniel, Alexandre Bartel, Li Li, Jacques Klein, and Yves Le Traon. Combining Static Analysis with Probabilistic Models to Enable Market-Scale Android Inter-Component Analysis. Proceedings of the 43rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), January 2016. St. Petersburg, Florida, USA. Acceptance rate: 23.3%.