352 research outputs found

    Do Memories Haunt You? An Automated Black Box Testing Approach for Detecting Memory Leaks in Android Apps

    Get PDF
    Memory leaks represent a remarkable problem for mobile app developers since a waste of memory due to bad programming practices may reduce the available memory of the device, slow down the apps, reduce their responsiveness and, in the worst cases, they may cause the crash of the app. A common cause of memory leaks in the specific context of Android apps is the bad handling of the events tied to the Activity Lifecycle. In order to detect and characterize these memory leaks, we present FunesDroid, a tool-supported black box technique for the automatic detection of memory leaks tied to the Activity Lifecycle in Android apps. FunesDroid implements a testing approach that can find memory leaks by analyzing unnecessary heap object replications after the execution of three different sequences of Activity Lifecycle events. In the paper, we present an exploratory study that shows the capability of the proposed technique to detect memory leaks and to characterize them in terms of their size, persistence and growth trend. The study also illustrates how memory leak causes can be detected with the support of the information provided by the FunesDroid tool

    Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development

    Full text link
    Mobile devices and platforms have become an established target for modern software developers due to performant hardware and a large and growing user base numbering in the billions. Despite their popularity, the software development process for mobile apps comes with a set of unique, domain-specific challenges rooted in program comprehension. Many of these challenges stem from developer difficulties in reasoning about different representations of a program, a phenomenon we define as a "language dichotomy". In this paper, we reflect upon the various language dichotomies that contribute to open problems in program comprehension and development for mobile apps. Furthermore, to help guide the research community towards effective solutions for these problems, we provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference on Program Comprehension (ICPC'18

    Characterizing the evolution of statically-detectable performance issues of Android apps

    Get PDF
    Mobile apps are playing a major role in our everyday life, and they are tending to become more and more complex and resource demanding. Because of that, performance issues may occur, disrupting the user experience or, even worse, preventing an effective use of the app. Ultimately, such problems can cause bad reviews and influence the app success. Developers deal with performance issues thorough dynamic analysis, i.e., performance testing and profiler tools, albeit static analysis tools can be a valid, relatively inexpensive complement for the early detection of some such issues. This paper empirically investigates how potential performance issues identified by a popular static analysis tool — Android Lint — are actually resolved in 316 open source Android apps among 724 apps we analyzed. More specifically, the study traces the issues detected by Android Lint since their introduction until they resolved, with the aim of studying (i) the overall evolution of performance issues in apps, (ii) the proportion of issues being resolved, as well as (iii) the distribution of their survival time, and (iv) the extent to which issue resolution are documented by developers in commit messages. Results indicate how some issues, especially related to the lack of resource recycle, tend to be more frequent than others. Also, while some issues, primarily of algorithmic nature, tend to be resolved quickly through well-known patterns, others tend to stay in the app longer, or not to be resolved at all. Finally, we found how only 10% of the issue resolution is documented in commit messages

    Prioritized Garbage Collection: Explicit GC Support for Software Caches

    Full text link
    Programmers routinely trade space for time to increase performance, often in the form of caching or memoization. In managed languages like Java or JavaScript, however, this space-time tradeoff is complex. Using more space translates into higher garbage collection costs, especially at the limit of available memory. Existing runtime systems provide limited support for space-sensitive algorithms, forcing programmers into difficult and often brittle choices about provisioning. This paper presents prioritized garbage collection, a cooperative programming language and runtime solution to this problem. Prioritized GC provides an interface similar to soft references, called priority references, which identify objects that the collector can reclaim eagerly if necessary. The key difference is an API for defining the policy that governs when priority references are cleared and in what order. Application code specifies a priority value for each reference and a target memory bound. The collector reclaims references, lowest priority first, until the total memory footprint of the cache fits within the bound. We use this API to implement a space-aware least-recently-used (LRU) cache, called a Sache, that is a drop-in replacement for existing caches, such as Google's Guava library. The garbage collector automatically grows and shrinks the Sache in response to available memory and workload with minimal provisioning information from the programmer. Using a Sache, it is almost impossible for an application to experience a memory leak, memory pressure, or an out-of-memory crash caused by software caching.Comment: to appear in OOPSLA 201

    Man-machine partial program analysis for malware detection

    Get PDF
    With the meteoric rise in popularity of the Android platform, there is an urgent need to combat the accompanying proliferation of malware. Existing work addresses the area of consumer malware detection, but cannot detect novel, sophisticated, domain-specific malware that is targeted specifically at one aspect of an organization (eg. ground operations of the US Military). Adversaries can exploit domain knowledge to camoflauge malice within the legitimate behaviors of an app and behind a domain-specific trigger, rendering traditional approaches such as signature-matching, machine learning, and dynamic monitoring ineffective. Manual code inspections are also inadequate, scaling poorly and introducing human error. Yet, there is a dire need to detect this kind of malware before it causes catastrophic loss of life and property. This dissertation presents the Security Toolbox, our novel solution for this challenging new problem posed by DARPA\u27s Automated Program Analysis for Cybersecurity (APAC) program. We employ a human-in-the-loop approach to amplify the natural intelligence of our analysts. Our automation detects interesting program behaviors and exposes them in an analysis Dashboard, allowing the analyst to brainstorm flaw hypotheses and ask new questions, which in turn can be answered by our automated analysis primitives. The Security Toolbox is built on top of Atlas, a novel program analysis platform made by EnSoft. Atlas uses a graph-based mathematical abstraction of software to produce a unified property multigraph, exposes a powerful API for writing analyzers using graph traversals, and provides both automated and interactive capabilities to facilitate program comprehension. The Security Toolbox is also powered by FlowMiner, a novel solution to mine fine-grained, compact data flow summaries of Java libraries. FlowMiner allows the Security Toolbox to complete a scalable and accurate partial program analysis of an application without including all of the libraries that it uses (eg. Android). This dissertation presents the Security Toolbox, Atlas, and FlowMiner. We provide empirical evidence of the effectiveness of the Security Toolbox for detecting novel, sophisticated, domain-specific Android malware, demonstrating that our approach outperforms other cutting-edge research tools and state-of-the-art commercial programs in both time and accuracy metrics. We also evaluate the effectiveness of Atlas as a program analysis platform and FlowMiner as a library summary tool

    Techniques for advanced android malware triage

    Get PDF
    Mención Internacional en el título de doctorAndroid is the leading operating system in smartphones with a big difference. Statistics show that 88% of all smartphones sold to end users in the second quarter of 2018 were phones with the Android OS. Regardless of the operating systems which are running on smartphones, most of the functionalities of these devices are offered through applications. There are currently over 2 million apps only on the official Google store, known as Google Play. This huge market with billions of users is tempting for attackers to develop and distribute their malicious apps (or malware). Mobile malware has raised explosively since 2009. Symantec reported an increase of 54% in the new mobile malware variants in 2017 as compared to the previous year. Additionally, more incentive has been provided for profit-driven malware by the growth of black markets. This rise has happened for Android malware as well since only 20% of devices are running the newest major version of Android OS based on Symantec report in 2018. Android continued to be the most targeted platform with the biggest number of attacks in 2015. After that year, attacks against the Android platform slowed for the first time as attackers were faced with improved security architectures though Android is still the main appealing target OS for attackers. Moreover, advanced types of Android malware are found which make use of extensive anit-analysis techniques to evade static or dynamic analysis. To address the security and privacy concerns of complex Android malware, this dissertation focuses on three main objectives. First of all, we propose a light-weight yet efficient method to identify risky Android applications. Next, we present a precise approach to characterize Android malware based on their malicious behavior. Finally, we propose an adaptive learning system to address the security concerns of obfuscation in Android malware. Identifying potentially dangerous and risky applications is an important step in Android malware analysis. To this end, we develop a triage system to rank applications based on their potential risk. Our approach, called TriFlow, relies on static features which are quick to obtain. TriFlow combines a probabilistic model to predict the existence of information flows with a metric of how significant a flow is in benign and malicious apps. Based on this, TriFlow provides a score for each application that can be used to prioritize analysis. It also provides the analysts with an explanatory report of the associated risk. Our tool can also be used as a complement with computationally expensive static and dynamic analysis tools. Another important step towards Android malware analysis lies in their accurate characterization. Labeling Android malware is challenging yet crucially important, as it helps to identify upcoming malware samples and threats. A key challenge is that different researchers and anti-virus vendors assign labels using their own criteria, and it is not known to what extent these labels are aligned with the apps’ real behavior. Based on this, we propose a new behavioral characterization method for Android apps based on their extracted information flows. As information flows can be used to track why and how apps use specific pieces of information, a flowbased characterization provides a relatively easy-to-interpret summary of the malware sample’s behavior. Not all Android malware are easy to analyze due to advanced and easyto-apply anti-analysis techniques that are available nowadays. Obfuscation is the most common anti-analysis technique that Android malware use to evade detection. Obfuscation techniques modify an app’s source (or machine) code in order to make it more difficult to analyze. This is typically applied to protect intellectual property in benign apps, or to hinder the process of extracting actionable information in the case of malware. Since malware analysis often requires considerable resource investment, detecting the particular obfuscation technique used may contribute to apply the right analysis tools, thus leading to some savings. Therefore, we propose AndrODet, a mechanism to detect three popular types of obfuscation in Android applications, namely identifier renaming, string encryption, and control flow obfuscation. AndrODet leverages online learning techniques, thus being suitable for resource-limited environments that need to operate in a continuous manner. We compare our results with a batch learning algorithm using a dataset of 34,962 apps from both malware and benign apps. Experimental results show that online learning approaches are not only able to compete with batch learning methods in terms of accuracy, but they also save significant amount of time and computational resources. Finally, we present a number of open research directions based on the outcome of this thesis.Android es el sistema operativo líder en teléfonos inteligentes (también denominados con la palabra inglesa smartphones), con una gran diferencia con respecto al resto de competidores. Las estadísticas muestran que el 88% de todos los smartphones vendidos a usuarios finales en el segundo trimestre de 2018 fueron teléfonos con sistema operativo Android. Independientemente de su sistema operativo, la mayoría de las funcionalidades de estos dispositivos se ofrecen a través de aplicaciones. Actualmente hay más de 2 millones de aplicaciones solo en la tienda oficial de Google, conocida como Google Play. Este enorme mercado con miles de millones de usuarios es tentador para los atacantes, que buscan distribuir sus aplicaciones malintencionadas (o malware). El malware para dispositivos móviles ha aumentado de forma exponencial desde 2009. Symantec ha detectado un aumento del 54% en las nuevas variantes de malware para dispositivos móviles en 2017 en comparación con el año anterior. Además, el crecimiento del mercado negro (es decir, plataformas no oficiales de descargas de aplicaciones) supone un incentivo para los programas maliciosos con fines lucrativos. Este aumento también ha ocurrido en el malware de Android, aprovechando la circunstancia de que solo el 20% de los dispositivos ejecutan la versión mas reciente del sistema operativo Android, de acuerdo con el informe de Symantec en 2018. De hecho, Android ha sido la plataforma que ha centrado los esfuerzos de los atacantes desde 2015, aunque los ataques decayeron ligeramente tras ese año debido a las mejoras de seguridad incorporadas en el sistema operativo. En todo caso, existen formas avanzadas de malware para Android que hacen uso de técnicas sofisticadas para evadir el análisis estático o dinámico. Para abordar los problemas de seguridad y privacidad que causa el malware en Android, esta Tesis se centra en tres objetivos principales. En primer lugar, se propone un método ligero y eficiente para identificar aplicaciones de Android que pueden suponer un riesgo. Por otra parte, se presenta un mecanismo para la caracterización del malware atendiendo a su comportamiento. Finalmente, se propone un mecanismo basado en aprendizaje adaptativo para la detección de algunos tipos de ofuscación que son empleados habitualmente en las aplicaciones maliciosas. Identificar aplicaciones potencialmente peligrosas y riesgosas es un paso importante en el análisis de malware de Android. Con este fin, en esta Tesis se desarrolla un mecanismo de clasificación (llamado TriFlow) que ordena las aplicaciones según su riesgo potencial. La aproximación se basa en características estáticas que se obtienen rápidamente, siendo de especial interés los flujos de información. Un flujo de información existe cuando un cierto dato es recibido o producido mediante una cierta función o llamada al sistema, y atraviesa la lógica de la aplicación hasta que llega a otra función. Así, TriFlow combina un modelo probabilístico para predecir la existencia de un flujo con una métrica de lo habitual que es encontrarlo en aplicaciones benignas y maliciosas. Con ello, TriFlow proporciona una puntuación para cada aplicación que puede utilizarse para priorizar su análisis. Al mismo tiempo, proporciona a los analistas un informe explicativo de las causas que motivan dicha valoración. Así, esta herramienta se puede utilizar como complemento a otras técnicas de análisis estático y dinámico que son mucho más costosas desde el punto de vista computacional. Otro paso importante hacia el análisis de malware de Android radica en caracterizar su comportamiento. Etiquetar el malware de Android es un desafío de crucial importancia, ya que ayuda a identificar las próximas muestras y amenazas de malware. Una cuestión relevante es que los diferentes investigadores y proveedores de antivirus asignan etiquetas utilizando sus propios criterios, de modo no se sabe en qué medida estas etiquetas están en línea con el comportamiento real de las aplicaciones. Sobre esta base, en esta Tesis se propone un nuevo método de caracterización de comportamiento para las aplicaciones de Android en función de sus flujos de información. Como dichos flujos se pueden usar para estudiar el uso de cada dato por parte de una aplicación, permiten proporcionar un resumen relativamente sencillo del comportamiento de una determinada muestra de malware. A pesar de la utilidad de las técnicas de análisis descritas, no todos los programas maliciosos de Android son fáciles de analizar debido al uso de técnicas anti-análisis que están disponibles en la actualidad. Entre ellas, la ofuscación es la técnica más común que se utiliza en el malware de Android para evadir la detección. Dicha técnica modifica el código de una aplicación para que sea más difícil de entender y analizar. Esto se suele aplicar para proteger la propiedad intelectual en aplicaciones benignas o para dificultar la obtención de pistas sobre su funcionamiento en el caso del malware. Dado que el análisis de malware a menudo requiere una inversión considerable de recursos, detectar la técnica de ofuscación que se ha utilizado en un caso particular puede contribuir a utilizar herramientas de análisis adecuadas, contribuyendo así a un cierto ahorro de recursos. Así, en esta Tesis se propone AndrODet, un mecanismo para detectar tres tipos populares de ofuscación, a saber, el renombrado de identificadores, cifrado de cadenas de texto y la modificación del flujo de control de la aplicación. AndrODet se basa en técnicas de aprendizaje automático en línea (online machine learning), por lo que es adecuado para entornos con recursos limitados que necesitan operar de forma continua, sin interrupción. Para medir su eficacia respecto de las técnicas de aprendizaje automático tradicionales, se comparan los resultados con un algoritmo de aprendizaje por lotes (batch learning) utilizando un dataset de 34.962 aplicaciones de malware y benignas. Los resultados experimentales muestran que el enfoque de aprendizaje en línea no solo es capaz de competir con el basado en lotes en términos de precisión, sino que también ahorra una gran cantidad de tiempo y recursos computacionales. Tras la exposición de las contribuciones anteriormente mencionadas, esta Tesis concluye con la identificación de una serie de líneas abiertas de investigación con el fin de alentar el desarrollo de trabajos futuros en esta dirección.Omid Mirzaei is a Ph.D. candidate in the Computer Security Lab (COSEC) at the Department of Computer Science and Engineering of Universidad Carlos III de Madrid (UC3M). His Ph.D. is funded by the Community of Madrid and the European Union through the research project CIBERDINE (Ref. S2013/ICE-3095).Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Gregorio Martínez Pérez.- Secretario: Pedro Peris López.- Vocal: Pablo Picazo Sánche

    Energyware engineering: techniques and tools for green software development

    Get PDF
    Tese de Doutoramento em Informática (MAP-i)Energy consumption is nowadays one of the most important concerns worldwide. While hardware is generally seen as the main culprit for a computer’s energy usage, software too has a tremendous impact on the energy spent, as it can cancel the efficiency introduced by the hardware. Green Computing is not a newfield of study, but the focus has been, until recently, on hardware. While there has been advancements in Green Software techniques, there is still not enough support for software developers so they can make their code more energy-aware, with various studies arguing there is both a lack of knowledge and lack of tools for energy-aware development. This thesis intends to tackle these two problems and aims at further pushing forward research on Green Software. This software energy consumption issue is faced as a software engineering question. By using systematic, disciplined, and quantifiable approaches to the development, operation, and maintenance of software we defined several techniques, methodologies, and tools within this document. These focus on providing software developers more knowledge and tools to help with energy-aware software development, or Energyware Engineering. Insights are provided on the energy influence of several stages performed during a software’s development process. We look at the energy efficiency of various popular programming languages, understanding which are the most appropriate if a developer’s concern is energy consumption. A detailed study on the energy profiles of different Java data structures is also presented, alongwith a technique and tool, further providing more knowledge on what energy efficient alternatives a developer has to choose from. To help developers with the lack of tools, we defined and implemented a technique to detect energy inefficient fragments within the source code of a software system. This technique and tool has been shown to help developers improve the energy efficiency of their programs, and even outperforming a runtime profiler. Finally, answers are provided to common questions and misconceptions within this field of research, such as the relationship between time and energy, and howone can improve their software’s energy consumption. This thesis provides a great effort to help support both research and education on this topic, helps continue to grow green software out of its infancy, and contributes to solving the lack of knowledge and tools which exist for Energyware Engineering.Hoje em dia o consumo energético é uma das maiores preocupações a nível global. Apesar do hardware ser, de umaforma geral, o principal culpado para o consumo de energia num computador, o software tem também um impacto significativo na energia consumida, pois pode anular, em parte, a eficiência introduzida pelo hardware. Embora Green Computing não seja uma área de investigação nova, o foco tem sido, até recentemente, na componente de hardware. Embora as técnicas de Green Software tenham vindo a evoluir, não há ainda suporte suficiente para que os programadores possam produzir código com consciencialização energética. De facto existemvários estudos que defendem que existe tanto uma falta de conhecimento como uma escassez de ferramentas para o desenvolvimento energeticamente consciente. Esta tese pretende abordar estes dois problemas e tem como foco promover avanços em green software. O tópico do consumo de energia é abordado duma perspectiva de engenharia de software. Através do uso de abordagens sistemáticas, disciplinadas e quantificáveis no processo de desenvolvimento, operação e manutencão de software, foi possível a definição de novas metodologias e ferramentas, apresentadas neste documento. Estas ferramentas e metodologias têm como foco dotar de conhecimento e ferramentas os programadores de software, de modo a suportar um desenvolvimento energeticamente consciente, ou Energyware Engineering. Deste trabalho resulta a compreensão sobre a influência energética a ser usada durante as diferentes fases do processo de desenvolvimento de software. Observamos as linguagens de programação mais populares sobre um ponto de vista de eficiência energética, percebendo quais as mais apropriadas caso o programador tenha uma preocupação com o consumo energético. Apresentamos também um estudo detalhado sobre perfis energéticos de diferentes estruturas de dados em Java, acompanhado por técnicas e ferramentas, fornecendo conhecimento relativo a quais as alternativas energeticamente eficientes que os programadores dispõem. Por forma a ajudar os programadores, definimos e implementamos uma técnica para detetar fragmentos energicamente ineficientes dentro do código fonte de um sistema de software. Esta técnica e ferramenta têm demonstrado ajudar programadores a melhorarem a eficiência energética dos seus programas e em algum casos superando um runtime profiler. Por fim, são dadas respostas a questões e conceções erradamente formuladas dentro desta área de investigação, tais como o relacionamento entre tempo e energia e como é possível melhorar o consumo de energia do software. Foi empregue nesta tese um esforço árduo de suporte tanto na investigação como na educação relativo a este tópico, ajudando à maturação e crescimento de green computing, contribuindo para a resolução da lacuna de conhecimento e ferramentas para suporte a Energyware Engineering.This work is partially funded by FCT – Foundation for Science and Technology, the Portuguese Ministry of Science, Technology and Higher Education, through national funds, and co-financed by the European Social Fund (ESF) through the Operacional Programme for Human Capital (POCH), with scholarship reference SFRH/BD/112733/2015. Additionally, funding was also provided the ERDF – European Regional Development Fund – through the Operational Programmes for Competitiveness and Internationalisation COMPETE and COMPETE 2020, and by the Portuguese Government through FCT project Green Software Lab (ref. POCI-01-0145-FEDER-016718), by the project GreenSSCM - Green Software for Space Missions Control, a project financed by the Innovation Agency, SA, Northern Regional Operational Programme, Financial Incentive Grant Agreement under the Incentive Research and Development System, Project No. 38973, and by the Luso-American Foundation in collaboration with the National Science Foundation with grant FLAD/NSF ref. 300/2015 and ref. 275/2016
    corecore