59 research outputs found

    Logram: Efficient Log Parsing Using n-Gram Dictionaries

    Full text link
    Software systems usually record important runtime information in their logs. Logs help practitioners understand system runtime behaviors and diagnose field failures. As logs are usually very large in size, automated log analysis is needed to assist practitioners in their software operation and maintenance efforts. Typically, the first step of automated log analysis is log parsing, i.e., converting unstructured raw logs into structured data. However, log parsing is challenging, because logs are produced by static templates in the source code (i.e., logging statements) yet the templates are usually inaccessible when parsing logs. Prior work proposed automated log parsing approaches that have achieved high accuracy. However, as the volume of logs grows rapidly in the era of cloud computing, efficiency becomes a major concern in log parsing. In this work, we propose an automated log parsing approach, Logram, which leverages n-gram dictionaries to achieve efficient log parsing. We evaluated Logram on 16 public log datasets and compared Logram with five state-of-the-art log parsing approaches. We found that Logram achieves a similar parsing accuracy to the best existing approaches while outperforms these approaches in efficiency (i.e., 1.8 to 5.1 times faster than the second fastest approaches). Furthermore, we deployed Logram on Spark and we found that Logram scales out efficiently with the number of Spark nodes (e.g., with near-linear scalability) without sacrificing parsing accuracy. In addition, we demonstrated that Logram can support effective online parsing of logs, achieving similar parsing results and efficiency with the offline mode.Comment: 13 pages, IEEE journal forma

    Understanding, Analysis, and Handling of Software Architecture Erosion

    Get PDF
    Architecture erosion occurs when a software system's implemented architecture diverges from the intended architecture over time. Studies show erosion impacts development, maintenance, and evolution since it accumulates imperceptibly. Identifying early symptoms like architectural smells enables managing erosion through refactoring. However, research lacks comprehensive understanding of erosion, unclear which symptoms are most common, and lacks detection methods. This thesis establishes an erosion landscape, investigates symptoms, and proposes identification approaches. A mapping study covers erosion definitions, symptoms, causes, and consequences. Key findings: 1) "Architecture erosion" is the most used term, with four perspectives on definitions and respective symptom types. 2) Technical and non-technical reasons contribute to erosion, negatively impacting quality attributes. Practitioners can advocate addressing erosion to prevent failures. 3) Detection and correction approaches are categorized, with consistency and evolution-based approaches commonly mentioned.An empirical study explores practitioner perspectives through communities, surveys, and interviews. Findings reveal associated practices like code review and tools identify symptoms, while collected measures address erosion during implementation. Studying code review comments analyzes erosion in practice. One study reveals architectural violations, duplicate functionality, and cyclic dependencies are most frequent. Symptoms decreased over time, indicating increased stability. Most were addressed after review. A second study explores violation symptoms in four projects, identifying 10 categories. Refactoring and removing code address most violations, while some are disregarded.Machine learning classifiers using pre-trained word embeddings identify violation symptoms from code reviews. Key findings: 1) SVM with word2vec achieved highest performance. 2) fastText embeddings worked well. 3) 200-dimensional embeddings outperformed 100/300-dimensional. 4) Ensemble classifier improved performance. 5) Practitioners found results valuable, confirming potential.An automated recommendation system identifies qualified reviewers for violations using similarity detection on file paths and comments. Experiments show common methods perform well, outperforming a baseline approach. Sampling techniques impact recommendation performance

    Enhancing Automated GUI Exploration Techniques for Android Mobile Applications

    Get PDF
    Mobile software applications ("apps") are used by billions of smartphone owners worldwide. The demand for quality to these apps has grown together with their spread. Therefore, effective techniques and tools are being requested to support developers in mobile app quality engineering activities. Automation tools can facilitate these activities since they can save humans from routine, time consuming and error prone manual tasks. Automated GUI exploration techniques are widely adopted by researchers and practitioners in the context of mobile apps for supporting critical engineering tasks such as reverse engineering, testing, and network traffic signature generation. These techniques iteratively exercise a running app by exploiting the information that the app exposes at runtime through its GUI to derive the set of input events to be fired. Although several automated GUI exploration techniques have been proposed in the literature, they suffer from some limitations that may hinder them from a thorough app exploration. This dissertation proposes two novel solutions that contribute to the literature in Software Engineering towards improving existing automated GUI exploration techniques for mobile software applications. The former is a fully automated GUI exploration technique that aims to detect issues tied to the app instances lifecycle, a mobile-specific feature that allows users to smoothly navigate through an app and switch between apps. In particular, this technique addresses the issues of crashes and GUI failures, that consists in the manifestation of unexpected GUI states. This work includes two exploratory studies that prove that GUI failures are a widespread problem in the context of mobile apps. The latter solution is a hybrid exploration technique that combines automated GUI exploration with capture and replay through machine learning. It exploits app-specific knowledge that only human users can provide in order to explore relevant parts of the application that can be reached only by firing complex sequences of input events on specific GUIs and by choosing specific input values. Both the techniques have been implemented in tools that target the Android Operating System, that is today the world’s most popular mobile operating system. The effectiveness of the proposed techniques is demonstrated through experimental evaluations performed on real mobile apps

    Exploring means to facilitate software debugging

    Get PDF
    In this thesis, several aspects of software debugging from automated crash reproduction to bug report analysis and use of contracts have been studied.Algorithms and the Foundations of Software technolog

    On the Challenges of Implementing Machine Learning Systems in Industry

    Get PDF
    RÉSUMÉ : Dans l’optique de ce mémoire, nous nous concentrons sur les défis de l’implantation de sys-tèmes d’apprentissage automatique dans le contexte de l’industrie. Notre travail est réparti sur deux volets: dans un premier temps, nous explorons des considérations fondamentales sur le processus d’ingénierie de systèmes d’apprentissage automatique et dans un second temps, nous explorons l’aspect pratique de l’ingénierie de tels systèmes dans un cadre industriel. Pour le premier volet, nous explorons un des défis récemment mis en évidence par la com-munauté scientifique: la reproducibilité. Nous expliquons les défis qui s’y rattachent et, à la lueur de cette nouvelle compréhension, nous explorons un des effets rattachés, omniprésent dans l’ingénierie logicielle: la présence de défaut logiciels. À l’aide d’une méthodologie rigoureuse nous cherchons à savoir si la présence de défauts logiciels, parmis un échantillon de taille fixe, dans un cadriciel d’apprentissage automatique impacte le résultat d’un processus d’apprentissage.----------ABSTRACT : Software engineering projects face a number of challenges, ranging from managing their life-cycle to ensuring proper testing methodologies, dealing with defects, building, deploying, among others. As machine learning is becoming more prominent, introducing machine learn-ing in new environments requires skills and considerations from software engineering, machine learning and computer engineering, while also sharing their challenges from these disciplines. As democratization of machine learning has increased by the presence of open-source projects led by both academia and industry, industry practitioners and researchers share one thing in common: the tools they use. In machine learning, tools are represented by libraries and frameworks used as software for the various steps necessary in a machine learning project. In this work, we investigate the challenges in implementing machine learning systems in the industry

    The Essence of Software Engineering

    Get PDF
    Software Engineering; Software Development; Software Processes; Software Architectures; Software Managemen

    Scientific History of Incipit in the period 2010-2016

    Get PDF
    Historial de la actividad científica y técnica del Instituto de Ciencias del Patrimonio (Incipit) del CSIC, basado en Santiago de Compostela, desde su fecha de creación (2010) hasta el año 2016. Se presentan la misión y las líneas de investigación del Incipit, centradas principalmente en el estudio de los procesos de patrimonialización y de valorización social del patrimonio cultural realizadas con una perspectiva transdisciplinar. Se relacionan las publicaciones, proyectos de investigación, actividades de ciencia pública, eventos de comunicación y productos de divulgación que su personal investigador ha producido a lo largo de estos años.General introduction to the Incipit. Presentation of the Research Line: Cultural Heritage Studies: Sub-Theme: Landscape Archaeology and Cultural Landscapes, Sub-theme: Heritagization Processes: Memory, Power and Ethnicity, Sub-theme: Socioeconomics of Cultural Heritage, Sub-theme: Archaeology of the Contemporary Past, Sub-theme: Material culture and formalization processes of cultural heritage. Scientific Contributions. Transfer of Knowledge. International Activities. Other Activities and Results. Scientific DisseminationN

    Plataforma colaborativa, distribuida, escalable y de bajo costo basada en microservicios, contenedores, dispositivos móviles y servicios en la Nube para tareas de cómputo intensivo

    Get PDF
    A la hora de resolver tareas de cómputo intensivo de manera distribuida y paralela, habitualmente se utilizan recursos de hardware x86 (CPU/GPU) e infraestructura especializada (Grid, Cluster, Nube) para lograr un alto rendimiento. En sus inicios los procesadores, coprocesadores y chips x86 fueron desarrollados para resolver problemas complejos sin tener en cuenta su consumo energético. Dado su impacto directo en los costos y el medio ambiente, optimizar el uso, refrigeración y gasto energético, así como analizar arquitecturas alternativas, se convirtió en una preocupación principal de las organizaciones. Como resultado, las empresas e instituciones han propuesto diferentes arquitecturas para implementar las características de escalabilidad, flexibilidad y concurrencia. Con el objetivo de plantear una arquitectura alternativa a los esquemas tradicionales, en esta tesis se propone ejecutar las tareas de procesamiento reutilizando las capacidades ociosas de los dispositivos móviles. Estos equipos integran procesadores ARM los cuales, en contraposición a las arquitecturas tradicionales x86, fueron desarrollados con la eficiencia energética como pilar fundacional, ya que son mayormente alimentados por baterías. Estos dispositivos, en los últimos años, han incrementado su capacidad, eficiencia, estabilidad, potencia, así como también masividad y mercado; mientras conservan un precio, tamaño y consumo energético reducido. A su vez, cuentan con lapsos de ociosidad durante los períodos de carga, lo que representa un gran potencial que puede ser reutilizado. Para gestionar y explotar adecuadamente estos recursos, y convertirlos en un centro de datos de procesamiento intensivo; se diseñó, desarrolló y evaluó una plataforma distribuida, colaborativa, elástica y de bajo costo basada en una arquitectura compuesta por microservicios y contenedores orquestados con Kubernetes en ambientes de Nube y local, integrada con herramientas, metodologías y prácticas DevOps. El paradigma de microservicios permitió que las funciones desarrolladas sean fragmentadas en pequeños servicios, con responsabilidades acotadas. Las prácticas DevOps permitieron construir procesos automatizados para la ejecución de pruebas, trazabilidad, monitoreo e integración de modificaciones y desarrollo de nuevas versiones de los servicios. Finalmente, empaquetar las funciones con todas sus dependencias y librerías en contenedores ayudó a mantener servicios pequeños, inmutables, portables, seguros y estandarizados que permiten su ejecución independiente de la arquitectura subyacente. Incluir Kubernetes como Orquestador de contenedores, permitió que los servicios se puedan administrar, desplegar y escalar de manera integral y transparente, tanto a nivel local como en la Nube, garantizando un uso eficiente de la infraestructura, gastos y energía. Para validar el rendimiento, escalabilidad, consumo energético y flexibilidad del sistema, se ejecutaron diversos escenarios concurrentes de transcoding de video. De esta manera se pudo probar, por un lado, el comportamiento y rendimiento de diversos dispositivos móviles y x86 bajo diferentes condiciones de estrés. Por otro lado, se pudo mostrar cómo a través de una carga variable de tareas, la arquitectura se ajusta, flexibiliza y escala para dar respuesta a las necesidades de procesamiento. Los resultados experimentales, sobre la base de los diversos escenarios de rendimiento, carga y saturación planteados, muestran que se obtienen mejoras útiles sobre la línea de base de este estudio y que la arquitectura desarrollada es lo suficientemente robusta para considerarse una alternativa escalable, económica y elástica, respecto a los modelos tradicionales.Facultad de Informátic

    A reactive architecture for cloud-based system engineering

    Get PDF
    PhD ThesisSoftware system engineering is increasingly practised over globally distributed locations. Such a practise is termed as Global Software Development (GSD). GSD has become a business necessity mainly because of the scarcity of resources, cost, and the need to locate development closer to the customers. GSD is highly dependent on requirements management, but system requirements continuously change. Poorly managed change in requirements affects the overall cost, schedule and quality of GSD projects. It is particularly challenging to manage and trace such changes, and hence we require a rigorous requirement change management (RCM) process. RCM is not trivial in collocated software development; and with the presence of geographical, cultural, social and temporal factors, it makes RCM profoundly difficult for GSD. Existing RCM methods do not take into consideration these issues faced in GSD. Considering the state-of-the-art in RCM, design and analysis of architecture, and cloud accountability, this work contributes: 1. an alternative and novel mechanism for effective information and knowledge-sharing towards RCM and traceability. 2. a novel methodology for the design and analysis of small-to-medium size cloud-based systems, with a particular focus on the trade-off of quality attributes. 3. a dependable framework that facilitates the RCM and traceability method for cloud-based system engineering. 4. a novel methodology for assuring cloud accountability in terms of dependability. 5. a cloud-based framework to facilitate the cloud accountability methodology. The results show a traceable RCM linkage between system engineering processes and stakeholder requirements for cloud-based GSD projects, which is better than existing approaches. Also, the results show an improved dependability assurance of systems interfacing with the unpredictable cloud environment. We reach the conclusion that RCM with a clear focus on traceability, which is then facilitated by a dependable framework, improves the chance of developing a cloud-based GSD project successfully
    corecore