11 research outputs found

    Predictive analysis of incidents based on software deployments

    Get PDF
    A high number of information technology organizations have several problems during and after deploying their services, this alongside with the high number of services that they provide daily, it makes Incident Management (IM) process quite demanding. An effective IM system needs to enable decision-makers to detect problems easily. Otherwise, the organizations can face unscheduled system downtime and/or unplanned costs. This study demonstrates that is possible to introduce a predictive process that may lead to an improvement of the response time to incidents and to the reduction of the number of incidents created by deployments. By predicting these problems, the decision-makers can better allocate resources and mitigate costs. Therefore, this research aims to investigate if machine learning algorithms can help to predict the number of incidents of a certain deployment. The results showed with some security, that it is possible to predict, if a certain deployment will have or not an incident in the future.Um número elevado de organizações de tecnologias de informação têm um grande número de problemas no momento e após lançarem os seus serviços, se juntarmos a isto o número elevado de serviços que estas organizações prestam diariamente, dificulta bastante o processo de Incident Management (IM). Um sistema de IM eficaz deve permitir aos decisores de negócio detetar facilmente estes problemas, caso contrário, as organizações podem ter de enfrentar imprevistos nos seus serviços (custos ou falhas). Esta tese irá demonstrar que é possível introduzir um processo de previsão que poderá levar a um melhoramento do tempo de resposta aos incidentes, assim como uma redução dos mesmo. Prevendo estes problemas estes podem alocar melhor os recursos assim como mitigar os incidentes. Como tal, esta tese irá analisar como prever esses incidentes, analisando os deployments feitos nos últimos anos e relacionando-os usando algoritmos de machine learning para prever os incidentes. Os resultados mostraram que é possível prever com confiança se um determinado deploymente vai ou não ter incidentes

    Predictive analysis of incidents based on software deployments

    Get PDF
    A high number of IT organizations have problems when deploying their services, this alongside with the high number of services that organizations have daily, makes Incident Management (IM) process quite demanding. An effective IM system need to enable decision makers to detect problems easily otherwise the organizations can face unscheduled system downtime and/or unplanned costs. By predicting these problems, the decision makers can better allocate resources and mitigate costs. Therefore, this research aims to help predicting those problems by looking at the history of past deployments and incident ticket creation and relate them by using machine learning algorithms to predict the number of incidents of a certain deployment. This research aims to analyze the results with the most used algorithms found in the literature.info:eu-repo/semantics/publishedVersio

    A top-down strategy to reverse architecting execution views for a large and complex software-intensive system:An experience report

    Get PDF
    This article is an experience report about the application of a top-down strategy to use and embed an architecture reconstruction approach in the incremental software development process of the Philips MRI scanner, a representative large and complex software-intensive system. The approach is an iterative process to construct execution views without being overwhelmed by the system size and complexity. An execution view contains architectural information that describes what the software of a software-intensive system does at runtime and how it does this. The application of the strategy is illustrated with a case study, the construction of an up-to-date execution view for the start-up process of the Philips MRI scanner. The construction of this view helped the development organization to quickly reduce about 30% the start-up time of the scanner, and set up a new system benchmark for assuring the system performance through future evolution steps. The report provides detailed information about the application of the top-down strategy, including how it supports top-down analysis, communication within the development organization, and the aspects that influence the use of the top-down strategy in other contexts. (C) 2010 Elsevier B.V. All rights reserved

    From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files

    Get PDF
    Abstract: Log files generated by computational systems contain relevant and essential information. In some application areas like the design of integrated circuits, log files generated by design tools contain information which can be used in management information systems to evaluate the final products. However, the complexity of such textual data raises some challenges concerning the extraction of information from log files. Log files are usually multi-source, multi-format, and have a heterogeneous and evolving structure. Moreover, they usually do not respect natural language grammar and structures even though they are written in English. Classical methods of information extraction such as terminology extraction methods are particularly irrelevant to this context. In this paper, we introduce our approach Exterlog to extract terminology from log files. We detail how it deals with the specific features of such textual data. The performance is emphasized by favoring the most relevant terms of the domain based on a scoring function which uses a Web and context based measure. The experiments show that Exterlog is a well-adapted approach for terminology extraction from log files

    Fault Prediction and Localization with Test Logs

    Get PDF
    Software testing is an integral part of modern software development. However, test runs produce 1000’s of lines of logged output that make it difficult to find the cause of a fault in the logs. This problem is exacerbated by environmental failures that distract from product faults. In this thesis, we present techniques that reduce the number of log lines that testers manually investigate while still finding a maximal number of faults. We observe that the location of a fault should be contained in the lines of a failing log. In contrast, a passing log should not contain the lines related to a failure. Lines that occur in both a passing and failing log introduce noise when attempting to find the fault in a failing log. We introduce a novel approach where we remove the lines that occur in the passing log from the failing log. After removing these lines, we use information retrieval techniques to flag the most probable lines for investigation. We modify TF-IDF to identify the most relevant log lines related to past product failures. We then vectorize the logs and develop an exclusive version of KNN to identify which logs are likely to lead to product faults and which lines are the most probable indication of the failure. Our best approach, FaultFlagger finds 89% of the total faults and flags only 0.5% of lines for inspection. FaultFlagger drastically outperforms the previous work CAM. We implemented FaultFlagger as a tool at Ericsson where it presents daily fault prediction summaries to testers

    Formato de representação de eventos de segurança de informação

    Get PDF
    Dissertação de mestrado em Engenharia e Gestão de Sistemas de InformaçãoNos últimos anos, o crescimento da utilização das Tecnologias e Sistemas de Informação nas organizações, aliado ao aumento da dependência da Internet, trouxe consigo um conjunto de ameaças que comprometem os seus Sistemas de Informação (SI). A propagação dessas ameaças tem aumentado, quer em número quer em sofisticação, a um ritmo alarmante, e desperta a atenção de todos aqueles que querem proteger e zelar os seus sistemas, sendo esta uma prioridade das organizações. De forma a colmatar este problema, existem várias abordagens para prevenir e detetar as ameaças aos SI, contudo as abordagens existentes por si só, não são suficientes para responder às necessidades reais do problema. Uma crescente solução é a Gestão dos Logs, especificamente a análise dos eventos enquanto técnica/ferramenta fundamental na deteção de falhas do sistema e da rede e também na deteção e prevenção de atividades que colocam em causa os SI. O contributo que os eventos podem dar é extremamente importante no auxílio e orientação no combate às ameaças aos seus SI. Contudo, a exploração dos logs não é uma tarefa fácil/trivial, devido à heterogeneidade e dispersão dos eventos e pela inconsistência do seu conteúdo e formato. Os logs devem ter um formato padronizado de modo a que se possa tirar partido das suas potencialidades de forma eficiente e inteligente. Nesta dissertação propõe-se um formato de representação de dados adequado a uma gestão de eventos de segurança de informação integrada e uma interface capaz de transformar os dados obtidos em informação útil, a partir de diversos sistemas de registos logs.In the last years, the growth and use of Technology and Information Systems by organizations, ally with the increasing dependence of the internet, provides a group of threats that compromise it Information Systems (IS). The spread of those threats it’s concerning and captures the attentions of those who pretend to protect and ensure there systems, becoming a priority to every organization. It’s one of the main priorities of every organization. In order to solve this problem, there are many approaches to prevent and detect the IS’s threats. Although there are many solutions they don’t are enough to solve the raised problem. A growing solution its Logs Management, specifically the events analysis as fundamental technique/tool to detect network and systems flaws also detect and prevent activities that compromise IS. The contribution of these events it extremely important to support and guide against the IS threats in an intelligence way. However, the logs exploration it’s not an easy task to do, result from the heterogeneity and dispersion of the events, and still the inconsistency of its contents and form. The logs must have a standard form in order to take advantage of its potentialities, in an efficient and intelligent way. In these master’s thesis its proposed a representation form of data suitable to a management of integrated information security events and an interface capable of transform the obtained data in useful information, through various log record systems

    Comparison of Different Clustering Algorithms for Diagnosing Memory-Related Performance Issues Using a Distributed Computing System

    Get PDF
    Αποτυχίες δημοφιλών συστημάτων μεγάλων τεχνολογικών κολοσσών καθιστούν την εφαρμογή των δοκιμών φορτίου (load tests) απαραίτητη για τον έλεγχο των συστημάτων λογισμικού. Παρόλα αυτά, η διάγνωση των προβλημάτων που σχετίζονται με την μνήμη αποτελεί μια σημαντική πρόκληση για τους προγραμματιστές. Για την αντιμετώπιση τους, εφαρμόζονται συχνά αυτοματοποιημένες τεχνικές ανάλυσης οι οποίες όμως απαιτούν σημαντική χειροκίνητη προσπάθεια και υψηλό βαθμό γνώσης του συστήματος. Μια λύση στο πρόβλημα αυτό, αποτελεί η χρήση τεχνικών της μηχανικής μάθησης (machine learning) με σκοπό την διάγνωση της υπάρχουσας ανώμαλης συμπεριφοράς του συστήματος. Οι Mark D. Syer et al. προτείνουν μια νέα αυτοματοποιημένη προσέγγιση συνδυάζοντας τους μετρητές απόδοσης (performance counters) και τα αρχεία εκτέλεσης (execution logs) εφαρμόζοντας την ιεραρχική ομαδοποίηση (hierarchical clustering) για την συσταδοποίηση των δεδομένων. Η ομαδοποίηση αυτή όμως, αποτυγχάνει σε περιπτώσεις μεγάλων δεδομένων (big data) καθώς παρουσιάζει μεγάλη πολυπλοκότητα. Εμείς, εφαρμόζουμε μια διαφορετική προσέγγιση του αλγορίθμου του Syer εκμεταλλευόμενοι το πλεονέκτημα του παραλληλισμού των ποικίλων διεργασιών που μας παρέχει το Spark framework. Βασιζόμενοι σε μια προηγούμενη εταιρική υλοποίηση του αλγόριθμου, στη φάση της συσταδοποίησης, εφαρμόζουμε τον k-means αλγόριθμο, αντί της ιεραρχικής, ώστε να αξιολογήσουμε τη συμπεριφορά των δυο αλγορίθμων για μεγάλα δεδομένα αλλά και αλγοριθμικά ως κομμάτι της προσέγγισης του Syer. Για την αξιολόγηση χρησιμοποιούμε συνθετικά δεδομένα από ένα πρόγραμμα υλοποίησης της Software Competitiveness International αλλά και πραγματικά δεδομένα από την εφαρμογή του Apache Tomcat έχοντας εισάγει ένα memory spike. Όσον αφορά τα αποτελέσματα, η προσέγγιση μας ανιχνεύει με ικανοποιητική ακρίβεια memory spikes ή συστάδες οι οποίες τα περιέχουν. Τέλος, σε περιπτώσεις μεγάλων σετ δεδομένων, τα αποτελέσματα που προκύπτουν, καθιστούν τον k-means αλγόριθμο καλύτερο ως προς τον χρόνο εκτέλεσης και την απόδοση σε σχέση με την ιεραρχική ομαδοποίηση.Failures in popular systems of technological giants illustrate load testing is a necessary procedure for the quality of software systems. However, the diagnosis of memory-related issues is a major challenge for developers. To address them, they often apply automated analysis techniques which require considerable manual effort and a high degree of system knowledge. One solution to this problem is the application of machine learning techniques to diagnose the existing abnormal system behavior. Mark D. Syer et al. propose a new automated approach combining performance counters and executing files by applying hierarchical clustering for clustering data. This grouping, however, fails in the case of large data sets as it generates greater complexity. We apply a different approach to the algorithm of Syer by using the Spark framework which offers parallelism of processes. Based on a previous corporate implementation of the algorithm, we apply the k-means algorithm in the clustering phase instead of the hierarchical clustering. This is done in order to evaluate the behavior of the two algorithms for large data sets and validate the k-means algorithm as part of the overall Syer approach. Our case studies use performance counters and execution logs from two systems. For the evaluation, we use synthetic data from one program created by Software Competitiveness International and actual data from the implementation of Apache Tomcat with an injection of a memory spike. Our approach identifies memory spikes corresponding to log lines with a high degree of precision. The approach detects a fairly accurate number of individual memory spikes or the clusters containing them. Finally, in the case of large data sets, the k-means algorithm performs better in terms of execution time and performance than hierarchical clustering
    corecore