Search CORE

26 research outputs found

Logging Statements Analysis and Automation in Software Systems with Data Mining and Machine Learning Techniques

Author: Gholamian Sina
Publication venue: 'University of Waterloo'
Publication date: 13/01/2022
Field of study

Log files are widely used to record runtime information of software systems, such as the timestamp of an event, the name or ID of the component that generated the log, and parts of the state of a task execution. The rich information of logs enables system developers (and operators) to monitor the runtime behavior of their systems and further track down system problems in development and production settings. With the ever-increasing scale and complexity of modern computing systems, the volume of logs is rapidly growing. For example, eBay reported that the rate of log generation on their servers is in the order of several petabytes per day in 2018 [17]. Therefore, the traditional way of log analysis that largely relies on manual inspection (e.g., searching for error/warning keywords or grep) has become an inefficient, a labor intensive, error-prone, and outdated task. The growth of the logs has initiated the emergence of automated tools and approaches for log mining and analysis. In parallel, the embedding of logging statements in the source code is a manual and error-prone task, and developers often might forget to add a logging statement in the software's source code. To address the logging challenge, many e orts have aimed to automate logging statements in the source code, and in addition, many tools have been proposed to perform large-scale log le analysis by use of machine learning and data mining techniques. However, the current logging process is yet mostly manual, and thus, proper placement and content of logging statements remain as challenges. To overcome these challenges, methods that aim to automate log placement and content prediction, i.e., `where and what to log', are of high interest. In addition, approaches that can automatically mine and extract insight from large-scale logs are also well sought after. Thus, in this research, we focus on predicting the log statements, and for this purpose, we perform an experimental study on open-source Java projects. We introduce a log-aware code-clone detection method to predict the location and description of logging statements. Additionally, we incorporate natural language processing (NLP) and deep learning methods to further enhance the performance of the log statements' description prediction. We also introduce deep learning based approaches for automated analysis of software logs. In particular, we analyze execution logs and extract natural language characteristics of logs to enable the application of natural language models for automated log le analysis. Then, we propose automated tools for analyzing log files and measuring the information gain from logs for different log analysis tasks such as anomaly detection. We then continue our NLP-enabled approach by leveraging the state-of-the-art language models, i.e., Transformers, to perform automated log parsing

University of Waterloo's Institutional Repository

Finding the online cry for help : automatic text classification for suicide prevention

Author: Desmet Bart
Publication venue: Ghent University. Faculty of Arts and Philosophy
Publication date: 01/01/2014
Field of study

Successful prevention of suicide, a serious public health concern worldwide, hinges on the adequate detection of suicide risk. While online platforms are increasingly used for expressing suicidal thoughts, manually monitoring for such signals of distress is practically infeasible, given the information overload suicide prevention workers are confronted with. In this thesis, the automatic detection of suicide-related messages is studied. It presents the first classification-based approach to online suicidality detection, and focuses on Dutch user-generated content. In order to evaluate the viability of such a machine learning approach, we developed a gold standard corpus, consisting of message board and blog posts. These were manually labeled according to a newly developed annotation scheme, grounded in suicide prevention practice. The scheme provides for the annotation of a post's relevance to suicide, and the subject and severity of a suicide threat, if any. This allowed us to derive two tasks: the detection of suicide-related posts, and of severe, high-risk content. In a series of experiments, we sought to determine how well these tasks can be carried out automatically, and which information sources and techniques contribute to classification performance. The experimental results show that both types of messages can be detected with high precision. Therefore, the amount of noise generated by the system is minimal, even on very large datasets, making it usable in a real-world prevention setting. Recall is high for the relevance task, but at around 60%, it is considerably lower for severity. This is mainly attributable to implicit references to suicide, which often go undetected. We found a variety of information sources to be informative for both tasks, including token and character ngram bags-of-words, features based on LSA topic models, polarity lexicons and named entity recognition, and suicide-related terms extracted from a background corpus. To improve classification performance, the models were optimized using feature selection, hyperparameter, or a combination of both. A distributed genetic algorithm approach proved successful in finding good solutions for this complex search problem, and resulted in more robust models. Experiments with cascaded classification of the severity task did not reveal performance benefits over direct classification (in terms of F1-score), but its structure allows the use of slower, memory-based learning algorithms that considerably improved recall. At the end of this thesis, we address a problem typical of user-generated content: noise in the form of misspellings, phonetic transcriptions and other deviations from the linguistic norm. We developed an automatic text normalization system, using a cascaded statistical machine translation approach, and applied it to normalize the data for the suicidality detection tasks. Subsequent experiments revealed that, compared to the original data, normalized data resulted in fewer and more informative features, and improved classification performance. This extrinsic evaluation demonstrates the utility of automatic normalization for suicidality detection, and more generally, text classification on user-generated content

Ghent University Academic Bibliography

NASA space station automation: AI-based technology review

Author: Firschein O.
Georgeff M. P.
Kautz W. H.
Levitt K. N.
Neumann P.
Park W.
Poggio A. A.
Rom R. J.
Publication venue
Publication date
Field of study

Research and Development projects in automation for the Space Station are discussed. Artificial Intelligence (AI) based automation technologies are planned to enhance crew safety through reduced need for EVA, increase crew productivity through the reduction of routine operations, increase space station autonomy, and augment space station capability through the use of teleoperation and robotics. AI technology will also be developed for the servicing of satellites at the Space Station, system monitoring and diagnosis, space manufacturing, and the assembly of large space structures

NASA Technical Reports Server

Management: A bibliography for NASA managers (supplement 21)

Author
Publication venue
Publication date
Field of study

This bibliography lists 664 reports, articles and other documents introduced into the NASA scientific and technical information system in 1986. Items are selected and grouped according to their usefulness to the manager as manager. Citations are grouped into ten subject categories: human factors and personnel issues; management theory and techniques; industrial management and manufacturing; robotics and expert systems; computers and information management; research and development; economics, costs, and markets; logistics and operations management; reliability and quality control; and legality, legislation, and policy

NASA Technical Reports Server

Search and multi-frequency follow-up studies of radio transients: novel approaches and large campaigns

Author: TRUDU MATTEO
Publication venue: Università degli Studi di Cagliari
Publication date: 14/04/2023
Field of study

One of the most fascinating phenomena in modern radio astronomy is related to the Fast Radio Bursts (FRBs). FRBs are Jy-intense, ms-duration radio transients of extra-galactic origin, whose nature is not assessed yet. To date, FRBs have been observed only in the radio band. Despite their origin has not been disambiguated yet, observational facts point towards highly magnetised neutron stars, such as magnetars, as the putative sources behind at least some of them. In particular, on April 2020 the Galactic magnetar SGR J1935+2154 emitted a radio flash closely resembling the ones produced by the FRBs, with simultaneous detections in the X-rays. This result remarkably strengthened the FRB/magnetar link and strongly motivates panchromatic campaigns towards FRB sources in order to find, as in the case of SGR J1935+2154, their high-energy and also, possibly, optical/infrared counterparts. On one side, a multi-wavelength (MWL) detection of a FRB would lead to the confirmation of the aforementioned FRB/magnetar connection and/or to the evidence that other classes of astrophysical objects/events could be also responsible for a fraction of these radio transients. On the other hand, the observation of a MWL burst (outside the radio band) might discriminate between various proposed emission models, some of which predict MWL emission(s) simultaneous with the radio, while others prescribe that the MWL emission should occur well before or after the radio burst. However, any panchromatic detection of a FRB will mainly rely on the detection of a burst in radio. In order to detect as many radio bursts as possible, dedicated instruments and tailored search algorithms are a fundamental asset. MWL observations of FRBs are indeed the main driver of this PhD Thesis. On the other hand, MWL campaigns strongly depend on the capability to detect bursts in the radio band, which can happen with the deployment of proper instruments and with the use of dedicated search algorithms. In view of that, the PhD work presented in this Thesis, revolves around the following three main themes: (i) the analysis and possible improvement of the FRB detection algorithms; (ii) the development of radio facilities tailored for the FRB observations, and lastly (iii) a series of large MWL campaigns targeted to some of the FRB sources

Archivio istituzionale della ricerca - Università di Cagliari

Enhancing knowledge acquisition systems with user generated and crowdsourced resources

Author: Xu Fang
Publication venue: Fakultät 7 - Naturwissenschaftlich-Technische Fakultät II. Fachrichtung 7.4 - Mechatronik
Publication date: 01/01/2012
Field of study

This thesis is on leveraging knowledge acquisition systems with collaborative data and crowdsourcing work from internet. We propose two strategies and apply them for building effective entity linking and question answering (QA) systems. The first strategy is on integrating an information extraction system with online collaborative knowledge bases, such as Wikipedia and Freebase. We construct a Cross-Lingual Entity Linking (CLEL) system to connect Chinese entities, such as people and locations, with corresponding English pages in Wikipedia. The main focus is to break the language barrier between Chinese entities and the English KB, and to resolve the synonymy and polysemy of Chinese entities. To address those problems, we create a cross-lingual taxonomy and a Chinese knowledge base (KB). We investigate two methods of connecting the query representation with the KB representation. Based on our CLEL system participating in TAC KBP 2011 evaluation, we finally propose a simple and effective generative model, which achieved much better performance. The second strategy is on creating annotation for QA systems with the help of crowd- sourcing. Crowdsourcing is to distribute a task via internet and recruit a lot of people to complete it simultaneously. Various annotated data are required to train the data-driven statistical machine learning algorithms for underlying components in our QA system. This thesis demonstrates how to convert the annotation task into crowdsourcing micro-tasks, investigate different statistical methods for enhancing the quality of crowdsourced anno- tation, and ﬁnally use enhanced annotation to train learning to rank models for passage ranking algorithms for QA.Gegenstand dieser Arbeit ist das Nutzbarmachen sowohl von Systemen zur Wissener- fassung als auch von kollaborativ erstellten Daten und Arbeit aus dem Internet. Es werden zwei Strategien vorgeschlagen, welche für die Erstellung effektiver Entity Linking (Disambiguierung von Entitätennamen) und Frage-Antwort Systeme eingesetzt werden. Die erste Strategie ist, ein Informationsextraktions-System mit kollaborativ erstellten Online- Datenbanken zu integrieren. Wir entwickeln ein Cross-Linguales Entity Linking-System (CLEL), um chinesische Entitäten, wie etwa Personen und Orte, mit den entsprechenden Wikipediaseiten zu verknüpfen. Das Hauptaugenmerk ist es, die Sprachbarriere zwischen chinesischen Entitäten und englischer Datenbank zu durchbrechen, und Synonymie und Polysemie der chinesis- chen Entitäten aufzulösen. Um diese Probleme anzugehen, erstellen wir eine cross linguale Taxonomie und eine chinesische Datenbank. Wir untersuchen zwei Methoden, die Repräsentation der Anfrage und die Repräsentation der Datenbank zu verbinden. Schließlich stellen wir ein einfaches und effektives generatives Modell vor, das auf unserem System für die Teilnahme an der TAC KBP 2011 Evaluation basiert und eine erheblich bessere Performanz erreichte. Die zweite Strategie ist, Annotationen für Frage-Antwort-Systeme mit Hilfe von "Crowd- sourcing" zu erstellen. "Crowdsourcing" bedeutet, eine Aufgabe via Internet an eine große Menge an angeworbene Menschen zu verteilen, die diese simultan erledigen. Verschiedene annotierte Daten sind notwendig, um die datengetriebenen statistischen Lernalgorithmen zu trainieren, die unserem Frage-Antwort System zugrunde liegen. Wir zeigen, wie die Annotationsaufgabe in Mikro-Aufgaben für das Crowdsourcing umgewan- delt werden kann, wir untersuchen verschiedene statistische Methoden, um die Qualität der Annotation aus dem Crowdsourcing zu erweitern, und schließlich nutzen wir die erwei- erte Annotation, um Modelle zum Lernen von Ranglisten von Textabschnitten zu trainieren

The Talking Heads experiment: Origins of words and meanings

Author: Steels Luc L.
Publication venue: Language Science Press
Publication date: 19/05/2015
Field of study

The Talking Heads Experiment, conducted in the years 1999-2001, was the first large-scale experiment in which open populations of situated embodied agents created for the first time ever a new shared vocabulary by playing language games about real world scenes in front of them. The agents could teleport to different physical sites in the world through the Internet. Sites, in Antwerp, Brussels, Paris, Tokyo, London, Cambridge and several other locations were linked into the network. Humans could interact with the robotic agents either on site or remotely through the Internet and thus influence the evolving ontologies and languages of the artificial agents. The present book describes in detail the motivation, the cognitive mechanisms used by the agents, the various installations of the Talking Heads, the experimental results that were obtained, and the interaction with humans. It also provides a perspective on what happened in the field after these initial groundbreaking experiments. The book is invaluable reading for anyone interested in the history of agent-based models of language evolution and the future of Artificial Intelligence

Language Science Press