Search CORE

4,095 research outputs found

Learning, transferring, and recommending performance knowledge with Monte Carlo tree search and neural networks

Author: Dini Don M.
Publication venue
Publication date: 06/05/2020
Field of study

Making changes to a program to optimize its performance is an unscalable task that relies entirely upon human intuition and experience. In addition, companies operating at large scale are at a stage where no single individual understands the code controlling its systems, and for this reason, making changes to improve performance can become intractably difficult. In this paper, a learning system is introduced that provides AI assistance for finding recommended changes to a program. Specifically, it is shown how the evaluative feedback, delayed-reward performance programming domain can be effectively formulated via the Monte Carlo tree search (MCTS) framework. It is then shown that established methods from computational games for using learning to expedite tree-search computation can be adapted to speed up computing recommended program alterations. Estimates of expected utility from MCTS trees built for previous problems are used to learn a sampling policy that remains effective across new problems, thus demonstrating transferability of optimization knowledge. This formulation is applied to the Apache Spark distributed computing environment, and a preliminary result is observed that the time required to build a search tree for finding recommendations is reduced by up to a factor of 10x.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

State of the Art, Evaluation and Recommendations regarding "Document Processing and Visualization Techniques"

Author: Andrews Pierre
Rajman Martin
Vesely Martin
Publication venue
Publication date: 29/12/2004
Field of study

Several Networks of Excellence have been set up in the framework of the European FP5 research program. Among these Networks of Excellence, the NEMIS project focuses on the field of Text Mining. Within this field, document processing and visualization was identified as one of the key topics and the WG1 working group was created in the NEMIS project, to carry out a detailed survey of techniques associated with the text mining process and to identify the relevant research topics in related research areas. In this document we present the results of this comprehensive survey. The report includes a description of the current state-of-the-art and practice, a roadmap for follow-up research in the identified areas, and recommendations for anticipated technological development in the domain of text mining.Comment: 54 pages, Report of Working Group 1 for the European Network of Excellence (NoE) in Text Mining and its Applications in Statistics (NEMIS

arXiv.org e-Print Archive

Semi-Automatic Terminology Ontology Learning Based on Topic Modeling

Author: Dhar Amit Kumar
Rani Monika
Vyas O. P.
Publication venue
Publication date: 05/08/2017
Field of study

Ontologies provide features like a common vocabulary, reusability, machine-readable content, and also allows for semantic search, facilitate agent interaction and ordering & structuring of knowledge for the Semantic Web (Web 3.0) application. However, the challenge in ontology engineering is automatic learning, i.e., the there is still a lack of fully automatic approach from a text corpus or dataset of various topics to form ontology using machine learning techniques. In this paper, two topic modeling algorithms are explored, namely LSI & SVD and Mr.LDA for learning topic ontology. The objective is to determine the statistical relationship between document and terms to build a topic ontology and ontology graph with minimum human intervention. Experimental analysis on building a topic ontology and semantic retrieving corresponding topic ontology for the user's query demonstrating the effectiveness of the proposed approach

arXiv.org e-Print Archive

Towards a Context Knowledge Taxonomy. Combined Methodologies to Improve a Fast-Search Concept Extraction for an Ontology Population

Author: FIORAVANTI Antonio
GARGARO SILVIA
Publication venue: place:Brussels; Wien
Publication date: 01/01/2015
Field of study

Context in Architectural Design can be defined-related-comparable to hypothesis and boundary conditions in mathematics. An eco-system that influences it by means of natural and artificial events, space and time dimension. The research has the aim to analyze the critical issues related to Context by providing a contribution to the study of interactions between Context Knowledge and Architectural Design and how it can be used to improve the performance of the buildings and reducing design mistakes. The research focusing on formal ontologies, has developed a model that enables a semantic approach to design application programs, to manage information, to answer design questions and to have a clear relation between the formal representation of the context domain and its meanings. This context model provides an advancement on the state of the art in simplified design assumptions, in term of ontology ambiguity and complexity reduction, by using algorithms to extract and optimize branches of the graph. The extraction does not limit the number of relations, that can be extended and improve context taxonomy coherency and accuracy

Archivio della ricerca- Università di Roma La Sapienza

Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses

Author: Blodgett Su Lin
Keith Katherine A.
O'Connor Brendan
Publication venue
Publication date: 16/04/2018
Field of study

Dependency parsing research, which has made significant gains in recent years, typically focuses on improving the accuracy of single-tree predictions. However, ambiguity is inherent to natural language syntax, and communicating such ambiguity is important for error analysis and better-informed downstream applications. In this work, we propose a transition sampling algorithm to sample from the full joint distribution of parse trees defined by a transition-based parsing model, and demonstrate the use of the samples in probabilistic dependency analysis. First, we define the new task of dependency path prediction, inferring syntactic substructures over part of a sentence, and provide the first analysis of performance on this task. Second, we demonstrate the usefulness of our Monte Carlo syntax marginal method for parser error analysis and calibration. Finally, we use this method to propagate parse uncertainty to two downstream information extraction applications: identifying persons killed by police and semantic role assignment.Comment: To appear in Proceedings of NAACL 201

arXiv.org e-Print Archive

Empower Entity Set Expansion via Language Model Probing

Author: Han Jiawei
Shang Jingbo
Shen Jiaming
Zhang Yunyi
Publication venue
Publication date: 29/06/2020
Field of study

Entity set expansion, aiming at expanding a small seed entity set with new entities belonging to the same semantic class, is a critical task that benefits many downstream NLP and IR applications, such as question answering, query understanding, and taxonomy construction. Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities. A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations. In this study, we propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue. In each iteration, we select one positive and several negative class names by probing a pre-trained language model, and further score each candidate entity based on selected class names. Experiments on two datasets show that our framework generates high-quality class names and outperforms previous state-of-the-art methods significantly.Comment: ACL 202

arXiv.org e-Print Archive

A new procedure to analyze RNA non-branching structures

Author: FISCON GIULIA
G. Iannello
P. Paci
T. Colombo
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2015
Field of study

RNA structure prediction and structural motifs analysis are challenging tasks in the investigation of RNA function. We propose a novel procedure to detect structural motifs shared between two RNAs (a reference and a target). In particular, we developed two core modules: (i) nbRSSP_extractor, to assign a unique structure to the reference RNA encoded by a set of non-branching structures; (ii) SSD_finder, to detect structural motifs that the target RNA shares with the reference, by means of a new score function that rewards the relative distance of the target non-branching structures compared to the reference ones. We integrated these algorithms with already existing software to reach a coherent pipeline able to perform the following two main tasks: prediction of RNA structures (integration of RNALfold and nbRSSP_extractor) and search for chains of matches (integration of Structator and SSD_finder)

Archivio della ricerca- Università di Roma La Sapienza

Comparative Opinion Mining: A Review

Author: Crestani Fabio
Giachanou Anastasia
Varathan Kasturi Dewi
Publication venue: 'Wiley'
Publication date: 24/12/2017
Field of study

Opinion mining refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in textual material. Opinion mining, also known as sentiment analysis, has received a lot of attention in recent times, as it provides a number of tools to analyse the public opinion on a number of different topics. Comparative opinion mining is a subfield of opinion mining that deals with identifying and extracting information that is expressed in a comparative form (e.g.~"paper X is better than the Y"). Comparative opinion mining plays a very important role when ones tries to evaluate something, as it provides a reference point for the comparison. This paper provides a review of the area of comparative opinion mining. It is the first review that cover specifically this topic as all previous reviews dealt mostly with general opinion mining. This survey covers comparative opinion mining from two different angles. One from perspective of techniques and the other from perspective of comparative opinion elements. It also incorporates preprocessing tools as well as dataset that were used by the past researchers that can be useful to the future researchers in the field of comparative opinion mining

arXiv.org e-Print Archive

Intelligent Agents for Active Malware Analysis

Author: SARTEA RICCARDO
Publication venue
Publication date: 01/01/2020
Field of study

The main contribution of this thesis is to give a novel perspective on Active Malware Analysis modeled as a decision making process between intelligent agents. We propose solutions aimed at extracting the behaviors of malware agents with advanced Artificial Intelligence techniques. In particular, we devise novel action selection strategies for the analyzer agents that allow to analyze malware by selecting sequences of triggering actions aimed at maximizing the information acquired. The goal is to create informative models representing the behaviors of the malware agents observed while interacting with them during the analysis process. Such models can then be used to effectively compare a malware against others and to correctly identify the malware famil

Probabilistic Clustering of Sequences: Inferring new bacterial regulons by comparative genomics

Author: Bailey
Bausch
Begley
Berg
Cheek
E. D. Siggia
E. van Nimwegen
Gelfand
Jacobson
Jennings
Lawrence
M. Zavolan
McCue
McGuire
N. Rajewsky
Peekhaus
Peekhaus
Robison
Salgado
Sengupta
Stormo
van Nimwegen
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2002
Field of study

Genome wide comparisons between enteric bacteria yield large sets of conserved putative regulatory sites on a gene by gene basis that need to be clustered into regulons. Using the assumption that regulatory sites can be represented as samples from weight matrices we derive a unique probability distribution for assignments of sites into clusters. Our algorithm, 'PROCSE' (probabilistic clustering of sequences), uses Monte-Carlo sampling of this distribution to partition and align thousands of short DNA sequences into clusters. The algorithm internally determines the number of clusters from the data, and assigns significance to the resulting clusters. We place theoretical limits on the ability of any algorithm to correctly cluster sequences drawn from weight matrices (WMs) when these WMs are unknown. Our analysis suggests that the set of all putative sites for a single genome (e.g. E. coli) is largely inadequate for clustering. When sites from different genomes are combined and all the homologous sites from the various species are used as a block, clustering becomes feasible. We predict 50-100 new regulons as well as many new members of existing regulons, potentially doubling the number of known regulatory sites in E. coli.Comment: 27 pages including 9 figures and 3 table

arXiv.org e-Print Archive

edoc