4,095 research outputs found
Learning, transferring, and recommending performance knowledge with Monte Carlo tree search and neural networks
Making changes to a program to optimize its performance is an unscalable task
that relies entirely upon human intuition and experience. In addition,
companies operating at large scale are at a stage where no single individual
understands the code controlling its systems, and for this reason, making
changes to improve performance can become intractably difficult. In this paper,
a learning system is introduced that provides AI assistance for finding
recommended changes to a program. Specifically, it is shown how the evaluative
feedback, delayed-reward performance programming domain can be effectively
formulated via the Monte Carlo tree search (MCTS) framework. It is then shown
that established methods from computational games for using learning to
expedite tree-search computation can be adapted to speed up computing
recommended program alterations. Estimates of expected utility from MCTS trees
built for previous problems are used to learn a sampling policy that remains
effective across new problems, thus demonstrating transferability of
optimization knowledge. This formulation is applied to the Apache Spark
distributed computing environment, and a preliminary result is observed that
the time required to build a search tree for finding recommendations is reduced
by up to a factor of 10x.Comment: 8 pages, 2 figure
State of the Art, Evaluation and Recommendations regarding "Document Processing and Visualization Techniques"
Several Networks of Excellence have been set up in the framework of the
European FP5 research program. Among these Networks of Excellence, the NEMIS
project focuses on the field of Text Mining.
Within this field, document processing and visualization was identified as
one of the key topics and the WG1 working group was created in the NEMIS
project, to carry out a detailed survey of techniques associated with the text
mining process and to identify the relevant research topics in related research
areas.
In this document we present the results of this comprehensive survey. The
report includes a description of the current state-of-the-art and practice, a
roadmap for follow-up research in the identified areas, and recommendations for
anticipated technological development in the domain of text mining.Comment: 54 pages, Report of Working Group 1 for the European Network of
Excellence (NoE) in Text Mining and its Applications in Statistics (NEMIS
Semi-Automatic Terminology Ontology Learning Based on Topic Modeling
Ontologies provide features like a common vocabulary, reusability,
machine-readable content, and also allows for semantic search, facilitate agent
interaction and ordering & structuring of knowledge for the Semantic Web (Web
3.0) application. However, the challenge in ontology engineering is automatic
learning, i.e., the there is still a lack of fully automatic approach from a
text corpus or dataset of various topics to form ontology using machine
learning techniques. In this paper, two topic modeling algorithms are explored,
namely LSI & SVD and Mr.LDA for learning topic ontology. The objective is to
determine the statistical relationship between document and terms to build a
topic ontology and ontology graph with minimum human intervention. Experimental
analysis on building a topic ontology and semantic retrieving corresponding
topic ontology for the user's query demonstrating the effectiveness of the
proposed approach
Towards a Context Knowledge Taxonomy. Combined Methodologies to Improve a Fast-Search Concept Extraction for an Ontology Population
Context in Architectural Design can be defined-related-comparable to hypothesis and boundary conditions in mathematics. An eco-system that influences it by means of natural and artificial events, space and time dimension. The research has the aim to analyze the critical issues related to Context by providing a contribution to the study of interactions between Context Knowledge and Architectural Design and how it can be used to improve the performance of the buildings and reducing design mistakes. The research focusing on formal ontologies, has developed a model that enables a semantic approach to design application programs, to manage information, to answer design questions and to have a clear relation between the formal representation of the context domain and its meanings. This context model provides an advancement on the state of the art in simplified design assumptions, in term of ontology ambiguity and complexity reduction, by using algorithms to extract and optimize branches of the graph. The extraction does not limit the number of relations, that can be extended and improve context taxonomy coherency and accuracy
Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses
Dependency parsing research, which has made significant gains in recent
years, typically focuses on improving the accuracy of single-tree predictions.
However, ambiguity is inherent to natural language syntax, and communicating
such ambiguity is important for error analysis and better-informed downstream
applications. In this work, we propose a transition sampling algorithm to
sample from the full joint distribution of parse trees defined by a
transition-based parsing model, and demonstrate the use of the samples in
probabilistic dependency analysis. First, we define the new task of dependency
path prediction, inferring syntactic substructures over part of a sentence, and
provide the first analysis of performance on this task. Second, we demonstrate
the usefulness of our Monte Carlo syntax marginal method for parser error
analysis and calibration. Finally, we use this method to propagate parse
uncertainty to two downstream information extraction applications: identifying
persons killed by police and semantic role assignment.Comment: To appear in Proceedings of NAACL 201
Empower Entity Set Expansion via Language Model Probing
Entity set expansion, aiming at expanding a small seed entity set with new
entities belonging to the same semantic class, is a critical task that benefits
many downstream NLP and IR applications, such as question answering, query
understanding, and taxonomy construction. Existing set expansion methods
bootstrap the seed entity set by adaptively selecting context features and
extracting new entities. A key challenge for entity set expansion is to avoid
selecting ambiguous context features which will shift the class semantics and
lead to accumulative errors in later iterations. In this study, we propose a
novel iterative set expansion framework that leverages automatically generated
class names to address the semantic drift issue. In each iteration, we select
one positive and several negative class names by probing a pre-trained language
model, and further score each candidate entity based on selected class names.
Experiments on two datasets show that our framework generates high-quality
class names and outperforms previous state-of-the-art methods significantly.Comment: ACL 202
A new procedure to analyze RNA non-branching structures
RNA structure prediction and structural motifs analysis are challenging tasks in the investigation of RNA function. We propose a novel procedure to detect structural motifs shared between two RNAs (a reference and a target). In particular, we developed two core modules: (i) nbRSSP_extractor, to assign a unique structure to the reference RNA encoded by a set of non-branching structures; (ii) SSD_finder, to detect structural motifs that the target RNA shares with the reference, by means of a new score function that rewards the relative distance of the target non-branching structures compared to the reference ones. We integrated these algorithms with already existing software to reach a coherent pipeline able to perform the following two main tasks: prediction of RNA structures (integration of RNALfold and nbRSSP_extractor) and search for chains of matches (integration of Structator and SSD_finder)
Comparative Opinion Mining: A Review
Opinion mining refers to the use of natural language processing, text
analysis and computational linguistics to identify and extract subjective
information in textual material. Opinion mining, also known as sentiment
analysis, has received a lot of attention in recent times, as it provides a
number of tools to analyse the public opinion on a number of different topics.
Comparative opinion mining is a subfield of opinion mining that deals with
identifying and extracting information that is expressed in a comparative form
(e.g.~"paper X is better than the Y"). Comparative opinion mining plays a very
important role when ones tries to evaluate something, as it provides a
reference point for the comparison. This paper provides a review of the area of
comparative opinion mining. It is the first review that cover specifically this
topic as all previous reviews dealt mostly with general opinion mining. This
survey covers comparative opinion mining from two different angles. One from
perspective of techniques and the other from perspective of comparative opinion
elements. It also incorporates preprocessing tools as well as dataset that were
used by the past researchers that can be useful to the future researchers in
the field of comparative opinion mining
Intelligent Agents for Active Malware Analysis
The main contribution of this thesis is to give a novel perspective on Active Malware Analysis modeled as a decision making process between intelligent agents. We propose solutions aimed at extracting the behaviors of malware agents with advanced Artificial Intelligence techniques. In particular, we devise novel action selection strategies for the analyzer agents that allow to analyze malware by selecting sequences of triggering actions aimed at maximizing the information acquired. The goal is to create informative models representing the behaviors of the malware agents observed while interacting with them during the analysis process. Such models can then be used to effectively compare a malware against others and to correctly identify the malware famil
Probabilistic Clustering of Sequences: Inferring new bacterial regulons by comparative genomics
Genome wide comparisons between enteric bacteria yield large sets of
conserved putative regulatory sites on a gene by gene basis that need to be
clustered into regulons. Using the assumption that regulatory sites can be
represented as samples from weight matrices we derive a unique probability
distribution for assignments of sites into clusters. Our algorithm, 'PROCSE'
(probabilistic clustering of sequences), uses Monte-Carlo sampling of this
distribution to partition and align thousands of short DNA sequences into
clusters. The algorithm internally determines the number of clusters from the
data, and assigns significance to the resulting clusters. We place theoretical
limits on the ability of any algorithm to correctly cluster sequences drawn
from weight matrices (WMs) when these WMs are unknown. Our analysis suggests
that the set of all putative sites for a single genome (e.g. E. coli) is
largely inadequate for clustering. When sites from different genomes are
combined and all the homologous sites from the various species are used as a
block, clustering becomes feasible. We predict 50-100 new regulons as well as
many new members of existing regulons, potentially doubling the number of known
regulatory sites in E. coli.Comment: 27 pages including 9 figures and 3 table
- …