8,216 research outputs found
Expert System for Crop Disease based on Graph Pattern Matching: A proposal
Para la agroindustria, las enfermedades en cultivos constituyen uno de los problemas más frecuentes que generan grandes pérdidas económicas y baja calidad en la producción. Por otro lado, desde las ciencias de la computación, han surgido diferentes herramientas cuya finalidad es mejorar la prevención y el tratamiento de estas enfermedades. En este sentido, investigaciones recientes proponen el desarrollo de sistemas expertos para resolver este problema haciendo uso de técnicas de minería de datos e inteligencia artificial, como inferencia basada en reglas, árboles de decisión, redes bayesianas, entre otras. Además, los grafos pueden ser usados para el almacenamiento de los diferentes tipos de variables que se encuentran presentes en un ambiente de cultivos, permitiendo la aplicación de técnicas de minería de datos en grafos, como el emparejamiento de patrones en los mismos. En este artículo presentamos una visión general de las temáticas mencionadas y una propuesta de un sistema experto para enfermedades en cultivos, basado en emparejamiento de patrones en grafos.For agroindustry, crop diseases constitute one of the most common problems that generate large economic losses and low production quality. On the other hand, from computer science, several tools have emerged in order to improve the prevention and treatment of these diseases. In this sense, recent research proposes the development of expert systems to solve this problem, making use of data mining and artificial intelligence techniques like rule-based inference, decision trees, Bayesian network, among others. Furthermore, graphs can be used for storage of different types of variables that are present in an environment of crops, allowing the application of graph data mining techniques like graph pattern matching. Therefore, in this paper we present an overview of the above issues and a proposal of an expert system for crop disease based on graph pattern matching
Performance Guarantees for Distributed Reachability Queries
In the real world a graph is often fragmented and distributed across
different sites. This highlights the need for evaluating queries on distributed
graphs. This paper proposes distributed evaluation algorithms for three classes
of queries: reachability for determining whether one node can reach another,
bounded reachability for deciding whether there exists a path of a bounded
length between a pair of nodes, and regular reachability for checking whether
there exists a path connecting two nodes such that the node labels on the path
form a string in a given regular expression. We develop these algorithms based
on partial evaluation, to explore parallel computation. When evaluating a query
Q on a distributed graph G, we show that these algorithms possess the following
performance guarantees, no matter how G is fragmented and distributed: (1) each
site is visited only once; (2) the total network traffic is determined by the
size of Q and the fragmentation of G, independent of the size of G; and (3) the
response time is decided by the largest fragment of G rather than the entire G.
In addition, we show that these algorithms can be readily implemented in the
MapReduce framework. Using synthetic and real-life data, we experimentally
verify that these algorithms are scalable on large graphs, regardless of how
the graphs are distributed.Comment: VLDB201
Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge
This paper describes the winning entry to the IJCNN 2011 Social Network
Challenge run by Kaggle.com. The goal of the contest was to promote research on
real-world link prediction, and the dataset was a graph obtained by crawling
the popular Flickr social photo sharing website, with user identities scrubbed.
By de-anonymizing much of the competition test set using our own Flickr crawl,
we were able to effectively game the competition. Our attack represents a new
application of de-anonymization to gaming machine learning contests, suggesting
changes in how future competitions should be run.
We introduce a new simulated annealing-based weighted graph matching
algorithm for the seeding step of de-anonymization. We also show how to combine
de-anonymization with link prediction---the latter is required to achieve good
performance on the portion of the test set not de-anonymized---for example by
training the predictor on the de-anonymized portion of the test set, and
combining probabilistic predictions from de-anonymization and link prediction.Comment: 11 pages, 13 figures; submitted to IJCNN'201
Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases
A critical challenge in constructing a natural language interface to database
(NLIDB) is bridging the semantic gap between a natural language query (NLQ) and
the underlying data. Two specific ways this challenge exhibits itself is
through keyword mapping and join path inference. Keyword mapping is the task of
mapping individual keywords in the original NLQ to database elements (such as
relations, attributes or values). It is challenging due to the ambiguity in
mapping the user's mental model and diction to the schema definition and
contents of the underlying database. Join path inference is the process of
selecting the relations and join conditions in the FROM clause of the final SQL
query, and is difficult because NLIDB users lack the knowledge of the database
schema or SQL and therefore cannot explicitly specify the intermediate tables
and joins needed to construct a final SQL query. In this paper, we propose
leveraging information from the SQL query log of a database to enhance the
performance of existing NLIDBs with respect to these challenges. We present a
system Templar that can be used to augment existing NLIDBs. Our extensive
experimental evaluation demonstrates the effectiveness of our approach, leading
up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL
query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE)
201
View Selection in Semantic Web Databases
We consider the setting of a Semantic Web database, containing both explicit
data encoded in RDF triples, and implicit data, implied by the RDF semantics.
Based on a query workload, we address the problem of selecting a set of views
to be materialized in the database, minimizing a combination of query
processing, view storage, and view maintenance costs. Starting from an existing
relational view selection method, we devise new algorithms for recommending
view sets, and show that they scale significantly beyond the existing
relational ones when adapted to the RDF context. To account for implicit
triples in query answers, we propose a novel RDF query reformulation algorithm
and an innovative way of incorporating it into view selection in order to avoid
a combinatorial explosion in the complexity of the selection process. The
interest of our techniques is demonstrated through a set of experiments.Comment: VLDB201
Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty
There is a growing need for methods which can capture uncertainties and
answer queries over graph-structured data. Two common types of uncertainty are
uncertainty over the attribute values of nodes and uncertainty over the
existence of edges. In this paper, we combine those with identity uncertainty.
Identity uncertainty represents uncertainty over the mapping from objects
mentioned in the data, or references, to the underlying real-world entities. We
propose the notion of a probabilistic entity graph (PEG), a probabilistic graph
model that defines a distribution over possible graphs at the entity level. The
model takes into account node attribute uncertainty, edge existence
uncertainty, and identity uncertainty, and thus enables us to systematically
reason about all three types of uncertainties in a uniform manner. We introduce
a general framework for constructing a PEG given uncertain data at the
reference level and develop highly efficient algorithms to answer subgraph
pattern matching queries in this setting. Our algorithms are based on two novel
ideas: context-aware path indexing and reduction by join-candidates, which
drastically reduce the query search space. A comprehensive experimental
evaluation shows that our approach outperforms baseline implementations by
orders of magnitude
- …