117,046 research outputs found
On The Effect of Hyperedge Weights On Hypergraph Learning
Hypergraph is a powerful representation in several computer vision, machine
learning and pattern recognition problems. In the last decade, many researchers
have been keen to develop different hypergraph models. In contrast, no much
attention has been paid to the design of hyperedge weights. However, many
studies on pairwise graphs show that the choice of edge weight can
significantly influence the performances of such graph algorithms. We argue
that this also applies to hypegraphs. In this paper, we empirically discuss the
influence of hyperedge weight on hypegraph learning via proposing three novel
hyperedge weights from the perspectives of geometry, multivariate statistical
analysis and linear regression. Extensive experiments on ORL, COIL20, JAFFE,
Sheffield, Scene15 and Caltech256 databases verify our hypothesis. Similar to
graph learning, several representative hyperedge weighting schemes can be
concluded by our experimental studies. Moreover, the experiments also
demonstrate that the combinations of such weighting schemes and conventional
hypergraph models can get very promising classification and clustering
performances in comparison with some recent state-of-the-art algorithms
A Scalable Graph-Coarsening Based Index for Dynamic Graph Databases
Graph is a commonly used data structure for modeling complex data such as chemical molecules, images, social networks, and XML documents. This complex data is stored using a set of graphs, known as graph database D. To speed up query answering on graph databases, indexes are commonly used. State-of-the-art graph database indexes do not adapt or scale well to dynamic graph database use; they are static, and their ability to prune possible search responses to meet user needs worsens over time as databases change and grow. Users can re-mine indexes to gain some improvement, but it is time consuming. Users must also tune numerous parameters on an ongoing basis to optimize performance and can inadvertently worsen the query response time if they do not choose parameters wisely. Recently, a one-pass algorithm has been developed to enhance the performance of these indexes in part by using the algorithm to update them regularly. However, there are some drawbacks, most notably the need to make updates as the query workload changes.
We propose a new index based on graph-coarsening to speed up query answering time in dynamic graph databases. Our index is parameter-free, query-independent, scalable, small enough to store in the main memory, and is simpler and less costly to maintain for database updates.
We conducted an extensive sets of experiments on two types of databases, i.e., chemical and social network databases, to compare our graph-coarsening based index vs. hybrid-indexes as follows. First, we considered no database updates or query workload changes (static graph databases) and compared the indexes according to query vi answering time and index size for different minSup values. Second, we compared the indexes in the case of dynamic graph databases, i.e. when graphs are added to or removed from the database. Third, we compared the indexes with regard to query workload changes. Fourth, we studied the scalability of our index vs. hybrid-indexes.
Experimental results show that our index outperforms hybrid-indexes (i.e. indexes updated with one-pass) for query answering time in the case of social network databases, and is comparable with these indexes for frequent and infrequent queries on chemical databases. Our graph-coarsening index can be updated up to 60 times faster in comparison to one-pass on dynamic graph databases. Moreover, our index is independent of the query workload for index update and is up to 15 times better after hybrid indexes are attuned to query workload for social network databases.
This work is also published in 26th ACM International Conference on Information and Knowledge Management (CIKM) held in Singapore[18]
Data integration for biological network databases: MetNetDB labeled graph model and graph matching algorithm
To understand the cellular functions of genes requires investigating a variety of biological data, including experimental data, annotation from online databases and literatures, information about cellular interactions, and domain knowledge from biologists. These requirements demand a flexible and powerful biological data management system. MetNetDB is the biological database component of the MetNet platform (http://metnetdb.org/), a software platform for Arabidopsis system biology. This work describes a labeled graph model that addresses the challenges associated with biological network databases, and discusses the implementation of this model in MetNetDB.
MetNetDB integrates most recent data from various sources, including biological networks, gene annotation, metabolite information, and protein localization data. The integration contains four steps: data model transformation and integration; semantic mapping; data conversion and integration; and conflict resolution. MetNetDB is established as a labeled graph model. The graph structure supports network data storage and application of graph analysis algorithm. The node and edge labels have the same extension capability as object data model. In addition, rules are used to guarantee the biological network data integrity; operations are defined for graph edit and comparison.
To facilitate the integration of network data, which is often inaccurate or incomplete, a subgraph extraction algorithm is designed for MetNetDB. This algorithm allows subgraph querying based on user-specified biomolecules. Both exact matching and approximate matching with biomolecules in networks are supported. The similarity among biomolecules is inferred from expression patterns, gene ontology, chemical ontology, and protein-gene relationships. Combined with the implementation of Messmer\u27s approximate subgraph isomorphism algorithm, MetNetDB supports exact and approximate graph matching.
Based on the MetNetDB labeled graph model and the graph matching algorithms, the MetNetDB curator tool is built with several innovative features, including active biological rule checking during network curation, tracking data change history, and a biologist-friendly visual graph query system
Graph-based data management system for efficient information storage, retrieval and processing
Data management systems rely on a correct design of data representation and software components. The data representation scheme plays a vital role in how the data are stored, which influences the efficiency of its processing and retrieval. The system components design realizes software engineering concepts to enable performance metrics such as scalability, efficiency, flexibility, maintainability, and extendibility. This paper presents a data management system that uses a graph-based data representation scheme to achieve an efficient data retrieval when using graph-based databases. Input data are transformed into vertices, edges, and labels while inserting them into the database. The proposed system consists of three layers which are: system beans layer, data access layer, and the database engine. Healthcare data are used to evaluate the system in comparison with resource description framework (RDF) semantics. Extensive experiments are conducted to compare different scenarios of data storage and retrieval using Neo4J, OrientDB, and RDF4J. Experimental results show that the performance of the proposed graph-based approach outperforms RDF4J framework in terms of insertion and retrieval time
Robust Principal Component Analysis on Graphs
Principal Component Analysis (PCA) is the most widely used tool for linear
dimensionality reduction and clustering. Still it is highly sensitive to
outliers and does not scale well with respect to the number of data samples.
Robust PCA solves the first issue with a sparse penalty term. The second issue
can be handled with the matrix factorization model, which is however
non-convex. Besides, PCA based clustering can also be enhanced by using a graph
of data similarity. In this article, we introduce a new model called "Robust
PCA on Graphs" which incorporates spectral graph regularization into the Robust
PCA framework. Our proposed model benefits from 1) the robustness of principal
components to occlusions and missing values, 2) enhanced low-rank recovery, 3)
improved clustering property due to the graph smoothness assumption on the
low-rank matrix, and 4) convexity of the resulting optimization problem.
Extensive experiments on 8 benchmark, 3 video and 2 artificial datasets with
corruptions clearly reveal that our model outperforms 10 other state-of-the-art
models in its clustering and low-rank recovery tasks
Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases
A critical challenge in constructing a natural language interface to database
(NLIDB) is bridging the semantic gap between a natural language query (NLQ) and
the underlying data. Two specific ways this challenge exhibits itself is
through keyword mapping and join path inference. Keyword mapping is the task of
mapping individual keywords in the original NLQ to database elements (such as
relations, attributes or values). It is challenging due to the ambiguity in
mapping the user's mental model and diction to the schema definition and
contents of the underlying database. Join path inference is the process of
selecting the relations and join conditions in the FROM clause of the final SQL
query, and is difficult because NLIDB users lack the knowledge of the database
schema or SQL and therefore cannot explicitly specify the intermediate tables
and joins needed to construct a final SQL query. In this paper, we propose
leveraging information from the SQL query log of a database to enhance the
performance of existing NLIDBs with respect to these challenges. We present a
system Templar that can be used to augment existing NLIDBs. Our extensive
experimental evaluation demonstrates the effectiveness of our approach, leading
up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL
query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE)
201
- …