2,151 research outputs found

    Keyword search in graphs, relational databases and social networks

    Get PDF
    Keyword search, a well known mechanism for retrieving relevant information from a set of documents, has recently been studied for extracting information from structured data (e.g., relational databases and XML documents). It offers an alternative way to query languages (e.g., SQL) to explore databases, which is effective for lay users who may not be familiar with the database schema or the query language. This dissertation addresses some issues in keyword search in structured data. Namely, novel solutions to existing problems in keyword search in graphs or relational databases are proposed. In addition, a problem related to graph keyword search, team formation in social networks, is studied. The dissertation consists of four parts. The first part addresses keyword search over a graph which finds a substructure of the graph containing all or some of the query keywords. Current methods for keyword search over graphs may produce answers in which some content nodes (i.e., nodes that contain input keywords) are not very close to each other. In addition, current methods explore both content and non-content nodes while searching for the result and are thus both time and memory consuming for large graphs. To address the above problems, we propose algorithms for finding r-cliques in graphs. An r-clique is a group of content nodes that cover all the input keywords and the distance between each pair of nodes is less than or equal to r. Two approximation algorithms that produce r-cliques with a bounded approximation ratio in polynomial delay are proposed. In the second part, the problem of duplication-free and minimal keyword search in graphs is studied. Current methods for keyword search in graphs may produce duplicate answers that contain the same set of content nodes. In addition, an answer found by these methods may not be minimal in the sense that some of the nodes in the answer may contain query keywords that are all covered by other nodes in the answer. Removing these nodes does not change the coverage of the answer but can make the answer more compact. We define the problem of finding duplication-free and minimal answers, and propose algorithms for finding such answers efficiently. Meaningful keyword search in relational databases is the subject of the third part of this dissertation. Keyword search over relational databases returns a join tree spanning tuples containing the query keywords. As many answers of varying quality can be found, and the user is often only interested in seeing the·top-k answers, how to gauge the relevance of answers to rank them is of paramount importance. This becomes more pertinent for databases with large and complex schemas. We focus on the relevance of join trees as the fundamental means to rank the answers. We devise means to measure relevance of relations and foreign keys in the schema over the information content of the database. The problem of keyword search over graph data is similar to the problem of team formation in social networks. In this setting, keywords represent skills and the nodes in a graph represent the experts that possess skills. Given an expert network, in which a node represents an expert that has a cost for using the expert service and an edge represents the communication cost between the two corresponding experts, we tackle the problem of finding a team of experts that covers a set of required skills and also minimizes the communication cost as well as the personnel cost of the team. We propose two types of approximation algorithms to solve this bi-criteria problem in the fourth part of this dissertation

    Minmax regret combinatorial optimization problems: an Algorithmic Perspective

    Get PDF
    Candia-Vejar, A (reprint author), Univ Talca, Modeling & Ind Management Dept, Curico, Chile.Uncertainty in optimization is not a new ingredient. Diverse models considering uncertainty have been developed over the last 40 years. In our paper we essentially discuss a particular uncertainty model associated with combinatorial optimization problems, developed in the 90's and broadly studied in the past years. This approach named minmax regret (in particular our emphasis is on the robust deviation criteria) is different from the classical approach for handling uncertainty, stochastic approach, where uncertainty is modeled by assumed probability distributions over the space of all possible scenarios and the objective is to find a solution with good probabilistic performance. In the minmax regret (MMR) approach, the set of all possible scenarios is described deterministically, and the search is for a solution that performs reasonably well for all scenarios, i.e., that has the best worst-case performance. In this paper we discuss the computational complexity of some classic combinatorial optimization problems using MMR. approach, analyze the design of several algorithms for these problems, suggest the study of some specific research problems in this attractive area, and also discuss some applications using this model

    Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing

    Get PDF
    Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM

    GRASP/VND Optimization Algorithms for Hard Combinatorial Problems

    Get PDF
    Two hard combinatorial problems are addressed in this thesis. The first one is known as the ”Max CutClique”, a combinatorial problem introduced by P. Martins in 2012. Given a simple graph, the goal is to find a clique C such that the number of links shared between C and its complement C C is maximum. In a first contribution, a GRASP/VND methodology is proposed to tackle the problem. In a second one, the N P-Completeness of the problem is mathematically proved. Finally, a further generalization with weighted links is formally presented with a mathematical programming formulation, and the previous GRASP is adapted to the new problem. The second problem under study is a celebrated optimization problem coming from network reliability analysis. We assume a graph G with perfect nodes and imperfect links, that fail independently with identical probability ρ ∈ [0,1]. The reliability RG(ρ), is the probability that the resulting subgraph has some spanning tree. Given a number of nodes and links, p and q, the goal is to find the (p,q)-graph that has the maximum reliability RG(ρ), uniformly in the compact set ρ ∈ [0,1]. In a first contribution, we exploit properties shared by all uniformly most-reliable graphs such as maximum connectivity and maximum Kirchhoff number, in order to build a novel GRASP/VND methodology. Our proposal finds the globally optimum solution under small cases, and it returns novel candidates of uniformly most-reliable graphs, such as Kantor-Mobius and Heawood graphs. We also offer a literature review, š and a mathematical proof that the bipartite graph K4,4 is uniformly most-reliable. Finally, an abstract mathematical model of Stochastic Binary Systems (SBS) is also studied. It is a further generalization of network reliability models, where failures are modelled by a general logical function. A geometrical approximation of a logical function is offered, as well as a novel method to find reliability bounds for general SBS. This bounding method combines an algebraic duality, Markov inequality and Hahn-Banach separation theorem between convex and compact sets

    Survey of Vector Database Management Systems

    Full text link
    There are now over 20 commercial vector database management systems (VDBMSs), all produced within the past five years. But embedding-based retrieval has been studied for over ten years, and similarity search a staggering half century and more. Driving this shift from algorithms to systems are new data intensive applications, notably large language models, that demand vast stores of unstructured data coupled with reliable, secure, fast, and scalable query processing capability. A variety of new data management techniques now exist for addressing these needs, however there is no comprehensive survey to thoroughly review these techniques and systems. We start by identifying five main obstacles to vector data management, namely vagueness of semantic similarity, large size of vectors, high cost of similarity comparison, lack of natural partitioning that can be used for indexing, and difficulty of efficiently answering hybrid queries that require both attributes and vectors. Overcoming these obstacles has led to new approaches to query processing, storage and indexing, and query optimization and execution. For query processing, a variety of similarity scores and query types are now well understood; for storage and indexing, techniques include vector compression, namely quantization, and partitioning based on randomization, learning partitioning, and navigable partitioning; for query optimization and execution, we describe new operators for hybrid queries, as well as techniques for plan enumeration, plan selection, and hardware accelerated execution. These techniques lead to a variety of VDBMSs across a spectrum of design and runtime characteristics, including native systems specialized for vectors and extended systems that incorporate vector capabilities into existing systems. We then discuss benchmarks, and finally we outline research challenges and point the direction for future work.Comment: 25 page
    • 

    corecore