46 research outputs found

    Risk-Averse Matchings over Uncertain Graph Databases

    Full text link
    A large number of applications such as querying sensor networks, and analyzing protein-protein interaction (PPI) networks, rely on mining uncertain graph and hypergraph databases. In this work we study the following problem: given an uncertain, weighted (hyper)graph, how can we efficiently find a (hyper)matching with high expected reward, and low risk? This problem naturally arises in the context of several important applications, such as online dating, kidney exchanges, and team formation. We introduce a novel formulation for finding matchings with maximum expected reward and bounded risk under a general model of uncertain weighted (hyper)graphs that we introduce in this work. Our model generalizes probabilistic models used in prior work, and captures both continuous and discrete probability distributions, thus allowing to handle privacy related applications that inject appropriately distributed noise to (hyper)edge weights. Given that our optimization problem is NP-hard, we turn our attention to designing efficient approximation algorithms. For the case of uncertain weighted graphs, we provide a 13\frac{1}{3}-approximation algorithm, and a 15\frac{1}{5}-approximation algorithm with near optimal run time. For the case of uncertain weighted hypergraphs, we provide a Ω(1k)\Omega(\frac{1}{k})-approximation algorithm, where kk is the rank of the hypergraph (i.e., any hyperedge includes at most kk nodes), that runs in almost (modulo log factors) linear time. We complement our theoretical results by testing our approximation algorithms on a wide variety of synthetic experiments, where we observe in a controlled setting interesting findings on the trade-off between reward, and risk. We also provide an application of our formulation for providing recommendations of teams that are likely to collaborate, and have high impact.Comment: 25 page

    Reviewing the integration of patient data: how systems are evolving in practice to meet patient needs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The integration of Information Systems (IS) is essential to support shared care and to provide consistent care to individuals – patient-centred care. This paper identifies, appraises and summarises studies examining different approaches to integrate patient data from heterogeneous IS.</p> <p>Methods</p> <p>The literature was systematically reviewed between 1995–2005 to identify articles mentioning patient records, computers and data integration or sharing.</p> <p>Results</p> <p>Of 3124 articles, 84 were included describing 56 distinct projects. Most of the projects were on a regional scale. Integration was most commonly accomplished by messaging with pre-defined templates and middleware solutions. HL7 was the most widely used messaging standard. Direct database access and web services were the most common communication methods. The user interface for most systems was a Web browser. Regarding the type of medical data shared, 77% of projects integrated diagnosis and problems, 67% medical images and 65% lab results. More recently significantly more IS are extending to primary care and integrating referral letters.</p> <p>Conclusion</p> <p>It is clear that Information Systems are evolving to meet people's needs by implementing regional networks, allowing patient access and integration of ever more items of patient data. Many distinct technological solutions coexist to integrate patient data, using differing standards and data architectures which may difficult further interoperability.</p

    Nearest neighbor retrieval using distance-based hashing

    No full text
    A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficient approximate nearest neighbor retrieval. Hashing methods, such as Locality Sensitive Hashing (LSH), have been successfully applied for similarity indexing in vector spaces and string spaces under the Hamming distance. The key novelty of the hashing technique proposed here is that it can be applied to spaces with arbitrary distance measures, including non-metric distance measures. First, we describe a domain-independent method for constructing a family of binary hash functions. Then, we use these functions to construct multiple multibit hash tables. We show that the LSH formalism is not applicable for analyzing the behavior of these tables as index structures. We present a novel formulation, that uses statistical observations from sample data to analyze retrieval accuracy and efficiency for the proposed indexing method. Experiments on several real-world data sets demonstrate that our method produces good trade-offs between accuracy and efficiency, and significantly outperforms VP-trees, which are a well-known method for distance-based indexing

    Modeling susceptibility to periodontitis

    No full text
    Chronic inflammatory diseases like periodontitis have a complex pathogenesis and a multifactorial etiology, involving complex interactions between multiple genetic loci and infectious agents. We aimed to investigate the influence of genetic polymorphisms and bacteria on chronic periodontitis risk. We determined the prevalence of 12 single-nucleotide polymorphisms (SNPs) in immune response candidate genes and 7 bacterial species of potential relevance to periodontitis etiology, in chronic periodontitis patients and non-periodontitis control individuals (N = 385). Using decision tree analysis, we identified the presence of bacterial species Tannerella forsythia, Porphyromonas gingivalis, Aggregatibacter actinomycetemcomitans, and SNPs TNF -857 and IL-1A -889 as discriminators between periodontitis and non-periodontitis. The model reached an accuracy of 80%, sensitivity of 85%, specificity of 73%, and AUC of 73%. This pilot study shows that, on the basis of 3 periodontal pathogens and SNPs, patterns may be recognized to identify patients at risk for periodontitis. Modern bioinformatics tools are valuable in modeling the multifactorial and complex nature of periodontitis

    Embedding-based subsequence matching in time-series databases

    No full text
    We propose an embedding-based framework for subsequence matching in time-series databases that improves the efficiency of processing subsequence matching queries under the Dynamic Time Warping (DTW) distance measure. This framework partially reduces subsequence matching to vector matching, using an embedding that maps each query sequence to a vector and each database time series into a sequence of vectors. The database embedding is computed offline, as a preprocessing step. At runtime, given a query object, an embedding of that object is computed online. Relatively few areas of interest are efficiently identified in the database sequences by comparing the embedding of the query with the database vectors. Those areas of interest are then fully explored using the exact DTW-based subsequence matching algorithm. We apply the proposed framework to define two specific methods. The first method focuses on time-series subsequence matching under unconstrained Dynamic Time Warping. The second method targets subsequence matching under constrained Dynamic Time Warping (cDTW), where warping paths are not allowed to stray too much off the diagonal. In our experiments, good trade-offs between retrieval accuracy and retrieval efficiency are obtained for both methods, and the results are competitive with respect to current state-of-the-art methods. © 2011 ACM

    Modeling Susceptibility to Periodontitis

    No full text
    Chronic inflammatory diseases like periodontitis have a complex pathogenesis and a multifactorial etiology, involving complex interactions between multiple genetic loci and infectious agents. We aimed to investigate the influence of genetic polymorphisms and bacteria on chronic periodontitis risk. We determined the prevalence of 12 single-nucleotide polymorphisms (SNPs) in immune response candidate genes and 7 bacterial species of potential relevance to periodontitis etiology, in chronic periodontitis patients and non-periodontitis control individuals (N = 385). Using decision tree analysis, we identified the presence of bacterial species Tannerella forsythia, Porphyromonas gingivalis, Aggregatibacter actinomycetemcomitans, and SNPs TNF -857 and IL-1A -889 as discriminators between periodontitis and non-periodontitis. The model reached an accuracy of 80%, sensitivity of 85%, specificity of 73%, and AUC of 73%. This pilot study shows that, on the basis of 3 periodontal pathogens and SNPs, patterns may be recognized to identify patients at risk for periodontitis. Modern bioinformatics tools are valuable in modeling the multifactorial and complex nature of periodontitis
    corecore