Search CORE

100 research outputs found

The Flexible Group Spatial Keyword Query

Author: D Papadias
G Cong
GR Hjaltason
K Yao
ME Ali
N Roussopoulos
X Cao
Z Li
Publication venue
Publication date: 24/04/2017
Field of study

We present a new class of service for location based social networks, called the Flexible Group Spatial Keyword Query, which enables a group of users to collectively find a point of interest (POI) that optimizes an aggregate cost function combining both spatial distances and keyword similarities. In addition, our query service allows users to consider the tradeoffs between obtaining a sub-optimal solution for the entire group and obtaining an optimimized solution but only for a subgroup. We propose algorithms to process three variants of the query: (i) the group nearest neighbor with keywords query, which finds a POI that optimizes the aggregate cost function for the whole group of size n, (ii) the subgroup nearest neighbor with keywords query, which finds the optimal subgroup and a POI that optimizes the aggregate cost function for a given subgroup size m (m <= n), and (iii) the multiple subgroup nearest neighbor with keywords query, which finds optimal subgroups and corresponding POIs for each of the subgroup sizes in the range [m, n]. We design query processing algorithms based on branch-and-bound and best-first paradigms. Finally, we provide theoretical bounds and conduct extensive experiments with two real datasets which verify the effectiveness and efficiency of the proposed algorithms.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Bulk Insertions into xBR+ -trees

Author: G Roumelis
G Roumelis
GR Hjaltason
L Arge
L Chen
R Choubey
S Shekhar
T Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Bulk insertion refers to the process of updating an existing index by inserting a large batch of new data, treating the items of this batch as a whole and not by inserting these items one-by-one. Bulk insertion is related to bulk loading, which refers to the process of creating a non-existing index from scratch, when the dataset to be indexed is available beforehand. The xBR + -tree is a balanced, disk-resident, Quadtree-based index for point data, which is very efficient for processing spatial queries. In this paper, we present the first algorithm for bulk insertion into xBR+ -trees. This algorithm incorporates extensions of techniques that we have recently developed for bulk loading xBR+ -trees. Moreover, using real and artificial datasets of various cardinalities, we present an experimental comparison of this algorithm vs. inserting items one-by-one for updating xBR+ -trees, regarding performance (I/O and execution time) and the characteristics of the resulting trees. We also present experimental results regarding the query-processing efficiency of xBR+ -trees built by bulk insertions vs. xBR+ -trees built by inserting items one-by-one

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional de la Universidad de Almería (Spain)

Spatial Queries in the Presence of Obstacles

Author: E. Dijkstra
E. Welzl
G. Hjaltason
M. Berg de
T. Asano
T. Lozano-Pérez
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs

Author: B Naidan
DD Lewis
DM Blei
DW Jacobs
E Chávez
G Chechik
GR Hjaltason
GT Toussaint
H Samet
L Boytsov
M Aumüller
S Kullback
S Robertson
T Skopal
Y Malkov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/10/2019
Field of study

We demonstrate that a graph-based search algorithm-relying on the construction of an approximate neighborhood graph-can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space mapping and/or distance symmetrization, which, in turn, lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually ineffective, we find that constructing an index using a modified, e.g., symmetrized, distance can improve performance. This observation paves a way to a new line of research of designing index-specific graph-construction distance functions

arXiv.org e-Print Archive

Crossref

Efficient k-nearest neighbor searching in nonordered discrete data spaces

Author: Berchtold S.
Ciaccia P.
Dashiell Kolbe
Guttman A.
Hjaltason G.
Kolahdouzan M.
Kolbe D.
Korn F.
Qian G.
Qiang Zhu
Sakti Pramanik
Seidl T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Recommended from our members

Anonymisation of geographical distance matrices via Lipschitz embedding

Author: AS Whittemore
BS Everitt
CD Lloyd
DR Helsel
G Duncan
GR Hjaltason
GT Duncan
H-W Jung
J Bourgain
J Höhne
J Konc
JJ Trinckes
K Emam El
K Emam El
K Emam El
K Emam El
K Emam El
K Kenthapadi
K Riesen
KC Clarke
KH Hampton
L Sweeney
LA Waller
M Kroll
Martin Kroll
MM Merener
MP Armstrong
MP Gutmann
Rainer Schnell
RS Bivand
S Dray
SC Wieland
T Dalenius
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

BACKGROUND: Anonymisation of spatially referenced data has received increasing attention in recent years. Whereas the research focus has been on the anonymisation of point locations, the disclosure risk arising from the publishing of inter-point distances and corresponding anonymisation methods have not been studied systematically. METHODS: We propose a new anonymisation method for the release of geographical distances between records of a microdata file-for example patients in a medical database. We discuss a data release scheme in which microdata without coordinates and an additional distance matrix between the corresponding rows of the microdata set are released. In contrast to most other approaches this method preserves small distances better than larger distances. The distances are modified by a variant of Lipschitz embedding. RESULTS: The effects of the embedding parameters on the risk of data disclosure are evaluated by linkage experiments using simulated data. The results indicate small disclosure risks for appropriate embedding parameters. CONCLUSION: The proposed method is useful if published distance information might be misused for the re-identification of records. The method can be used for publishing scientific-use-files and as an additional tool for record-linkage studies

City Research Online

Crossref

Springer - Publisher Connector

PubMed Central

Using metric space indexing for complete and efficient record linkage

Author: A Reid
B Ramadan
C Li
D Hand
G Papadakis
GR Hjaltason
H Newcombe
IP Fellegi
L Bo
P Christen
P Christen
P Zezula
Q Wang
R Connor
R Connor
RC Steorts
V Levenshtein
XL Dong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Record linkage is the process of identifying records that refer to the same real-world entities in situations where entity identifiers are unavailable. Records are linked on the basis of similarity between common attributes, with every pair being classified as a link or non-link depending on their similarity. Linkage is usually performed in a three-step process: first, groups of similar candidate records are identified using indexing, then pairs within the same group are compared in more detail, and finally classified. Even state-of-the-art indexing techniques, such as locality sensitive hashing, have potential drawbacks. They may fail to group together some true matching records with high similarity, or they may group records with low similarity, leading to high computational overhead. We propose using metric space indexing (MSI) to perform complete linkage, resulting in a parameter-free process combining indexing, comparison and classification into a single step delivering complete and efficient record linkage. An evaluation on real-world data from several domains shows that linkage using MSI can yield better quality than current indexing techniques, with similar execution cost, without the need for domain knowledge or trial and error to configure the process.Postprin

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Neuregulin 1 and susceptibility to schizophrenia

To access full text version of this article. Please click on the hyperlink "View/Open" at the bottom of this pageThe cause of schizophrenia is unknown, but it has a significant genetic component. Pharmacologic studies, studies of gene expression in man, and studies of mouse mutants suggest involvement of glutamate and dopamine neurotransmitter systems. However, so far, strong association has not been found between schizophrenia and variants of the genes encoding components of these systems. Here, we report the results of a genomewide scan of schizophrenia families in Iceland; these results support previous work, done in five populations, showing that schizophrenia maps to chromosome 8p. Extensive fine-mapping of the 8p locus and haplotype-association analysis, supplemented by a transmission/disequilibrium test, identifies neuregulin 1 (NRG1) as a candidate gene for schizophrenia. NRG1 is expressed at central nervous system synapses and has a clear role in the expression and activation of neurotransmitter receptors, including glutamate receptors. Mutant mice heterozygous for either NRG1 or its receptor, ErbB4, show a behavioral phenotype that overlaps with mouse models for schizophrenia. Furthermore, NRG1 hypomorphs have fewer functional NMDA receptors than wild-type mice. We also demonstrate that the behavioral phenotypes of the NRG1 hypomorphs are partially reversible with clozapine, an atypical antipsychotic drug used to treat schizophrenia

Landspítali University Hospital Research Archive

Fourteen sequence variants that associate with multiple sclerosis discovered by meta-analysis informed by genetic correlations

Author: Aarsland Dag
Alfredsson Lars
Andreassen Ole A.
Benediktsson Rafn
Bjarnason Ragnar
Bjornsdottir Unnur S.
Bjornsson Einar S
Bjornsson Sigurdur
Bos Steffan Daniël
Celius Elisabeth G.
Djurovic Srdjan
Euesden Jack
Fladby Tormod
Geirsson Arni J.
Gislason Thorarinn
Grondal Gerdur
Gudbjartsson Daniel F
Gustafsson Omar
Harbo Hanne F.
Hillert Jan
Hjaltason Haukur
Ingason Andres
Johannesson Ari
Jonasson Jon G.
Jonsdottir Ingileif
Jonsson Stefan
Knudsen Gun Peggy
Kockum Ingrid Skelton
Kristjansdottir Helga
Kristjansdottir Sjofn
Ludviksson Bjorn Runar
Masson Gisli
Mikaelsdottir Evgenia
Myhr Kjell-Morten
Nilsson Bjorn
Olafsson Elias
Olafsson Jon Hjaltalin
Olafsson Sigurgeir
Olsson Tomas
Orvar Kjartan B.
Rafnar Thorunn
Sigurdsson Snaevar
Sigurgeirsson Bardur
Stefansson Hreinn
Stefansson Kari
Steinsson Kristjan
Stridh Pernilla
Sulem Patrick
Thorleifsson Gudmar
Thorsson Arni V.
Thorsteinsdottir Unnur
Valdimarsson Helgi
Valdimarsson Trausti
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

To access publisher's full text version of this article, please click on the hyperlink in Additional Links field or click on the hyperlink at the top of the page marked FilesA meta-analysis of publicly available summary statistics on multiple sclerosis combined with three Nordic multiple sclerosis cohorts (21,079 cases, 371,198 controls) revealed seven sequence variants associating with multiple sclerosis, not reported previously. Using polygenic risk scores based on public summary statistics of variants outside the major histocompatibility complex region we quantified genetic overlap between common autoimmune diseases in Icelanders and identified disease clusters characterized by autoantibody presence/absence. As multiple sclerosis-polygenic risk scores captures the risk of primary biliary cirrhosis and vice versa (P = 1.6 x 10(-7), 4.3 x 10(-9)) we used primary biliary cirrhosis as a proxy-phenotype for multiple sclerosis, the idea being that variants conferring risk of primary biliary cirrhosis have a prior probability of conferring risk of multiple sclerosis. We tested 255 variants forming the primary biliary cirrhosis-polygenic risk score and found seven multiple sclerosis-associating variants not correlated with any previously established multiple sclerosis variants. Most of the variants discovered are close to or within immune-related genes. One is a low-frequency missense variant in TYK2, another is a missense variant in MTHFR that reduces the function of the encoded enzyme affecting methionine metabolism, reported to be dysregulated in multiple sclerosis brain.Swedish Research Council Knut and Alice Wallenberg Foundation AFA Foundation Swedish Brain Foundatio

Landspítali University Hospital Research Archive

Lund University Publications

Crossref

Opin visindi

Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

Author: AGK Janacek
André Skupin
BC Vanteru
Bob Schijvenaars
Colin Allen
David Newman
DJ Newman
DK Harman
DM Blei
EM Voorhees
EP Jiang
F Janssens
G Gorrell
G Salton
GL Poulter
GR Hjaltason
HM Müller
J Lewis
J Lin
J Lin
Joseph R. Biberstine
K Börner
K Järvelin
K Sparck Jones
K Sparck Jones
Katy Börner
Kevin W. Boyack
KW Boyack
KW Boyack
KW Boyack
MA Hearst
MD Cao
Michael Patek
MW Berry
N Jardine
Nianli Ma
NJ Belkin
P Ahlgren
P Ahlgren
P Calado
P Castells
R Kassab
R Klavans
Richard Klavans
Russell J. Duhon
S Deerwester
S Martin
SE Robertson
T Couto
T Hofmann
T Kohonen
T Kohonen
T Theodosiou
TG Kolda
TK Landauer
WS Cooper
Y Aphinyanaphongs
Y Yamamoto
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts

Public Library of Science (PLOS)

Crossref

IUScholarWorks (University of Indiana)

Directory of Open Access Journals

PubMed Central

eScholarship - University of California