Search CORE

9,028 research outputs found

A 2D based Partition Strategy for Solving Ranking under Team Context (RTP)

Author: Feng Ling
Li Dongxu
Li Xiang
Lu Xiaolu
Publication venue
Publication date: 14/04/2014
Field of study

In this paper, we propose a 2D based partition method for solving the problem of Ranking under Team Context(RTC) on datasets without a priori. We first map the data into 2D space using its minimum and maximum value among all dimensions. Then we construct window queries with consideration of current team context. Besides, during the query mapping procedure, we can pre-prune some tuples which are not top ranked ones. This pre-classified step will defer processing those tuples and can save cost while providing solutions for the problem. Experiments show that our algorithm performs well especially on large datasets with correctness

arXiv.org e-Print Archive

CiteSeerX

Experiments in terabyte searching, genomic retrieval and novelty detection for TREC 2004

Author: Blott Stephen
Boydell Oisín
Camous Fabrice
Ferguson Paul
Gaughan Georgina
Gurrin Cathal
Jones Gareth J.F.
Murphy Noel
O'Connor Noel E.
Smeaton Alan F.
Smyth Barry
Wilkins Peter
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2004
Field of study

In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty. In this paper we will discuss each track separately and present separate conclusions from this work. In addition, we present a general description of a text retrieval engine that we have developed in the last year to support our experiments into large scale, distributed information retrieval, which underlies all of the track experiments described in this document

Irish Universities

DCU Online Research Access Service

HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

Author: Arora Akhil
Bhattacharya Arnab
Kumar Piyush
Sinha Sakshi
Publication venue: 'VLDB Endowment'
Publication date: 23/04/2018
Field of study

Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor search. In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. HD-Index consists of a set of novel hierarchical structures called RDB-trees built on Hilbert keys of database objects. The leaves of the RDB-trees store distances of database objects to reference objects, thereby allowing efficient pruning using distance filters. In addition to triangular inequality, we also use Ptolemaic inequality to produce better lower bounds. Experiments on massive (up to billion scale) high-dimensional (up to 1000+) datasets show that HD-Index is effective, efficient, and scalable.Comment: PVLDB 11(8):906-919, 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

De Novo Assembly of Nucleotide Sequences in a Compressed Feature Space

Author: Robertson David L.
Tapinos Avraam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Sequencing technologies allow for an in-depth analysis of biological species but the size of the generated datasets introduce a number of analytical challenges. Recently, we demonstrated the application of numerical sequence representations and data transformations for the alignment of short reads to a reference genome. Here, we expand out approach for de novo assembly of short reads. Our results demonstrate that highly compressed data can encapsulate the signal suffi- ciently to accurately assemble reads to big contigs or complete genomes

Crossref

Enlighten

Unsupervised Graph-based Rank Aggregation for Improved Retrieval

Author: Dourado Icaro Cavalcante
Pedronette Daniel Carlos Guimarães
Torres Ricardo da Silva
Publication venue: 'Elsevier BV'
Publication date: 18/03/2019
Field of study

This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations. We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters. A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions

arXiv.org e-Print Archive

Repositorio da Producao Cientifica e Intelectual da Unicamp