Search CORE

10,465 research outputs found

Compressed k2-Triples for Full-In-Memory RDF Engines

Author: Brisaboa Nieves R.
Fernández Javier D.
Martínez-Prieto Miguel A.
Álvarez-García Sandra
Publication venue
Publication date: 19/05/2011
Field of study

Current "data deluge" has flooded the Web of Data with very large RDF datasets. They are hosted and queried through SPARQL endpoints which act as nodes of a semantic net built on the principles of the Linked Data project. Although this is a realistic philosophy for global data publishing, its query performance is diminished when the RDF engines (behind the endpoints) manage these huge datasets. Their indexes cannot be fully loaded in main memory, hence these systems need to perform slow disk accesses to solve SPARQL queries. This paper addresses this problem by a compact indexed RDF structure (called k2-triples) applying compact k2-tree structures to the well-known vertical-partitioning technique. It obtains an ultra-compressed representation of large RDF graphs and allows SPARQL queries to be full-in-memory performed without decompression. We show that k2-triples clearly outperforms state-of-the-art compressibility and traditional vertical-partitioning query resolution, remaining very competitive with multi-index solutions.Comment: In Proc. of AMCIS'201

arXiv.org e-Print Archive

AIS Electronic Library (AISeL)

An information-driven framework for image mining

Author: Hsu Wynne
Lee Mong Li
Zhang Ji
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2001
Field of study

[Abstract]: Image mining systems that can automatically extract semantically meaningful information (knowledge) from image data are increasingly in demand. The fundamental challenge in image mining is to determine how low-level, pixel representation contained in a raw image or image sequence can be processed to identify high-level spatial objects and relationships. To meet this challenge, we propose an efficient information-driven framework for image mining. We distinguish four levels of information: the Pixel Level, the Object Level, the Semantic Concept Level, and the Pattern and Knowledge Level. High-dimensional indexing schemes and retrieval techniques are also included in the framework to support the flow of information among the levels. We believe this framework represents the first step towards capturing the different levels of information present in image data and addressing the issues and challenges of discovering useful patterns/knowledge from each level

CiteSeerX

Language-based multimedia information retrieval

Author: Gauvain J.L.
Hiemstra D.
Jong F.M.G. de
Netter K.
Publication venue
Publication date: 01/01/2000
Field of study

This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality

CiteSeerX

University of Twente Research Information

Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods

Author: Aste Tomaso
Di Matteo Tiziana
Musmeci Nicolo
Publication venue: 'Infopro Digital Services Ltd'
Publication date: 01/01/2015
Field of study

We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing it with the underlying industrial activity structure. Specifically, we apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. In particular, by taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover, we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging.Comment: 31 pages, 17 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

King's Research Portal

FigShare

HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

Author: Arora Akhil
Bhattacharya Arnab
Kumar Piyush
Sinha Sakshi
Publication venue: 'VLDB Endowment'
Publication date: 23/04/2018
Field of study

Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor search. In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. HD-Index consists of a set of novel hierarchical structures called RDB-trees built on Hilbert keys of database objects. The leaves of the RDB-trees store distances of database objects to reference objects, thereby allowing efficient pruning using distance filters. In addition to triangular inequality, we also use Ptolemaic inequality to produce better lower bounds. Experiments on massive (up to billion scale) high-dimensional (up to 1000+) datasets show that HD-Index is effective, efficient, and scalable.Comment: PVLDB 11(8):906-919, 201

arXiv.org e-Print Archive