28 research outputs found
Search-Optimized Suffix-Tree Storage for Biological Applications
Suffix-trees are popular indexing structures for various sequence processing problems in biological data management. We investigate here the possibility of enhancing the search efficiency of disk-resident suffix-trees through customized layouts of tree-nodes to disk-pages. Specifically, we propose a new layout strategy, called Stellar, that provides significantly improved search performance on a representative set of real genomic sequences. Further, Stellar supports both the standard root-to-leaf lookup queries as well as sophisticated sequence search algorithms that exploit the suffix-links of suffix-trees. Our results are encouraging with regard to the ultimate objective of seamlessly integrating sequence processing in database engines
Interview with GraceAnne DeCandido
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a number of techniques exist for answering reachability queries and approximating node distances efficiently, determining actual shortest paths (i. e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to near-exact shortest-path approximations in real world graphs. We evaluate our techniques – implemented within a fully functional RDF graph database system – over large real-world social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0% and 1% on average
Antourage: Mining Distance-constrained Trips from Flickr
We study how to automatically extract tourist trips from large volumes of geo-tagged photographs. Working with more than 8 million of these photographs that are publicly available via photo- sharing communities such as Flickr and Panoramio, our goal is to satisfy the needs of a tourist who specifies a starting location (typically a hotel) together with a bounded travel distance and demands a tour that visits the popular sites along the way. Our system, named ANTOURAGE, solves this intractable problem using a novel adaptation of the max-min ant system (MMAS) meta-heuristic. Experiments using GPS metadata crawled from Flickr show that ANTOURAGE can generate high-quality tours
Temporal Knowledge for Timely Intelligence
Knowledge bases about entities and their relationships are a great asset for business intelligence. Major advances in information extraction and the proliferation of knowledge-sharing communities like Wikipedia have enabled ways for the largely automated construction of rich knowledge bases. Such knowledge about entity-oriented facts can greatly improve the output quality and possibly also efficiency of processing business-relevant documents and event logs. This holds for information within the enterprise as well as in Web communities such as blogs. However, no knowledge base will ever be fully complete and real-world knowledge is continuously changing: new facts supersede old facts, knowledge grows in various dimensions, and completely new classes, relation types, or knowledge structures will arise. This leads to a number of difficult research questions regarding temporal knowledge and the life-cycle of knowledge bases. This short paper outlines challenging issues and research opportunities, and provides references to technical literature