Search CORE

27,125 research outputs found

Substring filtering for low-cost linked data interfaces

Author: E Minack
I Ermilov
J Van Herwegen
L Rietveld
M Arias Gallego
M Nelson
MP Ferguson
NR Brisaboa
O Erling
R Li
R Verborgh
S van Hooland
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Recently, Triple Pattern Fragments (TPFS) were introduced as a low-cost server-side interface when high numbers of clients need to evaluate SPARQL queries. Scalability is achieved by moving part of the query execution to the client, at the cost of elevated query times. Since the TPFS interface purposely does not support complex constructs such as SPARQL filters, queries that use them need to be executed mostly on the client, resulting in long execution times. We therefore investigated the impact of adding a literal substring matching feature to the TPFS interface, with the goal of improving query performance while maintaining low server cost. In this paper, we discuss the client/server setup and compare the performance of SPARQL queries on multiple implementations, including Elastic Search and case-insensitive FM-index. Our evaluations indicate that these improvements allow for faster query execution without significantly increasing the load on the server. Offering the substring feature on TPF servers allows users to obtain faster responses for filter-based SPARQL queries. Furthermore, substring matching can be used to support other filters such as complete regular expressions or range queries

Crossref

Ghent University Academic Bibliography

RCSB PDB Mobile: iOS and Android mobile apps to provide data access and visualization to the RCSB Protein Data Bank.

Author: Berman Helen M
Bi Chunxiao
Bourne Philip E
Christie Cole H
Nakane Takanori
Pang Kyle
Prlić Andreas
Quinn Gregory B
Rose Peter W
Voigt Maria
Zardecki Christine
Publication venue: eScholarship, University of California
Publication date: 02/09/2014
Field of study

SummaryThe Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) resource provides tools for query, analysis and visualization of the 3D structures in the PDB archive. As the mobile Web is starting to surpass desktop and laptop usage, scientists and educators are beginning to integrate mobile devices into their research and teaching. In response, we have developed the RCSB PDB Mobile app for the iOS and Android mobile platforms to enable fast and convenient access to RCSB PDB data and services. Using the app, users from the general public to expert researchers can quickly search and visualize biomolecules, and add personal annotations via the RCSB PDB's integrated MyPDB service.Availability and implementationRCSB PDB Mobile is freely available from the Apple App Store and Google Play (http://www.rcsb.org)

PubMed Central

eScholarship - University of California

Report of MIRACLE team for the Ad-Hoc track in CLEF 2006

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2006
Field of study

This paper presents the 2006 MIRACLE’s team approach to the AdHoc Information Retrieval track. The experiments for this campaign keep on testing our IR approach. First, a baseline set of runs is obtained, including standard components: stemming, transforming, filtering, entities detection and extracting, and others. Then, a extended set of runs is obtained using several types of combinations of these baseline runs. The improvements introduced for this campaign have been a few ones: we have used an entity recognition and indexing prototype tool into our tokenizing scheme, and we have run more combining experiments for the robust multilingual case than in previous campaigns. However, no significative improvements have been achieved. For the this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, French, Hungarian, and Portuguese. - Bilingual: English to Bulgarian, French, Hungarian, and Portuguese; Spanish to French and Portuguese; and French to Portuguese. - Robust monolingual: German, English, Spanish, French, Italian, and Dutch. - Robust bilingual: English to German, Italian to Spanish, and French to Dutch. - Robust multilingual: English to robust monolingual languages. We still need to work harder to improve some aspects of our processing scheme, being the most important, to our knowledge, the entities recognition and normalization

Archivo Digital UPM

Derandomized Parallel Repetition via Structured PCPs

Author: Dinur Irit
Meir Or
Publication venue
Publication date: 01/01/2014
Field of study

A PCP is a proof system for NP in which the proof can be checked by a probabilistic verifier. The verifier is only allowed to read a very small portion of the proof, and in return is allowed to err with some bounded probability. The probability that the verifier accepts a false proof is called the soundness error, and is an important parameter of a PCP system that one seeks to minimize. Constructing PCPs with sub-constant soundness error and, at the same time, a minimal number of queries into the proof (namely two) is especially important due to applications for inapproximability. In this work we construct such PCP verifiers, i.e., PCPs that make only two queries and have sub-constant soundness error. Our construction can be viewed as a combinatorial alternative to the "manifold vs. point" construction, which is the only construction in the literature for this parameter range. The "manifold vs. point" PCP is based on a low degree test, while our construction is based on a direct product test. We also extend our construction to yield a decodable PCP (dPCP) with the same parameters. By plugging in this dPCP into the scheme of Dinur and Harsha (FOCS 2009) one gets an alternative construction of the result of Moshkovitz and Raz (FOCS 2008), namely: a construction of two-query PCPs with small soundness error and small alphabet size. Our construction of a PCP is based on extending the derandomized direct product test of Impagliazzo, Kabanets and Wigderson (STOC 09) to a derandomized parallel repetition theorem. More accurately, our PCP construction is obtained in two steps. We first prove a derandomized parallel repetition theorem for specially structured PCPs. Then, we show that any PCP can be transformed into one that has the required structure, by embedding it on a de-Bruijn graph

arXiv.org e-Print Archive

CiteSeerX

Toward Entity-Aware Search

Author: Cheng Tao
Publication venue
Publication date: 01/12/2010
Field of study

As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability

Illinois Digital Environment for Access to Learning and Scholarship Repository

Reify Your Collection Queries for Modularity and Speed!

Author: Eichberg Michael
Giarrusso Paolo G.
Kästner Christian
Mitschke Ralf
Ostermann Klaus
Rendel Tillmann
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/10/2012
Field of study

Modularity and efficiency are often contradicting requirements, such that programers have to trade one for the other. We analyze this dilemma in the context of programs operating on collections. Performance-critical code using collections need often to be hand-optimized, leading to non-modular, brittle, and redundant code. In principle, this dilemma could be avoided by automatic collection-specific optimizations, such as fusion of collection traversals, usage of indexing, or reordering of filters. Unfortunately, it is not obvious how to encode such optimizations in terms of ordinary collection APIs, because the program operating on the collections is not reified and hence cannot be analyzed. We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala collections API that allows such analyses and optimizations to be defined and executed within Scala, without relying on external tools or compiler extensions. SQuOpt provides the same "look and feel" (syntax and static typing guarantees) as the standard collections API. We evaluate SQuOpt by re-implementing several code analyses of the Findbugs tool using SQuOpt, show average speedups of 12x with a maximum of 12800x and hence demonstrate that SQuOpt can reconcile modularity and efficiency in real-world applications.Comment: 20 page

arXiv.org e-Print Archive

TUbiblio

Crossref