27,454 research outputs found
Enhancing Content-And-Structure Information Retrieval using a Native XML Database
Three approaches to content-and-structure XML retrieval are analysed in this
paper: first by using Zettair, a full-text information retrieval system; second
by using eXist, a native XML database, and third by using a hybrid XML
retrieval system that uses eXist to produce the final answers from likely
relevant articles retrieved by Zettair. INEX 2003 content-and-structure topics
can be classified in two categories: the first retrieving full articles as
final answers, and the second retrieving more specific elements within articles
as final answers. We show that for both topic categories our initial hybrid
system improves the retrieval effectiveness of a native XML database. For
ranking the final answer elements, we propose and evaluate a novel retrieval
model that utilises the structural relationships between the answer elements of
a native XML database and retrieves Coherent Retrieval Elements. The final
results of our experiments show that when the XML retrieval task focusses on
highly relevant elements our hybrid XML retrieval system with the Coherent
Retrieval Elements module is 1.8 times more effective than Zettair and 3 times
more effective than eXist, and yields an effective content-and-structure XML
retrieval
PhenDisco: phenotype discovery system for the database of genotypes and phenotypes.
The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) and is currently available via NCBI's dbGaP Entrez interface. The database is an important resource, providing GWAS data that can be used for new exploratory research or cross-study validation by authorized users. However, finding studies relevant to a particular phenotype of interest is challenging, as phenotype information is presented in a non-standardized way. To address this issue, we developed PhenDisco (phenotype discoverer), a new information retrieval system for dbGaP. PhenDisco consists of two main components: (1) text processing tools that standardize phenotype variables and study metadata, and (2) information retrieval tools that support queries from users and return ranked results. In a preliminary comparison involving 18 search scenarios, PhenDisco showed promising performance for both unranked and ranked search comparisons with dbGaP's search engine Entrez. The system can be accessed at http://pfindr.net
Matching Queries to Frequently Asked Questions: Search Functionality for the MRSA Web-Portal
As part of the long-term EUREGIO MRSA-net project a system was developed which enables health care workers and the general public to quickly find answers to their questions regarding the MRSA pathogen. This paper focuses on how these questions can be answered using Information Retrieval (IR) and Natural Language Processing (NLP) techniques on a Frequently-Asked-Questions-style (FAQ) database
Social Search with Missing Data: Which Ranking Algorithm?
Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services which can do naive profile matching with old database technology are too brittle in the absence of key data, and even modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder which can automatically identify buddies who can best match a user's search requirements specified in a term-based query, even in the absence of stored user-profiles. We deploy and compare five statistical measures, namely, our own CORDER, mutual information (MI), phi-squared, improved MI and Z score, and two TF/IDF based baseline methods to find online users who best match the search requirements based on 'inferred profiles' of these users in the form of scavenged web pages. These measures identify statistically significant relationships between online users and a term-based query. Our user evaluation on two groups of users shows that BuddyFinder can find users highly relevant to search queries, and that CORDER achieved the best average ranking correlations among all seven algorithms and improved the performance of both baseline methods
Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database
This paper investigates the impact of three approaches to XML retrieval:
using Zettair, a full-text information retrieval system; using eXist, a native
XML database; and using a hybrid system that takes full article answers from
Zettair and uses eXist to extract elements from those articles. For the
content-only topics, we undertake a preliminary analysis of the INEX 2003
relevance assessments in order to identify the types of highly relevant
document components. Further analysis identifies two complementary sub-cases of
relevance assessments ("General" and "Specific") and two categories of topics
("Broad" and "Narrow"). We develop a novel retrieval module that for a
content-only topic utilises the information from the resulting answer list of a
native XML database and dynamically determines the preferable units of
retrieval, which we call "Coherent Retrieval Elements". The results of our
experiments show that -- when each of the three systems is evaluated against
different retrieval scenarios (such as different cases of relevance
assessments, different topic categories and different choices of evaluation
metrics) -- the XML retrieval systems exhibit varying behaviour and the best
performance can be reached for different values of the retrieval parameters. In
the case of INEX 2003 relevance assessments for the content-only topics, our
newly developed hybrid XML retrieval system is substantially more effective
than either Zettair or eXist, and yields a robust and a very effective XML
retrieval.Comment: Postprint version. The editor version can be accessed through the DO
Towards a Cloud-Based Service for Maintaining and Analyzing Data About Scientific Events
We propose the new cloud-based service OpenResearch for managing and
analyzing data about scientific events such as conferences and workshops in a
persistent and reliable way. This includes data about scientific articles,
participants, acceptance rates, submission numbers, impact values as well as
organizational details such as program committees, chairs, fees and sponsors.
OpenResearch is a centralized repository for scientific events and supports
researchers in collecting, organizing, sharing and disseminating information
about scientific events in a structured way. An additional feature currently
under development is the possibility to archive web pages along with the
extracted semantic data in order to lift the burden of maintaining new and old
conference web sites from public research institutions. However, the main
advantage is that this cloud-based repository enables a comprehensive analysis
of conference data. Based on extracted semantic data, it is possible to
determine quality estimations, scientific communities, research trends as well
the development of acceptance rates, fees, and number of participants in a
continuous way complemented by projections into the future. Furthermore, data
about research articles can be systematically explored using a content-based
analysis as well as citation linkage. All data maintained in this
crowd-sourcing platform is made freely available through an open SPARQL
endpoint, which allows for analytical queries in a flexible and user-defined
way.Comment: A completed version of this paper had been accepted in SAVE-SD
workshop 2017 at WWW conferenc
Ranking relations using analogies in biological and information networks
Analogical reasoning depends fundamentally on the ability to learn and
generalize about relations between objects. We develop an approach to
relational learning which, given a set of pairs of objects
,
measures how well other pairs A:B fit in with the set . Our work
addresses the following question: is the relation between objects A and B
analogous to those relations found in ? Such questions are
particularly relevant in information retrieval, where an investigator might
want to search for analogous pairs of objects that match the query set of
interest. There are many ways in which objects can be related, making the task
of measuring analogies very challenging. Our approach combines a similarity
measure on function spaces with Bayesian analysis to produce a ranking. It
requires data containing features of the objects of interest and a link matrix
specifying which relationships exist; no further attributes of such
relationships are necessary. We illustrate the potential of our method on text
analysis and information networks. An application on discovering functional
interactions between pairs of proteins is discussed in detail, where we show
that our approach can work in practice even if a small set of protein pairs is
provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- ā¦