10,427 research outputs found
Report on the first Twente Data Management Workshop on XML Databases and Information Retrieval
The Database Group of the University of Twente initiated a new series of workshops called Twente Data Management workshops (TDM), starting with one on XML Databases and Information Retrieval which took place on 21 June 2004 at the University of Twente. We have set ourselves two goals for the workshop series: i) To provide a forum to share original ideas as well as research results on data management problems; ii) To bring together researchers from the database community and researchers from related research fields
PFTijah: text search in an XML database system
This paper introduces the PFTijah system, a text search system that is integrated with an XML/XQuery database management system. We present examples of its use, we explain some of the system internals, and discuss plans for future work. PFTijah is part of the open source release of MonetDB/XQuery
DCU and ISI@INEX 2010: Ad-hoc and data-centric tracks
We describe the participation of Dublin City University (DCU)and the Indian Statistical Institute (ISI) in INEX 2010. The main contributions of this paper are: i) a simplified version of Hierarchical Language Model (HLM) which involves scoring XML elements with a combined probability of generating the given query from itself and the top level article node, is shown to outperform the baselines of Language Model (LM) and Vector Space Model (VSM) scoring of XML elements; ii) the Expectation Maximization (EM) feedback in LM is shown to be the most effective on the domain specic collection of IMDB; iii) automated removal of sentences indicating aspects of irrelevance from the narratives
of INEX ad-hoc topics is shown to improve retrieval eectiveness
Use of Wikipedia Categories in Entity Ranking
Wikipedia is a useful source of knowledge that has many applications in
language processing and knowledge representation. The Wikipedia category graph
can be compared with the class hierarchy in an ontology; it has some
characteristics in common as well as some differences. In this paper, we
present our approach for answering entity ranking queries from the Wikipedia.
In particular, we explore how to make use of Wikipedia categories to improve
entity ranking effectiveness. Our experiments show that using categories of
example entities works significantly better than using loosely defined target
categories
Capturing k-ary Existential Second Order Logic with k-ary Inclusion-Exclusion Logic
In this paper we analyze k-ary inclusion-exclusion logic, INEX[k], which is
obtained by extending first order logic with k-ary inclusion and exclusion
atoms. We show that every formula of INEX[k] can be expressed with a formula of
k-ary existential second order logic, ESO[k]. Conversely, every formula of
ESO[k] with at most k-ary free relation variables can be expressed with a
formula of INEX[k]. From this it follows that, on the level of sentences,
INEX[k] captures the expressive power of ESO[k].
We also introduce several useful operators that can be expressed in INEX[k].
In particular, we define inclusion and exclusion quantifiers and so-called term
value preserving disjunction which is essential for the proofs of the main
results in this paper. Furthermore, we present a novel method of relativization
for team semantics and analyze the duality of inclusion and exclusion atoms.Comment: Extended version of a paper published in Annals of Pure and Applied
Logic 169 (3), 177-21
Evaluation campaigns and TRECVid
The TREC Video Retrieval Evaluation (TRECVid) is an
international benchmarking activity to encourage research
in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video
corpus, automatic detection of a variety of semantic and
low-level video features, shot boundary detection and the
detection of story boundaries in broadcast TV news. This
paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether
such campaigns are a good thing or a bad thing. There are
arguments for and against these campaigns and we present
some of them in the paper concluding that on balance they
have had a very positive impact on research progress
Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization
In Automatic Text Summarization, preprocessing is an important phase to
reduce the space of textual representation. Classically, stemming and
lemmatization have been widely used for normalizing words. However, even using
normalization on large texts, the curse of dimensionality can disturb the
performance of summarizers. This paper describes a new method for normalization
of words to further reduce the space of representation. We propose to reduce
each word to its initial letters, as a form of Ultra-stemming. The results show
that Ultra-stemming not only preserve the content of summaries produced by this
representation, but often the performances of the systems can be dramatically
improved. Summaries on trilingual corpora were evaluated automatically with
Fresa. Results confirm an increase in the performance, regardless of summarizer
system used.Comment: 22 pages, 12 figures, 9 table
- …