Search CORE

10,427 research outputs found

Report on the first Twente Data Management Workshop on XML Databases and Information Retrieval

Author: Hiemstra D.
Mihajlovic V.
Publication venue: ACM Press
Publication date: 01/01/2004
Field of study

The Database Group of the University of Twente initiated a new series of workshops called Twente Data Management workshops (TDM), starting with one on XML Databases and Information Retrieval which took place on 21 June 2004 at the University of Twente. We have set ourselves two goals for the workshop series: i) To provide a forum to share original ideas as well as research results on data management problems; ii) To bring together researchers from the database community and researchers from related research fields

Radboud Repository

University of Twente Research Information

PFTijah: text search in an XML database system

Author: Flokstra J.
Hiemstra D.
Os R. van
Rode H.
Publication venue: Ecole Nationale Supérieure des Mines de Saint-Etienne
Publication date: 01/01/2006
Field of study

This paper introduces the PFTijah system, a text search system that is integrated with an XML/XQuery database management system. We present examples of its use, we explain some of the system internals, and discuss plans for future work. PFTijah is part of the open source release of MonetDB/XQuery

CiteSeerX

Radboud Repository

University of Twente Research Information

DCU and ISI@INEX 2010: Ad-hoc and data-centric tracks

Author: Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Publication venue
Publication date: 01/12/2010
Field of study

We describe the participation of Dublin City University (DCU)and the Indian Statistical Institute (ISI) in INEX 2010. The main contributions of this paper are: i) a simplified version of Hierarchical Language Model (HLM) which involves scoring XML elements with a combined probability of generating the given query from itself and the top level article node, is shown to outperform the baselines of Language Model (LM) and Vector Space Model (VSM) scoring of XML elements; ii) the Expectation Maximization (EM) feedback in LM is shown to be the most effective on the domain specic collection of IMDB; iii) automated removal of sentences indicating aspects of irrelevance from the narratives of INEX ad-hoc topics is shown to improve retrieval eectiveness

Irish Universities

DCU Online Research Access Service

Use of Wikipedia Categories in Entity Ranking

Author: Pehcevski Jovan
Thom James A.
Vercoustre Anne-Marie
Publication venue
Publication date: 01/01/2007
Field of study

Wikipedia is a useful source of knowledge that has many applications in language processing and knowledge representation. The Wikipedia category graph can be compared with the class hierarchy in an ontology; it has some characteristics in common as well as some differences. In this paper, we present our approach for answering entity ranking queries from the Wikipedia. In particular, we explore how to make use of Wikipedia categories to improve entity ranking effectiveness. Our experiments show that using categories of example entities works significantly better than using loosely defined target categories

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Capturing k-ary Existential Second Order Logic with k-ary Inclusion-Exclusion Logic

Author: Rönnholm Raine
Publication venue
Publication date: 01/01/2018
Field of study

In this paper we analyze k-ary inclusion-exclusion logic, INEX[k], which is obtained by extending first order logic with k-ary inclusion and exclusion atoms. We show that every formula of INEX[k] can be expressed with a formula of k-ary existential second order logic, ESO[k]. Conversely, every formula of ESO[k] with at most k-ary free relation variables can be expressed with a formula of INEX[k]. From this it follows that, on the level of sentences, INEX[k] captures the expressive power of ESO[k]. We also introduce several useful operators that can be expressed in INEX[k]. In particular, we define inclusion and exclusion quantifiers and so-called term value preserving disjunction which is essential for the proofs of the main results in this paper. Furthermore, we present a novel method of relativization for team semantics and analyze the duality of inclusion and exclusion atoms.Comment: Extended version of a paper published in Annals of Pure and Applied Logic 169 (3), 177-21

arXiv.org e-Print Archive

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Evaluation campaigns and TRECVid

Author: Kraaij Wessel
Over Paul
Smeaton Alan F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

The TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus, automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

Author: Torres-Moreno Juan-Manuel
Publication venue
Publication date: 14/09/2012
Field of study

In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

arXiv.org e-Print Archive

CiteSeerX