726 research outputs found
CLEF 2005: Ad Hoc track overview
We describe the objectives and organization of the CLEF 2005 ad hoc track and discuss the main characteristics of the tasks offered to test monolingual, bilingual and multilingual textual document retrieval. The performance achieved for each task is presented and a preliminary analysis of results is given. The paper focuses in particular on the multilingual tasks which reused the test collection created in CLEF 2003 in an attempt to see if an improvement in system performance over time could be measured, and also to examine the multilingual results merging problem
Principles of content analysis for information retrieval systems: an overview
"Unquestionably, the content analysis which has emerged as part of Information Retrieval Systems (IRS, e.g. literature databases) over the past 20 years has much in common with the content analysis used by linguists or in the social sciences. However, its intrinsic value stems from the special context in which it is used: a) Close interdependencies link the selected content analysis with the retrieval situation. The user’s retrieval strategies, which are intended to obtain information relevant to the current problem situation, and the available aids (e.g. expansion lists or user-friendly browsing tools) affect the efficacy of some analysis techniques (e.g. noun phrase analysis from computer linguistics) to a considerable extent. b) Normally, a commercial IRS handles mass data, thus necessitating the use of a reduced content analysis even today. Full morphological, syntactic, semantic and pragmatic text analyses are unthinkable simply for efficiency reasons but also for knowledge reasons. Content analysis in IRS is therefore a component part of a special type of restricted system which obeys its own laws. Against the backdrop of these considerations, forms of content analysis in present-day commercial retrieval systems are studied and promising expansions and alternatives are proposed." (author's abstract
Extracting knowledge from web communities and linked data for case-based reasoning systems
Web communities and the Web 2.0 provide a huge amount of experiences and there has been a growing availability of Linked Open Data. Making experiences and data available as knowledge to be used in case-based reasoning CBR systems is a current research effort. The process of extracting such knowledge from the diverse data types used in web communities, to transform data obtained from Linked Data sources, and then formalising it for CBR, is not an easy task. In this paper, we present a prototype, the Knowledge Extraction Workbench KEWo, which supports the knowledge engineer in this task. We integrated the KEWo into the open-source case-based reasoning tool myCBR Workbench. We provide details on the abilities of the KEWo to extract vocabularies from Linked Data sources and generate taxonomies from Linked Data as well as from web community data in the form of semi-structured texts
Relevance distributions across Bradford Zones: Can Bradfordizing improve search?
The purpose of this paper is to describe the evaluation of the effectiveness
of the bibliometric technique Bradfordizing in an information retrieval (IR)
scenario. Bradfordizing is used to re-rank topical document sets from
conventional abstracting & indexing (A&I) databases into core and more
peripheral document zones. Bradfordized lists of journal articles and
monographs will be tested in a controlled scenario consisting of different A&I
databases from social and political sciences, economics, psychology and medical
science, 164 standardized IR topics and intellectual assessments of the listed
documents. Does Bradfordizing improve the ratio of relevant documents in the
first third (core) compared to the second and last third (zone 2 and zone 3,
respectively)? The IR tests show that relevance distributions after re-ranking
improve at a significant level if documents in the core are compared with
documents in the succeeding zones. After Bradfordizing of document pools, the
core has a significant better average precision than zone 2, zone 3 and
baseline. This paper should be seen as an argument in favour of alternative
non-textual (bibliometric) re-ranking methods which can be simply applied in
text-based retrieval systems and in particular in A&I databases.Comment: 11 pages, 2 figures, Preprint of a full paper @ 14th International
Society of Scientometrics and Informetrics Conference (ISSI 2013
Formation of microtubule-based traps controls the sorting and concentration of vesicles to restricted sites of regenerating neurons after axotomy
Transformation of a transected axonal tip into a growth cone (GC) is a critical step in the cascade leading to neuronal regeneration. Critical to the regrowth is the supply and concentration of vesicles at restricted sites along the cut axon. The mechanisms underlying these processes are largely unknown. Using online confocal imaging of transected, cultured Aplysia californica neurons, we report that axotomy leads to reorientation of the microtubule (MT) polarities and formation of two distinct MT-based vesicle traps at the cut axonal end. Approximately 100 μm proximal to the cut end, a selective trap for anterogradely transported vesicles is formed, which is the plus end trap. Distally, a minus end trap is formed that exclusively captures retrogradely transported vesicles. The concentration of anterogradely transported vesicles in the former trap optimizes the formation of a GC after axotomy
Deriving case base vocabulary from web community data
This paper presents and approach for knowledge extraction for Case-Based Reasoning systems. The recent development of the WWW, especially the Web 2.0, shows that many successful applications are web based. Moreover, the Web 2.0 offers many experiences and our approach uses those experiences to fill the knowledge containers. We are especially focusing on vocabulary knowledge and are using forum posts to create domain-dependent taxonomies that can be directly used in Case-Based Reasoning systems. This paper introduces the applied knowledge extraction process based on the KDD process and explains its application on a web forum for travelers
Time series classification with ensembles of elastic distance measures
Several alternative distance measures for comparing time series have recently been proposed and evaluated on time series classification (TSC) problems. These include variants of dynamic time warping (DTW), such as weighted and derivative DTW, and edit distance-based measures, including longest common subsequence, edit distance with real penalty, time warp with edit, and move–split–merge. These measures have the common characteristic that they operate in the time domain and compensate for potential localised misalignment through some elastic adjustment. Our aim is to experimentally test two hypotheses related to these distance measures. Firstly, we test whether there is any significant difference in accuracy for TSC problems between nearest neighbour classifiers using these distance measures. Secondly, we test whether combining these elastic distance measures through simple ensemble schemes gives significantly better accuracy. We test these hypotheses by carrying out one of the largest experimental studies ever conducted into time series classification. Our first key finding is that there is no significant difference between the elastic distance measures in terms of classification accuracy on our data sets. Our second finding, and the major contribution of this work, is to define an ensemble classifier that significantly outperforms the individual classifiers. We also demonstrate that the ensemble is more accurate than approaches not based in the time domain. Nearly all TSC papers in the data mining literature cite DTW (with warping window set through cross validation) as the benchmark for comparison. We believe that our ensemble is the first ever classifier to significantly outperform DTW and as such raises the bar for future work in this area
Opinion Holder and Target Extraction on Opinion Compounds – A Linguistic Approach
We present an approach to the new task of opinion holder and target extraction on opinion compounds. Opinion compounds (e.g. user rating or victim support) are noun compounds whose head is an opinion noun. We do not only examine features known to be effective for noun compound analysis, such as paraphrases and semantic classes of heads and modifiers, but also propose novel features tailored to this new task. Among them, we examine paraphrases that jointly consider holders and targets, a verb detour in which noun heads are replaced by related verbs, a global head constraint allowing inferencing between different compounds, and the categorization of the sentiment view that the head conveys
- …