726 research outputs found

    CLEF 2005: Ad Hoc track overview

    Get PDF
    We describe the objectives and organization of the CLEF 2005 ad hoc track and discuss the main characteristics of the tasks offered to test monolingual, bilingual and multilingual textual document retrieval. The performance achieved for each task is presented and a preliminary analysis of results is given. The paper focuses in particular on the multilingual tasks which reused the test collection created in CLEF 2003 in an attempt to see if an improvement in system performance over time could be measured, and also to examine the multilingual results merging problem

    Principles of content analysis for information retrieval systems: an overview

    Full text link
    "Unquestionably, the content analysis which has emerged as part of Information Retrieval Systems (IRS, e.g. literature databases) over the past 20 years has much in common with the content analysis used by linguists or in the social sciences. However, its intrinsic value stems from the special context in which it is used: a) Close interdependencies link the selected content analysis with the retrieval situation. The user’s retrieval strategies, which are intended to obtain information relevant to the current problem situation, and the available aids (e.g. expansion lists or user-friendly browsing tools) affect the efficacy of some analysis techniques (e.g. noun phrase analysis from computer linguistics) to a considerable extent. b) Normally, a commercial IRS handles mass data, thus necessitating the use of a reduced content analysis even today. Full morphological, syntactic, semantic and pragmatic text analyses are unthinkable simply for efficiency reasons but also for knowledge reasons. Content analysis in IRS is therefore a component part of a special type of restricted system which obeys its own laws. Against the backdrop of these considerations, forms of content analysis in present-day commercial retrieval systems are studied and promising expansions and alternatives are proposed." (author's abstract

    Extracting knowledge from web communities and linked data for case-based reasoning systems

    Get PDF
    Web communities and the Web 2.0 provide a huge amount of experiences and there has been a growing availability of Linked Open Data. Making experiences and data available as knowledge to be used in case-based reasoning CBR systems is a current research effort. The process of extracting such knowledge from the diverse data types used in web communities, to transform data obtained from Linked Data sources, and then formalising it for CBR, is not an easy task. In this paper, we present a prototype, the Knowledge Extraction Workbench KEWo, which supports the knowledge engineer in this task. We integrated the KEWo into the open-source case-based reasoning tool myCBR Workbench. We provide details on the abilities of the KEWo to extract vocabularies from Linked Data sources and generate taxonomies from Linked Data as well as from web community data in the form of semi-structured texts

    Relevance distributions across Bradford Zones: Can Bradfordizing improve search?

    Full text link
    The purpose of this paper is to describe the evaluation of the effectiveness of the bibliometric technique Bradfordizing in an information retrieval (IR) scenario. Bradfordizing is used to re-rank topical document sets from conventional abstracting & indexing (A&I) databases into core and more peripheral document zones. Bradfordized lists of journal articles and monographs will be tested in a controlled scenario consisting of different A&I databases from social and political sciences, economics, psychology and medical science, 164 standardized IR topics and intellectual assessments of the listed documents. Does Bradfordizing improve the ratio of relevant documents in the first third (core) compared to the second and last third (zone 2 and zone 3, respectively)? The IR tests show that relevance distributions after re-ranking improve at a significant level if documents in the core are compared with documents in the succeeding zones. After Bradfordizing of document pools, the core has a significant better average precision than zone 2, zone 3 and baseline. This paper should be seen as an argument in favour of alternative non-textual (bibliometric) re-ranking methods which can be simply applied in text-based retrieval systems and in particular in A&I databases.Comment: 11 pages, 2 figures, Preprint of a full paper @ 14th International Society of Scientometrics and Informetrics Conference (ISSI 2013

    Formation of microtubule-based traps controls the sorting and concentration of vesicles to restricted sites of regenerating neurons after axotomy

    Get PDF
    Transformation of a transected axonal tip into a growth cone (GC) is a critical step in the cascade leading to neuronal regeneration. Critical to the regrowth is the supply and concentration of vesicles at restricted sites along the cut axon. The mechanisms underlying these processes are largely unknown. Using online confocal imaging of transected, cultured Aplysia californica neurons, we report that axotomy leads to reorientation of the microtubule (MT) polarities and formation of two distinct MT-based vesicle traps at the cut axonal end. Approximately 100 μm proximal to the cut end, a selective trap for anterogradely transported vesicles is formed, which is the plus end trap. Distally, a minus end trap is formed that exclusively captures retrogradely transported vesicles. The concentration of anterogradely transported vesicles in the former trap optimizes the formation of a GC after axotomy

    Deriving case base vocabulary from web community data

    Get PDF
    This paper presents and approach for knowledge extraction for Case-Based Reasoning systems. The recent development of the WWW, especially the Web 2.0, shows that many successful applications are web based. Moreover, the Web 2.0 offers many experiences and our approach uses those experiences to fill the knowledge containers. We are especially focusing on vocabulary knowledge and are using forum posts to create domain-dependent taxonomies that can be directly used in Case-Based Reasoning systems. This paper introduces the applied knowledge extraction process based on the KDD process and explains its application on a web forum for travelers

    Time series classification with ensembles of elastic distance measures

    Get PDF
    Several alternative distance measures for comparing time series have recently been proposed and evaluated on time series classification (TSC) problems. These include variants of dynamic time warping (DTW), such as weighted and derivative DTW, and edit distance-based measures, including longest common subsequence, edit distance with real penalty, time warp with edit, and move–split–merge. These measures have the common characteristic that they operate in the time domain and compensate for potential localised misalignment through some elastic adjustment. Our aim is to experimentally test two hypotheses related to these distance measures. Firstly, we test whether there is any significant difference in accuracy for TSC problems between nearest neighbour classifiers using these distance measures. Secondly, we test whether combining these elastic distance measures through simple ensemble schemes gives significantly better accuracy. We test these hypotheses by carrying out one of the largest experimental studies ever conducted into time series classification. Our first key finding is that there is no significant difference between the elastic distance measures in terms of classification accuracy on our data sets. Our second finding, and the major contribution of this work, is to define an ensemble classifier that significantly outperforms the individual classifiers. We also demonstrate that the ensemble is more accurate than approaches not based in the time domain. Nearly all TSC papers in the data mining literature cite DTW (with warping window set through cross validation) as the benchmark for comparison. We believe that our ensemble is the first ever classifier to significantly outperform DTW and as such raises the bar for future work in this area

    Opinion Holder and Target Extraction on Opinion Compounds – A Linguistic Approach

    Get PDF
    We present an approach to the new task of opinion holder and target extraction on opinion compounds. Opinion compounds (e.g. user rating or victim support) are noun compounds whose head is an opinion noun. We do not only examine features known to be effective for noun compound analysis, such as paraphrases and semantic classes of heads and modifiers, but also propose novel features tailored to this new task. Among them, we examine paraphrases that jointly consider holders and targets, a verb detour in which noun heads are replaced by related verbs, a global head constraint allowing inferencing between different compounds, and the categorization of the sentiment view that the head conveys
    corecore