6,531 research outputs found

    Unity in diversity : integrating differing linguistic data in TUSNELDA

    Get PDF
    This paper describes the creation and preparation of TUSNELDA, a collection of corpus data built for linguistic research. This collection contains a number of linguistically annotated corpora which differ in various aspects such as language, text sorts / data types, encoded annotation levels, and linguistic theories underlying the annotation. The paper focuses on this variation on the one hand and the way how these heterogeneous data are integrated into one resource on the other hand

    Use of Solr and Xapian in the Invenio document repository software

    Full text link
    Invenio is a free comprehensive web-based document repository and digital library software suite originally developed at CERN. It can serve a variety of use cases from an institutional repository or digital library to a web journal. In order to fully use full-text documents for efficient search and ranking, Solr was integrated into Invenio through a generic bridge. Solr indexes extracted full-texts and most relevant metadata. Consequently, Invenio takes advantage of Solr's efficient search and word similarity ranking capabilities. In this paper, we first give an overview of Invenio, its capabilities and features. We then present our open source Solr integration as well as scalability challenges that arose for an Invenio-based multi-million record repository: the CERN Document Server. We also compare our Solr adapter to an alternative Xapian adapter using the same generic bridge. Both integrations are distributed with the Invenio package and ready to be used by the institutions using or adopting Invenio

    ONTOLOGY BASED TECHNICAL SKILL SIMILARITY

    Get PDF
    Online job boards have become a major platform for technical talent procurement and job search. These job portals have given rise to challenging matching and search problems. The core matching or search happens between technical skills of the job requirements and the candidate\u27s profile or keywords. The extensive list of technical skills and its polyonymous nature makes it less effective to perform a direct keyword matching. This results in substandard job matching or search results which misses out a closely matching candidate on account of it not having the exact skills. It is important to use a semantic similarity measure between skills to improve the relevance of the results. This paper proposes a semantic similarity measure between technical skills using a knowledge based approach. The approach builds an ontology using DBpedia and uses it to derive a similarity score. Feature based ontology similarity measures are used to derive a similarity score between two skills. The ontology also helps in resolving a base skill from its multiple representations. The paper discusses implementation of custom ontology, similarity measuring system and performance of the system in comparing technical skills. The proposed approach performs better than the Resumatcher system in finding the similarity between skills. Keywords

    A semantic-based system for querying personal digital libraries

    Get PDF
    This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-540-28640-0_4. Copyright @ Springer 2004.The decreasing cost and the increasing availability of new technologies is enabling people to create their own digital libraries. One of the main topic in personal digital libraries is allowing people to select interesting information among all the different digital formats available today (pdf, html, tiff, etc.). Moreover the increasing availability of these on-line libraries, as well as the advent of the so called Semantic Web [1], is raising the demand for converting paper documents into digital, possibly semantically annotated, documents. These motivations drove us to design a new system which could enable the user to interact and query documents independently from the digital formats in which they are represented. In order to achieve this independence from the format we consider all the digital documents contained in a digital library as images. Our system tries to automatically detect the layout of the digital documents and recognize the geometric regions of interest. All the extracted information is then encoded with respect to a reference ontology, so that the user can query his digital library by typing free text or browsing the ontology

    Advanced content-based semantic scene analysis and information retrieval: the SCHEMA project

    Get PDF
    The aim of the SCHEMA Network of Excellence is to bring together a critical mass of universities, research centers, industrial partners and end users, in order to design a reference system for content-based semantic scene analysis, interpretation and understanding. Relevant research areas include: content-based multimedia analysis and automatic annotation of semantic multimedia content, combined textual and multimedia information retrieval, semantic -web, MPEG-7 and MPEG-21 standards, user interfaces and human factors. In this paper, recent advances in content-based analysis, indexing and retrieval of digital media within the SCHEMA Network are presented. These advances will be integrated in the SCHEMA module-based, expandable reference system
    • 

    corecore