98,173 research outputs found

    Ontology-Based MEDLINE Document Classification

    Get PDF
    An increasing and overwhelming amount of biomedical information is available in the research literature mainly in the form of free-text. Biologists need tools that automate their information search and deal with the high volume and ambiguity of free-text. Ontologies can help automatic information processing by providing standard concepts and information about the relationships between concepts. The Medical Subject Headings (MeSH) ontology is already available and used by MEDLINE indexers to annotate the conceptual content of biomedical articles. This paper presents a domain-independent method that uses the MeSH ontology inter-concept relationships to extend the existing MeSH-based representation of MEDLINE documents. The extension method is evaluated within a document triage task organized by the Genomics track of the 2005 Text REtrieval Conference (TREC). Our method for extending the representation of documents leads to an improvement of 17% over a non-extended baseline in terms of normalized utility, the metric defined for the task. The SVMlight software is used to classify documents

    Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR) 2007

    Get PDF
    This is the proceedings of the Workshop on Semantic Content Acquisition and Representation, held in conjunction with NODALIDA 2007, on May 24 2007 in Tartu, Estonia.</p

    Multiple Retrieval Models and Regression Models for Prior Art Search

    Get PDF
    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend

    Semantic Matchmaking as Non-Monotonic Reasoning: A Description Logic Approach

    Full text link
    Matchmaking arises when supply and demand meet in an electronic marketplace, or when agents search for a web service to perform some task, or even when recruiting agencies match curricula and job profiles. In such open environments, the objective of a matchmaking process is to discover best available offers to a given request. We address the problem of matchmaking from a knowledge representation perspective, with a formalization based on Description Logics. We devise Concept Abduction and Concept Contraction as non-monotonic inferences in Description Logics suitable for modeling matchmaking in a logical framework, and prove some related complexity results. We also present reasonable algorithms for semantic matchmaking based on the devised inferences, and prove that they obey to some commonsense properties. Finally, we report on the implementation of the proposed matchmaking framework, which has been used both as a mediator in e-marketplaces and for semantic web services discovery

    Using distributional similarity to organise biomedical terminology

    Get PDF
    We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

    Generating global products of LAI and FPAR from SNPP-VIIRS data: theoretical background and implementation

    Full text link
    Leaf area index (LAI) and fraction of photosynthetically active radiation (FPAR) absorbed by vegetation have been successfully generated from the Moderate Resolution Imaging Spectroradiometer (MODIS) data since early 2000. As the Visible Infrared Imaging Radiometer Suite (VIIRS) instrument onboard, the Suomi National Polar-orbiting Partnership (SNPP) has inherited the scientific role of MODIS, and the development of a continuous, consistent, and well-characterized VIIRS LAI/FPAR data set is critical to continue the MODIS time series. In this paper, we build the radiative transfer-based VIIRS-specific lookup tables by achieving minimal difference with the MODIS data set and maximal spatial coverage of retrievals from the main algorithm. The theory of spectral invariants provides the configurable physical parameters, i.e., single scattering albedos (SSAs) that are optimized for VIIRS-specific characteristics. The effort finds a set of smaller red-band SSA and larger near-infraredband SSA for VIIRS compared with the MODIS heritage. The VIIRS LAI/FPAR is evaluated through comparisons with one year of MODIS product in terms of both spatial and temporal patterns. Further validation efforts are still necessary to ensure the product quality. Current results, however, imbue confidence in the VIIRS data set and suggest that the efforts described here meet the goal of achieving the operationally consistent multisensor LAI/FPAR data sets. Moreover, the strategies of parametric adjustment and LAI/FPAR evaluation applied to SNPP-VIIRS can also be employed to the subsequent Joint Polar Satellite System VIIRS or other instruments.Accepted manuscrip

    Pruning-based identification of domain ontologies

    Get PDF
    We present a novel approach of extracting a domain ontology from large-scale thesauri. Concepts are identified to be relevant for a domain based on their frequent occurrence in domain texts. The approach allows to bootstrap the ontology engineering process from given legacy thesauri and identifies an initial domain ontology that may easily be refined by experts in a later stage. We present a thorough evaluation of the results obtained in building a biosecurity ontology for the UN FAO AOS project

    Spatio-textual indexing for geographical search on the web

    Get PDF
    Many web documents refer to specific geographic localities and many people include geographic context in queries to web search engines. Standard web search engines treat the geographical terms in the same way as other terms. This can result in failure to find relevant documents that refer to the place of interest using alternative related names, such as those of included or nearby places. This can be overcome by associating text indexing with spatial indexing methods that exploit geo-tagging procedures to categorise documents with respect to geographic space. We describe three methods for spatio-textual indexing based on multiple spatially indexed text indexes, attaching spatial indexes to the document occurrences of a text index, and merging text index access results with results of access to a spatial index of documents. These schemes are compared experimentally with a conventional text index search engine, using a collection of geo-tagged web documents, and are shown to be able to compete in speed and storage performance with pure text indexing
    corecore