98,173 research outputs found
Ontology-Based MEDLINE Document Classification
An increasing and overwhelming amount of biomedical information is available in the research literature mainly in the form of free-text. Biologists need tools that automate their information search and deal with the high volume and ambiguity of free-text. Ontologies can help automatic information processing by providing standard concepts and information about the relationships between concepts. The Medical Subject Headings (MeSH) ontology is already available and used by MEDLINE indexers to annotate the conceptual content of biomedical articles. This paper presents a domain-independent method that uses the MeSH ontology inter-concept relationships to extend the existing MeSH-based representation of MEDLINE documents. The extension method is evaluated within a document triage task organized by the Genomics track of the 2005 Text REtrieval Conference (TREC). Our method for extending the representation of documents leads to an improvement of 17% over a non-extended baseline in terms of normalized utility, the metric defined for the task. The SVMlight software is used to classify documents
Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR) 2007
This is the proceedings of the Workshop on Semantic Content Acquisition and Representation, held in conjunction with NODALIDA 2007, on May 24 2007 in Tartu, Estonia.</p
Multiple Retrieval Models and Regression Models for Prior Art Search
This paper presents the system called PATATRAS (PATent and Article Tracking,
Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach
presents three main characteristics: 1. The usage of multiple retrieval models
(KL, Okapi) and term index definitions (lemma, phrase, concept) for the three
languages considered in the present track (English, French, German) producing
ten different sets of ranked results. 2. The merging of the different results
based on multiple regression models using an additional validation set created
from the patent collection. 3. The exploitation of patent metadata and of the
citation structures for creating restricted initial working sets of patents and
for producing a final re-ranking regression model. As we exploit specific
metadata of the patent documents and the citation relations only at the
creation of initial working sets and during the final post ranking step, our
architecture remains generic and easy to extend
Semantic Matchmaking as Non-Monotonic Reasoning: A Description Logic Approach
Matchmaking arises when supply and demand meet in an electronic marketplace,
or when agents search for a web service to perform some task, or even when
recruiting agencies match curricula and job profiles. In such open
environments, the objective of a matchmaking process is to discover best
available offers to a given request. We address the problem of matchmaking from
a knowledge representation perspective, with a formalization based on
Description Logics. We devise Concept Abduction and Concept Contraction as
non-monotonic inferences in Description Logics suitable for modeling
matchmaking in a logical framework, and prove some related complexity results.
We also present reasonable algorithms for semantic matchmaking based on the
devised inferences, and prove that they obey to some commonsense properties.
Finally, we report on the implementation of the proposed matchmaking framework,
which has been used both as a mediator in e-marketplaces and for semantic web
services discovery
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
Generating global products of LAI and FPAR from SNPP-VIIRS data: theoretical background and implementation
Leaf area index (LAI) and fraction of photosynthetically active radiation (FPAR) absorbed by vegetation have been successfully generated from the Moderate Resolution Imaging Spectroradiometer (MODIS) data since early 2000. As the Visible Infrared Imaging Radiometer Suite (VIIRS) instrument onboard, the Suomi National Polar-orbiting Partnership (SNPP) has inherited the scientific role of MODIS, and the development of a continuous, consistent, and well-characterized VIIRS LAI/FPAR data set is critical to continue the MODIS time series. In this paper, we build the radiative transfer-based VIIRS-specific lookup tables by achieving minimal difference with the MODIS data set and maximal spatial coverage of retrievals from the main algorithm. The theory of spectral invariants provides the configurable physical parameters, i.e., single scattering albedos (SSAs) that are optimized for VIIRS-specific characteristics. The effort finds a set of smaller red-band SSA and larger near-infraredband SSA for VIIRS compared with the MODIS heritage. The VIIRS LAI/FPAR is evaluated through comparisons with one year of MODIS product in terms of both spatial and temporal patterns. Further validation efforts are still necessary to ensure the product quality. Current results, however, imbue confidence in the VIIRS data set and suggest that the efforts described here meet the goal of achieving the operationally consistent multisensor LAI/FPAR data sets. Moreover, the strategies of parametric adjustment and LAI/FPAR evaluation applied to SNPP-VIIRS can also be employed to the subsequent Joint Polar Satellite System VIIRS or other instruments.Accepted manuscrip
Pruning-based identification of domain ontologies
We present a novel approach of extracting a domain ontology from large-scale thesauri. Concepts are identified to be relevant for a domain based on their frequent occurrence in domain texts. The approach allows to bootstrap the ontology engineering process from given legacy thesauri and identifies an initial domain ontology that may easily be refined by experts in a later stage. We present a thorough evaluation of the results obtained in building a biosecurity ontology for the UN FAO AOS project
Spatio-textual indexing for geographical search on the web
Many web documents refer to specific geographic localities and many
people include geographic context in queries to web search engines. Standard
web search engines treat the geographical terms in the same way as other terms.
This can result in failure to find relevant documents that refer to the place of
interest using alternative related names, such as those of included or nearby
places. This can be overcome by associating text indexing with spatial indexing
methods that exploit geo-tagging procedures to categorise documents with
respect to geographic space. We describe three methods for spatio-textual
indexing based on multiple spatially indexed text indexes, attaching spatial
indexes to the document occurrences of a text index, and merging text index
access results with results of access to a spatial index of documents. These
schemes are compared experimentally with a conventional text index search
engine, using a collection of geo-tagged web documents, and are shown to be
able to compete in speed and storage performance with pure text indexing
- …