42,351 research outputs found
Ontology-Based MEDLINE Document Classification
An increasing and overwhelming amount of biomedical information is available in the research literature mainly in the form of free-text. Biologists need tools that automate their information search and deal with the high volume and ambiguity of free-text. Ontologies can help automatic information processing by providing standard concepts and information about the relationships between concepts. The Medical Subject Headings (MeSH) ontology is already available and used by MEDLINE indexers to annotate the conceptual content of biomedical articles. This paper presents a domain-independent method that uses the MeSH ontology inter-concept relationships to extend the existing MeSH-based representation of MEDLINE documents. The extension method is evaluated within a document triage task organized by the Genomics track of the 2005 Text REtrieval Conference (TREC). Our method for extending the representation of documents leads to an improvement of 17% over a non-extended baseline in terms of normalized utility, the metric defined for the task. The SVMlight software is used to classify documents
Multiple hierarchies : new aspects of an old solution
In this paper, we present the Multiple Annotation approach, which solves two problems: the problem of annotating overlapping structures, and the problem that occurs when documents should be annotated according to different, possibly heterogeneous tag sets. This approach has many advantages: it is based on XML, the modeling of alternative annotations is possible, each level can be viewed separately, and new levels can be added at any time. The files can be regarded as an interrelated unit, with the text serving as the implicit link. Two representations of the information contained in the multiple files (one in Prolog and one in XML) are described. These representations serve as a base for several applications
Extending, trimming and fusing WordNet for technical documents
This paper describes a tool for the automatic
extension and trimming of a multilingual
WordNet database for cross-lingual retrieval
and multilingual ontology building in
intranets and domain-specific document
collections. Hierarchies, built from
automatically extracted terms and combined
with the WordNet relations, are trimmed
with a disambiguation method based on the
document salience of the words in the
glosses. The disambiguation is tested in a
cross-lingual retrieval task, showing
considerable improvement (7%-11%). The
condensed hierarchies can be used as
browse-interfaces to the documents
complementary to retrieval
Recommended from our members
Automatic Annotation and Semantic Search from Protégé
Semantic search has been one of the major envisioned benefits of the Semantic Web since its emergence in the late 90âs [1]. Our demo shows a proposal towards this goal. One way to view a semantic search engine is as a tool that gets formal queries (e.g. in RDQL, RQL, SPARQL, or the like) from a client, executes them against an ontology-based knowledge base, and returns tuples of ontology values (resources) that satisfy the query [2]. While this conception of semantic search brings enormous advantages already, our work aims at taking a step beyond this. In our view of Information Retrieval in the Semantic Web, a search engine returns documents, rather than (or in addition to) exact values, in response to user queries. The engine should rank the documents, according to concept-based relevance criteria. The overall retrieval process is illustrated in Figure 1 (see [3] for more details of our research)
- âŠ