915 research outputs found
Automatic extraction of paraphrastic phrases from medium size corpora
This paper presents a versatile system intended to acquire paraphrastic
phrases from a representative corpus. In order to decrease the time spent on
the elaboration of resources for NLP system (for example Information
Extraction, IE hereafter), we suggest to use a machine learning system that
helps defining new templates and associated resources. This knowledge is
automatically derived from the text collection, in interaction with a large
semantic network
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
Rewriting and suppressing UMLS terms for improved biomedical term identification
<p>Abstract</p> <p>Background</p> <p>Identification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule.</p> <p>Results</p> <p>Five of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus.</p> <p>Conclusions</p> <p>We recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at <url>http://biosemantics.org/casper</url>.</p
Enlightened Romanticism: Mary Gartside’s colour theory in the age of Moses Harris, Goethe and George Field
The aim of this paper is to evaluate the work of Mary Gartside, a British female colour theorist, active in London between 1781 and 1808. She published three books between 1805 and 1808. In chronological and intellectual terms Gartside can cautiously be regarded an exemplary link between Moses Harris, who published a short but important theory of colour in the second half of the eighteenth century, and J.W. von Goethe’s highly influential Zur Farbenlehre, published in Germany in 1810. Gartside’s colour theory was published privately under the disguise of a traditional water colouring manual, illustrated with stunning abstract colour blots (see example above). Until well into the twentieth century, she remained the only woman known to have published a theory of colour. In contrast to Goethe and other colour theorists in the late 18th and early 19th century Gartside was less inclined to follow the anti-Newtonian attitudes of the Romantic movement
Similarity of Semantic Relations
There are at least two kinds of similarity. Relational similarity is
correspondence between relations, in contrast with attributional similarity,
which is correspondence between attributes. When two words have a high
degree of attributional similarity, we call them synonyms. When two pairs
of words have a high degree of relational similarity, we say that their
relations are analogous. For example, the word pair mason:stone is analogous
to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA),
a method for measuring relational similarity. LRA has potential applications in many
areas, including information extraction, word sense disambiguation,
and information retrieval. Recently the Vector Space Model (VSM) of information
retrieval has been adapted to measuring relational similarity,
achieving a score of 47% on a collection of 374 college-level multiple-choice
word analogy questions. In the VSM approach, the relation between a pair of words is
characterized by a vector of frequencies of predefined patterns in a large corpus.
LRA extends the VSM approach in three ways: (1) the patterns are derived automatically
from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency
data, and (3) automatically generated synonyms are used to explore variations of the
word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the
average human score of 57%. On the related problem of classifying semantic relations, LRA
achieves similar gains over the VSM
An Architecture for Data and Knowledge Acquisition for the Semantic Web: the AGROVOC Use Case
We are surrounded by ever growing volumes of unstructured and weakly-structured information, and for a human being, domain expert or not, it is nearly impossible to read, understand and categorize such information in a fair amount of time. Moreover, different user categories have different expectations: final users need easy-to-use tools and services for specific tasks, knowledge engineers require robust tools for knowledge acquisition, knowledge categorization and semantic resources development, while semantic applications developers demand for flexible frameworks for fast and easy, standardized development of complex applications. This work represents an experience report on the use of the CODA framework for rapid prototyping and deployment of knowledge acquisition systems for RDF. The system integrates independent NLP tools and custom libraries complying with UIMA standards. For our experiment a document set has been processed to populate the AGROVOC thesaurus with two new relationships
OntoAna: Domain Ontology for Human Anatomy
Today, we can find many search engines which provide us with information
which is more operational in nature. None of the search engines provide domain
specific information. This becomes very troublesome to a novice user who wishes
to have information in a particular domain. In this paper, we have developed an
ontology which can be used by a domain specific search engine. We have
developed an ontology on human anatomy, which captures information regarding
cardiovascular system, digestive system, skeleton and nervous system. This
information can be used by people working in medical and health care domain.Comment: Proceedings of 5th CSI National Conference on Education and Research.
Organized by Lingayay University, Faridabad. Sponsored by Computer Society of
India and IEEE Delhi Chapter. Proceedings published by Lingayay University
Pres
Towards a Knowledge Graph based Speech Interface
Applications which use human speech as an input require a speech interface
with high recognition accuracy. The words or phrases in the recognised text are
annotated with a machine-understandable meaning and linked to knowledge graphs
for further processing by the target application. These semantic annotations of
recognised words can be represented as a subject-predicate-object triples which
collectively form a graph often referred to as a knowledge graph. This type of
knowledge representation facilitates to use speech interfaces with any spoken
input application, since the information is represented in logical, semantic
form, retrieving and storing can be followed using any web standard query
languages. In this work, we develop a methodology for linking speech input to
knowledge graphs and study the impact of recognition errors in the overall
process. We show that for a corpus with lower WER, the annotation and linking
of entities to the DBpedia knowledge graph is considerable. DBpedia Spotlight,
a tool to interlink text documents with the linked open data is used to link
the speech recognition output to the DBpedia knowledge graph. Such a
knowledge-based speech recognition interface is useful for applications such as
question answering or spoken dialog systems.Comment: Under Review in International Workshop on Grounding Language
Understanding, Satellite of Interspeech 201
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
- …