9,705 research outputs found
Distributional Measures of Semantic Distance: A Survey
The ability to mimic human notions of semantic distance has widespread
applications. Some measures rely only on raw text (distributional measures) and
some rely on knowledge sources such as WordNet. Although extensive studies have
been performed to compare WordNet-based measures with human judgment, the use
of distributional measures as proxies to estimate semantic distance has
received little attention. Even though they have traditionally performed poorly
when compared to WordNet-based measures, they lay claim to certain uniquely
attractive features, such as their applicability in resource-poor languages and
their ability to mimic both semantic similarity and semantic relatedness.
Therefore, this paper presents a detailed study of distributional measures.
Particular attention is paid to flesh out the strengths and limitations of both
WordNet-based and distributional measures, and how distributional measures of
distance can be brought more in line with human notions of semantic distance.
We conclude with a brief discussion of recent work on hybrid measures
Measuring the Sentence Level Similarity
This article describes a method used to calculate the similarity between short English texts, specifically of sentence length. The described algorithm calculates semantic and word order similarities of two sentences. In order to do so, it uses a structured lexical knowledge base and statistical information from a corpus. The described method works well in determining sentence similarity for most sentence pairs, consequently the implemented method can be used in computer automated sentence similarity measurements and other text based mining problems. We encapsulated the implemented algorithm in a .NET library, to simplify the task of calculating sentence similarity for end users
Neurocognitive Informatics Manifesto.
Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given
Identifying Relationships Among Sentences in Court Case Transcripts Using Discourse Relations
Case Law has a significant impact on the proceedings of legal cases.
Therefore, the information that can be obtained from previous court cases is
valuable to lawyers and other legal officials when performing their duties.
This paper describes a methodology of applying discourse relations between
sentences when processing text documents related to the legal domain. In this
study, we developed a mechanism to classify the relationships that can be
observed among sentences in transcripts of United States court cases. First, we
defined relationship types that can be observed between sentences in court case
transcripts. Then we classified pairs of sentences according to the
relationship type by combining a machine learning model and a rule-based
approach. The results obtained through our system were evaluated using human
judges. To the best of our knowledge, this is the first study where discourse
relationships between sentences have been used to determine relationships among
sentences in legal court case transcripts.Comment: Conference: 2018 International Conference on Advances in ICT for
Emerging Regions (ICTer
Distinguishing Word Senses in Untagged Text
This paper describes an experimental comparison of three unsupervised
learning algorithms that distinguish the sense of an ambiguous word in untagged
text. The methods described in this paper, McQuitty's similarity analysis,
Ward's minimum-variance method, and the EM algorithm, assign each instance of
an ambiguous word to a known sense definition based solely on the values of
automatically identifiable features in text. These methods and feature sets are
found to be more successful in disambiguating nouns rather than adjectives or
verbs. Overall, the most accurate of these procedures is McQuitty's similarity
analysis in combination with a high dimensional feature set.Comment: 11 pages, latex, uses aclap.st
Identity and Granularity of Events in Text
In this paper we describe a method to detect event descrip- tions in
different news articles and to model the semantics of events and their
components using RDF representations. We compare these descriptions to solve a
cross-document event coreference task. Our com- ponent approach to event
semantics defines identity and granularity of events at different levels. It
performs close to state-of-the-art approaches on the cross-document event
coreference task, while outperforming other works when assuming similar quality
of event detection. We demonstrate how granularity and identity are
interconnected and we discuss how se- mantic anomaly could be used to define
differences between coreference, subevent and topical relations.Comment: Invited keynote speech by Piek Vossen at Cicling 201
- …