Search CORE

3,395 research outputs found

Terminology mining in social media

Author: Karlgren Jussi
Sahlgren Magnus
Publication venue
Publication date: 01/01/2009
Field of study

The highly variable and dynamic word usage in social media presents serious challenges for both research and those commercial applications that are geared towards blogs or other user-generated non-editorial texts. This paper discusses and exempliﬁes a terminology mining approach for dealing with the productive character of the textual environment in social media. We explore the challenges of practically acquiring new terminology, and of modeling similarity and relatedness of terms from observing realistic amounts of data. We also discuss semantic evolution and density, and investigate novel measures for characterizing the preconditions for terminology mining

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Comparing knowledge sources for nominal anaphora resolution

Author: Markert K.
Nissim M.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2005
Field of study

We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora by means of shallow lexico-semantic patterns. As corpora we use the British National Corpus (BNC), as well as the Web, which has not been previously used for this task. Our results show that (a) the knowledge encoded in WordNet is often insufficient, especially for anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite NP coreference, the Web-based method yields results comparable to those obtained using WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the dataset; (d) in both case studies, the BNC-based method is worse than the other methods because of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge gap often encountered in anaphora resolution, and handled examples with context-dependent relations between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

White Rose Research Online

A Corpus-Based Investigation of Definite Description Use

Author: Poesio Massimo
Vieira Renata
Publication venue
Publication date: 24/10/1997
Field of study

We present the results of a study of definite descriptions use in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretation. We ran two experiments, in which subjects were asked to classify the uses of definite descriptions in a corpus of 33 newspaper articles, containing a total of 1412 definite descriptions. We measured the agreement among annotators about the classes assigned to definite descriptions, as well as the agreement about the antecedent assigned to those definites that the annotators classified as being related to an antecedent in the text. The most interesting result of this study from a corpus annotation perspective was the rather low agreement (K=0.63) that we obtained using versions of Hawkins' and Prince's classification schemes; better results (K=0.76) were obtained using the simplified scheme proposed by Fraurud that includes only two classes, first-mention and subsequent-mention. The agreement about antecedents was also not complete. These findings raise questions concerning the strategy of evaluating systems for definite description interpretation by comparing their results with a standardized annotation. From a linguistic point of view, the most interesting observations were the great number of discourse-new definites in our corpus (in one of our experiments, about 50% of the definites in the collection were classified as discourse-new, 30% as anaphoric, and 18% as associative/bridging) and the presence of definites which did not seem to require a complete disambiguation.Comment: 47 pages, uses fullname.sty and palatino.st

arXiv.org e-Print Archive

CiteSeerX

Toward an Effective Automated Tracing Process

Author: Mahmoud Anas Mohammad
Publication venue: Scholars Junction
Publication date: 28/04/2014
Field of study

Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

Theory and Applications for Advanced Text Mining

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

Directory of Open Access Books (DOAB)

Towards automated knowledge-based mapping between individual conceptualisations to empower personalisation of Geospatial Semantic Web

Author: Agarwal Pragya
Dimitrova Vania
Huang Yongjian
Publication venue
Publication date: 01/01/2005
Field of study

Geospatial domain is characterised by vagueness, especially in the semantic disambiguation of the concepts in the domain, which makes defining universally accepted geo- ontology an onerous task. This is compounded by the lack of appropriate methods and techniques where the individual semantic conceptualisations can be captured and compared to each other. With multiple user conceptualisations, efforts towards a reliable Geospatial Semantic Web, therefore, require personalisation where user diversity can be incorporated. The work presented in this paper is part of our ongoing research on applying commonsense reasoning to elicit and maintain models that represent users' conceptualisations. Such user models will enable taking into account the users' perspective of the real world and will empower personalisation algorithms for the Semantic Web. Intelligent information processing over the Semantic Web can be achieved if different conceptualisations can be integrated in a semantic environment and mismatches between different conceptualisations can be outlined. In this paper, a formal approach for detecting mismatches between a user's and an expert's conceptual model is outlined. The formalisation is used as the basis to develop algorithms to compare models defined in OWL. The algorithms are illustrated in a geographical domain using concepts from the SPACE ontology developed as part of the SWEET suite of ontologies for the Semantic Web by NASA, and are evaluated by comparing test cases of possible user misconceptions

Southampton (e-Prints Soton)

The art of chicken sexing

Author: Horsey Richard
Publication venue
Publication date: 01/09/2002
Field of study

Expert chick sexers are able to quickly and reliably determine the sex of day-old chicks on the basis of very subtle perceptual cues. They claim that in many cases they have no idea how they make their decisions. They just look at the rear end of a chick, and ‘see’ that it is either male or female. This is somewhat reminiscent of those expert chess players, often cited in the psychological literature, who can just ‘see’ what the next move should be; similar claims have been made for expert wine tasters and experts at medical diagnosis. All of these skills are hard-earned and not accessible to introspection. But is there really anything unusual about the chicken sexer, the chess grand master, the wine buff or the medical expert? I argue that there is not. In fact, we are all constantly making categorizations of this sort: we are highly accurate at categorizing natural kinds, substances, artefacts, and so on. We do so quickly and subconsciously, and the process is completely inaccessible to introspection. The question is, why is it so difficult to acquire skills such as chicken sexing, when we automatically acquire the ability to categorize other objects. In this paper, I argue that we have mechanisms for learning the cues necessary for categorization, but that these mechanisms require selective attention to be given to the relevant features. We automatically acquire the ability to categorize certain objects because we have inbuilt attention directors causing us to attend to diagnostic cues. In cases such as chicken sexing, where we do not automatically develop categorization abilities, our inbuilt attention directors do not cause us to attend to diagnostic cues, and out attention therefore has to be drawn to these cues in another way, such as through training

CogPrints Cognitive Sciences Eprint Archive

Levels of Development in the Language of Deaf Children: ASL Grammatical Processes, Signed English Structures, Semantic Features

Author: Livingston Sue
Publication venue: CUNY Academic Works
Publication date: 01/10/1983
Field of study

This study describes the spontaneous sign language of six deaf children (6 to 16 years old) of hearing parents, who were exposed to Signed English when after the age of six they first attended a school for the deaf. Samples of their language taken at three times over a 15-month period were searched for processes and structures representative or not representative of Signed English. The nature of their developing semantics was described as the systematic acquisition of features of meaning in signs from selected lexical categories (kinship terms, negation, time expression, wh-questions, descriptive terms, and prepositions/conjunctions). Processes not representative of Signed English were found to conform with grammatical processes of American Sign Language (ASL) and were so described. Five levels of increasing complexity of these ASL processes and of the structures representative of Signed English were hypothesized. Levels of increasing complexity of semantic feature acquisition of signs within the lexical categories named above were also hypothesized. The development of ASL grammatical processes was found to be orderly and characterized by the appearance of new processes and structures and by increases in utterance length and complexity resulting from the coordination and expansion of formerly used processes. By the end of Level 2, subjects displayed knowledge of most basic sentence forms, simultaneous processes, and some grammatical uses of repetition. A significant change appeared at Level 3, when processes that could express meaning simultaneously were used to portray communication between the signer and others as well as to express two different ideas simultaneously. These uses provided the foundation for Level 4 and Level 5 processes, which functioned to set up locations in space in increasingly complex ways. These processes decreased the need for total dependence on visual field content (i.e. immediate context), since persons and objects were first established, next assigned a position in front of the signers, bodies (i.e. set up in a particular location), and then commented about -- as opposed to the earlier process of first pointing out and then naming and describing elements of the visible picture. Thus as the subjects became more linguistically mature, they developed scene-setting (typically ASL) ways of establishing who or what they were referring to, by making efficient use of signing space. This developmental change also revealed that the children were beginning to conceptualize events as wholes and to relate information about them in logical ways. The development of structures representative of Signed English was likewise an orderly process, characterized by the appearance of new relations and the coordination and expansion of formerly used relations. By the end of Level 2, most basic semantic and grammatical relations were expressed by signs in English order. Preposition-object, appositive, genitive, and disjunctive relationst basic semantic and grammatical relations were expressed by signs in English order. Preposition-object, appositive, genitive, and disjunctive relations were later occurring relations (Levels 3\u27and 4), as were dative and indirect object relations (Level 5). There was some evidence to indicate that the consistent use of Signed English grammatical morphemes followed the order of acquisition of these morphemes by hearing children, presumably in both cases a function of the grammatical or semantic complexity of the morphemes themselves. Learning the meanings for signs was, for the most part, a process in which labels (signs) for referents took on additional features and thereby became more specific. This process is compared with the general process of perception, in which initially single, usually broad or attribute features of meaning are perceived and labelled before more defining features are added to form configurations of features for a particular referent. A brief contrast between the development of ASL grammatical processes and the development of structures representative of Signed English revealed both differences and similarities in development. Quite clearly, the contrast showed that the children were more linguistically competent using ASL grammatical processes -- processes be it noted for which they had no adult model

City University of New York

Word Meaning

Author: Luca Gasparri
Marconi Diego
Publication venue: Center for the Study of Language and Information
Publication date: 01/01/2015
Field of study

Institutional Research Information System University of Turin