Search CORE

20,711 research outputs found

A Stochastic Approach for Finding of Semantically Related Words

Author: Agustini Alexandre
Gamallo Pablo
Lopes Gabriel
Noncheva Veska
Publication venue: Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Publication date: 01/01/2004
Field of study

2000 Mathematics Subject Classification: 62P99, 68T50Semantically related words are modelled as words having the same probability distribution on the set of syntactic contexts occurring in text corpora. A learning algorithm for finding of clusters of semantically related words is developed. In that algorithm Chi-Squared statistics is used as a performance measure

Bulgarian Digital Mathematics Library at IMI-BAS

On the Effect of Semantically Enriched Context Models on Software Modularization

Author: Hage Jurriaan
Jansen Slinger
Khadka Ravi
Saeidi Amir
Publication venue: 'Aspect-Oriented Software Association (AOSA)'
Publication date: 04/08/2017
Field of study

Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

ZENODO

Utrecht University Repository

FigShare

Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR) 2007

Author: Knutsson Ola
Sahlgren Magnus
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2007
Field of study

This is the proceedings of the Workshop on Semantic Content Acquisition and Representation, held in conjunction with NODALIDA 2007, on May 24 2007 in Tartu, Estonia.</p

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

A distributional model of semantic context effects in lexical processinga

Author: Brew Chris
McDonald Scott
Publication venue
Publication date: 01/01/2002
Field of study

One of the most robust findings of experimental psycholinguistics is that the context in which a word is presented influences the effort involved in processing that word. We present a novel model of contextual facilitation based on word co-occurrence prob ability distributions, and empirically validate the model through simulation of three representative types of context manipulation: single word priming, multiple-priming and contextual constraint. In our simulations the effects of semantic context are mod eled using general-purpose techniques and representations from multivariate statistics, augmented with simple assumptions reflecting the inherently incremental nature of speech understanding. The contribution of our study is to show that special-purpose m echanisms are not necessary in order to capture the general pattern of the experimental results, and that a range of semantic context effects can be subsumed under the same principled account.›

CogPrints Cognitive Sciences Eprint Archive

Evaluation of automatic hypernym extraction from technical corpora in English and Dutch

Author: Hoste Veronique
Lefever Els
Van de Kauter Marjan
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2014
Field of study

In this research, we evaluate different approaches for the automatic extraction of hypernym relations from English and Dutch technical text. The detected hypernym relations should enable us to semantically structure automatically obtained term lists from domain- and user-specific data. We investigated three different hypernymy extraction approaches for Dutch and English: a lexico-syntactic pattern-based approach, a distributional model and a morpho-syntactic method. To test the performance of the different approaches on domain-specific data, we collected and manually annotated English and Dutch data from two technical domains, viz. the dredging and financial domain. The experimental results show that especially the morpho-syntactic approach obtains good results for automatic hypernym extraction from technical and domain-specific texts

Ghent University Academic Bibliography

From deep dyslexia to agrammatic comprehension on silent reading

Author: Dési Martine
Rosenthal Victor
Publication venue: Medsport Press
Publication date: 02/12/2005
Field of study

We report on a case of a French-speaking patient whose performance on reading aloud single words was characteristically deep dyslexic (in spite of preserved ability to identify letters), and whose comprehension on silent sentence reading was agrammatic and strikingly poorer than on oral reading. The first part of the study is mainly informative as regards (i) the relationship between letter identification, semantic paralexias and the ability to read nonwords, (ii) the differential character of silent and oral reading tasks, and (iii) the potential modality-dependent character of the deficits in comprehension encountered. In the second part of the study we examine the patient's sensitivity to verb-noun ambiguity and probe her skills in the comprehension of indexical structures by exploring her ability to cope with number agreement and temporal and prepositional relations. The results indicate the patient's sensitivity to certain dimensions of these linguistic categories, reveal a partly correct basis for certain incorrect responses, and, on the whole, favor a definition of the patient's disorders in terms of a deficit in integrating indexical information in language comprehension. More generally, the present study substantiates a microgenetic approach to neuropsychology, where the pathological behavior due to brain damage is described as an arrest of microgenesis at an early stage of development, so that patient's responses take the form of unfinished "products" which would normally undergo further development

CogPrints Cognitive Sciences Eprint Archive

Usage Effects on the Cognitive Routinization of Chinese Resultative Verbs

Author: Wang Ben Pin-Yun
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/12/2012
Field of study

The present study adopts a corpus-oriented usage-based approach to the grammar of Chinese resultative verbs. Zooming in on a specific class of V-kai constructions, this paper aims to elucidate the effect of frequency in actual usage events on shaping the linguistic representations of resultative verbs. Specifically, it will be argued that while high token frequency results in more lexicalized V-kai complex verbs, high type frequency gives rise to more schematized V-kai constructions. The routinized patterns pertinent to V-kai resultative verbs varying in their extent of specificity and generality accordingly serve as a representative illustration of the continuum between lexicon and grammar that characterizes a usage-based conception of language

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)

Bilingual language processing

Author: Desmet Timothy
Duyck Wouter
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

Ghent University Academic Bibliography