58,990 research outputs found

    Discovery of Linguistic Relations Using Lexical Attraction

    Full text link
    This work has been motivated by two long term goals: to understand how humans learn language and to build programs that can understand language. Using a representation that makes the relevant features explicit is a prerequisite for successful learning and understanding. Therefore, I chose to represent relations between individual words explicitly in my model. Lexical attraction is defined as the likelihood of such relations. I introduce a new class of probabilistic language models named lexical attraction models which can represent long distance relations between words and I formalize this new class of models using information theory. Within the framework of lexical attraction, I developed an unsupervised language acquisition program that learns to identify linguistic relations in a given sentence. The only explicitly represented linguistic knowledge in the program is lexical attraction. There is no initial grammar or lexicon built in and the only input is raw text. Learning and processing are interdigitated. The processor uses the regularities detected by the learner to impose structure on the input. This structure enables the learner to detect higher level regularities. Using this bootstrapping procedure, the program was trained on 100 million words of Associated Press material and was able to achieve 60% precision and 50% recall in finding relations between content-words. Using knowledge of lexical attraction, the program can identify the correct relations in syntactically ambiguous sentences such as ``I saw the Statue of Liberty flying over New York.''Comment: dissertation, 56 page

    On the Effect of Semantically Enriched Context Models on Software Modularization

    Full text link
    Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

    Phonological recoding in error detection: a cross-sectional study in beginning readers of Dutch

    Get PDF
    The present cross-sectional study investigated the development of phonological recoding in beginning readers of Dutch, using a proofreading task with pseudohomophones and control misspellings. In Experiment 1, children in grades 1 to 3 rejected fewer pseudohomophones (e. g., wein, sounding like wijn 'wine') as spelling errors than control misspellings (e. g., wijg). The size of this pseudohomophone effect was larger in grade 1 than in grade 2 and did not differ between grades 2 and 3. In Experiment 2, we replicated the pseudohomophone effect in beginning readers and we tested how orthographic knowledge may modulate this effect. Children in grades 2 to 4 again detected fewer pseudohomophones than control misspellings and this effect decreased between grades 2 and 3 and between grades 3 and 4. The magnitude of the pseudohomophone effect was modulated by the development of orthographic knowledge: its magnitude decreased much more between grades 2 and 3 for more advanced spellers, than for less advanced spellers. The persistence of the pseudohomophone effect across all grades illustrates the importance of phonological recoding in Dutch readers. At the same time, the decreasing pseudohomophone effect across grades indicates the increasing influence of orthographic knowledge as reading develops

    Weakly-supervised appraisal analysis

    Get PDF
    This article is concerned with the computational treatment of Appraisal, a Systemic Functional Linguistic theory of the types of language employed to communicate opinion in English. The theory considers aspects such as Attitude (how writers communicate their point of view), Engagement (how writers align themselves with respect to the opinions of others) and Graduation (how writers amplify or diminish their attitudes and engagements). To analyse text according to the theory we employ a weakly-supervised approach to text classification, which involves comparing the similarity of words with prototypical examples of classes. We evaluate the method's performance using a collection of book reviews annotated according to the Appraisal theory

    Qualitative market research and product development: representations of food and marketing challenges

    Get PDF
    A new method for analysing social representations from sentences in natural language is presented. The basic nuclei of the social representation of "eating" are extracted from two corpuses, one coming from a large set of definitions from a dictionary, the other from free associations of 2000 French adult subjects. The method shows that "eating", as a mental model, is the connection of "libido", "intake", "foodstuffs", "meal", "filling up" and "living". Further analysis on free associations on "eating well" yields some pragmatic scripts, showing how consumers assemble the basic nuclei into action rules. Results uncover an archaeology of social knowledge, showing some psychological and cultural bases on which lie the contemporary representations of eating. As important marketing issues in the food business today are concerned with the psychological determinants of food behaviour, our method may bring some new tools for market research, and open new data fields to systematic investigation. A paper from an international symposium 'Enjeux actuels du marketing dans l'alimentation et la restauration' held in Montreal, Canada, May 24th to 27th, 1994

    Teaching of multi-word expressions to second language learners

    Get PDF
    • ā€¦
    corecore