Search CORE

13 research outputs found

A Machine learning approach to POS tagging

Author: Màrquez Villodre Lluís
Padró Lluís
Rodríguez Hontoria Horacio
Publication venue
Publication date: 01/01/1997
Field of study

We have applied inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities. This model consists of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired language models are complete enough to be directly used as sets of POS disambiguation rules, and include more complex contextual information than simple collections of n-grams usually used in statistical taggers. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine learned decision trees. Simultaneously, we address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Natural language processing

Author: Adams
Amsler
Bangalore
Barker
Benoît
Bian
Bondale
Carrick
Ceric
Chandrasekar
Chang
Charniak
Chen
Chowdhury
Chowdhury
Costantino
Cowie
Craven
Craven
Craven
Dogru
Evans
Feldman
Fernandez
Gaizauskas
Glasgow
Haas
Hayes
Hayes
Hedlund
Herath
Ide
Isahara
Jelinek
Jeong
Jurafsky
Kazakov
Kehler
Khoo
Kim
King
Lange
Lee
Lehmam
Lehtokangas
Lewis
Liddy
Liddy
Lovis
Ma
Magnini
Mani
Manning
Marquez
Martinez
Martinez
McMurchie
Meyer
Mihalcea
Mock
Moens
Morin
Narita
Nerbonne
Oard
Ogura
Oudet
Owei
Paris
Pasero
Pedersen
Perez-Carballo
Petreley
Pirkola
Poesio
Rosenfield
Roux
Say
Scarlett
Schenker
Silber
Smeaton
Smeaton
Smith
Sokol
Song
Sparck Jones
Staab
Stock
Tolle
Trybula
Tsuda
Vickery
Waldrop
Warner
Weigard
Wilks
Wong
Yang
Yang
Zadrozny
Zweigenbaum
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

Crossref

University of Strathclyde Institutional Repository

OPUS - University of Technology Sydney

Cognitive Complexity Applied to Software Development: An Automated Procedure to Reduce the Comprehension Effort

Author: Hewagamage K. P.
Wijendra Dinuka R.
Publication venue: 'The Institute for Research and Community Services (LPPM) ITB'
Publication date: 01/12/2022
Field of study

The cognitive complexity of a software application determines the amount of human effort required to comprehend its internal logic, which results in a subjective measurement. The quantification process of the cognitive complexity as a metric is problematic since the factors representing the computation do not represent the exact human cognition. Therefore, the determination of cognitive complexity requires expansion beyond its quantification. The human comprehension effort related with a software application is associated with each phase of its development process. Correct requirements identification and accurate logical diagram generation prior to code implementation can lead to proper logical identification of software applications. Moreover, human comprehension is essential for software maintenance. Defect identification, correction and handling of code quality issues cannot be maintained without good comprehension. Therefore, cognitive complexity can be effectively applied to demonstrate human understandability inside the respective phases of requirements analysis, design, defect tracking, and code quality optimization. This study involved automation of the above-mentioned phases to reduce the manual human cognitive load and reduce cognitive complexity. It was found that the proposed system could enhance the average accuracy of requirements analysis and class diagram generation by 14.44% and 9.89% average accuracy incrementation through defect tracking and code quality issues compared to manual procedures

Journal of ICT Research and Applications

Directory of Open Access Journals

Cognitive Complexity Applied to Software Development: An Automated Procedure to Reduce the Comprehension Effort

Author: Hewagamage K. P.
Wijendra Dinuka R.
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 01/12/2022
Field of study

Journal of ICT Research and Applications

Directory of Open Access Journals

ITB Journal

Recommended from our members

Calibrating recurrent sliding window classifiers for sequential supervised learning

Author: Dietterich Thomas G.
Joshi Saket Subhash
Oregon State University. Department of Computer Science
Publication venue: Corvallis, OR : Oregon State University, Dept. of Computer Science
Publication date
Field of study

Sequential supervised learning problems involve assigning a class label to each item in a sequence. Examples include part-of-speech tagging and text-to-speech mapping. A very general-purpose strategy for solving such problems is to construct a recurrent sliding window (R.SW) classifier, which maps some window of the input sequence plus some number of previously-predicted items into a prediction for the next item in the sequence. Tins paper describes a general-purpose implementation of RSW classifiers and discusses time highly practical issue of how to choose the size of time input window and the number of previous predictions to incorporate. Experiments on two real-world domains show that the optimal choices vary from one learning algorithm to another. They also depend on the evaluation criterion (number of correctly-predicted items versus number of correctly-predicted whole sequences). We conclude that window sizes must be chosen by cross-validation. The results have implications for the choice of window sizes for other models including hidden Markov models and conditional random fields

ScholarsArchive@OSU

Anotação semântica para recomendação de conteúdos educacionais

Author: Borges Marcos Vinícius Macêdo, 1995-
Publication venue: [s.n.]
Publication date: 17/05/2021
Field of study

Orientador: Julio Cesar dos ReisDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Sistemas de apoio à aprendizagem exploram diversos recursos multimídia para considerar individualidades do aluno bem com diferentes estilos de aprendizagem. Todavia, a crescente quantidade de conteúdos educacionais disponíveis em diferentes formatos e de maneira fragmentada di?culta o acesso e compreensão dos conceitos em estudo. Embora a literatura tenha proposto abordagens para explorar técnicas de recomendação que permitem representação explícita de semântica por meio de artefatos como ontologias, essa linha não foi totalmente explorada e ainda requer muitos esforços de pesquisa. Esta pesquisa objetiva conceber um método de recomendação de conteúdo educacional explorando o uso de anotações semânticas sobre transcrições textuais de videoaulas. As anotações servem como metadados que expressam o signi?cado de trechos das aulas. A técnica de recomendação, como principal contribuição esperada, fundamenta-se nas anotações disponíveis para de?nir estratégias de ranking de conteúdos disponíveis a partir da proximidade semântica dos conceitos combinadas com técnicas de aprendizagem de máquina. A contribuição envolve o desenvolvimento de protótipos funcionais de software para validação experimental com base em conteúdos de videoaulas reais e deve destacar as principais vantagens e limitações da abordagem. Os resultados obtidos permitirão o acesso à recomendações mais adequadas para melhorar o processo de aprendizagem apresentando a possibilidade de uma experiência mais satisfatória pelos alunosAbstract: Learning support systems explore several audio-visual resources to consider individual needs and learning styles aiming to stimulate learning experiences. However, the large amount of online educational content in di?erent formats and the possibility of making them available in a fragmented way turns di?cult the tasks of accessing these resources and understanding the concepts under study. Although literature has proposed approachestoexploreexplicitsemanticrepresentationthroughartifactssuchasontologies in learning support systems, this research line still requires further investigation e?orts. In this MS.c. dissertation, we propose a method for recommending educational content by exploring the use of semantic annotations over textual transcriptions from video lectures. Our investigation addresses the di?culties in extracting entities from natural language texts in subtitles of videos. Our work studies how to re?ne concepts in a domain ontology to support semantic annotation of video lecture subtitles. We report on the design of a video lecture recommendation system which explores the extracted semantic annotations. Our solution explored semantically annotated videos with an ontology in the Computer Science domain. Obtained results indicate our recommendation mechanism is suited to ?lter relevant video content in di?erent use scenariosMestradoCiência da ComputaçãoMestre em Ciência da Computação2017/02325-5; 2018/00313-2FAPES

Repositorio da Producao Cientifica e Intelectual da Unicamp

Part-of-speech Tagging: A Machine Learning Approach based on Decision Trees

Author: Màrquez Lluís
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/1999
Field of study

The study and application of general Machine Learning (ML) algorithms to theclassical ambiguity problems in the area of Natural Language Processing (NLP) isa currently very active area of research. This trend is sometimes called NaturalLanguage Learning. Within this framework, the present work explores the applicationof a concrete machine-learning technique, namely decision-tree induction, toa very basic NLP problem, namely part-of-speech disambiguation (POS tagging).Its main contributions fall in the NLP field, while topics appearing are addressedfrom the artificial intelligence perspective, rather from a linguistic point of view.A relevant property of the system we propose is the clear separation betweenthe acquisition of the language model and its application within a concrete disambiguationalgorithm, with the aim of constructing two components which are asindependent as possible. Such an approach has many advantages. For instance, thelanguage models obtained can be easily adapted into previously existing taggingformalisms; the two modules can be improved and extended separately; etc.As a first step, we have experimentally proven that decision trees (DT) providea flexible (by allowing a rich feature representation), efficient and compact wayfor acquiring, representing and accessing the information about POS ambiguities.In addition to that, DTs provide proper estimations of conditional probabilities fortags and words in their particular contexts. Additional machine learning techniques,based on the combination of classifiers, have been applied to address some particularweaknesses of our tree-based approach, and to further improve the accuracy in themost difficult cases.As a second step, the acquired models have been used to construct simple,accurate and effective taggers, based on diiferent paradigms. In particular, wepresent three different taggers that include the tree-based models: RTT, STT, andRELAX, which have shown different properties regarding speed, flexibility, accuracy,etc. The idea is that the particular user needs and environment will define whichis the most appropriate tagger in each situation. Although we have observed slightdifferences, the accuracy results for the three taggers, tested on the WSJ test benchcorpus, are uniformly very high, and, if not better, they are at least as good asthose of a number of current taggers based on automatic acquisition (a qualitativecomparison with the most relevant current work is also reported.Additionally, our approach has been adapted to annotate a general Spanishcorpus, with the particular limitation of learning from small training sets. A newtechnique, based on tagger combination and bootstrapping, has been proposed toaddress this problem and to improve accuracy. Experimental results showed thatvery high accuracy is possible for Spanish tagging, with a relatively low manualeffort. Additionally, the success in this real application has confirmed the validity of our approach, and the validity of the previously presented portability argumentin favour of automatically acquired taggers

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Modeling second language learners' interlanguage and its variability: a computer-based dynamic assessment approach to distinguishing between errors and mistakes

Author: Thouësny Sylvie
Publication venue: Dublin City University. School of Applied Language and Intercultural Studies
Publication date: 01/11/2011
Field of study

Despite a long history, interlanguage variability research is a debatable topic as most paradigms do not distinguish between competence and performance. While interlanguage performance has been proven to be variable, determining whether interlanguage competence is exposed to random and/or systematic variations is complex, given the fact that distinction between competence-dependent errors and performance-related mistakes should be established to best represent the interlanguage competence. This thesis suggests a dynamic assessment model grounded in sociocultural theory to distinguish between errors and mistakes in texts written by learners of French, to then investigate the extent to which interlanguage competence varies across time, text types, and students. The key outcomes include: 1. An expanded model based on dynamic assessment principles to distinguish between errors and mistakes, which also provides the structure to create and observe learners’ zone of proximal development; 2. A method to increase the accuracy of the part-of-speech tagging procedure whose reliability correlates with the number of incorrect words contained in learners’ texts; 3. A sociocultural insight into interlanguage variability research. Results demonstrate that interlanguage competence is as variable as performance. The main finding shows that knowledge over time is subject to not only systematic, but also unsystematic variations

DCU Online Research Access Service