83 research outputs found
System combination using machine learning in NLP tasks
La combinación de sistemas constituye un área de investigación ampliamente estudiada en el ámbito del Reconocimiento de Patrones, en donde se han desarrollado múltiples técnicas para aprovechar la diversidad de métodos de clasificación de los que se dispone actualmente gracias al Aprendizaje Automático. En el desarrollo de esta Tesis Doctoral se ha realizado un estudio de las técnicas de combinación existentes y su grado de implicación en tareas del PLN. Asimismo se han expuesto algunos trabajos sobre tareas concretas y un estudio comparativo con los resultados arrojados por muchas de estas técnicas implementadas y aplicadas sobre la tarea de etiquetado morfosintáctico. El uso de un gran número de corpus diferentes y los experimentos llevados a cabo nos han permitido extraer algunas conclusiones que creemos de gran utilidad de cara al uso de estas técnicas en el futuro dentro del PLN.The combination of systems is an area of widely studied research in the field of Pattern Recognition, where many techniques have been developed for taking advantage of the diversity of classification methods that are currently available thanks to Machine Learning. During the work implied in this PhD Thesis we have carried out a study of the existing combination techniques and their implication in NLP tasks. Some works on concrete tasks have also been exposed as well as a comparative study with the results obtained by many of these techniques implemented and deployed over the POS-tagging task. By using many different corpora and making many different experiments we have been able to draw some conclusions that can be very helpful for using these techniques in the future inside NLP
A Technique for Distributed Systems Specification
In this paper we show how an object-oriented specification
language is usefvl for the specification of distributed
systems. The main constructors in this language
are the objects. An object consists of a state, a
behaviour and a set of transition rules between states.
The specification is composed by three sections: definition
of algebraic data types to represent the domain
of object attributes, definition of classes that group objects
with common features, and definition of relationships
among classes. We show two possible styles for
defining the behaviour of objects, in one hand we use
a transition system (state oriented) and in the other
hand we use an algebraic model of processes description
(constraint oriented). We illustrate the paper with
the specification of the dining philosophers problem, a
typical example in distributed programming
InstanceRank: Bringing order to datasets
In this paper we present InstanceRank, a ranking algorithm that reflects the relevance of the instances
within a dataset. InstanceRank applies a similar solution to that used by PageRank, the web pages ranking
algorithm in the Google search engine. We also present ISR, an instance selection technique that uses
InstanceRank. This algorithm chooses the most representative instances from a learning database. Experiments
show that ISR algorithm, with InstanceRank as ranking criteria, obtains similar results in accuracy
to other instance reduction techniques, noticeably reducing the size of the instance set.Ministerio de Educación y Ciencia HUM2007-66607-C04-0
On the Reusability of User Interface Declarative Models
The automatic generation of user interfaces based on declarative models
achieves a significant reduction of the development effort. In this paper, we analyze
the feasibility of using two well-known techniques such as XInclude and Packaging
in the new context of reusing user-interface model specifications. After analyzing the
suitability of each technique for UI reutilization and implementing both techniques
in a real system, we show that both techniques are suited to be used within the context
of today’s existing model-based user interfaces
Reusing UI elements with Model-Based User Interface Development
This paper introduces the potential for reusing UI elements in the context of Model-Based UI Development (MBUID) and provides guidance for future MBUID systems with enhanced reutilization capabilities. Our study is based upon the development of six inter-related projects with a specific MBUID environment which supports standard techniques for reuse such as parametrization and sub-specification, inclusion or shared repositories.
We analyze our experience and discuss the benefits and limitations of each technique supported by our MBUID environment. The system architecture, the structure and composition of UI elements and the models specification languages have a decisive impact on reusability. In our case, more than 40% of the elements defined in the UI specifications were reused, resulting in a reduction of 55% of the specification size. Inclusion, parametrization and sub-specification have facilitated modularity and internal reuse of UI specifications at development time, whereas the reuse of UI elements between applications has greatly benefited from sharing repositories of UI elements at run time.Ministerio de Ciencia e Innovación DPI2010-19154Junta de Andalucía TIC-633
Aproximación léxica basada en recursos para la tarea TWEET-NORM
This paper proposes a resource-based lexical approach for addressing the
TWEET-NORM task. The proposed system exposes a simple but extensible modular
architecture in which each analysis module independently proposes correction
candidates for each OOV word. Each one of these analysis modules tries to address a
specific problem and each one works in a very different way. The resources are used
as the main component for the OOV detection system and they works as support for
the validation and filtering of candidates.Este artículo propone una aproximación léxica basada en recursos para
abordar la tarea TWEET-NORM. El sistema presenta una arquitectura modular
sencilla pero extensible en la cual cada módulo de análisis propone candidatos para
cada palabra OOV de forma independiente. Cada uno de estos módulos de análisis
intenta abordar una problemática específica y cada uno opera de forma muy distinta.
Los recursos se usan como base fundamental del sistema de detección de OOVs y
como apoyo para la validación y filtrado de candidatos
Dynamic Topic-Related Tweet Retrieval
Twitter is a social network in which people publish publicly
accessible brief, instant messages. With its exponential
growth and the public nature and transversality
of its contents, more researchers are using Twitter as a
source of data for multiple purposes. In this context, the
ability to retrieve those messages (tweets) related to a
certain topic becomes critical. In this work, we define the
topic-related tweet retrieval task and propose a dynamic,
graph-based method with which to address it. We have
applied our method to capture a data set containing
tweets related to the participation of the Spanish team in
the Euro 2012 soccer competition, measuring the precision
and recall against other simple but commonly used
approaches. The results demonstrate the effectiveness
of our method, which significantly increases coverage of
the chosen topic and is able to capture related but
unknown à priori subtopics
An approach to the use of word embeddings in an opinion classification task
In this paper we show how a vector-based word representation obtained via word2vec can help to im- prove the results of a document classifier based on bags of words. Both models allow obtaining nu- meric representations from texts, but they do it very differently. The bag of words model can representdocuments by means of widely dispersed vectors in which the indices are words or groups of words.word2vec generates word level representations building vectors that are much more compact, where in- dices implicitly contain information about the context of word occurrences. Bags of words are very effec- tive for document classification and in our experiments no representation using only word2vec vectorsis able to improve their results. However, this does not mean that the information provided by word2vecis not useful for the classification task. When this information is used in combination with the bags ofwords, the results are improved, showing its complementarity and its contribution to the task. We havealso performed cross-domain experiments in which word2vec has shown much more stable behaviorthan bag of words models.Junta de Andalucía P11-TIC-7684 M
Obtaining Adaptation of Virtual Courses by Using a Collaborative Tool and Learning Design
In this work is described a collaborative tool Learning
Activity Management System, LAMS (Macquarie
University, Australia) which has been developed for
designing, managing and delivering online collaborative
learning activities. It provides teachers with a highly
intuitive visual authoring environment for creating
sequences of learning activities. These activities can
include a range of individual tasks, small group work and
whole class activities based on both content and
collaboration. Then a methodology to apply this tool is
described
- …