12 research outputs found
Cross-language Ontology Learning: Incorporating and Exploiting Cross-language Data in the Ontology Learning Process
Hans Hjelm. Cross-language Ontology Learning:
Incorporating and Exploiting Cross-language Data in the Ontology Learning Process.
NEALT Monograph Series, Vol. 1 (2009), 159 pages.
© 2009 Hans Hjelm.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/10126
A structural computing model for dynamic service-based systems
Traditional techniques for Programming in the Large, especially Object- Oriented approaches, have been used for a considerable time and with great success in the implementation of service-based information systems. However, the systems for which these techniques have been used are static, in that the services and the data available to users are fixed by the system, with a strict separation between system and user. Our interest lies in currently emerging dynamic systems, where both the data and the services available to users are freely extensible by the users and the strict distinction between system and user no longer exists. We describe why traditional object-oriented approaches are not suitable for modelling such dynamic systems. We discuss a new architectural model, the Information Unit Hypermedia Model, IUHM, which we have designed for modelling and implementing such dynamic systems. IUHM is based upon the application of structural computing to a hypermedia-like structure, which thereby operates as a service-based architecture. We discuss the details of this model, and illustrate its features by describing some aspects of a large-scale system, built using this architecture.Evento: International Symposium, MIS 2003 (Austria, 17 al 20 de septiembre de 2003)Laboratorio de Investigación y Formación en Informática Avanzad
An Approach for Automatic Generation of on-line Information Systems based on the Integration of Natural Language Processing and Adaptive Hypermedia Techniques
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid. Escuela Politécnica Superior, Departamento de ingeniería informática. Fecha de lectura: 29-05-200
Natural language processing for semiautomatic semantics extractio: encyclopedic entry disambiguation and relationship extraction using wikipedia and wordnet
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, septiembre de 200
Guessing Hierarchies and Symbols for Word Meanings through Hyperonyms and Conceptual Vectors
International audienc
Guessing Hierarchies and Symbols for Word Meanings through Hyperonyms and Conceptual vectors
The NLP team of LIRMM currently works on lexical disambiguation and thematic text analysis [Lafourcade, 2001]. We built a system, with automated learning capabilities, based on conceptual vectors for meaning representation. Vectors are supposed to encode ideas associated to words or expressions. In the framework of knowledge and lexical meaning representation, we devise some conceptual vectors based strategies to automatically construct hierarchical taxonomies and validate (or invalidate) hyperonymy (or superordinate) relations among terms. Conceptual vectors are used through the thematic distance for decision making and link quality assessment
A Hybrid Environment for Syntax-Semantic Tagging
The thesis describes the application of the relaxation labelling algorithm to
NLP disambiguation. Language is modelled through context constraint inspired on
Constraint Grammars. The constraints enable the use of a real value statind
"compatibility". The technique is applied to POS tagging, Shallow Parsing and
Word Sense Disambigation. Experiments and results are reported. The proposed
approach enables the use of multi-feature constraint models, the simultaneous
resolution of several NL disambiguation tasks, and the collaboration of
linguistic and statistical models.Comment: PhD Thesis. 120 page
Automatic extraction of definitions
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2014This doctoral research work provides a set of methods and heuristics for
building a definition extractor or for fine-tuning an existing one. In order
to develop and test the architecture, a generic definitions extractor for the
Portuguese language is built. Furthermore, the methods were tested in the
construction of an extractor for two languages different from Portuguese,
which are English and, less extensively, Dutch. The approach presented
in this work makes the proposed extractor completely different in nature
in comparison to the other works in the field. It is a matter of fact that
most systems that automatically extract definitions have been constructed
taking into account a specific corpus on a specific topic, and are based on
the manual construction of a set of rules or patterns capable of identifyinf
a definition in a text.
This research focused on three types of definitions, characterized by the connector
between the defined term and its description. The strategy adopted
can be seen as a "divide and conquer"approach. Differently from the other
works representing the state of the art, specific heuristics were developed in
order to deal with different types of definitions, namely copula, verbal and
punctuation definitions.
We used different methodology for each type of definition, namely we propose
to use rule-based methods to extract punctuation definitions, machine
learning with sampling algorithms for copula definitions, and machine learning
with a method to increase the number of positive examples for verbal
definitions. This architecture is justified by the increasing linguistic complexity
that characterizes the different types of definitions. Numerous experiments
have led to the conclusion that the punctuation definitions are
easily described using a set of rules. These rules can be easily adapted to
the relevant context and translated into other languages. However, in order
to deal with the other two definitions types, the exclusive use of rules is not
enough to get good performance and it asks for more advanced methods, in
particular a machine learning based approach.
Unlike other similar systems, which were built having in mind a specific
corpus or a specific domain, the one reported here is meant to obtain good
results regardless the domain or context. All the decisions made in the
construction of the definition extractor take into consideration this central
objective.Este trabalho de doutoramento visa proporcionar um conjunto de métodos
e heurísticas para a construção de um extractor de definição ou para melhorar
o desempenho de um sistema já existente, quando usado com um corpus
específico. A fim de desenvolver e testar a arquitectura, um extractor de
definic˛ões genérico para a língua Portuguesa foi construído. Além disso,
os métodos foram testados na construção de um extractor para um idioma
diferente do Português, nomeadamente Inglês, algumas heurísticas também
foram testadas com uma terceira língua, ou seja o Holandês. A abordagem
apresentada neste trabalho torna o extractor proposto neste trabalho completamente
diferente em comparação com os outros trabalhos na área. É
um fato que a maioria dos sistemas de extracção automática de definicões
foram construídos tendo em conta um corpus específico com um tema bem
determinado e são baseados na construc˛ão manual de um conjunto de regras
ou padrões capazes de identificar uma definição num texto dum domínio
específico.
Esta pesquisa centrou-se em três tipos de definições, caracterizadas pela
ligacão entre o termo definido e a sua descrição. A estratégia adoptada pode
ser vista como uma abordagem "dividir para conquistar". Diferentemente
de outras pesquisa nesta área, foram desenvolvidas heurísticas específicas
a fim de lidar com as diferentes tipologias de definições, ou seja, cópula,
verbais e definicões de pontuação.
No presente trabalho propõe-se utilizar uma metodologia diferente para cada
tipo de definição, ou seja, propomos a utilização de métodos baseados em
regras para extrair as definições de pontuação, aprendizagem automática,
com algoritmos de amostragem para definições cópula e aprendizagem automática
com um método para aumentar automáticamente o número de
exemplos positivos para a definição verbal. Esta arquitetura é justificada
pela complexidade linguística crescente que caracteriza os diferentes tipos de
definições. Numerosas experiências levaram à conclusão de que as definições
de pontuação são facilmente descritas utilizando um conjunto de regras. Essas
regras podem ser facilmente adaptadas ao contexto relevante e traduzido
para outras línguas. No entanto, a fim de lidar com os outros dois tipos de
definições, o uso exclusivo de regras não é suficiente para obter um bom
desempenho e é preciso usar métodos mais avançados, em particular aqueles
baseados em aprendizado de máquina.
Ao contrário de outros sistemas semelhantes, que foram construídos tendo
em mente um corpus ou um domínio específico, o sistema aqui apresentado
foi desenvolvido de maneira a obter bons resultados, independentemente do
domínio ou da língua. Todas as decisões tomadas na construção do extractor
de definição tiveram em consideração este objectivo central.Fundação para a Ciência e a Tecnologia (FCT, SFRH/ BD/36732/2007
Word Knowledge and Word Usage
Word storage and processing define a multi-factorial domain of scientific inquiry whose thorough investigation goes well beyond the boundaries of traditional disciplinary taxonomies, to require synergic integration of a wide range of methods, techniques and empirical and experimental findings. The present book intends to approach a few central issues concerning the organization, structure and functioning of the Mental Lexicon, by asking domain experts to look at common, central topics from complementary standpoints, and discuss the advantages of developing converging perspectives. The book will explore the connections between computational and algorithmic models of the mental lexicon, word frequency distributions and information theoretical measures of word families, statistical correlations across psycho-linguistic and cognitive evidence, principles of machine learning and integrative brain models of word storage and processing. Main goal of the book will be to map out the landscape of future research in this area, to foster the development of interdisciplinary curricula and help single-domain specialists understand and address issues and questions as they are raised in other disciplines
Cognitive design. Creating the sets of categories and labels that structure our shared experience
A Dissertation submitted to the
Graduate School — New Brunswick
Rutgers, The State University of New Jersey
in partial fulfillment of the requirements
for the degree of
Doctor of PhilosophyFollowing in the tradition of studies of categorization in everyday life, this dissertation
focuses on the specific case of sets of categories. The concept of the "contrast set," developed
by cognitive anthropologists in the 1950s, is the central focus of analysis. Canonical examples
of everyday life contrast sets include alphabets, identification numbers, standard pitches,
and the elements of geographical categorizations. This dissertation focuses on the design
issues surrounding the deliberate, conscious construction of such sets (rather than on
contrast sets which are natural or emergent). The chapters focus respectively on the creation
of contrast sets; the way contrast sets are used as labels for other contrast sets; the use of
rules, principles, and set topologies in this labeling process; the standardization and
institutionalization of contrast sets; the way in which people justify, legitimate, and attempt
to change standardized contrast sets; and the ways people learn about unfamiliar contrast
sets.
The dissertation uses the method of pattern analysis. It identifies and describes
abstract social forms, gives numerous concrete examples of each form, and includes sixty
images. The goal is to understand a recurrent type of human activity that affects and
structures many everyday life experiences. The dissertation is practically oriented as well,
and directly addresses the concerns of those responsible for designing contrast sets for public
use