58 research outputs found
A Computational Lexicon and Representational Model for Arabic Multiword Expressions
The phenomenon of multiword expressions (MWEs) is increasingly recognised as a serious and challenging issue that has attracted the attention of researchers in various language-related disciplines. Research in these many areas has emphasised the primary role of MWEs in the process of analysing and understanding language, particularly in the computational treatment of natural languages. Ignoring MWE knowledge in any NLP system reduces the possibility of achieving high precision outputs. However, despite the enormous wealth of MWE research and language resources available for English and some other languages, research on Arabic MWEs (AMWEs) still faces multiple challenges, particularly in key computational tasks such as extraction, identification, evaluation, language resource building, and lexical representations.
This research aims to remedy this deficiency by extending knowledge of AMWEs and making noteworthy contributions to the existing literature in three related research areas on the way towards building a computational lexicon of AMWEs. First, this study develops a general understanding of AMWEs by establishing a detailed conceptual framework that includes a description of an adopted AMWE concept and its distinctive properties at multiple linguistic levels. Second, in the use of AMWE extraction and discovery tasks, the study employs a hybrid approach that combines knowledge-based and data-driven computational methods for discovering multiple types of AMWEs. Third, this thesis presents a representative system for AMWEs which consists of multilayer encoding of extensive linguistic descriptions.
This project also paves the way for further in-depth AMWE-aware studies in NLP and linguistics to gain new insights into this complicated phenomenon in standard Arabic. The implications of this research are related to the vital role of the AMWE lexicon, as a new lexical resource, in the improvement of various ANLP tasks and the potential opportunities this lexicon provides for linguists to analyse and explore AMWE phenomena
Recommended from our members
Neurobiology of incremental speech comprehension
Understanding spoken language requires the rapid transition from perceptual processing of the auditory input through a variety of cognitive processes involved in constructing the mental representation of the message that the speaker is intending to convey. Listeners carry out these complex processes very rapidly and accurately as they hear each word incrementally unfolding in a sentence. However, little is known about the specific spatiotemporal patterning of this wide range of incremental processing operations that underpin the dynamic transitions from the speech input to the development of a meaning interpretation of an utterance. This thesis aims to address this set of issues by investigating the spatiotemporal dynamics of brain activity as spoken sentences unfold over time in order to illuminate the neurocomputational properties of the human language processing system and determine how the representation of a spoken sentence develops incrementally as each upcoming word is heard.
Using a novel application of multidimensional probabilistic modelling combined with models from computational linguistics, I developed models of a variety of computational processes associated with accessing and processing the syntactic and semantic properties of sentences and tested these models at various points as sentences unfolded over time. Since a wide range of incremental processes occur very rapidly during speech comprehension, it is crucial to keep track of the temporal dynamics of the neural computations involved. To do this, I used combined electroencephalography and magnetoencephalography (EMEG) to record neural activity with millisecond resolution and analyzed the recordings in source space using univariate and/or multivariate approaches. The results confirm the value of this combination of methods in examining the properties of incremental speech processing. My findings corroborate the predictive nature of human speech comprehension and demonstrate that the effects of early semantic constraint are not dependent on explicit syntactic knowledge
Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages
Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010
Automated Semantic Understanding of Human Emotions in Writing and Speech
Affective Human Computer Interaction (A-HCI) will be critical for the success of new technologies that will prevalent in the 21st century. If cell phones and the internet are any indication, there will be continued rapid development of automated assistive systems that help humans to live better, more productive lives. These will not be just passive systems such as cell phones, but active assistive systems like robot aides in use in hospitals, homes, entertainment room, office, and other work environments. Such systems will need to be able to properly deduce human emotional state before they determine how to best interact with people. This dissertation explores and extends the body of knowledge related to Affective HCI. New semantic methodologies are developed and studied for reliable and accurate detection of human emotional states and magnitudes in written and spoken speech; and for mapping emotional states and magnitudes to 3-D facial expression outputs. The automatic detection of affect in language is based on natural language processing and machine learning approaches. Two affect corpora were developed to perform this analysis. Emotion classification is performed at the sentence level using a step-wise approach which incorporates sentiment flow and sentiment composition features. For emotion magnitude estimation, a regression model was developed to predict evolving emotional magnitude of actors. Emotional magnitudes at any point during a story or conversation are determined by 1) previous emotional state magnitude; 2) new text and speech inputs that might act upon that state; and 3) information about the context the actors are in. Acoustic features are also used to capture additional information from the speech signal. Evaluation of the automatic understanding of affect is performed by testing the model on a testing subset of the newly extended corpus. To visualize actor emotions as perceived by the system, a methodology was also developed to map predicted emotion class magnitudes to 3-D facial parameters using vertex-level mesh morphing. The developed sentence level emotion state detection approach achieved classification accuracies as high as 71% for the neutral vs. emotion classification task in a test corpus of children’s stories. After class re-sampling, the results of the step-wise classification methodology on a test sub-set of a medical drama corpus achieved accuracies in the 56% to 84% range for each emotion class and polarity. For emotion magnitude prediction, the developed recurrent (prior-state feedback) regression model using both text-based and acoustic based features achieved correlation coefficients in the range of 0.69 to 0.80. This prediction function was modeled using a non-linear approach based on Support Vector Regression (SVR) and performed better than other approaches based on Linear Regression or Artificial Neural Networks
Interactive Technologies for the Public Sphere Toward a Theory of Critical Creative Technology
Digital media cultural practices continue to address the social, cultural and aesthetic
contexts of the global information economy, perhaps better called ecology, by inventing
new methods and genres that encourage interactive engagement, collaboration, exploration
and learning. The theoretical framework for creative critical technology evolved from the
confluence of the arts, human computer interaction, and critical theories of technology.
Molding this nascent theoretical framework from these seemingly disparate disciplines was
a reflexive process where the influence of each component on each other spiraled into the
theory and practice as illustrated through the Constructed Narratives project. Research
that evolves from an arts perspective encourages experimental processes of making as a
method for defining research principles. The traditional reductionist approach to research
requires that all confounding variables are eliminated or silenced using methods of
statistics. However, that noise in the data, those confounding variables provide the rich
context, media, and processes by which creative practices thrive. As research in the arts
gains recognition for its contributions of new knowledge, the traditional reductive practice
in search of general principles will be respectfully joined by methodologies for defining
living principles that celebrate and build from the confounding variables, the data noise.
The movement to develop research methodologies from the noisy edges of human
interaction have been explored in the research and practices of ludic design and ambiguity
(Gaver, 2003); affective gap (Sengers et al., 2005b; 2006); embodied interaction (Dourish,
2001); the felt life (McCarthy & Wright, 2004); and reflective HCI (Dourish, et al., 2004).
The theory of critical creative technology examines the relationships between critical
theories of technology, society and aesthetics, information technologies and contemporary
practices in interaction design and creative digital media. The theory of critical creative
technology is aligned with theories and practices in social navigation (Dourish, 1999) and
community-based interactive systems (Stathis, 1999) in the development of smart
appliances and network systems that support people in engaging in social activities,
promoting communication and enhancing the potential for learning in a community-based
environment. The theory of critical creative technology amends these community-based
and collaborative design theories by emphasizing methods to facilitate face-to-face
dialogical interaction when the exchange of ideas, observations, dreams, concerns, and
celebrations may be silenced by societal norms about how to engage others in public
spaces.
The Constructed Narratives project is an experiment in the design of a critical creative
technology that emphasizes the collaborative construction of new knowledge about one's
lived world through computer-supported collaborative play (CSCP). To construct is to
creatively invent one's world by engaging in creative decision-making, problem solving
and acts of negotiation. The metaphor of construction is used to demonstrate how a simple
artefact - a building block - can provide an interactive platform to support discourse
between collaborating participants. The technical goal for this project was the development
of a software and hardware platform for the design of critical creative technology
applications that can process a dynamic flow of logistical and profile data from multiple
users to be used in applications that facilitate dialogue between people in a real-time
playful interactive experience
Exploring a Bioinformatics Clustering Algorithm
This thesis explores and evaluates MAXCCLUS, a bioinformatics clustering algorithm, which was designed to be used to cluster genes from microarray experimental data. MAXCCLUS does the clustering of genes depending on the textual data that describe the genes. MAXCCLUS attempts to create clusters of which it selects only the statistically significant clusters by running a significance test. It then attempts to generalise these clusters by using a simple greedy generalisation algorithm. We explore the behaviour of MAXCCLUS by running several clustering experiments that investigate various modifications to MAXCCLUS and its data. The thesis shows (a) that using the simple generalisation algorithm of MAXCCLUS gives better result than using an exhaustive search algorithm for generalisation, (b) the significance test that MAXCCLUS uses needs to be modified to take into consideration the dependency of some genes on other genes functionally, (c) it is advantageous to delete the non domain-relevant textual data that describe the genes but disadvantageous to add more textual data to describe the genes, and (d) that MAXCCLUS behaves poorly when it attempts to cluster genes that have adjacent categories instead of having two distinct categories only
A method for ontology and knowledgebase assisted text mining for diabetes discussion forum
Social media offers researchers vast amount of unstructured text as a source to discover hidden knowledge and insights. However, social media poses new challenges to text mining and knowledge discovery due to its short length, temporal nature and informal language.
In order to identify the main requirements for analysing unstructured text in social media, this research takes a case study of a large discussion forum in the diabetes domain. It then reviews and evaluates existing text mining methods for the requirements to analyse such a domain. Using domain background knowledge to bridge the semantic gap in traditional text mining methods was identified as a key requirement for analysing text in discussion forums. Existing ontology engineering methodologies encounter difficulties in deriving suitable domain knowledge with the appropriate breadth and depth in domain-specific concepts with a rich relationships structure. These limitations usually originate from a reliance on human domain experts.
This research developed a novel semantic text mining method. It can identify the concepts and topics being discussed, the strength of the relationships between them and then display the emergent knowledge from a discussion forum. The derived method has a modular design that consists of three main components: The Ontology building Process, Semantic Annotation and Topic Identification, and Visualisation Tools. The ontology building process generates domain ontology quickly with little need for domain experts. The topic identification component utilises a hybrid system of domain ontology and a general knowledge base for text enrichment and annotation, while the visualisation methods of dynamic tag clouds and cooccurrence network for pattern discovery enable a flexible visualisation of these results and can help uncover hidden knowledge.
Application of the derived text mining method within the case study helped identify trending topics in the forum and how they change over time. The derived method performed better in semantic annotation of the text compared to the other systems evaluated.
The new text mining method appears to be “generalisable” to other domains than diabetes. Future study needs to confirm this ability and to evaluate its applicability to other types of social media text sources
Dezambiguizacja angielskich czasowników open i send w ramach ujęcia zorientowanego obiektowo
Przedmiotem rozprawy doktorskiej jest dezambiguizacja dwóch angielskich czasowników
kauzatywnych: open (otworzyć/otwierać) oraz send (wysłać/wysyłać) w ramach projektu
polegającego na stworzeniu elektronicznych baz danych morfologicznych, syntaktycznych i
leksykalnych, znajdujących zastosowanie w tworzeniu słowników elektronicznych typu
modifie - modifieur języka ogólnego, jak również języków specjalistycznych.
Do dezambiguizacji i analizy wybranych czasowników zastosowano model zorientowany
obiektowo Wiesława Banysia, którego parametry umożliwiają opis każdej jednostki
leksykalnej w sposób precyzyjny, kompletny i zgodny z wymogami tłumaczenia
automatycznego.
Pojęciem kluczowym przyjętej metody opisu leksykograficznego jest klasa obiektowa
zawierająca elementy wyodrębnione na podstawie atrybutów i operatorów właściwych dla
danej klasy, umożliwiających ukazanie polisemii predykatów i wyróżnienie ich
poszczególnych użyć.
Posługując się modelem zorientowanym obiektowo ustala się zestaw użyć analizowanych
czasowników w korpusie, z uwzględnieniem słowników tradycyjnych, następnie grupuje się
znalezione okurencje użyć w zbiory posiadające wspólne cechy syntaktyczne, semantyczne i
leksykalne, przypisuje się poszczególnym zbiorom użyć tłumaczenia w języku docelowym,
konklukzje analizy zapisuje się zarówno w formacie opisowym, jak i w formie tabel.
Z prezentowanego w niniejszej rozprawie punktu widzenia wynika fakt, że jest tyle znaczeń
danego słowa w języku źródłowym, ile jest jego tłumaczeń w języku docelowym
Interacting with Philosophy Through Natural Language Conversation
Ph.DDOCTOR OF PHILOSOPH
- …