58 research outputs found

    Acta Cybernetica : Volume 18. Number 2.

    Get PDF

    A Computational Lexicon and Representational Model for Arabic Multiword Expressions

    Get PDF
    The phenomenon of multiword expressions (MWEs) is increasingly recognised as a serious and challenging issue that has attracted the attention of researchers in various language-related disciplines. Research in these many areas has emphasised the primary role of MWEs in the process of analysing and understanding language, particularly in the computational treatment of natural languages. Ignoring MWE knowledge in any NLP system reduces the possibility of achieving high precision outputs. However, despite the enormous wealth of MWE research and language resources available for English and some other languages, research on Arabic MWEs (AMWEs) still faces multiple challenges, particularly in key computational tasks such as extraction, identification, evaluation, language resource building, and lexical representations. This research aims to remedy this deficiency by extending knowledge of AMWEs and making noteworthy contributions to the existing literature in three related research areas on the way towards building a computational lexicon of AMWEs. First, this study develops a general understanding of AMWEs by establishing a detailed conceptual framework that includes a description of an adopted AMWE concept and its distinctive properties at multiple linguistic levels. Second, in the use of AMWE extraction and discovery tasks, the study employs a hybrid approach that combines knowledge-based and data-driven computational methods for discovering multiple types of AMWEs. Third, this thesis presents a representative system for AMWEs which consists of multilayer encoding of extensive linguistic descriptions. This project also paves the way for further in-depth AMWE-aware studies in NLP and linguistics to gain new insights into this complicated phenomenon in standard Arabic. The implications of this research are related to the vital role of the AMWE lexicon, as a new lexical resource, in the improvement of various ANLP tasks and the potential opportunities this lexicon provides for linguists to analyse and explore AMWE phenomena

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

    Automated Semantic Understanding of Human Emotions in Writing and Speech

    Get PDF
    Affective Human Computer Interaction (A-HCI) will be critical for the success of new technologies that will prevalent in the 21st century. If cell phones and the internet are any indication, there will be continued rapid development of automated assistive systems that help humans to live better, more productive lives. These will not be just passive systems such as cell phones, but active assistive systems like robot aides in use in hospitals, homes, entertainment room, office, and other work environments. Such systems will need to be able to properly deduce human emotional state before they determine how to best interact with people. This dissertation explores and extends the body of knowledge related to Affective HCI. New semantic methodologies are developed and studied for reliable and accurate detection of human emotional states and magnitudes in written and spoken speech; and for mapping emotional states and magnitudes to 3-D facial expression outputs. The automatic detection of affect in language is based on natural language processing and machine learning approaches. Two affect corpora were developed to perform this analysis. Emotion classification is performed at the sentence level using a step-wise approach which incorporates sentiment flow and sentiment composition features. For emotion magnitude estimation, a regression model was developed to predict evolving emotional magnitude of actors. Emotional magnitudes at any point during a story or conversation are determined by 1) previous emotional state magnitude; 2) new text and speech inputs that might act upon that state; and 3) information about the context the actors are in. Acoustic features are also used to capture additional information from the speech signal. Evaluation of the automatic understanding of affect is performed by testing the model on a testing subset of the newly extended corpus. To visualize actor emotions as perceived by the system, a methodology was also developed to map predicted emotion class magnitudes to 3-D facial parameters using vertex-level mesh morphing. The developed sentence level emotion state detection approach achieved classification accuracies as high as 71% for the neutral vs. emotion classification task in a test corpus of children’s stories. After class re-sampling, the results of the step-wise classification methodology on a test sub-set of a medical drama corpus achieved accuracies in the 56% to 84% range for each emotion class and polarity. For emotion magnitude prediction, the developed recurrent (prior-state feedback) regression model using both text-based and acoustic based features achieved correlation coefficients in the range of 0.69 to 0.80. This prediction function was modeled using a non-linear approach based on Support Vector Regression (SVR) and performed better than other approaches based on Linear Regression or Artificial Neural Networks

    Interactive Technologies for the Public Sphere Toward a Theory of Critical Creative Technology

    Get PDF
    Digital media cultural practices continue to address the social, cultural and aesthetic contexts of the global information economy, perhaps better called ecology, by inventing new methods and genres that encourage interactive engagement, collaboration, exploration and learning. The theoretical framework for creative critical technology evolved from the confluence of the arts, human computer interaction, and critical theories of technology. Molding this nascent theoretical framework from these seemingly disparate disciplines was a reflexive process where the influence of each component on each other spiraled into the theory and practice as illustrated through the Constructed Narratives project. Research that evolves from an arts perspective encourages experimental processes of making as a method for defining research principles. The traditional reductionist approach to research requires that all confounding variables are eliminated or silenced using methods of statistics. However, that noise in the data, those confounding variables provide the rich context, media, and processes by which creative practices thrive. As research in the arts gains recognition for its contributions of new knowledge, the traditional reductive practice in search of general principles will be respectfully joined by methodologies for defining living principles that celebrate and build from the confounding variables, the data noise. The movement to develop research methodologies from the noisy edges of human interaction have been explored in the research and practices of ludic design and ambiguity (Gaver, 2003); affective gap (Sengers et al., 2005b; 2006); embodied interaction (Dourish, 2001); the felt life (McCarthy & Wright, 2004); and reflective HCI (Dourish, et al., 2004). The theory of critical creative technology examines the relationships between critical theories of technology, society and aesthetics, information technologies and contemporary practices in interaction design and creative digital media. The theory of critical creative technology is aligned with theories and practices in social navigation (Dourish, 1999) and community-based interactive systems (Stathis, 1999) in the development of smart appliances and network systems that support people in engaging in social activities, promoting communication and enhancing the potential for learning in a community-based environment. The theory of critical creative technology amends these community-based and collaborative design theories by emphasizing methods to facilitate face-to-face dialogical interaction when the exchange of ideas, observations, dreams, concerns, and celebrations may be silenced by societal norms about how to engage others in public spaces. The Constructed Narratives project is an experiment in the design of a critical creative technology that emphasizes the collaborative construction of new knowledge about one's lived world through computer-supported collaborative play (CSCP). To construct is to creatively invent one's world by engaging in creative decision-making, problem solving and acts of negotiation. The metaphor of construction is used to demonstrate how a simple artefact - a building block - can provide an interactive platform to support discourse between collaborating participants. The technical goal for this project was the development of a software and hardware platform for the design of critical creative technology applications that can process a dynamic flow of logistical and profile data from multiple users to be used in applications that facilitate dialogue between people in a real-time playful interactive experience

    Exploring a Bioinformatics Clustering Algorithm

    No full text
    This thesis explores and evaluates MAXCCLUS, a bioinformatics clustering algorithm, which was designed to be used to cluster genes from microarray experimental data. MAXCCLUS does the clustering of genes depending on the textual data that describe the genes. MAXCCLUS attempts to create clusters of which it selects only the statistically significant clusters by running a significance test. It then attempts to generalise these clusters by using a simple greedy generalisation algorithm. We explore the behaviour of MAXCCLUS by running several clustering experiments that investigate various modifications to MAXCCLUS and its data. The thesis shows (a) that using the simple generalisation algorithm of MAXCCLUS gives better result than using an exhaustive search algorithm for generalisation, (b) the significance test that MAXCCLUS uses needs to be modified to take into consideration the dependency of some genes on other genes functionally, (c) it is advantageous to delete the non domain-relevant textual data that describe the genes but disadvantageous to add more textual data to describe the genes, and (d) that MAXCCLUS behaves poorly when it attempts to cluster genes that have adjacent categories instead of having two distinct categories only

    A method for ontology and knowledgebase assisted text mining for diabetes discussion forum

    Get PDF
    Social media offers researchers vast amount of unstructured text as a source to discover hidden knowledge and insights. However, social media poses new challenges to text mining and knowledge discovery due to its short length, temporal nature and informal language. In order to identify the main requirements for analysing unstructured text in social media, this research takes a case study of a large discussion forum in the diabetes domain. It then reviews and evaluates existing text mining methods for the requirements to analyse such a domain. Using domain background knowledge to bridge the semantic gap in traditional text mining methods was identified as a key requirement for analysing text in discussion forums. Existing ontology engineering methodologies encounter difficulties in deriving suitable domain knowledge with the appropriate breadth and depth in domain-specific concepts with a rich relationships structure. These limitations usually originate from a reliance on human domain experts. This research developed a novel semantic text mining method. It can identify the concepts and topics being discussed, the strength of the relationships between them and then display the emergent knowledge from a discussion forum. The derived method has a modular design that consists of three main components: The Ontology building Process, Semantic Annotation and Topic Identification, and Visualisation Tools. The ontology building process generates domain ontology quickly with little need for domain experts. The topic identification component utilises a hybrid system of domain ontology and a general knowledge base for text enrichment and annotation, while the visualisation methods of dynamic tag clouds and cooccurrence network for pattern discovery enable a flexible visualisation of these results and can help uncover hidden knowledge. Application of the derived text mining method within the case study helped identify trending topics in the forum and how they change over time. The derived method performed better in semantic annotation of the text compared to the other systems evaluated. The new text mining method appears to be “generalisable” to other domains than diabetes. Future study needs to confirm this ability and to evaluate its applicability to other types of social media text sources

    Dezambiguizacja angielskich czasowników open i send w ramach ujęcia zorientowanego obiektowo

    Get PDF
    Przedmiotem rozprawy doktorskiej jest dezambiguizacja dwóch angielskich czasowników kauzatywnych: open (otworzyć/otwierać) oraz send (wysłać/wysyłać) w ramach projektu polegającego na stworzeniu elektronicznych baz danych morfologicznych, syntaktycznych i leksykalnych, znajdujących zastosowanie w tworzeniu słowników elektronicznych typu modifie - modifieur języka ogólnego, jak również języków specjalistycznych. Do dezambiguizacji i analizy wybranych czasowników zastosowano model zorientowany obiektowo Wiesława Banysia, którego parametry umożliwiają opis każdej jednostki leksykalnej w sposób precyzyjny, kompletny i zgodny z wymogami tłumaczenia automatycznego. Pojęciem kluczowym przyjętej metody opisu leksykograficznego jest klasa obiektowa zawierająca elementy wyodrębnione na podstawie atrybutów i operatorów właściwych dla danej klasy, umożliwiających ukazanie polisemii predykatów i wyróżnienie ich poszczególnych użyć. Posługując się modelem zorientowanym obiektowo ustala się zestaw użyć analizowanych czasowników w korpusie, z uwzględnieniem słowników tradycyjnych, następnie grupuje się znalezione okurencje użyć w zbiory posiadające wspólne cechy syntaktyczne, semantyczne i leksykalne, przypisuje się poszczególnym zbiorom użyć tłumaczenia w języku docelowym, konklukzje analizy zapisuje się zarówno w formacie opisowym, jak i w formie tabel. Z prezentowanego w niniejszej rozprawie punktu widzenia wynika fakt, że jest tyle znaczeń danego słowa w języku źródłowym, ile jest jego tłumaczeń w języku docelowym

    Interacting with Philosophy Through Natural Language Conversation

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore