1,814 research outputs found

    A Synonym Contextual-based Process for Handling Word Similarity in Malay Sentence

    Get PDF
    In this paper, we attempt to describe a method of finding word similarity within a Malay sentence. The list of similarity word produced is based on searching the appropriate context within a Malay  sentence. The context is determined by seeking rules from a rule-based phrase database. In implementing this approach, a working prototype application is described which can be used as a tool for improving writing text in Malay language, especially well adapted toward the requirements of teaching and learning this language in primary and secondary schools. The overall concept presented in this paper will assist us to identify clearly what are the basic components and their specifications that should exist in the process. On the other hand, it is also important to point out the possible drawbacks and constraints of the practical approach suggested

    A computational analysis of short sentences based on ensemble similarity model

    Get PDF
    The rapid development of Internet along with the wide use of social media applications produce huge volume of unstructured data in short text form such as tweets, text snippets and instant messages. This form of data rarely contains repeated word. It presents challenge in sentences similarity analysis as the standard text similarity models merely rely on the number of word occurrence, often resulting unreliable similarity value. Besides, the use of abbreviation, acronyms, slang, smiley, jargon, symbol or non-standard short form also contributes to the difficulty in similarity analysis. Thus, an extended ensemble similarity model approach is proposed. An experimental study has been conducted using datasets of English short sentences. The findings are very encouraging in improving the similarity value for short sentences

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Recognizing Emotions in a Foreign Language

    Get PDF
    Expressions of basic emotions (joy, sadness, anger, fear, disgust) can be recognized pan-culturally from the face and it is assumed that these emotions can be recognized from a speaker's voice, regardless of an individual's culture or linguistic ability. Here, we compared how monolingual speakers of Argentine Spanish recognize basic emotions from pseudo-utterances ("nonsense speech") produced in their native language and in three foreign languages (English, German, Arabic). Results indicated that vocal expressions of basic emotions could be decoded in each language condition at accuracy levels exceeding chance, although Spanish listeners performed significantly better overall in their native language ("in-group advantage"). Our findings argue that the ability to understand vocally-expressed emotions in speech is partly independent of linguistic ability and involves universal principles, although this ability is also shaped by linguistic and cultural variables

    Relationship Analysis of Keyword and Chapter in Malay-Translated Tafseer of Al-Quran

    Get PDF
    A number of studies have gained popularity to study the unseen knowledge categories and relationship of subject matters discussed in the Al-Quran or the Tafseer. This research investigates the relationships between verses and chapters at the keyword level in a Malay translated Tafseer. A combination technique of text mining and network analysis is developed to discover non-trivial patterns and relationships of verses and chapters in the Tafseer. This is achieved through keyword extraction, keyword-chapter relationship discovery and keyword- chapter network analysis. A total of 130 keywords were extracted from six chapters in the Tafseer. The keywords and their relative importance to a chapter are computed using term weighting. A network analysis map was generated to visualize and analyze the relationship between keyword and chapter in the Tafseer. The relationship between the verses and chapters at the keyword level are successfully portrayed through the combination technique of text mining and network analysis. The novelty of this approach lies in the discovery of the relationships between verses and chapters that is useful for grouping related chapters together

    Evaluation on knowledge extraction and machine learning in resolving Malay word ambiguity

    Get PDF
    The involvement of linguistic professionals in resolving the ambiguity of a word within a particular context will produce a concise meaning of the words that are found in the lexical knowledge based collection. Motivated from that issue, we employed lexical knowledge and machine learning approach which includes the integration of data or/and information from the lexical knowledge based, that is Malay collections which linked to the ambiguous words. We used the most open class word and removed the stop words from the targeted sentences. Experiments have been conducted with and without lexical knowledge on 50 ambiguous words. The Word Sense Disambiguation (WSD) method is determined by machine learning, corpus based approaches namely Malay-Malay corpus and English-Malay corpus. The results show that the proposed method has improved the precision in resolving ambiguity.Keywords: ambiguity; lexical knowledge; machine learning; Malay wor

    Investigating spoken emotion : the interplay of language and facial expression

    Get PDF
    This thesis aims to investigate how spoken expressions of emotions are influenced by the characteristics of spoken language and the facial emotion expression. The first three chapters examined how production and perception of emotions differed between Cantonese (tone language) and English (non-tone language). The rationale for this contrast was that the acoustic property of Fundamental Frequency (F0) may be used differently in the production and perception of spoken expressions in tone languages as F0 may be preserved as a linguistic resource for the production of lexical tones. To test this idea, I first developed the Cantonese Audio-visual Emotional Speech (CAVES) database, which was then used as stimuli in all the studies presented in this thesis (Chapter 1). An emotion perception study was then conducted to examine how three groups of participants (Australian English, Malaysian Malay and Hong Kong Cantonese speakers) identified spoken expression of emotions that were produced in either English or Cantonese (Chapter 2). As one of the aims of this study was to disambiguate the effects of language from culture, these participants were selected on the basis that they either shared similarities in language type (non-tone language, Malay and English) or culture (collectivist culture, Cantonese and Malay). The results showed that a greater similarity in emotion perception was observed between those who spoke a similar type of language, as opposed to those who shared a similar culture. This suggests some intergroup differences in emotion perception may be attributable to cross-language differences. Following up on these findings, an acoustic analysis study (Chapter 3) showed that compared to English spoken expression of emotions, Cantonese expressions had less F0 related cues (median and flatter F0 contour) and also the use of F0 cues was different. Taken together, these results show that language characteristics (n F0 usage) interact with the production and perception of spoken expression of emotions. The expression of disgust was used to investigate how facial expressions of emotions affect speech articulation. The rationale for selecting disgust was that the facial expression of disgust involves changes to the mouth region such as closure and retraction of the lips, and these changes are likely to have an impact on speech articulation. To test this idea, an automatic lip segmentation and measurement algorithm was developed to quantify the configuration of the lips from images (Chapter 5). By comparing neutral to disgust expressive speech, the results showed that disgust expressive speech is produced with significantly smaller vertical mouth opening, greater horizontal mouth opening and lower first and second formant frequencies (F1 and F2). Overall, this thesis provides an insight into how aspects of expressive speech may be shaped by specific (language type) and universal (face emotion expression) factors

    Feature extraction using regular expression in detecting proper noun for Malay news articles based on KNN algorithm

    Get PDF
    No AbstractKeywords: data mining; named entity recognition; regular expression; natural language processin

    Semantic Types, Lexical Sorts and Classifiers

    Get PDF
    We propose a cognitively and linguistically motivated set of sorts for lexical semantics in a compositional setting: the classifiers in languages that do have such pronouns. These sorts are needed to include lexical considerations in a semantical analyser such as Boxer or Grail. Indeed, all proposed lexical extensions of usual Montague semantics to model restriction of selection, felicitous and infelicitous copredication require a rich and refined type system whose base types are the lexical sorts, the basis of the many-sorted logic in which semantical representations of sentences are stated. However, none of those approaches define precisely the actual base types or sorts to be used in the lexicon. In this article, we shall discuss some of the options commonly adopted by researchers in formal lexical semantics, and defend the view that classifiers in the languages which have such pronouns are an appealing solution, both linguistically and cognitively motivated
    corecore