1,560 research outputs found

    Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case study

    Get PDF
    The occurrence of code-switching in online communication, when a writer switches among multiple languages, presents a challenge for natural language processing tools, since they are designed for texts written in a single language. To answer the challenge, this paper presents detailed research on ways to detect code-switching in Arabic text automatically. We compare the prediction by partial matching (PPM) compression-based classifier, implemented in Tawa, and a traditional machine learning classifier sequential minimal optimization (SMO), implemented in Waikato Environment for Knowledge Analysis, working specifically on Arabic text taken from Facebook. Three experiments were conducted in order to: (1) detect code-switching among the Egyptian dialect and English; (2) detect code-switching among the Egyptian dialect, the Saudi dialect, and English; and (3) detect code-switching among the Egyptian dialect, the Saudi dialect, Modern Standard Arabic (MSA), and English. Our experiments showed that PPM achieved a higher accuracy rate than SMO with 99.8% versus 97.5% in the first experiment and 97.8% versus 80.7% in the second. In the third experiment, PPM achieved a lower accuracy rate than SMO with 53.2% versus 60.2%. Code-switching between Egyptian Arabic and English text is easiest to detect because Arabic and English are generally written in different character sets. It is more difficult to distinguish between Arabic dialects and MSA as these use the same character set, and most users of Arabic, especially Saudis and Egyptians, frequently mix MSA with their dialects. We also note that the MSA corpus used for training the MSA model may not represent MSA Facebook text well, being built from news websites. This paper also describes in detail the new Arabic corpora created for this research and our experiments

    CODE-SWITCHING IN SUNNYDAHYE’S INSTAGRAM CAPTIONS

    Get PDF
    Instagram has become one of the most used social-networking websites. In the use of communication, Instagram also provides the user to communicate through picture and video with an addition of caption to explain the media in words. To express oneself on Instagram, some users might use the combination of their first language and English. Thus the phenomena of code-switching occurs. This paper aims to analyse the type and function of code-switching used in one Instagram account, sunnydahye. This paper uses a qualitative approach to provide insights to the problems. The data for this paper is selected by using a purposive sampling method by checking sunnydahye’s Instagram post one by one. From 6 instagram posts taken as the sample, there are 22 sentences identified with a phenomena of code switching. The result of the analysis shows that the type of code-switching which is used most in sunnydahye’s Instagram caption is intra-sentential switching. Meanwhile the function of code-switching which is used most is code-switching as message qualification

    High-performance and hardware-aware computing: proceedings of the second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC\u2711), San Antonio, Texas, USA, February 2011 ; (in conjunction with HPCA-17)

    Get PDF
    High-performance system architectures are increasingly exploiting heterogeneity. The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach

    Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

    Full text link
    Many fundamental problems in natural language processing rely on determining what entities appear in a given text. Commonly referenced as entity linking, this step is a fundamental component of many NLP tasks such as text understanding, automatic summarization, semantic search or machine translation. Name ambiguity, word polysemy, context dependencies and a heavy-tailed distribution of entities contribute to the complexity of this problem. We here propose a probabilistic approach that makes use of an effective graphical model to perform collective entity disambiguation. Input mentions (i.e.,~linkable token spans) are disambiguated jointly across an entire document by combining a document-level prior of entity co-occurrences with local information captured from mentions and their surrounding context. The model is based on simple sufficient statistics extracted from data, thus relying on few parameters to be learned. Our method does not require extensive feature engineering, nor an expensive training procedure. We use loopy belief propagation to perform approximate inference. The low complexity of our model makes this step sufficiently fast for real-time usage. We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods

    Advancement and applications of the template matching approach to indexing electron backscatter patterns

    Get PDF
    Electron backscatter diffraction is a well-established characterisation technique used to determine the orientation and crystal phase of a crystalline material. A pattern is formed by dynamical interaction of elections with the crystal lattice, which can be understood and simulated by using Bloch wave theory. The conventional method of indexing a diffraction pattern is to use a Hough transform to convert the lines of the pattern to points that are easily accessible to a computer. As the bands of the pattern are direct projections of the crystal planes, the interplanar angles can then be computed and compared to a look up table to determine phase and orientation. This method works well for most examples, however, is not well suited to more complex unit cells, due to the fact it ignores more subtle features of the patterns. This thesis proposes a refined template matching approach which uses efficient pattern matching algorithms, such as those used in the field of computer vision, for phase determination and orientation analysis. This thesis introduces the method and demonstrates its efficacy, as well as introducing advanced methods for pseudosymmetry analysis and phase mapping. A new metric for phase confidence is also proposed and the refined method is shown to be able to correctly determine phases and pseudosymmetric orientations. Finally, preliminary work on a direct electron detector stage is presented. Work on the development, testing the pattern centre reliability, modulation transfer and an example map is shown.Open Acces
    • 

    corecore