1,264 research outputs found

    Clustering Arabic Tweets for Sentiment Analysis

    Get PDF
    The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used

    Clustering Arabic Tweets for Sentiment Analysis

    Get PDF
    The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used

    Automatic speech feature extraction using a convolutional restricted boltzmann machine

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, in fulfillment of the requirements for the degree of Master of Science 2017Restricted Boltzmann Machines (RBMs) are a statistical learning concept that can be interpreted as Arti cial Neural Networks. They are capable of learning, in an unsupervised fashion, a set of features with which to describe a data set. Connected in series RBMs form a model called a Deep Belief Network (DBN), learning abstract feature combinations from lower layers. Convolutional RBMs (CRBMs) are a variation on the RBM architecture in which the learned features are kernels that are convolved across spatial portions of the input data to generate feature maps identifying if a feature is detected in a portion of the input data. Features extracted from speech audio data by a trained CRBM have recently been shown to compete with the state of the art for a number of speaker identi cation tasks. This project implements a similar CRBM architecture in order to verify previous work, as well as gain insight into Digital Signal Processing (DSP), Generative Graphical Models, unsupervised pre-training of Arti cial Neural Networks, and Machine Learning classi cation tasks. The CRBM architecture is trained on the TIMIT speech corpus and the learned features veri ed by using them to train a linear classi er on tasks such as speaker genetic sex classi cation and speaker identi cation. The implementation is quantitatively proven to successfully learn and extract a useful feature representation for the given classi cation tasksMT 201

    High resolution DNA copy number analysis of constitutional chromosomal aberrations in human genomic disorders

    Get PDF
    About one to three percent of the human population is aflicted by mild to severe mental retardation, often in association with congenital abnormalities (MR/CA). These abnormalities in normal human morphogenesis may express themselves as subtle dysmorphic signs not causing any harm or present as severe disabling and life-threatening malformations such as congenital heart defects. It is well established that constitutional chromosomal aberrations are an important cause for MR/CA. The screening for such chromosomal rearrangements is done by widely used routine analysis of banded metaphase chromosomes (karyotyping). Given the limited resolution of such analyses (5-10 Mb), it was anticipated that a significant number of submicroscopic deletions or duplications (DNA copy number variations, CNV) were overlooked in patients with idiopathic mental retardation with or without congenital anomalies. This thesis represents one of the _rst exhaustive studies of this patient group using a new and more sensitive method for detection of CNVs. This technique, termed array comparative genomic hybridization (array CGH), allows the genome wide screening for submicroscopic aberrations in one single experiment. Array CGH uses reporter DNA molecules more or less evenly spread throughout the entire genome which are spotted or synthesized in an array on a glass slide. Each reporter is used to interrogate the DNA copy number of a specific genomic region through the competitive hybridization of differentially fluorescent labeled patient and control DNA. Together with the tedious optimalization of the technique, also a web based open source (MySQL) database platform was developed for the analysis and visualization of large amount of array CGH data (medgen.ugent.be/arrayCGHbase) (paper 6). A total of 140 carefully clinically selected patients with mental retardation and/or congenital abnormalities were analyzed for hidden chromosomal aberrations in a collaborative effort with the Center for Medical Genetics Leuven (KUL). This initial study together with a review of other published investigations, allowed for the first time to establish a reliable figure of the number of submicroscopic CNVs in this patient population. When excluding patients with subtelomeric imbalances which could be identified through FISH or MLPA analyses, array CGH still allowed to detect CNVs in an additional ~8% of patients (paper 2). A major challenge resulting from this new flow of information is the search and description of new microdeletion/microduplication syndromes. Although most CNVs seemed to be scattered across the entire genome we were able to describe a new microdeletion syndrome characterized by osteopoikilosis, mental retardation and short stature. This observation was facilitated through the identification of LEMD3 as the causal gene for osteopoikilosis, Buschke-Ollendorff syndrome (BOS) and melorheostosis in the 12q14.3 deleted interval and subsequent, the finding of two additional patients with a 12q14.3 microdeletion (paper 3). The present work also illustrates the possible contribution of array CGH in the delineation of the critical region for recurrent deletion syndromes. In this study we identified a small interstitial deletion on chromosome 18q12.3 in a patient with clinical features of the del(18)(q12.1q21.1) syndrome. We were able to delineate the critical region for this syndrome to an interval of 1.8 Mb, enabling hereby the determination of the crucial genes for this microdeletion syndrome (paper 4). This thesis also further illustrates the power of combined flow cytometry and array CGH for rapid identification of translocation breakpoints. Using this approach we were able to identify OPHN1 as the causal gene for the observed mental retardation and overgrowth in a girl with an apparent balanced t(X;9) translocation (paper 5). In conclusion, the presented work clearly illustrates several important applications of array CGH in the field of clinical cytogenetics. The use of this new performant methodology will greatly improve the diagnostic yield in patients with unexplained mental retardation, provide more insights into genotype-phenotype correlations and ultimately lead to the identification of the causal genes. Functional studies of these gene products will enhance our understanding of the genetic regulation in normal human morphogenesis, embryogenesis and brain functioning. Finally, it is my believe that implementation of array CGH will represent a major and perhaps last wave of innovation in cytogenetics, as the latter may become largely redundant. Ultimately and perhaps earlier than we can anticipate, sequencing of the whole genome of a patient may eventually emerge as the method of choice

    Secure Automatic Speaker Verification Systems

    Get PDF
    The growing number of voice-enabled devices and applications consider automatic speaker verification (ASV) a fundamental component. However, maximum outreach for ASV in critical domains e.g., financial services and health care, is not possible unless we overcome security breaches caused by voice cloning, and replayed audios collectively known as the spoofing attacks. The audio spoofing attacks over ASV systems on one hand strictly limit the usability of voice-enabled applications; and on the other hand, the counterfeiter also remains untraceable. Therefore, to overcome these vulnerabilities, a secure ASV (SASV) system is presented in this dissertation. The proposed SASV system is based on the concept of novel sign modified acoustic local ternary pattern (sm-ALTP) features and asymmetric bagging-based classifier-ensemble. The proposed audio representation approach clusters the high and low-frequency components in audio frames by normally distributing frequency components against a convex function. Then, the neighborhood statistics are applied to capture the user specific vocal tract information. This information is then utilized by the classifier ensemble that is based on the concept of weighted normalized voting rule to detect various spoofing attacks. Contrary to the existing ASV systems, the proposed SASV system not only detects the conventional spoofing attacks (i.e. voice cloning, and replays), but also the new attacks that are still unexplored by the research community and a requirement of the future. In this regard, a concept of cloned replays is presented in this dissertation, where, replayed audios contains the microphone characteristics as well as the voice cloning artifacts. This depicts the scenario when voice cloning is applied in real-time. The voice cloning artifacts suppresses the microphone characteristics thus fails replay detection modules and similarly with the amalgamation of microphone characteristics the voice cloning detection gets deceived. Furthermore, the proposed scheme can be utilized to obtain a possible clue against the counterfeiter through voice cloning algorithm detection module that is also a novel concept proposed in this dissertation. The voice cloning algorithm detection module determines the voice cloning algorithm used to generate the fake audios. Overall, the proposed SASV system simultaneously verifies the bonafide speakers and detects the voice cloning attack, cloning algorithm used to synthesize cloned audio (in the defined settings), and voice-replay attacks over the ASVspoof 2019 dataset. In addition, the proposed method detects the voice replay and cloned voice replay attacks over the VSDC dataset. Rigorous experimentation against state-of-the-art approaches also confirms the robustness of the proposed research

    Automatic Framework to Aid Therapists to Diagnose Children who Stutter

    Get PDF

    cii Student Papers - 2021

    Get PDF
    In this collection of papers, we, the Research Group Critical Information Infrastructures (cii) from the Karlsruhe Institute of Technology, present nine selected student research articles contributing to the design, development, and evaluation of critical information infrastructures. During our courses, students mostly work in groups and deal with problems and issues related to sociotechnical challenges in the realm of (critical) information systems. Student papers came from four different cii courses, namely Emerging Trends in Digital Health, Emerging Trends in Internet Technologies, Critical Information Infrastructures, and Digital Health in the winter term of 2020 and summer term of 2021

    A Multimodal and Multi-Algorithmic Architecture for Data Fusion in Biometric Systems

    Get PDF
    Software di autenticazione basato su tratti biometric

    A survey on perceived speaker traits: personality, likability, pathology, and the first challenge

    Get PDF
    The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks
    corecore