198 research outputs found

    Audio-assisted movie dialogue detection

    Get PDF
    An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the cross-correlation and the magnitude of the corresponding the cross-power spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptions, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported. © 2008 IEEE

    ANALYZING CUSTOMER REVIEWS IN TURKISH USING MACHINE LEARNING AND DATA SCIENCE METHODOLOGIES

    Get PDF
    Digital life, especially after the introduction of Web 2.0, has significantly altered human relations, providing all people the “right of public speech”. Ideas, emotions, and opinions on many topics are generously shared in virtual environments. A new age global and digital Mouth of World is shaping the society where knowledge is the most influential power. Being fed by social media data highly dynamic in either amount or shape, automatic handling is indispensable. Natural Language Processing, in cooperation with Machine Language techniques, has an important say in analyzing written textual data. Traditional techniques exploited in the literature are empowered when hybrid ones are applied, in accordance also with the characteristic properties of the language used and the domain-specific data. Although all the subsequent steps of the text classification chain are important, adequate feature selecting has a notable huge impact on accurate classification prediction. In this study, a simple classification of the sentiment polarity of comments in document level of subjective texts in Turkish is done. Different domains include reviews of customers towards company products, movies, and healthcare services, deciding on the positivity or negativity of the comments. Another domain includes doctors’ notes on patients’ symptoms aiming to predict and thus recommend some of the most often used medical tests according to general doctors’ procedures. The features used included a part of or all distinct words roots together with their binary or frequency information. Linear or vector analysis of the feature sets was done employing Machine Learning algorithms provided by the Weka tool. Hybrid features set was proposed and found more efficient combining binary vectors and frequency meta-features from nodes and leaves of J48 tree classifier for all or a set of correlation based selected features, improving both prediction accuracy and classification performance

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    Machine Learning Approaches and Web-Based System to the Application of Disease Modifying Therapy for Sickle Cell

    Get PDF
    Sickle cell disease (SCD) is a common serious genetic disease, which has a severe impact due to red blood cell (RBCs) abnormality. According to the World Health Organisation, 7 million newborn babies each year suffer either from the congenital anomaly or from an inherited disease, primarily from thalassemia and sickle cell disease. In the case of SCD, recent research has shown the beneficial effects of a drug called hydroxyurea/hydroxycarbamide in modifying the disease phenotype. The clinical management of this disease-modifying therapy is difficult and time consuming for clinical staff. This includes finding an optimal classifier that can help to solve the issues with missing values, multi-class datasets, and features selection. For the classification and discriminant analysis of SCD datasets, 7 classifiers based on machine learning models are selected representing linear and non-linear methods. After running these classifiers with a single model, the results revealed that a single classifier has provided us with effective outcomes in terms of the classification performance evaluation metric. In order to produce such an optimal outcome, this research proposed and designed combined classifiers (ensemble classifiers) among the neural network’s models, the random forest classifier, and the K-nearest neighbour classifier. In this aspect, combining the levenberg-marquardt algorithm, the voted perceptron classifier, the radial basis neural classifier, and random forest classifier obtain the highest rate of performance and accuracy. This ensemble classifier receives better results during the training set and testing set process. Recent technology advances based on smart devices have improved the medical facilities and become increasingly popular in association with real-time health monitoring and remote/personal health-care. The web-based system developed under the supervision of the haematology specialist at the Alder Hey Children’s Hospital in order to produce such an effective and useful system for both patients and clinicians. To sum up, the simulation experiment concludes that using machine learning and the web-based system platforms represents an alternative procedure that could assist healthcare professionals, particularly for the specialist nurse and junior doctor to improve the quality of care with sickle cell disorder

    Contributions on distance-based algorithms, multi-classifier construction and pairwise classification

    Get PDF
    179 p.Aurkezten den ikerketa lan honetan saikapen atazak landu dira, non helburua,sailkapen gainbegiratuaren artearen-egoera aberastea izan den. Sailkapengainbegiratuaren zenbait estrategi analizatu dira, beraien ezaugarri etaahuleziak aztertuz. Beraz, ezaugarri positiboak mantenduz, ahuleziak hobetzekosaiakera egin da. Hau burutu ahal izateko, sailkapen gainbegiratuarenzenbait estrategi konbinatzeaz gain, zenbait bilaketa heuristiko ere erabili dira.Sailkapen gainbegiratuko 3 ikerketa lerro desberdinetan burutu dira ekarpenak.Aurkezten diren lehenengo proposamenak, K-NN algoritmoan zentratzendira, honen zenbait bertsio aurkezten direlarik. Ondoren sailkatzaileen konbinaketarekinerlazionatutako beste lan bat aurkezten da. Eta azkenik, binakakosailkapenaren zenbait estrategi berritzaile proposatzen dira. Ekarpenhauek aldizkari edo konferentzi internazionaletan publikatuak edo bidaliakizan dira.Buruturiko experimentuetan, proposatutako algoritmoak artearen-estatuanaurkituriko zenbait algoritmorekin konparatu dira, emaitza interesgarriak lortuaz.Honetaz gain, emaitza hauetatik ondorio esanguratsuak eskuratzeko asmoz,test estatistikoen erabilera ere burutu da

    Classification Techniques Using EHG Signals for Detecting Preterm Births

    Get PDF
    Premature birth is defined as an infant born before 37 weeks of gestation and can be sub-categorized into three phrases; late preterm delivery between 34 and 36 weeks of gestation; moderately preterm between 32 and 34 weeks, and extreme preterm less than 28 weeks of gestation. Globally, the rate of preterm births is increasing, thus resulting in significant health, development and economic problems. The current methods for the detection of preterm birth are inadequate due to the fact that the exact cause of premature uterine contractions leading to delivery is mostly unknown. Another problem is the interpretation of temporal and spectral characteristics of Electromyography (EMG), which is an electrodiagnostic medicine technique for recording and evaluating the electrical activity produced by uterine muscles during pregnancy and parturition – significant variability exists among obstetric care practitioners. Apart from a small number of potential causes for preterm birth, such as medication, uterine over-distension, preterm premature rupture of membranes (PPROM), intrauterine inflammation, precocious foetal endocrine activation, surgery, ethnicity and lifestyle, there is still a large amount of uncertainty about their specific risks. Hence, it is currently very difficult to make reliable predictions about preterm delivery risk. There has also been some evidence that the analysis of uterine electrical signals, collected from the abdominal surface, could provide an independent and easier way to diagnose true labour and detect the onset of preterm delivery. Early detection opens up new avenues for the development of an automated ambulatory system, based on uterine EMG, for patient monitoring during pregnancy. This can be made possible through the use of machine learning. The essence of machine learning is the utilisation of previously recorded data outcomes to train algorithms to ii stimulate software learning elements. Such learned models can, as a result, be used to detect and predict the early signs associated with the onset of preterm birth. Therefore in this thesis, Electrohysterography signals are used to classify uterine activity associated with preterm birth. This is achieved using an open dataset, which contains 262 records for women who delivered at term and 38 who delivered prematurely. Several new features from Electromyography studies are utilized, as well as feature-ranking techniques to determine their discriminative capabilities in detecting term and preterm records. The results illustrate that the combination of the Levenberg-Marquardt trained Feed-Forward Neural Network, Radial Basis Function Neural Network and the Random Neural Network classifiers performed the best, with 91% for sensitivity, 84% for specificity, 94% for the area under the curve and 12% for the mean error rate. Applying advanced machine learning algorithms, in conjunction with innovative signal processing techniques and the analysis of Electrohysterography signals shows significant benefits for use in clinical interventions for preterm birth assessments

    Selected Computing Research Papers Volume 5 June 2016

    Get PDF
    An Analysis of Current Computer Assisted Learning Techniques Aimed at Boosting Pass Rate Level and Interactivity of Students (Gilbert Bosilong) ........................................ 1 Evaluating the Ability of Anti-Malware to Overcome Code Obfuscation (Matthew Carson) .................................................................................................................................. 9 Evaluation of Current Research in Machine Learning Techniques Used in Anomaly-Based Network Intrusion Detection (Masego Chibaya) ..................................................... 15 A Critical Evaluation of Current Research on Techniques Aimed at Improving Search Efficiency over Encrypted Cloud Data (Kgosi Dickson) ........................................ 21 A Critical Analysis and Evaluation of Current Research on Credit Card Fraud Detection Methods (Lebogang Otto Gaboitaolelwe) .......................................................... 29 Evaluation of Research in Automatic Detection of Emotion from Facial Expressions (Olorato D. Gaonewe) ......................................................................................................... 35 A Critical Evaluation on Methods of Increasing the Detection Rate of Anti-Malware Software (Thomas Gordon) ................................................................................................ 43 An Evaluation of the Effectiveness of the Advanced Intrusion Detection Systems Utilizing Optimization on System Security Technologies (Carlos Lee) ............................ 49 An Evaluation of Current Research on Data Mining Techniques in Decision Support (Keamogetse Mojapelo) ...................................................................................................... 57 A Critical Investigation of the Cognitive Appeal and Impact of Video Games on Players (Kealeboga Charlie Mokgalo) ................................................................................ 65 Evaluation of Computing Research Aimed at Improving Virtualization Implementation in the Cloud (Keletso King Mooketsane) ................................................. 73 A Critical Evaluation of the Technology Used In Robotic Assisted Surgeries (Botshelo Keletso Mosekiemang) ....................................................................................... 79 An Evaluation of Current Bio-Metric Fingerprint Liveness Detection (George Phillipson) ........................................................................................................................... 85 A Critical Evaluation of Current Research into Malware Detection Using Neural-Network Classification (Tebogo Duduetsang Ramatebele) ................................................ 91 Evaluating Indirect Detection of Obfuscated Malware (Benjamin Stuart Roberts) ......... 101 Evaluation of Current Security Techniques for Online Banking Transactions (Annah Vickerman) ....................................................................................................................... 10

    Preference Learning for Machine Translation

    Get PDF
    Automatic translation of natural language is still (as of 2017) a long-standing but unmet promise. While advancing at a fast rate, the underlying methods are still far from actually being able to reliably capture syntax or semantics of arbitrary utterances of natural language, way off transporting the encoded meaning into a second language. However, it is possible to build useful translating machines when the target domain is well known and the machine is able to learn and adapt efficiently and promptly from new inputs. This is possible thanks to efficient and effective machine learning methods which can be applied to automatic translation. In this work we present and evaluate methods for three distinct scenarios: a) We develop algorithms that can learn from very large amounts of data by exploiting pairwise preferences defined over competing translations, which can be used to make a machine translation system robust to arbitrary texts from varied sources, but also enable it to learn effectively to adapt to new domains of data; b) We describe a method that is able to efficiently learn external models which adhere to fine-grained preferences that are extracted from a constricted selection of translated material, e.g. for adapting to users or groups of users in a computer-aided translation scenario; c) We develop methods for two machine translation paradigms, neural- and traditional statistical machine translation, to directly adapt to user-defined preferences in an interactive post-editing scenario, learning precisely adapted machine translation systems. In all of these settings, we show that machine translation can be made significantly more useful by careful optimization via preference learning

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
    corecore