7,808 research outputs found

    Twitter gender classification using user unstructured information

    Get PDF
    This paper describes an approach to automatically detect the gender of Twitter users, based only on clues provided by their profile information in an unstructured form. A number of features that capture phenomena specific of Twitter users is proposed and evaluated on a dataset of about 242K English language users. Different supervised and unsupervised approaches are used to assess the performance of the proposed features, including Naive Bayes variants, Logistic Regression, Support Vector Machines, Fuzzy c-Means clustering, and K-means. An unsupervised approach based on Fuzzy c-Means proved to be very suitable for this task, returning the correct gender for about 96% of the users.info:eu-repo/semantics/acceptedVersio

    A Comparative Study of Machine Learning Models for Tabular Data Through Challenge of Monitoring Parkinson's Disease Progression Using Voice Recordings

    Full text link
    People with Parkinson's disease must be regularly monitored by their physician to observe how the disease is progressing and potentially adjust treatment plans to mitigate the symptoms. Monitoring the progression of the disease through a voice recording captured by the patient at their own home can make the process faster and less stressful. Using a dataset of voice recordings of 42 people with early-stage Parkinson's disease over a time span of 6 months, we applied multiple machine learning techniques to find a correlation between the voice recording and the patient's motor UPDRS score. We approached this problem using a multitude of both regression and classification techniques. Much of this paper is dedicated to mapping the voice data to motor UPDRS scores using regression techniques in order to obtain a more precise value for unknown instances. Through this comparative study of variant machine learning methods, we realized some old machine learning methods like trees outperform cutting edge deep learning models on numerous tabular datasets.Comment: Accepted at "HIMS'20 - The 6th Int'l Conf on Health Informatics and Medical Systems"; https://americancse.org/events/csce2020/conferences/hims2

    Automatic classification of speaker characteristics

    Get PDF

    A new tool for the evaluation of the rehabilitation outcomes in older persons. a machine learning model to predict functional status 1 year ahead

    Get PDF
    Purpose To date, the assessment of disability in older people is obtained utilizing a Comprehensive Geriatric Assessment (CGA). However, it is often difficult to understand which areas of CGA are most predictive of the disability. The aim of this study is to evaluate the possibility to early predict—1year ahead—the disability level of a patient using machine leaning models. Methods Community-dwelling older people were enrolled in this study. CGA was made at baseline and at 1year follow-up. After collecting input/independent variables (i.e., age, gender, schooling followed, body mass index, information on smoking, polypharmacy, functional status, cognitive performance, depression, nutritional status), we performed two distinct Support Vector Machine models (SVMs) able to predict functional status 1year ahead. To validate the choice of the model, the results achieved with the SVMs were compared with the output produced by simple linear regression models. Results 218 patients (mean age = 78.01; SD = 7.85; male = 39%) were recruited. The combination of the two SVMs is able to achieve a higher prediction accuracy (exceeding 80% instances correctly classified vs 67% instances correctly classified by the combination of the two linear regression models). Furthermore, SVMs are able to classify both the three categories, self sufficiently, disability risk and disability, while linear regression model separates the population only in two groups (self-sufficiency and disability) without identifying the intermediate category (disability risk) which turns out to be the most critical one. Conclusions The development of such a model can contribute to the early detection of patients at risk of self-sufficiency loss

    The comparison study of kernel KC-means and support vector machines for classifying schizophrenia

    Get PDF
    Schizophrenia is one of mental disorder that affects the mind, feeling, and behavior. Its treatment is usually permanent and quite complicated; therefore, early detection is important. Kernel KC-means and support vector machines are the methods known as a good classifier. This research, therefore, aims to compare kernel KC-means and support vector machines, using data obtained from Northwestern University, which consists of 171 schizophrenia and 221 non-schizophrenia samples. The performance accuracy, F1-score, and running time were examined using the 10-fold cross-validation method. From the experiments, kernel KC-means with the sixth-order polynomial kernel gives 87.18 percent accuracy and 93.15 percent F1-score at the faster running time than support vector machines. However, with the same kernel, it was further deduced from the results that support vector machines provides better performance with an accuracy of 88.78 percent and F1-score of 94.05 percent

    Using unstructured profile information for gender classification of Portuguese and English

    Get PDF
    This paper reports experiments on automatically detecting the gender of Twitter users, based on unstructured information found on their Twitter profile. A set of features previously proposed is evaluated on two datasets of English and Portuguese users, and their performance is assessed using several supervised and unsupervised approaches, including Naive Bayes variants, Logistic Regression, Support Vector Machines, Fuzzy c-Means clustering, and k-means. Results show that features perform well in both languages separately, but even best results were achieved when combining both languages. Supervised approaches reached 97.9 % accuracy, but Fuzzy c-Means also proved suitable for this task achieving 96.4 % accuracy.info:eu-repo/semantics/acceptedVersio

    Survey of data mining approaches to user modeling for adaptive hypermedia

    Get PDF
    The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio
    corecore