1,583 research outputs found

    Machine Learning Methods for Finding Textual Features of Depression from Publications

    Get PDF
    Depression is a common but serious mood disorder. In 2015, WHO reports about 322 million people were living with some form of depression, which is the leading cause of ill health and disability worldwide. In USA, there are approximately 14.8 million American adults (about 6.7% percent of the US population) affected by major depressive disorder. Most individuals with depression are not receiving adequate care because the symptoms are easily neglected and most people are not even aware of their mental health problems. Therefore, a depression prescreen system is greatly beneficial for people to understand their current mental health status at an early stage. Diagnosis of depressions, however, is always extremely challenging due to its complicated, many and various symptoms. Fortunately, publications have rich information about various depression symptoms. Text mining methods can discover the different depression symptoms from literature. In order to extract these depression symptoms from publications, machine learning approaches are proposed to overcome four main obstacles: (1) represent publications in a mathematical form; (2) get abstracts from publications; (3) remove the noisy publications to improve the data quality; (4) extract the textual symptoms from publications. For the first obstacle, we integrate Word2Vec with LDA by either representing publications with document-topic distance distributions or augmenting the word-to-topic and word-to-word vectors. For the second obstacle, we calculate a document vector and its paragraph vectors by aggregating word vectors from Word2Vec. Feature vectors are calculated by clustering word vectors. Selected paragraphs are decided by the similarity of their distances to feature vectors and the document vector to feature vectors. For the third obstacle, one class SVM model is trained by vectored publications, and outlier publications are excluded by distance measurements. For the fourth obstacle, we fully evaluate the possibility of a word as a symptom according to its frequency in entire publications, and local relationship with its surrounding words in a publication

    A Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy

    Get PDF
    Text classification, the task of metadata to documents, needs a person to take significant time and effort. Since online-generated contents are explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Recently, various state-or-art text mining methods have been applied to classification process based on the keywords extraction. However, when using these keywords as features in the classification task, it is common that the number of feature dimensions is large. In addition, how to select keywords from documents as features in the classification task is a big challenge. Especially, when using traditional machine learning algorithms in big data, the computation time is very long. On the other hand, about 80% of real data is unstructured and non-labeled in the real world. The conventional supervised feature selection methods cannot be directly used in selecting entities from massive data. Usually, statistical strategies are utilized to extract features from unlabeled data for classification tasks according to their importance scores. We propose a novel method to extract key features effectively before feeding them into the classification assignment. Another challenge in the text classification is the multi-label problem, the assignment of multiple non-exclusive labels to documents. This problem makes text classification more complicated compared with a single label classification. For the above issues, we develop a framework for extracting data and reducing data dimension to solve the multi-label problem on labeled and unlabeled datasets. In order to reduce data dimension, we develop a hybrid feature selection method that extracts meaningful features according to the importance of each feature. The Word2Vec is applied to represent each document by a feature vector for the document categorization for the big dataset. The unsupervised approach is used to extract features from real online-generated data for text classification. Our unsupervised feature selection method is applied to extract depression symptoms from social media such as Twitter. In the future, these depression symptoms will be used for depression self-screening and diagnosis

    A Physiological Signal Processing System for Optimal Engagement and Attention Detection.

    Get PDF
    In today’s high paced, hi-tech and high stress environment, with extended work hours, long to-do lists and neglected personal health, sleep deprivation has become common in modern culture. Coupled with these factors is the inherent repetitious and tedious nature of certain occupations and daily routines, which all add up to an undesirable fluctuation in individuals’ cognitive attention and capacity. Given certain critical professions, a momentary or prolonged lapse in attention level can be catastrophic and sometimes deadly. This research proposes to develop a real-time monitoring system which uses fundamental physiological signals such as the Electrocardiograph (ECG), to analyze and predict the presence or lack of cognitive attention in individuals during task execution. The primary focus of this study is to identify the correlation between fluctuating level of attention and its implications on the physiological parameters of the body. The system is designed using only those physiological signals that can be collected easily with small, wearable, portable and non-invasive monitors and thereby being able to predict well in advance, an individual’s potential loss of attention and ingression of sleepiness. Several advanced signal processing techniques have been implemented and investigated to derive multiple clandestine and informative features. These features are then applied to machine learning algorithms to produce classification models that are capable of differentiating between the cases of a person being attentive and the person not being attentive. Furthermore, Electroencephalograph (EEG) signals are also analyzed and classified for use as a benchmark for comparison with ECG analysis. For the study, ECG signals and EEG signals of volunteer subjects are acquired in a controlled experiment. The experiment is designed to inculcate and sustain cognitive attention for a period of time following which an attempt is made to reduce cognitive attention of volunteer subjects. The data acquired during the experiment is decomposed and analyzed for feature extraction and classification. The presented results show that to a fairly reasonable accuracy it is possible to detect the presence or lack of attention in individuals with just their ECG signal, especially in comparison with analysis done on EEG signals. The continual work of this research includes other physiological signals such as Galvanic Skin Response, Heat Flux, Skin Temperature and video based facial feature analysis

    Review of feature selection techniques in Parkinson's disease using OCT-imaging data

    Get PDF
    Several spectral-domain optical coherence tomography studies (OCT) reported a decrease on the macular region of the retina in Parkinson’s disease. Yet, the implication of retinal thinning with visual disability is still unclear. Macular scans acquired from patients with Parkinson’s disease (n = 100) and a control group (n = 248) were used to train several supervised classification models. The goal was to determine the most relevant retinal layers and regions for diagnosis, for which univari- ate and multivariate filter and wrapper feature selection methods were used. In addition, we evaluated the classification ability of the patient group to assess the applicability of OCT measurements as a biomarker of the disease

    Review of feature selection techniques in Parkinson's disease using OCT-imaging data

    Get PDF
    Several spectral-domain optical coherence tomography studies (OCT) reported a decrease on the macular region of the retina in Parkinson’s disease. Yet, the implication of retinal thinning with visual disability is still unclear. Macular scans acquired from patients with Parkinson’s disease (n = 100) and a control group (n = 248) were used to train several supervised classification models. The goal was to determine the most relevant retinal layers and regions for diagnosis, for which univari- ate and multivariate filter and wrapper feature selection methods were used. In addition, we evaluated the classification ability of the patient group to assess the applicability of OCT measurements as a biomarker of the disease

    Machine learning techniques implementation in power optimization, data processing, and bio-medical applications

    Get PDF
    The rapid progress and development in machine-learning algorithms becomes a key factor in determining the future of humanity. These algorithms and techniques were utilized to solve a wide spectrum of problems extended from data mining and knowledge discovery to unsupervised learning and optimization. This dissertation consists of two study areas. The first area investigates the use of reinforcement learning and adaptive critic design algorithms in the field of power grid control. The second area in this dissertation, consisting of three papers, focuses on developing and applying clustering algorithms on biomedical data. The first paper presents a novel modelling approach for demand side management of electric water heaters using Q-learning and action-dependent heuristic dynamic programming. The implemented approaches provide an efficient load management mechanism that reduces the overall power cost and smooths grid load profile. The second paper implements an ensemble statistical and subspace-clustering model for analyzing the heterogeneous data of the autism spectrum disorder. The paper implements a novel k-dimensional algorithm that shows efficiency in handling heterogeneous dataset. The third paper provides a unified learning model for clustering neuroimaging data to identify the potential risk factors for suboptimal brain aging. In the last paper, clustering and clustering validation indices are utilized to identify the groups of compounds that are responsible for plant uptake and contaminant transportation from roots to plants edible parts --Abstract, page iv

    Imparting Systems Engineering Experience via Interactive Fiction Serious Games

    Get PDF
    Serious games for education are becoming increasing popular. Interactive fiction games are some of the most popular in app stores and are also beginning to be heavily used in education to teach analysis and decision-making. Noting that it is difficult for systems engineers to experience all necessary situations which prepare them for the role of a chief engineer, in this paper, we explore the use of interactive fiction serious games to impart systems engineering experience and to teach systems engineering principles. The results of a cognitive viability, qualitative viability, and replayability analysis of 14 systems engineering serious games developed in the interactive fiction genre are presented. The analysis demonstrates that students with a systems engineering background are able to learn the Twine gaming engine and create a serious game aligned to the Apply level of Bloom’s Taxonomy which conveys a systems engineering experience and teaches a systems engineering principle within a four-week period of time. These quickly generated games cognitive, quality, and replayability scores indicate they provide some opportunity for high-level thinking, are of high quality, and with above average replayability, are likely to be played multiple times and/or recommended to others

    Framework for data quality in knowledge discovery tasks

    Get PDF
    Actualmente la explosión de datos es tendencia en el universo digital debido a los avances en las tecnologías de la información. En este sentido, el descubrimiento de conocimiento y la minería de datos han ganado mayor importancia debido a la gran cantidad de datos disponibles. Para un exitoso proceso de descubrimiento de conocimiento, es necesario preparar los datos. Expertos afirman que la fase de preprocesamiento de datos toma entre un 50% a 70% del tiempo de un proceso de descubrimiento de conocimiento. Herramientas software basadas en populares metodologías para el descubrimiento de conocimiento ofrecen algoritmos para el preprocesamiento de los datos. Según el cuadrante mágico de Gartner de 2018 para ciencia de datos y plataformas de aprendizaje automático, KNIME, RapidMiner, SAS, Alteryx, y H20.ai son las mejores herramientas para el desucrimiento del conocimiento. Estas herramientas proporcionan diversas técnicas que facilitan la evaluación del conjunto de datos, sin embargo carecen de un proceso orientado al usuario que permita abordar los problemas en la calidad de datos. Adem´as, la selección de las técnicas adecuadas para la limpieza de datos es un problema para usuarios inexpertos, ya que estos no tienen claro cuales son los métodos más confiables. De esta forma, la presente tesis doctoral se enfoca en abordar los problemas antes mencionados mediante: (i) Un marco conceptual que ofrezca un proceso guiado para abordar los problemas de calidad en los datos en tareas de descubrimiento de conocimiento, (ii) un sistema de razonamiento basado en casos que recomiende los algoritmos adecuados para la limpieza de datos y (iii) una ontología que representa el conocimiento de los problemas de calidad en los datos y los algoritmos de limpieza de datos. Adicionalmente, esta ontología contribuye en la representacion formal de los casos y en la fase de adaptación, del sistema de razonamiento basado en casos.The creation and consumption of data continue to grow by leaps and bounds. Due to advances in Information and Communication Technologies (ICT), today the data explosion in the digital universe is a new trend. The Knowledge Discovery in Databases (KDD) gain importance due the abundance of data. For a successful process of knowledge discovery is necessary to make a data treatment. The experts affirm that preprocessing phase take the 50% to 70% of the total time of knowledge discovery process. Software tools based on Knowledge Discovery Methodologies offers algorithms for data preprocessing. According to Gartner 2018 Magic Quadrant for Data Science and Machine Learning Platforms, KNIME, RapidMiner, SAS, Alteryx and H20.ai are the leader tools for knowledge discovery. These software tools provide different techniques and they facilitate the evaluation of data analysis, however, these software tools lack any kind of guidance as to which techniques can or should be used in which contexts. Consequently, the use of suitable data cleaning techniques is a headache for inexpert users. They have no idea which methods can be confidently used and often resort to trial and error. This thesis presents three contributions to address the mentioned problems: (i) A conceptual framework to provide the user a guidance to address data quality issues in knowledge discovery tasks, (ii) a Case-based reasoning system to recommend the suitable algorithms for data cleaning, and (iii) an Ontology that represent the knowledge in data quality issues and data cleaning methods. Also, this ontology supports the case-based reasoning system for case representation and reuse phase.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Fernando Fernández Rebollo.- Secretario: Gustavo Adolfo Ramírez.- Vocal: Juan Pedro Caraça-Valente Hernánde

    A new approach to securing passwords using a probabilistic neural network based on biometric keystroke dynamics

    Get PDF
    Passwords are a common means of identifying an individual user on a computer system. However, they are only as secure as the computer user is vigilant in keeping them confidential. This thesis presents new methods for the strengthening of password security by employing the biometric feature of keystroke dynamics. Keystroke dynamics refers to the unique rhythm generated when keys are pressed as a person types on a computer keyboard. The aim is to make the positive identification of a computer user more robust by analysing the way in which a password is typed and not just the content of what is typed. Two new methods for implementing a keystroke dynamic system utilising neural networks are presented. The probabilistic neural network is shown to perform well and be more suited to the application than traditional backpropagation method. An improvement of 6% in the false acceptance and false rejection errors is observed along with a significant decrease in training time. A novel time sequenced method using a cascade forward neural network is demonstrated. This is a totally new approach to the subject of keystroke dynamics and is shown to be a very promising method The problems encountered in the acquisition of keystroke dynamics which, are often ignored in other research in this area, are explored, including timing considerations and keyboard handling. The features inherent in keystroke data are explored and a statistical technique for dealing with the problem of outlier datum is implemented.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Natural Selection For Disease Resistance In Hybrid Poplars Targets Stomatal Patterning Traits And Regulatory Genes.

    Get PDF
    The evolution of disease resistance in plants occurs within a framework of interacting phenotypes, balancing natural selection for life-history traits along a continuum of fast-growing and poorly defended, or slow-growing and well-defended lifestyles. Plant populations connected by gene flow are physiologically limited to evolving along a single axis of the spectrum of the growth-defense trade-off, and strong local selection can purge phenotypic variance from a population or species, making it difficult to detect variation linked to the trade-off. Hybridization between two species that have evolved different growth-defense trade-off optima can reveal trade-offs hidden in either species by introducing phenotypic and genetic variance. Here, I investigated the phenotypic and genetic basis for variation of disease resistance in a set of naturally formed hybrid poplars. The focal species of this dissertation were the balsam poplar (Populus balsamifera), black balsam poplar (P. trichocarpa), narrowleaf cottonwood (P. angustifolia), and eastern cottonwood (P. deltoides). Vegetative cuttings of samples were collected from natural populations and clonally replicated in a common garden. Ecophysiology and stomata traits, and the severity of poplar leaf rust disease (Melampsora medusae) were collected. To overcome the methodological bottleneck of manually phenotyping stomata density for thousands of cuticle micrographs, I developed a publicly available tool to automatically identify and count stomata. To identify stomata, a deep con- volutional neural network was trained on over 4,000 cuticle images of over 700 plant species. The neural network had an accuracy of 94.2% when applied to new cuticle images and phenotyped hundreds of micrographs in a matter of minutes. To understand how disease severity, stomata, and ecophysiology traits changed as a result of hybridization, statistical models were fit that included the expected proportion of the genome from either parental species in a hybrid. These models in- dicated that the ratio of stomata on the upper surface of the leaf to the total number of stomata was strongly linked to disease, was highly heritable, and wass sensitive to hybridization. I further investigated the genomic basis of stomata-linked disease variation by performing an association genetic analysis that explicitly incorporated admixture. Positive selection in genes involved in guard cell regulation, immune sys- tem negative regulation, detoxification, lipid biosynthesis, and cell wall homeostasis were identified. Together, my dissertation incorporated advances in image-based phenotyping with evolutionary theory, directed at understanding how disease frequency changes when hybridization alters the genomes of a population
    • …
    corecore