2,990 research outputs found

    Unconventional machine learning of genome-wide human cancer data

    Full text link
    Recent advances in high-throughput genomic technologies coupled with exponential increases in computer processing and memory have allowed us to interrogate the complex aberrant molecular underpinnings of human disease from a genome-wide perspective. While the deluge of genomic information is expected to increase, a bottleneck in conventional high-performance computing is rapidly approaching. Inspired in part by recent advances in physical quantum processors, we evaluated several unconventional machine learning (ML) strategies on actual human tumor data. Here we show for the first time the efficacy of multiple annealing-based ML algorithms for classification of high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas. To assess algorithm performance, we compared these classifiers to a variety of standard ML methods. Our results indicate the feasibility of using annealing-based ML to provide competitive classification of human cancer types and associated molecular subtypes and superior performance with smaller training datasets, thus providing compelling empirical evidence for the potential future application of unconventional computing architectures in the biomedical sciences

    Predicting Graph Categories from Structural Properties

    Get PDF
    Complex networks are often categorized according to the underlying phenomena that they represent such as molecular interactions, re-tweets, and brain activity. In this work, we investigate the problem of predicting the category (domain) of arbitrary networks. This includes complex networks from different domains as well as synthetically generated graphs from five different network models. A classification accuracy of 96.6% is achieved using a random forest classifier with both real and synthetic networks. This work makes two important findings. First, our results indicate that complex networks from various domains have distinct structural properties that allow us to predict with high accuracy the category of a new previously unseen network. Second, synthetic graphs are trivial to classify as the classification model can predict with near-certainty the network model used to generate it. Overall, the results demonstrate that networks drawn from different domains (and network models) are trivial to distinguish using only a handful of simple structural properties

    Computational models and approaches for lung cancer diagnosis

    Full text link
    The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results

    Advanced Text Analytics and Machine Learning Approach for Document Classification

    Get PDF
    Text classification is used in information extraction and retrieval from a given text, and text classification has been considered as an important step to manage a vast number of records given in digital form that is far-reaching and expanding. This thesis addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with other classes for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify patent document as well as to generate useful tag-words. The overall objective of this work is to systematize NASA’s patent management, by developing a set of automated tools that can assist NASA to manage and market its portfolio of intellectual properties (IP), and to enable easier discovery of relevant IP by users. We have identified an array of methods that can be applied such as k-Nearest Neighbors (kNN), two variations of the Support Vector Machine (SVM) algorithms, and two tree based classification algorithms: Random Forest and J48. The major research steps in this work consist of filtering techniques for variable selection, information gain and feature correlation analysis, and training and testing potential models using effective classifiers. Further, the obstacles associated with the imbalanced data were mitigated by adding synthetic data wherever appropriate, which resulted in a superior SVM classifier based model

    Investigation of novel methodologies using reactive power reserves for online voltage stability margin monitoring and control

    Get PDF
    As the amount of uncertainty in online power system operations grows, new methodologies need to be devised in order to timely monitor and control the power grid. In this work, novel techniques for online voltage stability margin monitoring and control have been developed with a focus on reactive power reserves. The maintenance of adequate reactive power reserves (RPRs) is a critical step in avoiding a voltage collapse. A thorough investigation of the relationship between different definitions of reactive power reserves and how they are related to voltage stability margin (VSM) is performed. Multi-linear regression models are used to relate RPRs and VSM. Several operating conditions and a significantly large number of different network topologies, including NERC category B, C and D outages are considered as well. A classification tool is then developed in order to identify which regression model needs to be used based on system conditions and network topology. The approach is tested in the IEEE 30 bus test system and in a reduced case of the eastern power system interconnection of the United States. Results have shown that the approach can monitor voltage stability margin in real time based on the amount of system wide reactive power reserves. In case degenerative system conditions are identified, control actions need to be put in place to increase the amount of RPRs and system VSM. A novel control method is proposed here in order to identify the location and amount of control necessary to recover RPRs, VSM and to remove existing voltage violations. The approach is based on the identification of a critical set of generators that, if exhausted, will directly contribute to a voltage collapse. Potential control actions are investigated in order to recover those critical reactive power reserves, namely: active power re-dispatch, capacitor switching, active and reactive power load shedding. The effectiveness of each control variables on RPRs is calculated using reactive power reserve sensitivities, a concept introduced in this work. Once these sensitivities are calculated, the problem of recovering RPRs and VSM is formulated as convex quadratic optimization problem with a reduced dimension. Results on the IEEE 30 bus test system and the IEEE 118 bus test system are used to illustrate the efficacy of the approach

    Advanced Text Analytics and Machine Learning Approach for Document Classification

    Get PDF
    Text classification is used in information extraction and retrieval from a given text, and text classification has been considered as an important step to manage a vast number of records given in digital form that is far-reaching and expanding. This thesis addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with other classes for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify patent document as well as to generate useful tag-words. The overall objective of this work is to systematize NASA’s patent management, by developing a set of automated tools that can assist NASA to manage and market its portfolio of intellectual properties (IP), and to enable easier discovery of relevant IP by users. We have identified an array of methods that can be applied such as k-Nearest Neighbors (kNN), two variations of the Support Vector Machine (SVM) algorithms, and two tree based classification algorithms: Random Forest and J48. The major research steps in this work consist of filtering techniques for variable selection, information gain and feature correlation analysis, and training and testing potential models using effective classifiers. Further, the obstacles associated with the imbalanced data were mitigated by adding synthetic data wherever appropriate, which resulted in a superior SVM classifier based model

    Support vector machines to detect physiological patterns for EEG and EMG-based human-computer interaction:a review

    Get PDF
    Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported

    Advances in power quality analysis techniques for electrical machines and drives: a review

    Get PDF
    The electric machines are the elements most used at an industry level, and they represent the major power consumption of the productive processes. Particularly speaking, among all electric machines, the motors and their drives play a key role since they literally allow the motion interchange in the industrial processes; it could be said that they are the medullar column for moving the rest of the mechanical parts. Hence, their proper operation must be guaranteed in order to raise, as much as possible, their efficiency, and, as consequence, bring out the economic benefits. This review presents a general overview of the reported works that address the efficiency topic in motors and drives and in the power quality of the electric grid. This study speaks about the relationship existing between the motors and drives that induces electric disturbances into the grid, affecting its power quality, and also how these power disturbances present in the electrical network adversely affect, in turn, the motors and drives. In addition, the reported techniques that tackle the detection, classification, and mitigations of power quality disturbances are discussed. Additionally, several works are reviewed in order to present the panorama that show the evolution and advances in the techniques and tendencies in both senses: motors and drives affecting the power source quality and the power quality disturbances affecting the efficiency of motors and drives. A discussion of trends in techniques and future work about power quality analysis from the motors and drives efficiency viewpoint is provided. Finally, some prompts are made about alternative methods that could help in overcome the gaps until now detected in the reported approaches referring to the detection, classification and mitigation of power disturbances with views toward the improvement of the efficiency of motors and drives.Peer ReviewedPostprint (published version
    • …
    corecore