3,569 research outputs found

    Knowledge Generation with Rule Induction in Cancer Omics

    Get PDF
    The explosion of omics data availability in cancer research has boosted the knowledge of the molecular basis of cancer, although the strategies for its definitive resolution are still not well established. The complexity of cancer biology, given by the high heterogeneity of cancer cells, leads to the development of pharmacoresistance for many patients, hampering the efficacy of therapeutic approaches. Machine learning techniques have been implemented to extract knowledge from cancer omics data in order to address fundamental issues in cancer research, as well as the classification of clinically relevant sub-groups of patients and for the identification of biomarkers for disease risk and prognosis. Rule induction algorithms are a group of pattern discovery approaches that represents discovered relationships in the form of human readable associative rules. The application of such techniques to the modern plethora of collected cancer omics data can effectively boost our understanding of cancer-related mechanisms. In fact, the capability of these methods to extract a huge amount of human readable knowledge will eventually help to uncover unknown relationships between molecular attributes and the malignant phenotype. In this review, we describe applications and strategies for the usage of rule induction approaches in cancer omics data analysis. In particular, we explore the canonical applications and the future challenges and opportunities posed by multi-omics integration problems.Peer reviewe

    Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition

    Full text link
    Sparse representation based classification (SRC) methods have achieved remarkable results. SRC, however, still suffer from requiring enough training samples, insufficient use of test samples and instability of representation. In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples. An IPR is firstly proposed and its feasibility and stability are analyzed. A classification criterion named category contribution rate is constructed to match the IPR and complete classification. Moreover, a statistical measure is introduced to quantify the stability of representation-based classification methods. Based on the IPRC technique, a robust tumor recognition framework is presented by interpreting microarray gene expression data, where a two-stage hybrid gene selection method is introduced to select informative genes. Finally, the functional analysis of candidate's pathogenicity-related genes is given. Extensive experiments on six public tumor microarray gene expression datasets demonstrate the proposed technique is competitive with state-of-the-art methods.Comment: 14 pages, 19 figures, 10 table

    Ensemble deep learning: A review

    Get PDF
    Ensemble learning combines several individual models to obtain better generalization performance. Currently, deep learning models with multilayer processing architecture is showing better performance as compared to the shallow or traditional classification models. Deep ensemble learning models combine the advantages of both the deep learning models as well as the ensemble learning such that the final model has better generalization performance. This paper reviews the state-of-art deep ensemble models and hence serves as an extensive summary for the researchers. The ensemble models are broadly categorised into ensemble models like bagging, boosting and stacking, negative correlation based deep ensemble models, explicit/implicit ensembles, homogeneous /heterogeneous ensemble, decision fusion strategies, unsupervised, semi-supervised, reinforcement learning and online/incremental, multilabel based deep ensemble models. Application of deep ensemble models in different domains is also briefly discussed. Finally, we conclude this paper with some future recommendations and research directions

    Feature Selection in the Contrastive Analysis Setting

    Full text link
    Contrastive analysis (CA) refers to the exploration of variations uniquely enriched in a target dataset as compared to a corresponding background dataset generated from sources of variation that are irrelevant to a given task. For example, a biomedical data analyst may wish to find a small set of genes to use as a proxy for variations in genomic data only present among patients with a given disease (target) as opposed to healthy control subjects (background). However, as of yet the problem of feature selection in the CA setting has received little attention from the machine learning community. In this work we present contrastive feature selection (CFS), a method for performing feature selection in the CA setting. We motivate our approach with a novel information-theoretic analysis of representation learning in the CA setting, and we empirically validate CFS on a semi-synthetic dataset and four real-world biomedical datasets. We find that our method consistently outperforms previously proposed state-of-the-art supervised and fully unsupervised feature selection methods not designed for the CA setting. An open-source implementation of our method is available at https://github.com/suinleelab/CFS.Comment: NeurIPS 202

    Application of remote sensing to state and regional problems

    Get PDF
    The methods and procedures used, accomplishments, current status, and future plans are discussed for each of the following applications of LANDSAT in Mississippi: (1) land use planning in Lowndes County; (2) strip mine inventory and reclamation; (3) white-tailed deer habitat evaluation; (4) remote sensing data analysis support systems; (5) discrimination of unique forest habitats in potential lignite areas; (6) changes in gravel operations; and (7) determining freshwater wetlands for inventory and monitoring. The documentation of all existing software and the integration of the image analysis and data base software into a single package are now considered very high priority items

    Click Fraud Detection in Online and In-app Advertisements: A Learning Based Approach

    Get PDF
    Click Fraud is the fraudulent act of clicking on pay-per-click advertisements to increase a site’s revenue, to drain revenue from the advertiser, or to inflate the popularity of content on social media platforms. In-app advertisements on mobile platforms are among the most common targets for click fraud, which makes companies hesitant to advertise their products. Fraudulent clicks are supposed to be caught by ad providers as part of their service to advertisers, which is commonly done using machine learning methods. However: (1) there is a lack of research in current literature addressing and evaluating the different techniques of click fraud detection and prevention, (2) threat models composed of active learning systems (smart attackers) can mislead the training process of the fraud detection model by polluting the training data, (3) current deep learning models have significant computational overhead, (4) training data is often in an imbalanced state, and balancing it still results in noisy data that can train the classifier incorrectly, and (5) datasets with high dimensionality cause increased computational overhead and decreased classifier correctness -- while existing feature selection techniques address this issue, they have their own performance limitations. By extending the state-of-the-art techniques in the field of machine learning, this dissertation provides the following solutions: (i) To address (1) and (2), we propose a hybrid deep-learning-based model which consists of an artificial neural network, auto-encoder and semi-supervised generative adversarial network. (ii) As a solution for (3), we present Cascaded Forest and Extreme Gradient Boosting with less hyperparameter tuning. (iii) To overcome (4), we propose a row-wise data reduction method, KSMOTE, which filters out noisy data samples both in the raw data and the synthetically generated samples. (iv) For (5), we propose different column-reduction methods such as multi-time-scale Time Series analysis for fraud forecasting, using binary labeled imbalanced datasets and hybrid filter-wrapper feature selection approaches
    • …
    corecore