61 research outputs found

    Overlap-based undersampling method for classification of imbalanced medical datasets.

    Get PDF
    Early diagnosis of some life-threatening diseases such as cancers and heart is crucial for effective treatments. Supervised machine learning has proved to be a very useful tool to serve this purpose. Historical data of patients including clinical and demographic information is used for training learning algorithms. This builds predictive models that provide initial diagnoses. However, in the medical domain, it is common to have the positive class under-represented in a dataset. In such a scenario, a typical learning algorithm tends to be biased towards the negative class, which is the majority class, and misclassify positive cases. This is known as the class imbalance problem. In this paper, a framework for predictive diagnostics of diseases with imbalanced records is presented. To reduce the classification bias, we propose the usage of an overlap-based undersampling method to improve the visibility of minority class samples in the region where the two classes overlap. This is achieved by detecting and removing negative class instances from the overlapping region. This will improve class separability in the data space. Experimental results show achievement of high accuracy in the positive class, which is highly preferable in the medical domain, while good trade-offs between sensitivity and specificity were obtained. Results also show that the method often outperformed other state-of-the-art and well-established techniques

    An insight into imbalanced Big Data classification: outcomes and challenges

    Get PDF
    Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. However, traditional learning approaches cannot be directly applied due to scalability issues. To overcome this issue, the MapReduce framework has arisen as a “de facto” solution. Basically, it carries out a “divide-and-conquer” distributed procedure in a fault-tolerant way to adapt for commodity hardware. Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data. The reasons behind this are mainly the difficulties in adapting standard techniques to the MapReduce programming style. Additionally, inner problems of imbalanced data, namely lack of data and small disjuncts, are accentuated during the data partitioning to fit the MapReduce programming style. This paper is designed under three main pillars. First, to present the first outcomes for imbalanced classification in Big Data problems, introducing the current research state of this area. Second, to analyze the behavior of standard pre-processing techniques in this particular framework. Finally, taking into account the experimental results obtained throughout this work, we will carry out a discussion on the challenges and future directions for the topic.This work has been partially supported by the Spanish Ministry of Science and Technology under Projects TIN2014-57251-P and TIN2015-68454-R, the Andalusian Research Plan P11-TIC-7765, the Foundation BBVA Project 75/2016 BigDaPTOOLS, and the National Science Foundation (NSF) Grant IIS-1447795

    Predicting no-show medical appointments using machine learning

    Get PDF
    Health care centers face many issues due to the limited availability of resources, such as funds, equipment, beds, physicians, and nurses. Appointment absences lead to a waste of hospital resources as well as endangering patient health. This fact makes unattended medi- cal appointments both socially expensive and economically costly. This research aimed to build a predictive model to identify whether an appointment would be a no-show or not in order to reduce its consequences. This paper proposes a multi-stage framework to build an accurate predictor that also tackles the imbalanced property that the data exhibits. The first stage includes dimensionality reduction to compress the data into its most important components. The second stage deals with the imbalanced nature of the data. Different machine learning algorithms were used to build the classifiers in the third stage. Various evaluation metrics are also discussed and an evaluation scheme that fits the problem at hand is described. The work presented in this paper will help decision makers at health care centers to implement effective strategies to reduce the number of no-shows

    Solid-Phase Peptide Synthesis Using a Four-Dimensional (Safety-Catch) Protecting Group Scheme

    No full text
    Peptides of importance to both academia and industry are mostly synthesized in the solid-phase mode using a two-dimensional scheme. The so-called Fmoc/tBu strategy, where the groups are removed by piperidine and TFA, respectively, is currently the method of choice for peptide synthesis. However, as the molecular diversity of cyclic and branched peptides becomes a challenging interest, a high level of orthogonal dimensionality is required, such as through triorthogonal protection schemes. Here we present a fourth category of orthogonal protecting groups that are stable under cleavage conditions, including the TFA treatment that removes the tBu-based groups. At the end of the synthetic process and upon some chemical manipulation, the groups in this fourth category were removed with TFA. This new concept of protecting groups could facilitate the synthesis and manipulation of difficult peptides.This work was partially funded by the National Research Foundation (NRF) (Blue Sky’s Research Program no. 120386). We thank Geraldo. A. Acosta, University of Barcelona, for the HRMS and NMR characterization.Peer reviewe

    Learning from Imbalanced Datasets with Cross-View Cooperation-Based Ensemble Methods

    No full text
    International audienceIn this paper, we address the problem of learning from im-balanced multi-class datasets in a supervised setting when multiple descriptions of the data-also called views-are available. Each view incorporates various information on the examples, and in particular, depending on the task at hand, each view might be better at recognizing only a subset of the classes. Establishing a sort-of cooperation between the views is needed for all the classes to be equally recognized-a crucial problem particularly for imbalanced datasets. The novelty of our work consists in capitalizing on the complementariness of the views so that each class can be processed by the most appropriate view(s); thus improving the per-class performances of the final classifier. The main contribution of this paper are two ensemble learning methods based on recent theoretical works on the use of the confusion matrix's norm as an error measure, while empirical results show the benefits of the proposed approaches

    A Preprocessing Approach for Class-Imbalanced Data Using SMOTE and Belief Function Theory

    No full text
    International audienceDealing with imbalanced datasets at the preprocessing level is an efficient strategy used by many methods to re-balance the data and improve classification performance. Specifically, SMOTE is a popular oversampling technique which modifies the training data by adding artificial minority samples. However, SMOTE may create instances in noisy and overlapping areas, far from safe regions. To tackle this issue, we propose SMOTE-BFT, in which we use the belief function theory to remove generated minority instances that are not in safe regions. After applying SMOTE, each generated minority instance is represented by an evidential membership structure, which provides detailed information about class memberships. Rules based on the belief function theory are then enforced to detect and remove generated instances that are in noisy and overlapping regions. Experiments on noisy artificial datasets show that our proposal significantly outperforms other popular oversampling methods

    Expression of Human Complement Factor H Prevents Age-Related Macular Degeneration–Like Retina Damage and Kidney Abnormalities in Aged Cfh Knockout Mice

    Get PDF
    Complement factor H (CFH) is an important regulatory protein in the alternative pathway of the complement system, and CFH polymorphisms increase the genetic risk of age-related macular degeneration dramatically. These same human CFH variants have also been associated with dense deposit disease. To mechanistically study the function of CFH in the pathogenesis of these diseases, we created transgenic mouse lines using human CFH bacterial artificial chromosomes expressing full-length human CFH variants and crossed these to Cfh knockout (Cfh−/−) mice. Human CFH protein inhibited cleavage of mouse complement component 3 and factor B in plasma and in retinal pigment epithelium/choroid/sclera, establishing that human CFH regulates activation of the mouse alternative pathway. One of the mouse lines, which express relatively higher levels of CFH, demonstrated functional and structural protection of the retina owing to the Cfh deletion. Impaired visual function, detected as a deficit in the scotopic electroretinographic response, was improved in this transgenic mouse line compared with Cfh−/− mice, and transgenics had a thicker outer nuclear layer and less sub–retinal pigment epithelium deposit accumulation. In addition, expression of human CFH also completely protected the mice from developing kidney abnormalities associated with loss of CFH. These humanized CFH mice present a valuable model for study of the molecular mechanisms of age-related macular degeneration and dense deposit disease and for testing therapeutic targets
    corecore