2,404 research outputs found

    Action following the discovery of a global association between the whole genome and adverse event risk in a clinical drug-development programme

    Get PDF
    Observation of adverse drug reactions during drug development can cause closure of the whole programme. However, if association between the genotype and the risk of an adverse event is discovered, then it might suffice to exclude patients of certain genotypes from future recruitment. Various sequential and non-sequential procedures are available to identify an association between the whole genome, or at least a portion of it, and the incidence of adverse events. In this paper we start with a suspected association between the genotype and the risk of an adverse event and suppose that the genetic subgroups with elevated risk can be identified. Our focus is determination of whether the patients identified as being at risk should be excluded from further studies of the drug. We propose using a utility function to determine the appropriate action, taking into account the relative costs of suffering an adverse reaction and of failing to alleviate the patient's disease. Two illustrative examples are presented, one comparing patients who suffer from an adverse event with contemporary patients who do not, and the other making use of a reference control group. We also illustrate two classification methods, LASSO and CART, for identifying patients at risk, but we stress that any appropriate classification method could be used in conjunction with the proposed utility function. Our emphasis is on determining the action to take rather than on providing definitive evidence of an association

    A rare event classification in the advanced manufacturing system: focused on imbalanced datasets

    Get PDF
    In many industrial applications, classification tasks are often associated with imbalanced class labels in training datasets. Imbalanced datasets can severely affect the accuracy of class predictions, and thus they need to be handled by appropriate data processing before analyzing the data since most machine learning techniques assume that the input data is balanced. When this imbalance problem comes with highdimensional space, feature extraction can be applied. In Chapter 2, we present two versions of feature extraction techniques called CL-LNN and RD-LNN in a time series dataset based on the nearest neighbor combined with machine learning algorithms to detect a failure of the paper manufacturing machinery earlier than its occurrence from the multi-stream system monitoring data. The nearest neighbor is applied to each separate feature instead of the whole 61 features to address the curse of dimensionality. Also, another technique for the skewness between class labels can be solved by either oversampling minorities or downsampling majorities in class. In the chapter 3, we are seeking to find a better way of downsampling by selecting the most informative samples in the given imbalanced dataset through the active learning strategy to mitigate the effect of imbalanced class labels. The data selection for downsampling is performed by the criterion used in optimal experimental designs, from which the generalization error of the trained model is minimized in a sequential manner under the penalized logistic regression as a classification model. We also suggest that the performance is significantly improved, especially with the highly imbalanced dataset, e.g., the imbalanced ratio is greater than ten if tuning hyper-parameter and costweight method are applied to the active downsampling technique. The research is further extended to cover nonlinearity using nonparametric logistic regression, and performance-based active learning (PBAL) is proposed to enhance the performance compared to the existing ones such as D-optimality and A-optimality.Includes bibliographical references
    • …
    corecore