12 research outputs found

    Improving Malware Detection Accuracy by Extracting Icon Information

    Full text link
    Detecting PE malware files is now commonly approached using statistical and machine learning models. While these models commonly use features extracted from the structure of PE files, we propose that icons from these files can also help better predict malware. We propose an innovative machine learning approach to extract information from icons. Our proposed approach consists of two steps: 1) extracting icon features using summary statics, histogram of gradients (HOG), and a convolutional autoencoder, 2) clustering icons based on the extracted icon features. Using publicly available data and by using machine learning experiments, we show our proposed icon clusters significantly boost the efficacy of malware prediction models. In particular, our experiments show an average accuracy increase of 10% when icon clusters are used in the prediction model.Comment: Full version. IEEE MIPR 201

    A Flexible Joint Longitudinal-Survival Model for Analysis of End-Stage Renal Disease Data

    Full text link
    We propose a flexible joint longitudinal-survival framework to examine the association between longitudinally collected biomarkers and a time-to-event endpoint. More specifically, we use our method for analyzing the survival outcome of end-stage renal disease patients with time-varying serum albumin measurements. Our proposed method is robust to common parametric assumptions in that it avoids explicit distributional assumptions on longitudinal measures and allows for subject-specific baseline hazard in the survival component. Fully joint estimation is performed to account for the uncertainty in the estimated longitudinal biomarkers included in the survival model

    A Bayesian Framework for Non-Collapsible Models

    Full text link
    In this paper, we discuss the non-collapsibility concept and propose a new approach based on Dirichlet process mixtures to estimate the conditional effect of covariates in non-collapsible models. Using synthetic data, we evaluate the performance of our proposed method and examine its sensitivity under different settings. We also apply our method to real data on access failure among hemodialysis patients

    A New Class of Bayesian Semi-Parametric Joint Longitudinal-Survival Models for Biomarker Discovery

    No full text
    In studying the progression of a disease and to better predict time to death (survival data), investigators often collect repeated measures over time (longitudinal data) and are interested in testing the association between risk factors, including collected repeated measures, and time to death. One such example is testing the association between the biomarker serum albumin that is measured repeatedly on end-stage renal disease (ESRD) patients. A modeling framework that is capable of modeling longitudinal and survival outcomes simultaneously is called a joint longitudinal-survival model. Joint longitudinal-survival models have received a great deal of attention over the past years where many different joint models have been proposed. Joint models commonly make parametric assumptions on either the functional form of the repeated measures or on the distribution of survival times. In this dissertation we are interested in joint models that are robust to common parametric and semi-parameteric survival and longitudinal assumptions. We propose a flexible Bayesian joint longitudinal-survival framework that avoids common parametric and semi-parameteric assumptions. More specifically, our modeling framework incorporates a flexible longitudinal component by utilizing Gaussian process (GP) technique. This technique avoids any explicit functional assumption on the trajectory of the repeated measures. Our modeling framework also uses Dirichlet process (DP) prior to avoid explicit distributional assumptions on survival times.We further extend our framework to modeling multiple longitudinal processes simultaneously. We propose a multivariate joint longitudinal-survival technique to jointly model the association between multiple longitudinal processes with survival outcomes. Our proposed technique is capable of taking correlation between longitudinal processes into account. This is particularly useful when observed measures from different longitudinal processes are taken at different frequencies. That means, some longitudinal processes are observed less frequently compared to other longitudinal processes. By jointly modeling these processes, one can take the correlation between the processes into account, and hence, better estimate the trajectory of the processes including those less frequent ones. Our proposed joint modeling frameworks use Dirichlet process techniques. Therefore, understanding parameter estimation in these models is vital. Using synthetic longitudinal and survival data, we compare parameter estimation under DPM models as opposed to commonly used parametric techniques. We are particularly interested in evaluation of the performance of the model in parameter estimation when a population consists of sub-populations with latent features that are different across subgroups. We propose a Dirichlet process mixture survival model that is capable of detecting latent subpopulations characterized by differing baseline risks for mortality. Our proposed technique is particularly useful when interest lies in estimation of the conditional effect of covariates as opposed to estimates that are marginalized across all subpopulations. Throughout, our work is motivated by data on patients with end stage renal disease (ESRD), a condition where the kidneys are no longer capable of cleaning blood sufficiently enough to sustain life. In this context a modeling framework capable of finding mortality-related biomarkers, which are measured longitudinally over time, can significantly help physicians and practitioners to lower mortality among these patients

    A Bayesian Framework for Non-Collapsible Models

    No full text
    corecore