12 research outputs found

    Improved comprehensibility and reliability of explanations via restricted halfspace discretization

    Get PDF
    Abstract. A number of two-class classification methods first discretize each attribute of two given training sets and then construct a propositional DNF formula that evaluates to True for one of the two discretized training sets and to False for the other one. The formula is not just a classification tool but constitutes a useful explanation for the differences between the two underlying populations if it can be comprehended by humans and is reliable. This paper shows that comprehensibility as well as reliability of the formulas can sometimes be improved using a discretization scheme where linear combinations of a small number of attributes are discretized

    Application of an efficient Bayesian discretization method to biomedical data

    Get PDF
    Background\ud Several data mining methods require data that are discrete, and other methods often perform better with discrete data. We introduce an efficient Bayesian discretization (EBD) method for optimal discretization of variables that runs efficiently on high-dimensional biomedical datasets. The EBD method consists of two components, namely, a Bayesian score to evaluate discretizations and a dynamic programming search procedure to efficiently search the space of possible discretizations. We compared the performance of EBD to Fayyad and Irani's (FI) discretization method, which is commonly used for discretization.\ud \ud Results\ud On 24 biomedical datasets obtained from high-throughput transcriptomic and proteomic studies, the classification performances of the C4.5 classifier and the naĂŻve Bayes classifier were statistically significantly better when the predictor variables were discretized using EBD over FI. EBD was statistically significantly more stable to the variability of the datasets than FI. However, EBD was less robust, though not statistically significantly so, than FI and produced slightly more complex discretizations than FI.\ud \ud Conclusions\ud On a range of biomedical datasets, a Bayesian discretization method (EBD) yielded better classification performance and stability but was less robust than the widely used FI discretization method. The EBD discretization method is easy to implement, permits the incorporation of prior knowledge and belief, and is sufficiently fast for application to high-dimensional data

    DeepFGSS: Anomalous Pattern Detection using Deep Learning

    Get PDF
    University of Minnesota M.S. thesis. May 2019. Major: Computer Science. Advisor: Edward McFowland III. 1 computer file (PDF); vii, 67 pages.Anomaly detection refers to finding observations which do not conform to expected behavior. It is widely applied in many domains such as image processing, fraud detection, intrusion detection, medical health, etc. However, most of the anomaly detection techniques focus on detecting a single anomalous instance. Such techniques fail when there is only a slight difference between the anomalous instance and a non-anomalous instance. Various collective anomaly detection techniques (based on clustering, deep learning, etc) have been developed that determine whether a group of records form an anomaly even though they are only slightly anomalous instances. However, they do not provide any information about the attributes that make the group anomalous. In other words, they are focussed only on detecting records that are collectively anomalous and are not able to detect anomalous patterns in general. FGSS is a scalable anomalous pattern detection technique that searches over both records and attributes. However, FGSS has several limitations preventing it from functioning on continuous, unstructured and high dimensional data such as images, etc. We propose a general framework called DeepFGSS, which uses Autoencoder, enabling it to operate on any kind of data. We evaluate its performance using four experiments on both structured and unstructured data to determine its accuracy of detecting anomalies and efficiency of distinguishing between datasets containing anomalies and ones that do not

    Facilitating and Enhancing Biomedical Knowledge Translation: An in Silico Approach to Patient-centered Pharmacogenomic Outcomes Research

    Get PDF
    Current research paradigms such as traditional randomized control trials mostly rely on relatively narrow efficacy data which results in high internal validity and low external validity. Given this fact and the need to address many complex real-world healthcare questions in short periods of time, alternative research designs and approaches should be considered in translational research. In silico modeling studies, along with longitudinal observational studies, are considered as appropriate feasible means to address the slow pace of translational research. Taking into consideration this fact, there is a need for an approach that tests newly discovered genetic tests, via an in silico enhanced translational research model (iS-TR) to conduct patient-centered outcomes research and comparative effectiveness research studies (PCOR CER). In this dissertation, it was hypothesized that retrospective EMR analysis and subsequent mathematical modeling and simulation prediction could facilitate and accelerate the process of generating and translating pharmacogenomic knowledge on comparative effectiveness of anticoagulation treatment plan(s) tailored to well defined target populations which eventually results in a decrease in overall adverse risk and improve individual and population outcomes. To test this hypothesis, a simulation modeling framework (iS-TR) was proposed which takes advantage of the value of longitudinal electronic medical records (EMRs) to provide an effective approach to translate pharmacogenomic anticoagulation knowledge and conduct PCOR CER studies. The accuracy of the model was demonstrated by reproducing the outcomes of two major randomized clinical trials for individualizing warfarin dosing. A substantial, hospital healthcare use case that demonstrates the value of iS-TR when addressing real world anticoagulation PCOR CER challenges was also presented

    Bayesian networks in additive manufacturing and reliability engineering

    Get PDF
    A Bayesian network (BN) is a powerful tool to represent the quantitative and qualitative features of a system in an intuitive yet sophisticated manner. The qualitative aspect is represented with a directed acyclic graph (DAG), depicting dependency relations between the random variables of the system. In a DAG, the variables of the system are shown with a set of nodes and the dependencies between them are shown with a directed edge. A DAG in the Bayesian network can be a causal graph under certain circumstances. The quantitative aspect is the local conditional probabilities associated with each variable, which is a factorization of the joint probability distribution of the variables in the system based on the dependency relation represented in the DAG. In this study, the benefits of using BNs in reliability engineering and additive manufacturing is investigated. In the case of reliability engineering, there are several methods to create predictive models for reliability features of a system. Predicting the possibility and the time of a possible failure is one of the important tasks in the reliability engineering principle. The quality of the corrective maintenance after each failure is affecting consecutive failure times. If a maintenance task after each failure involves replacing all the components of an equipment, called perfect maintenance, it is considered that the equipment is restored to an “as good as new” (AGAN) condition, and based on that, the consecutive failure times are considered independent. Not only in most of the cases the maintenance is not perfect, but the environment of the equipment and the usage patterns have a significant effect on the consecutive failure times. In this study, this effect is investigated by using Bayesian network structural learning algorithms to learn a BN based on the failure data of an industrial water pump. In additive manufacturing (AM) field, manufacturing systems are normally a complex combination of multiple components. This complex nature and the associated uncertainties in design and manufacturing parameters in additive manufacturing promotes the need for models that can handle uncertainties and are efficient in calculations. Moreover, the lack of AM knowledge in practitioners is one of the main obstacles for democratizing it. In this study, a method is developed for creating Bayesian network models for AM systems that includes experts’ and domain knowledge. To form the structure of the model, causal graphs obtained through dimensional analysis conceptual modeling (DACM) framework is used as the DAG for a Bayesian network after some modifications. DACM is a framework for extracting the causal graph and the governing equations between the variables of a complex system. The experts’ knowledge is extracted through a probability assessment process, called the analytical hierarchy process (AHP) and encoded into local probability tables associated with the independent variables of the model. To complete the model, a sampling technique is used along with the governing equations between the intermediate and output variables to obtain the rest of the probability tables. Such models can be used in many use cases, namely domain knowledge representation, defect prognosis and diagnosis and design space exploration. The qualitative aspect of the model is obtained from the physical phenomena in the system and the quantitative aspect is obtained from the experts’ knowledge, therefore the model can interactively represent the domain and the experts’ knowledge. In prognosis tasks, the probability distribution for the values that an output variable can take is calculated based on the values chosen for the input variables. In diagnosis tasks, the designer can investigate the reason for having a specific value in an output variable among the inputs. Finally, the model can be used to perform design space exploration. The model reduces the design space into a discretized and interactive Bayesian network space which is very convenient for design space exploration

    Probabilistic Modeling of Process Systems with Application to Risk Assessment and Fault Detection

    Get PDF
    Three new methods of joint probability estimation (modeling), a maximum-likelihood maximum-entropy method, a constrained maximum-entropy method, and a copula-based method called the rolling pin (RP) method, were developed. Compared to many existing probabilistic modeling methods such as Bayesian networks and copulas, the developed methods yield models that have better performance in terms of flexibility, interpretability and computational tractability. These methods can be used readily to model process systems and perform risk analysis and fault detection at steady state conditions, and can be coupled with appropriate mathematical tools to develop dynamic probabilistic models. Also, a method of performing probabilistic inference using RP-estimated joint probability distributions was introduced; this method is superior to Bayesian networks in several aspects. The RP method was also applied successfully to identify regression models that have high level of flexibility and are appealing in terms of computational costs.Ph.D., Chemical Engineering -- Drexel University, 201
    corecore