432 research outputs found

    Criteria of efficiency for conformal prediction

    Get PDF
    We study optimal conformity measures for various criteria of efficiency of classification in an idealised setting. This leads to an important class of criteria of efficiency that we call probabilistic; it turns out that the most standard criteria of efficiency used in literature on conformal prediction are not probabilistic unless the problem of classification is binary. We consider both unconditional and label-conditional conformal prediction.Comment: 31 page

    Building predictive unbound brain-to-plasma concentration ratio (Kp,uu,brain) models

    Get PDF
    Abstract The blood-brain barrier (BBB) constitutes a dynamic membrane primarily evolved to protect the brain from exposure to harmful xenobiotics. The distribution of synthesized drugs across the blood-brain barrier (BBB) is a vital parameter to consider in drug discovery projects involving a central nervous system (CNS) target, since the molecules should be capable of crossing the major hurdle, BBB. In contrast, the peripherally acting drugs have to be designed optimally to minimize brain exposure which could possibly result in undue side effects. It is thus important to establish the BBB permeability of molecules early in the drug discovery pipeline. Previously, most of the in-silico attempts for the prediction of brain exposure have relied on the total drug distribution between the blood plasma and the brain. However, it is now understood that the unbound brain-to-plasma concentration ratio ( Kp,uu,brain) is the parameter that precisely indicates the BBB availability of compounds. Kp,uu,brain describes the free drug concentration of the drug molecule in the brain, which, according to the free drug hypothesis, is the parameter that causes the relevant pharmacological response at the target site. Current work involves revisiting a model built in 2011 and uploaded in an in-house server and checking for its performance on the data collected since then. This gave a satisfying result showing the stability of the model. The old dataset was then further extended with the temporal dataset in order to update the model. This is important to maintain a substantial chemical space so as to ensure a good predictability with unknown data. Using other methods and descriptors not used in the previous study, a further improvement in the model performance was achieved. Attempts were also made in order to interpret the model by identifying the most influential descriptors in the model.Popular science summary: Predictive model for unbound brain-to-plasma concentration ratio Blood-brain barrier (BBB) is a dynamic interface evolved to protect the brain from exposure to toxic xenobiotics and to maintain homeostasis. Distribution of drugs across BBB is critical for any drug discovery project. A drug designed for a target in brain has to pass through the BBB in sufficient concentrations to elicit the desired therapeutic effect. On the other hand, a drug designed for a non-CNS target should be kept away from the brain to avoid fatal side effects. Unbound brain-to-plasma concentration ratio, Kp,uu,brain is a parameter that describes the distribution of a molecule across the BBB. It represents the free drug concentration in the brain, which is the fraction that elicits the pharmacological effect on the CNS. The experimental measurement of this parameter is time consuming and laborious. Computational prediction of such properties thus prove to be of a great utility in reducing the time and resources spent by aiding in the early elimination of compounds possessing undesirable qualities. This helps in reducing late stage compound attrition (failure rate) which has always been a major problem for pharmaceutical industries. Quantitative Structure Activity Relationship (QSAR) is an approach that attempts to establish a meaningful relationship between the chemical structure of a molecule and its chemical/biological activity. Once established, this relationship can be used to predict the activity of a new compound based on its chemical structure. In a typical QSAR experiment, the chemical structures are often represented in terms of numerical values called molecular descriptors. The thesis work utilized machine learning algorithm (Support Vector Machine and Random forest) to define the structure -activity relationship. A predictive model for estimating the unbound brain-to-plasma concentration ratio (Kp,uu,brain) was developed based on a training set of in-house compounds and was mounted in an in-house program (C-lab) in 2011 for routine use. The thesis project involved validating the existing model and updating the model by extending the dataset with the data collected since 2011. Different combinations of Machine Learning algorithms, modeling approaches and molecular descriptors (calculated numerical values representing of chemical structures) were used to build the models. Further, combining the prediction from these models, consensus models were built and validated. Two-class classification models were also evaluated based on categorizing compounds into BBB positive (crosses BBB) or negative (does not cross BBB). The validation of the old model using temporal test set (Kp,uu,brain data collected since 2011) gave a promising result showing stability and good predictive power. However, it is very important to keep the chemical space updated, which defines the purpose for updating the model. The new model (a consensus model with five components) shows a significant improvement in terms of the predictive power along with an improvement in the classification performance. This model will be uploaded to C-lab and will be accessible for use within AstraZeneca. Advisors: Hongming Chen, Ola Engkvist (Computational Chemistry, AstraZeneca R&D Mölndal) Master´s Degree Project 60 credits in Bioinformatics (2014) Department of Biology., Lund Universit

    MACHINE LEARNING AND THE CONSTRUCTION OF A SEISMIC ATTRIBUTE-SEISMIC FACIES ANALYSIS DATA BASE

    Get PDF
    Currently, seismic facies and structural analysis requires a significant amount of time and effort by skilled interpreters. With the advances made by companies such as Amazon and Google with AI (artificial intelligence) and machine learning, many geoscientists (and perhaps more so, many geoscience managers) have identified the application of such technologies to the seismic interpretation workflow. Advancements of such technologies, such as machine learning based interpretation like self-organizing maps (SOM), principle component analysis (PCA) and independent component analysis (ICA), will both accelerate and quantify the seismic interpretation process. Seismic attributes highlight subtle features in the seismic data that help identify architectural elements that can be used to further define the environment of deposition. Likewise, seismic attributes delineate subtle faults, folds, and flexures that better define the history of tectonic deformation. However, the understanding of “which attribute best illuminates which feature” requires either considerable experience or a tedious search process over years for published analogues. The objective of this thesis is to identify the seismic facies of interest through a prototype a web-based seismic attribute-seismic facies analysis database that can be used not only as a guide for human interpreters, but also to select attributes for machine learning. I propose a rule-based decision tree application that suggests which attributes are good candidates for machine learning applications. There are many seismic facies. This thesis illustrates the objectives and a prototype web application using only two seismic facies: marine mass transport deposits and karst collapses. After initial validation, this product can then be improved and expanded upon by a larger user community to provide an interactive attribute selection platform for interpreters at large

    CLASSIFICATION OF CYBERSECURITY INCIDENTS IN NIGERIA USING MACHINE LEARNING METHODS

    Get PDF
    Cybercrime has become more likely as a result of technological advancements and increased use of the internet and computer systems. As a result, there is an urgent need to develop effective methods of dealing with these cyber threats or incidents to identify and combat the associated cybercrimes in Nigerian cyberspace adequately. It is therefore desirable to build models that will enable the Nigeria Computer Emergency Response Team (ngCERT) and law enforcement agencies to gain valuable knowledge of insights from the available data to detect, identify and efficiently classify the most prevalent cyber incidents within Nigeria cyberspace, and predict future threats. This study applied machine learning methods to study and understand cybercrime incidents or threats recorded by ngCERT to build models that will characterize cybercrime incidents in Nigeria and classify cybersecurity incidents by mode of attacks and identify the most prevalent incidents within Nigerian cyberspace. Seven different machine learning methods were used to build the classification and prediction models. The Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), K-Nearest Neighbor (KNN), Decision Tree (CART) and Random Forest (RF) Algorithms were used to discover the relationship between the relevant attributes of the datasets then classify the threats into several categories. The RF, CART, and KNN models were shown to be the most effective in classifying our data with accuracy score of 99%  each while others has accuracy scores of 98% for SVM, 89% for NB, 88% for LR, and 88% for LDA. Therefore, the result of our classification will help organizations in Nigeria to be able to understand the threats that could affect their assets

    CLASSIFICATION OF CYBERSECURITY INCIDENTS IN NIGERIA USING MACHINE LEARNING METHODS

    Get PDF
    Cybercrime has become more likely as a result of technological advancements and increased use of the internet and computer systems. As a result, there is an urgent need to develop effective methods of dealing with these cyber threats or incidents to identify and combat the associated cybercrimes in Nigerian cyberspace adequately. It is therefore desirable to build models that will enable the Nigeria Computer Emergency Response Team (ngCERT) and law enforcement agencies to gain valuable knowledge of insights from the available data to detect, identify and efficiently classify the most prevalent cyber incidents within Nigeria cyberspace, and predict future threats. This study applied machine learning methods to study and understand cybercrime incidents or threats recorded by ngCERT to build models that will characterize cybercrime incidents in Nigeria and classify cybersecurity incidents by mode of attacks and identify the most prevalent incidents within Nigerian cyberspace. Seven different machine learning methods were used to build the classification and prediction models. The Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), K-Nearest Neighbor (KNN), Decision Tree (CART) and Random Forest (RF) Algorithms were used to discover the relationship between the relevant attributes of the datasets then classify the threats into several categories. The RF, CART, and KNN models were shown to be the most effective in classifying our data with accuracy score of 99%  each while others has accuracy scores of 98% for SVM, 89% for NB, 88% for LR, and 88% for LDA. Therefore, the result of our classification will help organizations in Nigeria to be able to understand the threats that could affect their assets

    MSAT-X: A technical introduction and status report

    Get PDF
    A technical introduction and status report for the Mobile Satellite Experiment (MSAT-X) program is presented. The concepts of a Mobile Satellite System (MSS) and its unique challenges are introduced. MSAT-X's role and objectives are delineated with focus on its achievements. An outline of MSS design philosophy is followed by a presentation and analysis of the MSAT-X results, which are cast in a broader context of an MSS. The current phase of MSAT-X has focused notably on the ground segment of MSS. The accomplishments in the four critical technology areas of vehicle antennas, modem and mobile terminal design, speech coding, and networking are presented. A concise evolutionary trace is incorporated in each area to elucidate the rationale leading to the current design choices. The findings in the area of propagation channel modeling are also summarized and their impact on system design discussed. To facilitate the assessment of the MSAT-X results, technology and subsystem recommendations are also included and integrated with a quantitative first-generation MSS design

    Uncertainty estimation for QSAR models using machine learning methods

    Get PDF
    corecore