51 research outputs found

    Quantum-inspired attribute selection algorithm: A Fidelity-based Quantum Decision Tree

    Full text link
    A classical decision tree is completely based on splitting measures, which utilize the occurrence of random events in correspondence to its class labels in order to optimally segregate datasets. However, the splitting measures are based on greedy strategy, which leads to construction of an imbalanced tree and hence decreases the prediction accuracy of the classical decision tree algorithm. An intriguing approach is to utilize the foundational aspects of quantum computing for enhancing decision tree algorithm. Therefore, in this work, we propose to use fidelity as a quantum splitting criterion to construct an efficient and balanced quantum decision tree. For this, we construct a quantum state using the occurrence of random events in a feature and its corresponding class. The quantum state is further utilized to compute fidelity for determining the splitting attribute among all features. Using numerical analysis, our results clearly demonstrate that the proposed algorithm cooperatively ensures the construction of a balanced tree. We further compared the efficiency of our proposed quantum splitting criterion to different classical splitting criteria on balanced and imbalanced datasets. Our simulation results show that the proposed splitting criterion exceeds all classical splitting criteria for all possible evaluation metrics

    Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework

    Get PDF
    (1) Background: We aimed to develop a transparent machine-learning (ML) framework to automatically identify patients with a condition from electronic health records (EHRs) via a parsimonious set of features. (2) Methods: We linked multiple sources of EHRs, including 917,496,869 primary care records and 40,656,805 secondary care records and 694,954 records from specialist surgeries between 2002 and 2012, to generate a unique dataset. Then, we treated patient identification as a problem of text classification and proposed a transparent disease-phenotyping framework. This framework comprises a generation of patient representation, feature selection, and optimal phenotyping algorithm development to tackle the imbalanced nature of the data. This framework was extensively evaluated by identifying rheumatoid arthritis (RA) and ankylosing spondylitis (AS). (3) Results: Being applied to the linked dataset of 9657 patients with 1484 cases of rheumatoid arthritis (RA) and 204 cases of ankylosing spondylitis (AS), this framework achieved accuracy and positive predictive values of 86.19% and 88.46%, respectively, for RA and 99.23% and 97.75% for AS, comparable with expert knowledge-driven methods. (4) Conclusions: This framework could potentially be used as an efficient tool for identifying patients with a condition of interest from EHRs, helping clinicians in clinical decision-support process

    Des-q: a quantum algorithm to construct and efficiently retrain decision trees for regression and binary classification

    Full text link
    Decision trees are widely used in machine learning due to their simplicity in construction and interpretability. However, as data sizes grow, traditional methods for constructing and retraining decision trees become increasingly slow, scaling polynomially with the number of training examples. In this work, we introduce a novel quantum algorithm, named Des-q, for constructing and retraining decision trees in regression and binary classification tasks. Assuming the data stream produces small increments of new training examples, we demonstrate that our Des-q algorithm significantly reduces the time required for tree retraining, achieving a poly-logarithmic time complexity in the number of training examples, even accounting for the time needed to load the new examples into quantum-accessible memory. Our approach involves building a decision tree algorithm to perform k-piecewise linear tree splits at each internal node. These splits simultaneously generate multiple hyperplanes, dividing the feature space into k distinct regions. To determine the k suitable anchor points for these splits, we develop an efficient quantum-supervised clustering method, building upon the q-means algorithm of Kerenidis et al. Des-q first efficiently estimates each feature weight using a novel quantum technique to estimate the Pearson correlation. Subsequently, we employ weighted distance estimation to cluster the training examples in k disjoint regions and then proceed to expand the tree using the same procedure. We benchmark the performance of the simulated version of our algorithm against the state-of-the-art classical decision tree for regression and binary classification on multiple data sets with numerical features. Further, we showcase that the proposed algorithm exhibits similar performance to the state-of-the-art decision tree while significantly speeding up the periodic tree retraining.Comment: 48 pager, 4 figures, 4 table

    Hybrid feature selection based on principal component analysis and grey wolf optimizer algorithm for Arabic news article classification

    Get PDF
    The rapid growth of electronic documents has resulted from the expansion and development of internet technologies. Text-documents classification is a key task in natural language processing that converts unstructured data into structured form and then extract knowledge from it. This conversion generates a high dimensional data that needs further analusis using data mining techniques like feature extraction, feature selection, and classification to derive meaningful insights from the data. Feature selection is a technique used for reducing dimensionality in order to prune the feature space and, as a result, lowering the computational cost and enhancing classification accuracy. This work presents a hybrid filter-wrapper method based on Principal Component Analysis (PCA) as a filter approach to select an appropriate and informative subset of features and Grey Wolf Optimizer (GWO) as wrapper approach (PCA-GWO) to select further informative features. Logistic Regression (LR) is used as an elevator to test the classification accuracy of candidate feature subsets produced by GWO. Three Arabic datasets, namely Alkhaleej, Akhbarona, and Arabiya, are used to assess the efficiency of the proposed method. The experimental results confirm that the proposed method based on PCA-GWO outperforms the baseline classifiers with/without feature selection and other feature selection approaches in terms of classification accuracy

    A computational study of the substrate conversion and selective inhibition of aldosterone synthase

    Get PDF
    When a functional or structural impairment of cardiac output has occurred, the cardiovascular system will attempt to compensate for the reduced blood flow. Unfortunately, many of the resulting processes, such as the renin angiotensin aldosterone system, will progressively weaken the heart, resulting in the condition called heart failure. The renin angiotensin aldosterone regulatory system is currently targeted with medicine for heart failure. Many successes for the prolongation of patient age have been achieved by inhibition of angiotensin II synthesis and action. It has become apparent that this approach is suboptimal. Antagonists of aldosterone have provided better treatment options, however, side-effects are still observed. In the search for an alternative therapeutic application, we have studied a novel treatment involving the selective inhibition of aldosterone biosynthesis. The scope of this study has involved the in silico design and prediction of novel inhibitors, the synthesis of these inhibitors and analogues, and finally the in vitro measurement of their potency. The biosynthesis of aldosterone is performed by two cytochrome p450 enzymes, 11B1 and 11B2, denoted as CYP11B1 and CYP11B2, respectively. From these two family members, only CYP11B2 can perform the final synthesis step that converts 18-hydroxycorticosterone into aldosterone. CYP11B1 performs the synthesis of glucocorticoids that are responsible for metabolic, immunologic and homeostatic functions. Because these glucocorticoid actions should not be inhibited, the newly designed medicine must be CYP11B2 selective. Since CYP11B1 is highly homologous to CYP11B2, we have performed an in silico study that allows us to model the interactions of substrates and inhibitors in both the active sites of CYP11B1 and CYP11B2. Using comparative modelling, we have constructed models for the three dimensional architecture of both proteins. These models have been validated by investigating the torsional properties of the protein backbone and residue side chains, the overall protein packing and the dynamic behaviour of the protein models. Subsequently, the models have been used to evaluate the binding mechanisms and conversion mechanisms for the natural steroidal ligands of CYP11B1 and CYP11B2. A hypothetical binding mode has been proposed for 18-hydroxycorticosterone in CYP11B2, featuring the presence of stabilising hydrogen bonding interactions required for its conversion. Quantum mechanical analyses on the conversion of the steroids involved have shown a favourable conversion for this conformation, thereby supporting our hypothesis. In addition, the quantum mechanical analyses have provided insights on steroid conformations in the active sites during conversion. The suitability of the protein models for inhibitor design has been tested by subjecting the models to a case study with four known inhibitors of CYP11B1 and CYP11B2. Using molecular dynamics and molecular docking, the inhibitor potencies for CYP11B1 and CYP11B2 have been predicted, and their interactions with the proteins have been evaluated. The trends in inhibitor potency found by these computational methods have been confirmed by in vitro inhibition measurements. As a next step, the molecular docking study has been expanded to improve the confidence in the predictive power of the models. Using the protein states evaluated by the molecular dynamics study, the molecular docking results of inhibitor analogues have been investigated and the predictive power of the models has been qualitatively improved. In a final approach, we have performed a ligand-based investigation of the inhibitor analogues to determine which ligand characteristics are important for the potency for CYP11B1 and CYP11B2. To this end, we have conducted decision tree analyses on the physico-chemical properties of inhibitor substituents, resulting in a collection of descriptors that can be used for the prediction and design of novel inhibitors. We have shown that a combination of synthesis, molecular modelling and experimental measurements form a promising approach towards the design of potentially new inhibitors

    Developing a high-performance soil fertility status prediction voting ensemble using brute exhaustive optimization in automated multiprecision weights of hybrid classifiers

    Get PDF
    A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Information and Communication Science and Engineering of the Nelson Mandela African Institution of Science and TechnologyWith the advent of machine learning (ML) techniques, various algorithms have been applied in previous studies to develop models for predicting soil fertility status. However, these models are observed to use varying fertility target classes, and variations have been reported in these models' predictive performances. As a result, practical applications of these models for obtaining the most accurate predictions may become hindered. While the weighted voting ensemble (WVE) ML technique can be used to improve soil fertility status prediction by aggregating individual models prediction, guaranteeing finding of an optimal WVE assignment weights is challenging. Whereas a brute exhaustive search procedure can be applied for the mentioned task, there is a lack of exploration on the exploitation of automated classifiers' precise weights combinations as search spaces for successful optimization. This research aims to develop a high-performance soil fertility status prediction voting ensemble using brute exhaustive optimization in automated 1EXP(-)Z+ multi-precision weights of hybrid classifiers. Soil chemical properties and ML modeling algorithms for modeling soil fertility status were identified. Base hybrid ML classification models for predicting soil fertility status were evaluated using Tanzania as a case study. Finally, the base ML hybrids WVE models were optimized using brute exhaustive search procedure’s novel developed search spaces generation algorithm for guaranteed optimal solution finding. The research was designed using design science research methodology, with the application of unsupervised machine learning K-mean algorithm with a knee detection method to find the optimal number of soil fertility status target classes, and supervised learning algorithms were applied to model classifiers for those optimal classes. Three soil fertility target classes were identified by clustering technique. The model achieved on test data a predictive accuracy of 98.93%, with respective AUC of 82%, 83%, and 87% for low, medium, and high soil fertility targets classes. Whereas these performances are observed higher compared to models in previous studies, 92% correct classifications were obtained on validation against external unseen laboratory-based tested soil results. Therefore, soil testing laboratories and farmers should consider using the model to smartly manage soil fertility which may lead to improved crop growth and productivity. The government could set agricultural-related policies that require the use of the model by farmers with the provision of agricultural inputs subsidies. Future work could be to develop an integrated real-time web and mobile application for providing farmers with soil fertility status information

    Predicting plant environmental exposure using remote sensing

    Get PDF
    Wheat is one of the most important crops globally with 776.4 million tonnes produced in 2019 alone. However, 10% of all wheat yield is predicted to be lost to Septoria Tritici Blotch (STB) caused by Zymoseptoria tritici (Z. tritici). Throughout Europe farmers spend £0.9 billion annually on preventative fungicide regimes to protect wheat against Z. tritici. A preventative fungicide regime is used as Z. tritici has a 9-16 day asymptomatic latent phase which makes it difficult to detect before symptoms develop, after which point fungicide intervention is ineffective. In the second chapter of my thesis I use hyperspectral sensing and imaging techniques, analysed with machine learning to detect and predict symptomatic Z. tritici infection in winter wheat, in UK based field trials, with high accuracy. This has the potential to improve detection and monitoring of symptomatic Z. tritici infection and could facilitate precision agriculture methods, to use in the subsequent growing season, that optimise fungicide use and increase yield. In the third chapter of my thesis, I develop a multispectral imaging system which can detect and utilise none visible shifts in plant leaf reflectance to distinguish plants based on the nitrogen source applied. Currently, plants are treated with nitrogen sources to increase growth and yield, the most common being calcium ammonium nitrate. However, some nitrogen sources are used in illicit activities. Ammonium nitrate is used in explosive manufacture and ammonium sulphate in the cultivation and extraction of the narcotic cocaine from Erythroxylum spp. In my third chapter I show that hyperspectral sensing, multispectral imaging, and machine learning image analysis can be used to visualise and differentiate plants exposed to different nefarious nitrogen sources. Metabolomic analysis of leaves from plants exposed to different nitrogen sources reveals shifts in colourful metabolites that may contribute to altered reflectance signatures. This suggests that different nitrogen feeding regimes alter plant secondary metabolism leading to changes in plant leaf reflectance detectable via machine learning of multispectral data but not the naked eye. These results could facilitate the development of technologies to monitor illegal activities involving various nitrogen sources and further inform nitrogen application requirements in agriculture. In my fourth chapter I implement and adapt the hyperspectral sensing, multispectral imaging and machine learning image analysis developed in the third chapter to detect asymptomatic (and symptomatic) Z. tritici infection in winter wheat, in UK based field trials, with high accuracy. This has the potential to improve detection and monitoring of all stages of Z. tritici infection and could facilitate precision agriculture methods to be used during the current growing season that optimise fungicide use and increase yield.Open Acces

    The 2nd International Electronic Conference on Applied Sciences

    Get PDF
    This book is focused on the works presented at the 2nd International Electronic Conference on Applied Sciences, organized by Applied Sciences from 15 to 31 October 2021 on the MDPI Sciforum platform. Two decades have passed since the start of the 21st century. The development of sciences and technologies is growing ever faster today than in the previous century. The field of science is expanding, and the structure of science is becoming ever richer. Because of this expansion and fine structure growth, researchers may lose themselves in the deep forest of the ever-increasing frontiers and sub-fields being created. This international conference on the Applied Sciences was started to help scientists conduct their own research into the growth of these frontiers by breaking down barriers and connecting the many sub-fields to cut through this vast forest. These functions will allow researchers to see these frontiers and their surrounding (or quite distant) fields and sub-fields, and give them the opportunity to incubate and develop their knowledge even further with the aid of this multi-dimensional network
    • …
    corecore