Search CORE

51 research outputs found

Quantum-inspired attribute selection algorithm: A Fidelity-based Quantum Decision Tree

Author: Kumar Atul
Sharma Diksha
Singh Parvinder
Publication venue
Publication date: 27/10/2023
Field of study

A classical decision tree is completely based on splitting measures, which utilize the occurrence of random events in correspondence to its class labels in order to optimally segregate datasets. However, the splitting measures are based on greedy strategy, which leads to construction of an imbalanced tree and hence decreases the prediction accuracy of the classical decision tree algorithm. An intriguing approach is to utilize the foundational aspects of quantum computing for enhancing decision tree algorithm. Therefore, in this work, we propose to use fidelity as a quantum splitting criterion to construct an efficient and balanced quantum decision tree. For this, we construct a quantum state using the occurrence of random events in a feature and its corresponding class. The quantum state is further utilized to compute fidelity for determining the splitting attribute among all features. Using numerical analysis, our results clearly demonstrate that the proposed algorithm cooperatively ensures the construction of a balanced tree. We further compared the efficiency of our proposed quantum splitting criterion to different classical splitting criteria on balanced and imbalanced datasets. Our simulation results show that the proposed splitting criterion exceeds all classical splitting criteria for all possible evaluation metrics

arXiv.org e-Print Archive

Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework

Author: Atkinson M
Brophy S
Choy E
Cooksey R
Fernández-Gutiérrez F
Huo L
Kennedy JI
Zhou S-M
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

(1) Background: We aimed to develop a transparent machine-learning (ML) framework to automatically identify patients with a condition from electronic health records (EHRs) via a parsimonious set of features. (2) Methods: We linked multiple sources of EHRs, including 917,496,869 primary care records and 40,656,805 secondary care records and 694,954 records from specialist surgeries between 2002 and 2012, to generate a unique dataset. Then, we treated patient identification as a problem of text classification and proposed a transparent disease-phenotyping framework. This framework comprises a generation of patient representation, feature selection, and optimal phenotyping algorithm development to tackle the imbalanced nature of the data. This framework was extensively evaluated by identifying rheumatoid arthritis (RA) and ankylosing spondylitis (AS). (3) Results: Being applied to the linked dataset of 9657 patients with 1484 cases of rheumatoid arthritis (RA) and 204 cases of ankylosing spondylitis (AS), this framework achieved accuracy and positive predictive values of 86.19% and 88.46%, respectively, for RA and 99.23% and 97.75% for AS, comparable with expert knowledge-driven methods. (4) Conclusions: This framework could potentially be used as an efficient tool for identifying patients with a condition of interest from EHRs, helping clinicians in clinical decision-support process

Online Research @ Cardiff

Plymouth Electronic Archive and Research Library

Cronfa at Swansea University

Des-q: a quantum algorithm to construct and efficiently retrain decision trees for regression and binary classification

Author: Kumar Niraj
Li Changhao
Minssen Pierre
Pistoia Marco
Yalovetzky Romina
Publication venue
Publication date: 22/09/2023
Field of study

Decision trees are widely used in machine learning due to their simplicity in construction and interpretability. However, as data sizes grow, traditional methods for constructing and retraining decision trees become increasingly slow, scaling polynomially with the number of training examples. In this work, we introduce a novel quantum algorithm, named Des-q, for constructing and retraining decision trees in regression and binary classification tasks. Assuming the data stream produces small increments of new training examples, we demonstrate that our Des-q algorithm significantly reduces the time required for tree retraining, achieving a poly-logarithmic time complexity in the number of training examples, even accounting for the time needed to load the new examples into quantum-accessible memory. Our approach involves building a decision tree algorithm to perform k-piecewise linear tree splits at each internal node. These splits simultaneously generate multiple hyperplanes, dividing the feature space into k distinct regions. To determine the k suitable anchor points for these splits, we develop an efficient quantum-supervised clustering method, building upon the q-means algorithm of Kerenidis et al. Des-q first efficiently estimates each feature weight using a novel quantum technique to estimate the Pearson correlation. Subsequently, we employ weighted distance estimation to cluster the training examples in k disjoint regions and then proceed to expand the tree using the same procedure. We benchmark the performance of the simulated version of our algorithm against the state-of-the-art classical decision tree for regression and binary classification on multiple data sets with numerical features. Further, we showcase that the proposed algorithm exhibits similar performance to the state-of-the-art decision tree while significantly speeding up the periodic tree retraining.Comment: 48 pager, 4 figures, 4 table

arXiv.org e-Print Archive

Hybrid feature selection based on principal component analysis and grey wolf optimizer algorithm for Arabic news article classification

Author: Afyouni Imad
Alomari Osama Ahmad
Elnagar Ashraf
Hashem Ibrahim Abaker
Nassif Ali Bou
Shahin Ismail
Tubishat Mohammad
Publication venue: ZU Scholars
Publication date: 17/11/2022
Field of study

The rapid growth of electronic documents has resulted from the expansion and development of internet technologies. Text-documents classification is a key task in natural language processing that converts unstructured data into structured form and then extract knowledge from it. This conversion generates a high dimensional data that needs further analusis using data mining techniques like feature extraction, feature selection, and classification to derive meaningful insights from the data. Feature selection is a technique used for reducing dimensionality in order to prune the feature space and, as a result, lowering the computational cost and enhancing classification accuracy. This work presents a hybrid filter-wrapper method based on Principal Component Analysis (PCA) as a filter approach to select an appropriate and informative subset of features and Grey Wolf Optimizer (GWO) as wrapper approach (PCA-GWO) to select further informative features. Logistic Regression (LR) is used as an elevator to test the classification accuracy of candidate feature subsets produced by GWO. Three Arabic datasets, namely Alkhaleej, Akhbarona, and Arabiya, are used to assess the efficiency of the proposed method. The experimental results confirm that the proposed method based on PCA-GWO outperforms the baseline classifiers with/without feature selection and other feature selection approaches in terms of classification accuracy

ZU Scholars (Zayed University)

Molding the Symbiosis between Human and Machine:Contributions to Anomaly Detection, Model Evaluation, and Active Learning

Author: Klein Jan Gerard
Publication venue: Ipskamp
Publication date: 07/09/2022
Field of study

VU Research Portal

A computational study of the substrate conversion and selective inhibition of aldosterone synthase

Author: Roumen L.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2008
Field of study

When a functional or structural impairment of cardiac output has occurred, the cardiovascular system will attempt to compensate for the reduced blood flow. Unfortunately, many of the resulting processes, such as the renin angiotensin aldosterone system, will progressively weaken the heart, resulting in the condition called heart failure. The renin angiotensin aldosterone regulatory system is currently targeted with medicine for heart failure. Many successes for the prolongation of patient age have been achieved by inhibition of angiotensin II synthesis and action. It has become apparent that this approach is suboptimal. Antagonists of aldosterone have provided better treatment options, however, side-effects are still observed. In the search for an alternative therapeutic application, we have studied a novel treatment involving the selective inhibition of aldosterone biosynthesis. The scope of this study has involved the in silico design and prediction of novel inhibitors, the synthesis of these inhibitors and analogues, and finally the in vitro measurement of their potency. The biosynthesis of aldosterone is performed by two cytochrome p450 enzymes, 11B1 and 11B2, denoted as CYP11B1 and CYP11B2, respectively. From these two family members, only CYP11B2 can perform the final synthesis step that converts 18-hydroxycorticosterone into aldosterone. CYP11B1 performs the synthesis of glucocorticoids that are responsible for metabolic, immunologic and homeostatic functions. Because these glucocorticoid actions should not be inhibited, the newly designed medicine must be CYP11B2 selective. Since CYP11B1 is highly homologous to CYP11B2, we have performed an in silico study that allows us to model the interactions of substrates and inhibitors in both the active sites of CYP11B1 and CYP11B2. Using comparative modelling, we have constructed models for the three dimensional architecture of both proteins. These models have been validated by investigating the torsional properties of the protein backbone and residue side chains, the overall protein packing and the dynamic behaviour of the protein models. Subsequently, the models have been used to evaluate the binding mechanisms and conversion mechanisms for the natural steroidal ligands of CYP11B1 and CYP11B2. A hypothetical binding mode has been proposed for 18-hydroxycorticosterone in CYP11B2, featuring the presence of stabilising hydrogen bonding interactions required for its conversion. Quantum mechanical analyses on the conversion of the steroids involved have shown a favourable conversion for this conformation, thereby supporting our hypothesis. In addition, the quantum mechanical analyses have provided insights on steroid conformations in the active sites during conversion. The suitability of the protein models for inhibitor design has been tested by subjecting the models to a case study with four known inhibitors of CYP11B1 and CYP11B2. Using molecular dynamics and molecular docking, the inhibitor potencies for CYP11B1 and CYP11B2 have been predicted, and their interactions with the proteins have been evaluated. The trends in inhibitor potency found by these computational methods have been confirmed by in vitro inhibition measurements. As a next step, the molecular docking study has been expanded to improve the confidence in the predictive power of the models. Using the protein states evaluated by the molecular dynamics study, the molecular docking results of inhibitor analogues have been investigated and the predictive power of the models has been qualitatively improved. In a final approach, we have performed a ligand-based investigation of the inhibitor analogues to determine which ligand characteristics are important for the potency for CYP11B1 and CYP11B2. To this end, we have conducted decision tree analyses on the physico-chemical properties of inhibitor substituents, resulting in a collection of descriptors that can be used for the prediction and design of novel inhibitors. We have shown that a combination of synthesis, molecular modelling and experimental measurements form a promising approach towards the design of potentially new inhibitors

Repository TU/e

Pure OAI Repository

Developing a high-performance soil fertility status prediction voting ensemble using brute exhaustive optimization in automated multiprecision weights of hybrid classifiers

Author: Josephat Augustine
Publication venue: NM-AIST
Publication date: 01/08/2023
Field of study

A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Information and Communication Science and Engineering of the Nelson Mandela African Institution of Science and TechnologyWith the advent of machine learning (ML) techniques, various algorithms have been applied in previous studies to develop models for predicting soil fertility status. However, these models are observed to use varying fertility target classes, and variations have been reported in these models' predictive performances. As a result, practical applications of these models for obtaining the most accurate predictions may become hindered. While the weighted voting ensemble (WVE) ML technique can be used to improve soil fertility status prediction by aggregating individual models prediction, guaranteeing finding of an optimal WVE assignment weights is challenging. Whereas a brute exhaustive search procedure can be applied for the mentioned task, there is a lack of exploration on the exploitation of automated classifiers' precise weights combinations as search spaces for successful optimization. This research aims to develop a high-performance soil fertility status prediction voting ensemble using brute exhaustive optimization in automated 1EXP(-)Z+ multi-precision weights of hybrid classifiers. Soil chemical properties and ML modeling algorithms for modeling soil fertility status were identified. Base hybrid ML classification models for predicting soil fertility status were evaluated using Tanzania as a case study. Finally, the base ML hybrids WVE models were optimized using brute exhaustive search procedure’s novel developed search spaces generation algorithm for guaranteed optimal solution finding. The research was designed using design science research methodology, with the application of unsupervised machine learning K-mean algorithm with a knee detection method to find the optimal number of soil fertility status target classes, and supervised learning algorithms were applied to model classifiers for those optimal classes. Three soil fertility target classes were identified by clustering technique. The model achieved on test data a predictive accuracy of 98.93%, with respective AUC of 82%, 83%, and 87% for low, medium, and high soil fertility targets classes. Whereas these performances are observed higher compared to models in previous studies, 92% correct classifications were obtained on validation against external unseen laboratory-based tested soil results. Therefore, soil testing laboratories and farmers should consider using the model to smartly manage soil fertility which may lead to improved crop growth and productivity. The government could set agricultural-related policies that require the use of the model by farmers with the provision of agricultural inputs subsidies. Future work could be to develop an integrated real-time web and mobile application for providing farmers with soil fertility status information

NM-AIST Repository

Predicting plant environmental exposure using remote sensing

Author: Adams Christopher
Publication venue: Department of Life Sciences (Silwood Park), Imperial College London
Publication date: 01/03/2021
Field of study

Wheat is one of the most important crops globally with 776.4 million tonnes produced in 2019 alone. However, 10% of all wheat yield is predicted to be lost to Septoria Tritici Blotch (STB) caused by Zymoseptoria tritici (Z. tritici). Throughout Europe farmers spend £0.9 billion annually on preventative fungicide regimes to protect wheat against Z. tritici. A preventative fungicide regime is used as Z. tritici has a 9-16 day asymptomatic latent phase which makes it difficult to detect before symptoms develop, after which point fungicide intervention is ineffective. In the second chapter of my thesis I use hyperspectral sensing and imaging techniques, analysed with machine learning to detect and predict symptomatic Z. tritici infection in winter wheat, in UK based field trials, with high accuracy. This has the potential to improve detection and monitoring of symptomatic Z. tritici infection and could facilitate precision agriculture methods, to use in the subsequent growing season, that optimise fungicide use and increase yield. In the third chapter of my thesis, I develop a multispectral imaging system which can detect and utilise none visible shifts in plant leaf reflectance to distinguish plants based on the nitrogen source applied. Currently, plants are treated with nitrogen sources to increase growth and yield, the most common being calcium ammonium nitrate. However, some nitrogen sources are used in illicit activities. Ammonium nitrate is used in explosive manufacture and ammonium sulphate in the cultivation and extraction of the narcotic cocaine from Erythroxylum spp. In my third chapter I show that hyperspectral sensing, multispectral imaging, and machine learning image analysis can be used to visualise and differentiate plants exposed to different nefarious nitrogen sources. Metabolomic analysis of leaves from plants exposed to different nitrogen sources reveals shifts in colourful metabolites that may contribute to altered reflectance signatures. This suggests that different nitrogen feeding regimes alter plant secondary metabolism leading to changes in plant leaf reflectance detectable via machine learning of multispectral data but not the naked eye. These results could facilitate the development of technologies to monitor illegal activities involving various nitrogen sources and further inform nitrogen application requirements in agriculture. In my fourth chapter I implement and adapt the hyperspectral sensing, multispectral imaging and machine learning image analysis developed in the third chapter to detect asymptomatic (and symptomatic) Z. tritici infection in winter wheat, in UK based field trials, with high accuracy. This has the potential to improve detection and monitoring of all stages of Z. tritici infection and could facilitate precision agriculture methods to be used during the current growing season that optimise fungicide use and increase yield.Open Acces

Spiral - Imperial College Digital Repository

The 2nd International Electronic Conference on Applied Sciences

Author
Publication venue: 'MDPI AG'
Publication date: 12/08/2022
Field of study

This book is focused on the works presented at the 2nd International Electronic Conference on Applied Sciences, organized by Applied Sciences from 15 to 31 October 2021 on the MDPI Sciforum platform. Two decades have passed since the start of the 21st century. The development of sciences and technologies is growing ever faster today than in the previous century. The field of science is expanding, and the structure of science is becoming ever richer. Because of this expansion and fine structure growth, researchers may lose themselves in the deep forest of the ever-increasing frontiers and sub-fields being created. This international conference on the Applied Sciences was started to help scientists conduct their own research into the growth of these frontiers by breaking down barriers and connecting the many sub-fields to cut through this vast forest. These functions will allow researchers to see these frontiers and their surrounding (or quite distant) fields and sub-fields, and give them the opportunity to incubate and develop their knowledge even further with the aid of this multi-dimensional network

Directory of Open Access Books (DOAB)