51 research outputs found
Quantum-inspired attribute selection algorithm: A Fidelity-based Quantum Decision Tree
A classical decision tree is completely based on splitting measures, which
utilize the occurrence of random events in correspondence to its class labels
in order to optimally segregate datasets. However, the splitting measures are
based on greedy strategy, which leads to construction of an imbalanced tree and
hence decreases the prediction accuracy of the classical decision tree
algorithm. An intriguing approach is to utilize the foundational aspects of
quantum computing for enhancing decision tree algorithm. Therefore, in this
work, we propose to use fidelity as a quantum splitting criterion to construct
an efficient and balanced quantum decision tree. For this, we construct a
quantum state using the occurrence of random events in a feature and its
corresponding class. The quantum state is further utilized to compute fidelity
for determining the splitting attribute among all features. Using numerical
analysis, our results clearly demonstrate that the proposed algorithm
cooperatively ensures the construction of a balanced tree. We further compared
the efficiency of our proposed quantum splitting criterion to different
classical splitting criteria on balanced and imbalanced datasets. Our
simulation results show that the proposed splitting criterion exceeds all
classical splitting criteria for all possible evaluation metrics
Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework
(1) Background: We aimed to develop a transparent machine-learning (ML) framework to automatically identify patients with a condition from electronic health records (EHRs) via a parsimonious set of features. (2) Methods: We linked multiple sources of EHRs, including 917,496,869 primary care records and 40,656,805 secondary care records and 694,954 records from specialist surgeries between 2002 and 2012, to generate a unique dataset. Then, we treated patient identification as a problem of text classification and proposed a transparent disease-phenotyping framework. This framework comprises a generation of patient representation, feature selection, and optimal phenotyping algorithm development to tackle the imbalanced nature of the data. This framework was extensively evaluated by identifying rheumatoid arthritis (RA) and ankylosing spondylitis (AS). (3) Results: Being applied to the linked dataset of 9657 patients with 1484 cases of rheumatoid arthritis (RA) and 204 cases of ankylosing spondylitis (AS), this framework achieved accuracy and positive predictive values of 86.19% and 88.46%, respectively, for RA and 99.23% and 97.75% for AS, comparable with expert knowledge-driven methods. (4) Conclusions: This framework could potentially be used as an efficient tool for identifying patients with a condition of interest from EHRs, helping clinicians in clinical decision-support process
Des-q: a quantum algorithm to construct and efficiently retrain decision trees for regression and binary classification
Decision trees are widely used in machine learning due to their simplicity in
construction and interpretability. However, as data sizes grow, traditional
methods for constructing and retraining decision trees become increasingly
slow, scaling polynomially with the number of training examples. In this work,
we introduce a novel quantum algorithm, named Des-q, for constructing and
retraining decision trees in regression and binary classification tasks.
Assuming the data stream produces small increments of new training examples, we
demonstrate that our Des-q algorithm significantly reduces the time required
for tree retraining, achieving a poly-logarithmic time complexity in the number
of training examples, even accounting for the time needed to load the new
examples into quantum-accessible memory. Our approach involves building a
decision tree algorithm to perform k-piecewise linear tree splits at each
internal node. These splits simultaneously generate multiple hyperplanes,
dividing the feature space into k distinct regions. To determine the k suitable
anchor points for these splits, we develop an efficient quantum-supervised
clustering method, building upon the q-means algorithm of Kerenidis et al.
Des-q first efficiently estimates each feature weight using a novel quantum
technique to estimate the Pearson correlation. Subsequently, we employ weighted
distance estimation to cluster the training examples in k disjoint regions and
then proceed to expand the tree using the same procedure. We benchmark the
performance of the simulated version of our algorithm against the
state-of-the-art classical decision tree for regression and binary
classification on multiple data sets with numerical features. Further, we
showcase that the proposed algorithm exhibits similar performance to the
state-of-the-art decision tree while significantly speeding up the periodic
tree retraining.Comment: 48 pager, 4 figures, 4 table
Hybrid feature selection based on principal component analysis and grey wolf optimizer algorithm for Arabic news article classification
The rapid growth of electronic documents has resulted from the expansion and development of internet technologies. Text-documents classification is a key task in natural language processing that converts unstructured data into structured form and then extract knowledge from it. This conversion generates a high dimensional data that needs further analusis using data mining techniques like feature extraction, feature selection, and classification to derive meaningful insights from the data. Feature selection is a technique used for reducing dimensionality in order to prune the feature space and, as a result, lowering the computational cost and enhancing classification accuracy. This work presents a hybrid filter-wrapper method based on Principal Component Analysis (PCA) as a filter approach to select an appropriate and informative subset of features and Grey Wolf Optimizer (GWO) as wrapper approach (PCA-GWO) to select further informative features. Logistic Regression (LR) is used as an elevator to test the classification accuracy of candidate feature subsets produced by GWO. Three Arabic datasets, namely Alkhaleej, Akhbarona, and Arabiya, are used to assess the efficiency of the proposed method. The experimental results confirm that the proposed method based on PCA-GWO outperforms the baseline classifiers with/without feature selection and other feature selection approaches in terms of classification accuracy
A computational study of the substrate conversion and selective inhibition of aldosterone synthase
When a functional or structural impairment of cardiac output has occurred, the cardiovascular system will attempt to compensate for the reduced blood flow. Unfortunately, many of the resulting processes, such as the renin angiotensin aldosterone system, will progressively weaken the heart, resulting in the condition called heart failure. The renin angiotensin aldosterone regulatory system is currently targeted with medicine for heart failure. Many successes for the prolongation of patient age have been achieved by inhibition of angiotensin II synthesis and action. It has become apparent that this approach is suboptimal. Antagonists of aldosterone have provided better treatment options, however, side-effects are still observed. In the search for an alternative therapeutic application, we have studied a novel treatment involving the selective inhibition of aldosterone biosynthesis. The scope of this study has involved the in silico design and prediction of novel inhibitors, the synthesis of these inhibitors and analogues, and finally the in vitro measurement of their potency. The biosynthesis of aldosterone is performed by two cytochrome p450 enzymes, 11B1 and 11B2, denoted as CYP11B1 and CYP11B2, respectively. From these two family members, only CYP11B2 can perform the final synthesis step that converts 18-hydroxycorticosterone into aldosterone. CYP11B1 performs the synthesis of glucocorticoids that are responsible for metabolic, immunologic and homeostatic functions. Because these glucocorticoid actions should not be inhibited, the newly designed medicine must be CYP11B2 selective. Since CYP11B1 is highly homologous to CYP11B2, we have performed an in silico study that allows us to model the interactions of substrates and inhibitors in both the active sites of CYP11B1 and CYP11B2. Using comparative modelling, we have constructed models for the three dimensional architecture of both proteins. These models have been validated by investigating the torsional properties of the protein backbone and residue side chains, the overall protein packing and the dynamic behaviour of the protein models. Subsequently, the models have been used to evaluate the binding mechanisms and conversion mechanisms for the natural steroidal ligands of CYP11B1 and CYP11B2. A hypothetical binding mode has been proposed for 18-hydroxycorticosterone in CYP11B2, featuring the presence of stabilising hydrogen bonding interactions required for its conversion. Quantum mechanical analyses on the conversion of the steroids involved have shown a favourable conversion for this conformation, thereby supporting our hypothesis. In addition, the quantum mechanical analyses have provided insights on steroid conformations in the active sites during conversion. The suitability of the protein models for inhibitor design has been tested by subjecting the models to a case study with four known inhibitors of CYP11B1 and CYP11B2. Using molecular dynamics and molecular docking, the inhibitor potencies for CYP11B1 and CYP11B2 have been predicted, and their interactions with the proteins have been evaluated. The trends in inhibitor potency found by these computational methods have been confirmed by in vitro inhibition measurements. As a next step, the molecular docking study has been expanded to improve the confidence in the predictive power of the models. Using the protein states evaluated by the molecular dynamics study, the molecular docking results of inhibitor analogues have been investigated and the predictive power of the models has been qualitatively improved. In a final approach, we have performed a ligand-based investigation of the inhibitor analogues to determine which ligand characteristics are important for the potency for CYP11B1 and CYP11B2. To this end, we have conducted decision tree analyses on the physico-chemical properties of inhibitor substituents, resulting in a collection of descriptors that can be used for the prediction and design of novel inhibitors. We have shown that a combination of synthesis, molecular modelling and experimental measurements form a promising approach towards the design of potentially new inhibitors
Developing a high-performance soil fertility status prediction voting ensemble using brute exhaustive optimization in automated multiprecision weights of hybrid classifiers
A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Information and Communication Science and Engineering of the Nelson Mandela African Institution of Science and TechnologyWith the advent of machine learning (ML) techniques, various algorithms have been applied in
previous studies to develop models for predicting soil fertility status. However, these models are
observed to use varying fertility target classes, and variations have been reported in these models'
predictive performances. As a result, practical applications of these models for obtaining the most
accurate predictions may become hindered. While the weighted voting ensemble (WVE) ML
technique can be used to improve soil fertility status prediction by aggregating individual models
prediction, guaranteeing finding of an optimal WVE assignment weights is challenging. Whereas
a brute exhaustive search procedure can be applied for the mentioned task, there is a lack of
exploration on the exploitation of automated classifiers' precise weights combinations as search
spaces for successful optimization. This research aims to develop a high-performance soil
fertility status prediction voting ensemble using brute exhaustive optimization in automated
1EXP(-)Z+ multi-precision weights of hybrid classifiers. Soil chemical properties and ML
modeling algorithms for modeling soil fertility status were identified. Base hybrid ML
classification models for predicting soil fertility status were evaluated using Tanzania as a case
study. Finally, the base ML hybrids WVE models were optimized using brute exhaustive search
procedure’s novel developed search spaces generation algorithm for guaranteed optimal solution
finding. The research was designed using design science research methodology, with the
application of unsupervised machine learning K-mean algorithm with a knee detection method
to find the optimal number of soil fertility status target classes, and supervised learning
algorithms were applied to model classifiers for those optimal classes. Three soil fertility target
classes were identified by clustering technique. The model achieved on test data a predictive
accuracy of 98.93%, with respective AUC of 82%, 83%, and 87% for low, medium, and high
soil fertility targets classes. Whereas these performances are observed higher compared to models
in previous studies, 92% correct classifications were obtained on validation against external
unseen laboratory-based tested soil results. Therefore, soil testing laboratories and farmers should
consider using the model to smartly manage soil fertility which may lead to improved crop
growth and productivity. The government could set agricultural-related policies that require the
use of the model by farmers with the provision of agricultural inputs subsidies. Future work could
be to develop an integrated real-time web and mobile application for providing farmers with soil
fertility status information
Predicting plant environmental exposure using remote sensing
Wheat is one of the most important crops globally with 776.4 million tonnes produced in
2019 alone. However, 10% of all wheat yield is predicted to be lost to Septoria Tritici
Blotch (STB) caused by Zymoseptoria tritici (Z. tritici). Throughout Europe farmers spend
£0.9 billion annually on preventative fungicide regimes to protect wheat against Z. tritici. A
preventative fungicide regime is used as Z. tritici has a 9-16 day asymptomatic latent phase
which makes it difficult to detect before symptoms develop, after which point fungicide
intervention is ineffective.
In the second chapter of my thesis I use hyperspectral sensing and imaging techniques,
analysed with machine learning to detect and predict symptomatic Z. tritici infection in
winter wheat, in UK based field trials, with high accuracy. This has the potential to
improve detection and monitoring of symptomatic Z. tritici infection and could facilitate
precision agriculture methods, to use in the subsequent growing season, that optimise
fungicide use and increase yield.
In the third chapter of my thesis, I develop a multispectral imaging system which can detect
and utilise none visible shifts in plant leaf reflectance to distinguish plants based on the
nitrogen source applied. Currently, plants are treated with nitrogen sources to increase
growth and yield, the most common being calcium ammonium nitrate. However, some
nitrogen sources are used in illicit activities. Ammonium nitrate is used in explosive
manufacture and ammonium sulphate in the cultivation and extraction of the narcotic
cocaine from Erythroxylum spp. In my third chapter I show that hyperspectral sensing,
multispectral imaging, and machine learning image analysis can be used to visualise and
differentiate plants exposed to different nefarious nitrogen sources. Metabolomic analysis
of leaves from plants exposed to different nitrogen sources reveals shifts in colourful
metabolites that may contribute to altered reflectance signatures. This suggests that
different nitrogen feeding regimes alter plant secondary metabolism leading to changes in
plant leaf reflectance detectable via machine learning of multispectral data but not the
naked eye. These results could facilitate the development of technologies to monitor illegal
activities involving various nitrogen sources and further inform nitrogen application
requirements in agriculture.
In my fourth chapter I implement and adapt the hyperspectral sensing, multispectral
imaging and machine learning image analysis developed in the third chapter to detect
asymptomatic (and symptomatic) Z. tritici infection in winter wheat, in UK based field
trials, with high accuracy. This has the potential to improve detection and monitoring of all
stages of Z. tritici infection and could facilitate precision agriculture methods to be used
during the current growing season that optimise fungicide use and increase yield.Open Acces
The 2nd International Electronic Conference on Applied Sciences
This book is focused on the works presented at the 2nd International Electronic Conference on Applied Sciences, organized by Applied Sciences from 15 to 31 October 2021 on the MDPI Sciforum platform. Two decades have passed since the start of the 21st century. The development of sciences and technologies is growing ever faster today than in the previous century. The field of science is expanding, and the structure of science is becoming ever richer. Because of this expansion and fine structure growth, researchers may lose themselves in the deep forest of the ever-increasing frontiers and sub-fields being created. This international conference on the Applied Sciences was started to help scientists conduct their own research into the growth of these frontiers by breaking down barriers and connecting the many sub-fields to cut through this vast forest. These functions will allow researchers to see these frontiers and their surrounding (or quite distant) fields and sub-fields, and give them the opportunity to incubate and develop their knowledge even further with the aid of this multi-dimensional network
- …