25 research outputs found
Gene selection and classification in autism gene expression data
Autism spectrum disorders (ASD) are neurodevelopmental disorders that are currently diagnosed on the basis of abnormal stereotyped behaviour as well as observable deficits in communication and social functioning. Although a variety of candidate genes have been attributed to the disorder, no single gene is applicable to more than 1â2% of the general ASD population. Despite extensive efforts, definitive genes that contribute to autism susceptibility have yet to be identified. The major problems in dealing with the gene expression dataset of autism include the presence of limited number of samples and large noises due to errors of experimental measurements and natural variation. In this study, a systematic combination of three important filters, namely t-test (TT), Wilcoxon Rank Sum (WRS) and Feature Correlation (COR) are applied along with efficient wrapper algorithm based on geometric binary particle swarm optimization-support vector machine (GBPSO-SVM), aiming at selecting and classifying the most attributed genes of autism. A new approach based on the criterion of median ratio, mean ratio and variance deviations is also applied to reduce the initial dataset prior to its involvement. Results showed that the most discriminative genes that were identified in the first and last selection steps concluded the presence of a repetitive gene (CAPS2), which was assigned as the most ASD risk gene. The fused result of genes subset that were selected by the GBPSO-SVM algorithm increased the classification accuracy to about 92.10%, which is higher than those reported in literature for the same autism dataset. Noticeably, the application of ensemble using random forest (RF) showed better performance compared to that of previous studies. However, the ensemble approach based on the employment of SVM as an integrator of the fused genes from the output branches of GBPSO-SVM outperformed the RF integrator. The overall improvement was ascribed to the selection strategies that were taken to reduce the dataset and the utilization of efficient wrapper based GBPSO-SVM algorithm
Evolutionary Computation
This book presents several recent advances on Evolutionary Computation, specially evolution-based optimization methods and hybrid algorithms for several applications, from optimization and learning to pattern recognition and bioinformatics. This book also presents new algorithms based on several analogies and metafores, where one of them is based on philosophy, specifically on the philosophy of praxis and dialectics. In this book it is also presented interesting applications on bioinformatics, specially the use of particle swarms to discover gene expression patterns in DNA microarrays. Therefore, this book features representative work on the field of evolutionary computation and applied sciences. The intended audience is graduate, undergraduate, researchers, and anyone who wishes to become familiar with the latest research work on this field
Discovering Biomarkers of Alzheimer's Disease by Statistical Learning Approaches
In this work, statistical learning approaches are exploited to discover biomarkers for Alzheimer's disease (AD). The contributions has been made in the fields of both biomarker and software driven studies. Surprising discoveries were made in the field of blood-based biomarker search. With the inclusion of existing biological knowledge and a proposed novel feature selection method, several blood-based protein models were discovered to have promising ability to separate AD patients from healthy individuals. A new statistical pattern was discovered which can be potential new guideline for diagnosis methodology. In the field of brain-based biomarker, the positive contribution of covariates such as age, gender and APOE genotype to a AD classifier was verified, as well as the discovery of panel of highly informative biomarkers comprising 26 RNA transcripts. The classifier trained by the panetl of genes shows excellent capacity in discriminating patients from control. Apart from biomarker driven studies, the development of statistical packages or application were also involved. R package metaUnion was designed and developed to provide advanced meta-analytic approach applicable for microarray data. This package overcomes the defects appearing in previous meta-analytic packages { 1) the neglection of missing data, 2) the in exibility of feature dimension 3) the lack of functions to support post-analysis summary. R package metaUnion has been applied in a published study as part of the integrated genomic approaches and resulted in significant findings. To provide benchmark references about significance of features for dementia researchers, a web-based platform AlzExpress was built to provide researchers with granular level of differential expression test and meta-analysis results. A combination of fashionable big data technologies and robust data mining algorithms make AlzExpress flexible, scalable and comprehensive platform of valuable bioinformatics in dementia research.Plymouth Universit
Assigning function to genome wide association study variants associated with complex gastrointestinal disease
PhDThe genomeâwide association study era has identified numerous loci associated with many
common polygenic diseases. The next challenge is to identify the functional consequences
of these variants and elicit how they impact on disease risk. Using a combination of protein
based assays, large scale microarrays and highâthroughput generation sequencing
platforms this thesis aims to identify the functional effects of disease loci, with particular
focus on Crohnâs disease and coeliac disease, two common complex gastrointestinal
diseases.
Variants located within the Interleukin 23 receptor are associated with both susceptibility
and protection from Crohnâs disease, a debilitating chronic inflammatory disease of the
bowel. A study was undertaken to investigate the effect of these variants, at the mRNA as
well as the protein level, on both cytokine and receptor levels.
Coeliac disease is a dietary intolerance to the gluten component of wheat, barley and rye
and has an estimated prevalence of approximately 1%. Genomeâwide association studies
have identified eight genomic different loci as associated with coeliac disease but none
have been functionally characterised. To investigate the effect that genotype has on gene
transcript levels, a genetical genomics study was undertaken in patients with coeliac
disease generating results with relevance to a range of autoimmune disorders.
Before disease based effects can be identified, it is first important to fully characterise the
normal human transcriptome and methylome. To this end CD4
+ T cells were studied using
novel highâthroughput sequencing techniques, with the aim of providing some insight into
novel genomic properties that may illuminate current and future disease associated loci.
Given the base pair resolution approach of highâthroughput sequencing, a novel method of
assaying for SNP effects on gene expression was developed. This allele specific method,
using whole transcriptome sequencing, is capable of identifying alterations in transcript
expression on a genomeâwide scale
Mediation of triple-negative breast cancer cell fate via cellular redox and Wnt signalling
Breast cancer is the most common cause of malignancy affecting women worldwide. This thesis focusses on the role of DDX20 in regulating Wnt/ÎČ-catenin signalling and its impact on cell fate in triple-negative breast cancer (TNBC). The results of this study demonstrated a new role for DDX20-mediated Wnt signalling governing intracellular redox and mitochondrial function. Furthermore, we have determined that DDX20 is an essential regulator of Wnt/ÎČ-catenin signalling in TNBC stem cells
Characterisation and structural studies of a superoxide dismutase and OmpA-like proteins from Borrelia burgdorferi sensu lato
Lyme borreliosis is the most common tick-borne, human infection across the Northern
hemisphere. The agent responsible, Borrelia burgdorferi sensu lato (s.l.) covers a family of
Spirochaetes with unique characteristics which are shared by both Gram-negative and Grampositive
bacteria. The outer membrane (OM) is rich in lipoproteins but contains a relatively
low density of integral membrane proteins (OMPs), of these OMPs very few have been
identified and even fewer are well characterised. The OmpA-like transmembrane domain
defined by the Pfam family PF01389 is a 8-stranded membrane spanning ÎČ-barrel and is well
conserved among Gram-negative bacteria but to date remains unknown in Spirochaetes.
Building from previous computational work which had sought to identify possible OMPs from
B. burgdorferi s.l. four OmpA-like proteins, BAPKO_0422 (Borrelia afzelii), BB_0562,
BB_0406 (B. burgdorferi) and BG0408 (Borrelia garinii) have been identified and structurally
characterised. The four proteins are encoded by chromosomal genes and highly conserved
between Borrelia species and may be of diagnostic or therapeutic value. Structural
characterisation by both circular dichroism and small angle X-ray scattering suggests these four
proteins adopt a compact globular structure rich in ÎČ-strand (~40%) with Ab initio molecular
envelopes resembling a cylindrical peanut shape with dimensions of ~25x45 Ă
consistent with
an 8-stranded ÎČ-barrel. The present work demonstrates that BAPKO_0422 can bind human
factor H (hfH) and some evidence for a further interaction between the BAPKO_0422 protein
and heparin. The interaction with hfH may contribute to the spirochaeteâs immune evasion
mechanisms by the inhibition of the complement response.
The zoonotic life-cycle of Borrelia and challenges by the hostâs immune system causes an ever
changing environment which often leads to fluctuations of O2 exposure. Although B.
burgdorferi s.l. have a distinct lack of metabolic systems including peroxidases and catalase
enzymes the Spirochaetes genome does encode a single superoxide dismutase gene (sodA -
bb_0157). Previously assigned as a Fe-SOD there has been some debate whether this protein
requires iron or manganese as a co-factor. The present work demonstrates that the B.
burgdorferi enzyme SodA requires manganese for activity and does not display cambialistic
behaviour. Structural and proteomic characterisation suggests the B. burgdorferi SodA enzyme
shares significant sequence similarity to a superoxide dismutase from Thermus thermophilus
RFID Technology in Intelligent Tracking Systems in Construction Waste Logistics Using Optimisation Techniques
Construction waste disposal is an urgent issue
for protecting our environment. This paper proposes a
waste management system and illustrates the work
process using plasterboard waste as an example, which
creates a hazardous gas when land filled with household
waste, and for which the recycling rate is less than 10%
in the UK. The proposed system integrates RFID
technology, Rule-Based Reasoning, Ant Colony
optimization and knowledge technology for auditing
and tracking plasterboard waste, guiding the operation
staff, arranging vehicles, schedule planning, and also
provides evidence to verify its disposal. It h relies on
RFID equipment for collecting logistical data and uses
digital imaging equipment to give further evidence; the
reasoning core in the third layer is responsible for
generating schedules and route plans and guidance, and
the last layer delivers the result to inform users. The
paper firstly introduces the current plasterboard
disposal situation and addresses the logistical problem
that is now the main barrier to a higher recycling rate,
followed by discussion of the proposed system in terms
of both system level structure and process structure.
And finally, an example scenario will be given to
illustrate the systemâs utilization