46 research outputs found

    Human Promoter Prediction Using DNA Numerical Representation

    Get PDF
    With the emergence of genomic signal processing, numerical representation techniques for DNA alphabet set {A, G, C, T} play a key role in applying digital signal processing and machine learning techniques for processing and analysis of DNA sequences. The choice of the numerical representation of a DNA sequence affects how well the biological properties can be reflected in the numerical domain for the detection and identification of the characteristics of special regions of interest within the DNA sequence. This dissertation presents a comprehensive study of various DNA numerical and graphical representation methods and their applications in processing and analyzing long DNA sequences. Discussions on the relative merits and demerits of the various methods, experimental results and possible future developments have also been included. Another area of the research focus is on promoter prediction in human (Homo Sapiens) DNA sequences with neural network based multi classifier system using DNA numerical representation methods. In spite of the recent development of several computational methods for human promoter prediction, there is a need for performance improvement. In particular, the high false positive rate of the feature-based approaches decreases the prediction reliability and leads to erroneous results in gene annotation.To improve the prediction accuracy and reliability, DigiPromPred a numerical representation based promoter prediction system is proposed to characterize DNA alphabets in different regions of a DNA sequence.The DigiPromPred system is found to be able to predict promoters with a sensitivity of 90.8% while reducing false prediction rate for non-promoter sequences with a specificity of 90.4%. The comparative study with state-of-the-art promoter prediction systems for human chromosome 22 shows that our proposed system maintains a good balance between prediction accuracy and reliability. To reduce the system architecture and computational complexity compared to the existing system, a simple feed forward neural network classifier known as SDigiPromPred is proposed. The SDigiPromPred system is found to be able to predict promoters with a sensitivity of 87%, 87%, 99% while reducing false prediction rate for non-promoter sequences with a specificity of 92%, 94%, 99% for Human, Drosophila, and Arabidopsis sequences respectively with reconfigurable capability compared to existing system

    Assessing the effects of data selection and representation on the development of reliable E. coli sigma 70 promoter region predictors

    Get PDF
    As the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery through binding to a variety of regulatory proteins (known as sigma factors). The identification of the promoter regions is very challenging because these regions do not adhere to specific sequence patterns or motifs and are difficult to determine experimentally. Machine learning represents a promising and cost-effective approach for computational identification of prokaryotic promoter regions. However, the quality of the predictors depends on several factors including: i) training data; ii) data representation; iii) classification algorithms; iv) evaluation procedures. In this work, we create several variants of E. coli promoter data sets and utilize them to experimentally examine the effect of these factors on the predictive performance of E. coli σ70 promoter models. Our results suggest that under some combinations of the first three criteria, a prediction model might perform very well on cross-validation experiments while its performance on independent test data is drastically very poor. This emphasizes the importance of evaluating promoter region predictors using independent test data, which corrects for the over-optimistic performance that might be estimated using the cross-validation procedure. Our analysis of the tested models shows that good prediction models often perform well despite how the non-promoter data was obtained. On the other hand, poor prediction models seems to be more sensitive to the choice of non-promoter sequences. Interestingly, the best performing sequence-based classifiers outperform the best performing structure-based classifiers on both cross-validation and independent test performance evaluation experiments. Finally, we propose a meta-predictor method combining two top performing sequence-based and structure-based classifiers and compare its performance with some of the state-of-the-art E. coli σ70 promoter prediction methods.NPRP grant No. 4-1454-1-233 from the Qatar National Research Fund (a member of Qatar Foundation).Scopu

    Modelling DNA Methylation Dynamics

    Get PDF

    Image Processing and Simulation Toolboxes of Microscopy Images of Bacterial Cells

    Get PDF
    Recent advances in microscopy imaging technology have allowed the characterization of the dynamics of cellular processes at the single-cell and single-molecule level. Particularly in bacterial cell studies, and using the E. coli as a case study, these techniques have been used to detect and track internal cell structures such as the Nucleoid and the Cell Wall and fluorescently tagged molecular aggregates such as FtsZ proteins, Min system proteins, inclusion bodies and all the different types of RNA molecules. These studies have been performed with using multi-modal, multi-process, time-lapse microscopy, producing both morphological and functional images. To facilitate the finding of relationships between cellular processes, from small-scale, such as gene expression, to large-scale, such as cell division, an image processing toolbox was implemented with several automatic and/or manual features such as, cell segmentation and tracking, intra-modal and intra-modal image registration, as well as the detection, counting and characterization of several cellular components. Two segmentation algorithms of cellular component were implemented, the first one based on the Gaussian Distribution and the second based on Thresholding and morphological structuring functions. These algorithms were used to perform the segmentation of Nucleoids and to identify the different stages of FtsZ Ring formation (allied with the use of machine learning algorithms), which allowed to understand how the temperature influences the physical properties of the Nucleoid and correlated those properties with the exclusion of protein aggregates from the center of the cell. Another study used the segmentation algorithms to study how the temperature affects the formation of the FtsZ Ring. The validation of the developed image processing methods and techniques has been based on benchmark databases manually produced and curated by experts. When dealing with thousands of cells and hundreds of images, these manually generated datasets can become the biggest cost in a research project. To expedite these studies in terms of time and lower the cost of the manual labour, an image simulation was implemented to generate realistic artificial images. The proposed image simulation toolbox can generate biologically inspired objects that mimic the spatial and temporal organization of bacterial cells and their processes, such as cell growth and division and cell motility, and cell morphology (shape, size and cluster organization). The image simulation toolbox was shown to be useful in the validation of three cell tracking algorithms: Simple Nearest-Neighbour, Nearest-Neighbour with Morphology and DBSCAN cluster identification algorithm. It was shown that the Simple Nearest-Neighbour still performed with great reliability when simulating objects with small velocities, while the other algorithms performed better for higher velocities and when there were larger clusters present

    Biosensors

    Get PDF
    A biosensor is defined as a detecting device that combines a transducer with a biologically sensitive and selective component. When a specific target molecule interacts with the biological component, a signal is produced, at transducer level, proportional to the concentration of the substance. Therefore biosensors can measure compounds present in the environment, chemical processes, food and human body at low cost if compared with traditional analytical techniques. This book covers a wide range of aspects and issues related to biosensor technology, bringing together researchers from 11 different countries. The book consists of 16 chapters written by 53 authors. The first four chapters describe several aspects of nanotechnology applied to biosensors. The subsequent section, including three chapters, is devoted to biosensor applications in the fields of drug discovery, diagnostics and bacteria detection. The principles behind optical biosensors and some of their application are discussed in chapters from 8 to 11. The last five chapters treat of microelectronics, interfacing circuits, signal transmission, biotelemetry and algorithms applied to biosensing

    Identifying the molecular components that matter: a statistical modelling approach to linking functional genomics data to cell physiology

    Get PDF
    Functional genomics technologies, in which thousands of mRNAs, proteins, or metabolites can be measured in single experiments, have contributed to reshape biological investigations. One of the most important issues in the analysis of the generated large datasets is the selection of relatively small sub-sets of variables that are predictive of the physiological state of a cell or tissue. In this thesis, a truly multivariate variable selection framework using diverse functional genomics data has been developed, characterized, and tested. This framework has also been used to prove that it is possible to predict the physiological state of the tumour from the molecular state of adjacent normal cells. This allows us to identify novel genes involved in cell to cell communication. Then, using a network inference technique networks representing cell-cell communication in prostate cancer have been inferred. The analysis of these networks has revealed interesting properties that suggests a crucial role of directional signals in controlling the interplay between normal and tumour cell to cell communication. Experimental verification performed in our laboratory has provided evidence that one of the identified genes could be a novel tumour suppressor gene. In conclusion, the findings and methods reported in this thesis have contributed to further understanding of cell to cell interaction and multivariate variable selection not only by applying and extending previous work, but also by proposing novel approaches that can be applied to any functional genomics data

    The structural characterisation of two DNA protectants during stress; the tandem RRM domains of mouse TDP-43 and E.coli DPS

    Get PDF
    TAR DNA Binding protein (TDP-43) is a member of the heterogeneous nuclear ribonucleoprotein family with crucial splicing, transport and regulatory function of genetic material inside mammalian cells. Unfortunately, TDP-43 positive cytoplasmic aggregates occurring with post-translational modifications are a common hallmark in neurodegenerative diseases observed in Alzheimer’s, Parkinson’s, amyotrophic lateral sclerosis (ALS) and fronto-temporal lobar degeneration (FTLD) diseases. Mutations in the TARDBP gene responsible for encoding TDP-43, have been directly correlated with onset of ALS and FLTD. Disease models describing TDP-43 proteinopathy suggests onset may derive through either cytoplasmic mis-localisation or a loss of nuclear function but it is unclear if or how disease associated point-mutations contribute to these observations. In order to determine the effects these mutations have on the protein, a fragment containing the tandem RRM domains (residues 101-265), responsible for the proteins nucleic acid binding function was tested. Using small angle X-ray scattering, circular dichroism, isothermal titration calorimetry and thermal assay methodology it was demonstrated that initial structures of all variants are similar but mutations (D169G and K263E) confer resistance to thermal denaturation by up to 4.9 ± 0.6˚C. This stability positively correlated with an increase in half-life when tested in the full-length variant using a neuron cell model suggesting that protein turn-over is a contributing disease factor. This study was also concerned with solving an X-ray crystallographic DNA binding complex structure for E.coli DPS and mapping interactions with neighbouring DPS complexes. These mechanisms are important in DPS function to protect nucleic acids during prokaryote stress. DPS is conserved in almost all prokaryotes however not all species can interact with DNA. Using X-ray crystallography, a model of E.coli DPS was built to 2.8 Å resolution from DNA containing samples showing both DNA and N-terminal residues were absent. Stabilising polar interactions were shown to form between neighbouring dodecamer structures involving T12, R18, D20, N99, S100, S106 and K134. Polar contacts are observed in all compared crystallographic structures from different species but the residues involved are poorly conserved, despite strong similarities between sequence and structure. This suggests that these contacts may contribute to stabilising the DNA-DPS complexes but form indiscriminately between exposed polar residues available on the dodecamer surface. These interactions are likely to contribute to the thermal stability of DNA-DPS complexes to aid in the proteins protective function

    Growth influences the single cell variability of the DNA damage response in Escherichia coli

    Get PDF
    The resilience of bacteria depends upon their capacity to proliferate and survive under different conditions, including in the human body where some bacterial infections can be fatal. Many antibiotics used to treat infections cause direct and indirect DNA damage, in particular DNA double-strand breaks, which can lead to bacterial cell death. Bacteria respond to DNA damage by inducing the SOS response, which is an important process in the repair and tolerance of DNA damage. Additional consequences of SOS induction by antibiotic exposure, is the potential increase of mutagenesis, horizontal gene transfer, and tolerance to other antibiotics. Therefore, identifying the factors involved in SOS induction is essential to understanding the dynamics of bacterial infections. Previous studies have indicated that bacterial susceptibility to DNA damaging agents is dependent on growth conditions, but the mechanisms involved are not well understood. Many physiological changes are associated with growth rate, including DNA replication (a major mechanismleading to DNA damage), and reallocation of resources towards growth-limiting processes, which could impair the capacity of cells to induce the SOS response. In addition, previous reports indicate that SOS expression is variable in single cells, and the effect of growth conditions in variability has not been evaluated. In order to evaluate how changes in growth conditions influence the SOS response, we have quantified the levels of SOS induction by DNA damage in single cells using E. coli as a model organism. Our results show that cells with very high levels of SOS expression are more abundant in slow-growing conditions, that is under spontaneous DNA damage, under damage induced by the antibiotic ciprofloxacin, and under replication-dependent chronic double-strand breaks. We explain these observations as a combination of population dynamics, that contributes to enriching for slow dividing cells (high SOS) in slow-growing populations, and an influence of growth conditions in the variability of SOS induction, possibly because of influences in the DNA-repair process via an unknown mechanism. The population dynamics arguments presented here may be relevant to other antibiotics, and argue to the significance of studying the response to antibiotics in single cells. We believe the observations on variability in SOS-expression may open new avenues for understanding the limiting factors for DNA repair
    corecore