2,333 research outputs found
Analysis of Genomic and Proteomic Signals Using Signal Processing and Soft Computing Techniques
Bioinformatics is a data rich field which provides unique opportunities to use computational techniques to understand and organize information associated with biomolecules such as DNA, RNA, and Proteins. It involves in-depth study in the
areas of genomics and proteomics and requires techniques from computer science,statistics and engineering to identify, model, extract features and to process data for analysis and interpretation of results in a biologically meaningful manner.In engineering methods the signal processing techniques such as transformation,filtering, pattern analysis and soft-computing techniques like multi layer perceptron(MLP) and radial basis function neural network (RBFNN) play vital role to effectively resolve many challenging issues associated with genomics and proteomics.
In this dissertation, a sincere attempt has been made to investigate on some challenging problems of bioinformatics by employing some efficient signal and soft computing methods. Some of the specific issues, which have been attempted are protein coding region identification in DNA sequence, hot spot identification in protein, prediction of protein structural class and classification of microarray gene expression data. The dissertation presents some novel methods to measure and to extract features from the genomic sequences using time-frequency analysis and machine intelligence techniques.The problems investigated and the contribution made in the thesis are presented here in a concise manner. The S-transform, a powerful time-frequency representation technique, possesses superior property over the wavelet transform and short time Fourier transform as the exponential function is fixed with respect to time axis while the localizing scalable Gaussian window dilates and translates. The S-transform uses an analysis window whose width is decreasing with frequency providing a frequency dependent resolution. The invertible property of S-transform makes it suitable for time-band filtering application. Gene prediction and protein coding region identification have been always a challenging task in computational biology,especially in eukaryote genomes due to its complex structure. This issue is resolved using a S-transform based time-band filtering approach by localizing the period-3 property present in the DNA sequence which forms the basis for the identification.Similarly, hot spot identification in protein is a burning issue in protein science due to its importance in binding and interaction between proteins. A novel S-transform based time-frequency filtering approach is proposed for efficient identification of the hot spots. Prediction of structural class of protein has been a challenging problem in bioinformatics.A novel feature representation scheme is proposed to efficiently
represent the protein, thereby improves the prediction accuracy. The high dimension and low sample size of microarray data lead to curse of dimensionality problem which affects the classification performance.In this dissertation an efficient hybrid feature extraction method is proposed to overcome the dimensionality issue and a RBFNN
is introduced to efficiently classify the microarray samples
Determination of Characteristic Frequency for Identification of Hot Spots in Proteins
Identification of hot spots or protein-target binding sites in proteins using resonant recognition model requires the knowledge of characteristic frequency. For a successful protein target interaction, both the protein and the target signals must share the same characteristic frequency. The common characteristic frequency of a functional group of proteins is determined from the consensus spectrum obtained using DFT. In this work an alternative approach for identification of characteristic frequency using power spectral density is described. The performance of the proposed method is observed to be better than the DFT-based approach and is illustrated using simulation examples
Method and System for Identification of Metabolites Using Mass Spectra
A method and system is provided for mass spectrometry for identification of a specific elemental formula for an unknown compound which includes but is not limited to a metabolite. The method includes calculating a natural abundance probability (NAP) of a given isotopologue for isotopes of non-labelling elements of an unknown compound. Molecular fragments for a subset of isotopes identified using the NAP are created and sorted into a requisite cache data structure to be subsequently searched. Peaks from raw spectrum data from mass spectrometry for an unknown compound. Sample-specific peaks of the unknown com- pound from various spectral artifacts in ultra-high resolution Fourier transform mass spectra are separated. A set of possible isotope-resolved molecular formula (IMF) are created by iteratively searching the molecular fragment caches and combining with additional isotopes and then statistically filtering the results based on NAP and mass-to-charge (m/2) matching probabilities. An unknown compound is identified and its corresponding elemental molecular formula (EMF) from statistically-significant caches of isotopologues with compatible IMFs
Recommended from our members
Optical imaging methods for the study of disease models from the nano to the mesoscale
The visualisation of disease phenotypes allows scientists to study fundamental mechanisms of disease. Optical imaging methods are useful not only to observe anatomical features of biological samples, but also to infer interactions between molecular species using fluorescence labelling. This thesis presents the development of imaging and analysis tools to study biological questions in three models of disease, with samples ranging from the sub-cellular to the organ scale.
First, the role of the alpha-synuclein (a-syn) protein, whose dysfunction is a hallmark of Parkinson’s Disease, was studied with respect to vesicle trafficking at the synapse. Synaptic vesicles are ∼40 nm in diameter; imaging vesicles therefore requires methods with resolution below the diffraction limit. Single-molecule localisation microscopy (SMLM), which circumvents the diffraction limit by separating fluorophore emission in time to localise individual molecules in space with ∼20 nm precision, was thus implemented to study a-syn in purified synaptic boutons. A software package was developed to analyse the colocalisation of a-syn with internalised vesicles, and the clustering of a-syn under differing synaptic calcium levels. The colocalisation of a-syn and internalised vesicles was found to be temperature independent, suggesting that a-syn is involved in non-canonical trafficking mechanisms. Ground truth simulations from a synaptosome model were used to benchmark two cluster analysis methods. Both methods applied on the experimental data showed that a-syn becomes less clustered at low synaptic calcium levels.
Second, the spatiotemporal association of ESCRT-II, a protein complex whose role in the budding of the human immunodeficiency virus (HIV) was previously considered dispensable, and the HIV polyprotein Gag was studied during viral egress using novel image analysis tools. A nearest-neighbour analysis showed the ESCRT-II protein EAP45 colocalises with Gag similarly to ALIX, a protein well known to be involved in HIV budding. However, upon deletion of EAP45’s N-terminus, its colocalisation with Gag was significantly impaired, highlighting the importance of this EAP45 domain in linking to Gag. Single particle tracking was used to trace the trajectories of EAP45 and Gag in live cells, and an algorithm was developed to visualise the simultaneous motion of two particles; these analyses revealed three types of potential dynamic interaction between EAP45 and Gag.
Finally, an open-source instrument to visualise phenotypes from large organs in 3D was developed for the study of chronic obstructive pulmonary disease (COPD) models. The instrument implements Optical Projection Tomography, a technique which can reconstruct cross-sectional slices of a transparent object from its orthographic projections, using off-the- shelf components and novel ImageJ plugins for artefact correction and volume reconstructions. Excised and cleared mouse lungs were imaged in which high order airways can be discerned with 50 μm resolution. The raw lung data, instructions for building the instrument, the free ImageJ plugins, and a detailed software manual are available in an online repository to encourage the widespread use of OPT for imaging large samples.Gates Cambridg
Improving the accuracy and efficiency of docking methods
Computational methods for predicting macromolecular complexes are useful tools for studying biological systems. They are used in areas such as drug design and for studying protein-protein interactions. While considerable progress has been made in this field over the decades, enhancing the speed and accuracy of these computational methods remains an important challenge. This work describes two different enhancements to the accuracy of ClusPro, a method for performing protein-protein docking, as well as an enhancement to the efficiency of global rigid body docking. SAXS is a high throughput technique collected for molecules in solution, and the data provides information about the shape and size of molecules. ClusPro was enhanced with the ability to SAXS data collected for protein complexes to guide docking by selecting conformations by how well they match the experimental data, which improved docking accuracy when such data is available. Various other experimental techniques, such as NMR, FRET, or chemical cross linking can provide information about protein-protein interfaces, and such information can be used to generate distance-based restraints between pairs of residues across the interface. A second enhancement to ClusPro enables the use of such distance restraints to improve docking accuracy. Finally, an enhancement to the efficiency of FFT based global docking programs was developed. This enhancement allows for the efficient search of multiple sidechain conformations, and this improved program was applied to the flexible computational solvent mapping program FTFlex.2018-07-09T00:00:00
Unraveling the Thousand Word Picture: An Introduction to Super-Resolution Data Analysis
Super-resolution microscopy provides direct insight into fundamental biological processes occurring at length scales smaller than light’s diffraction limit. The analysis of data at such scales has brought statistical and machine learning methods into the mainstream. Here we provide a survey of data analysis methods starting from an overview of basic statistical techniques underlying the analysis of super-resolution and, more broadly, imaging data. We subsequently break down the analysis of super-resolution data into four problems: the localization problem, the counting problem, the linking problem, and what we’ve termed the interpretation problem
Image Processing and Simulation Toolboxes of Microscopy Images of Bacterial Cells
Recent advances in microscopy imaging technology have allowed the characterization of the dynamics of cellular processes at the single-cell and single-molecule level. Particularly in bacterial cell studies, and using the E. coli as a case study, these techniques have been used to detect and track internal cell structures such as the Nucleoid and the Cell Wall and fluorescently tagged molecular aggregates such as FtsZ proteins, Min system proteins, inclusion bodies and all the different types of RNA molecules. These studies have been performed with using multi-modal, multi-process, time-lapse microscopy, producing both morphological and functional images.
To facilitate the finding of relationships between cellular processes, from small-scale, such as gene expression, to large-scale, such as cell division, an image processing toolbox was implemented with several automatic and/or manual features such as, cell segmentation and tracking, intra-modal and intra-modal image registration, as well as the detection, counting and characterization of several cellular components.
Two segmentation algorithms of cellular component were implemented, the first one based on the Gaussian Distribution and the second based on Thresholding and morphological structuring functions. These algorithms were used to perform the segmentation of Nucleoids and to identify the different stages of FtsZ Ring formation (allied with the use of machine learning algorithms), which allowed to understand how the temperature influences the physical properties of the Nucleoid and correlated those properties with the exclusion of protein aggregates from the center of the cell. Another study used the segmentation algorithms to study how the temperature affects the formation of the FtsZ Ring.
The validation of the developed image processing methods and techniques has been based on benchmark databases manually produced and curated by experts. When dealing with thousands of cells and hundreds of images, these manually generated datasets can become the biggest cost in a research project. To expedite these studies in terms of time and lower the cost of the manual labour, an image simulation was implemented to generate realistic artificial images.
The proposed image simulation toolbox can generate biologically inspired objects that mimic the spatial and temporal organization of bacterial cells and their processes, such as cell growth and division and cell motility, and cell morphology (shape, size and cluster organization). The image simulation toolbox was shown to be useful in the validation of three cell tracking algorithms: Simple Nearest-Neighbour, Nearest-Neighbour with Morphology and DBSCAN cluster identification algorithm. It was shown that the Simple Nearest-Neighbour still performed with great reliability when simulating objects with small velocities, while the other algorithms performed better for higher velocities and when there were larger clusters present
A robust algorithm for segmenting fluorescence images and its application to single-molecule counting
La microscopie par fluorescence de cellules vivantes produit de grandes quantités de
données. Ces données sont composées d’une grande diversité au niveau de la forme des
objets d’intérêts et possèdent un ratio signaux/bruit très bas. Pour concevoir un pipeline
d’algorithmes efficaces en traitement d’image de microscopie par fluorescence, il
est important d’avoir une segmentation robuste et fiable étant donné que celle-ci constitue
l’étape initiale du traitement d’image. Dans ce mémoire, je présente MinSeg, un
algorithme de segmentation d’image de microscopie par fluorescence qui fait peu d’assomptions
sur l’image et utilise des propriétés statistiques pour distinguer le signal par
rapport au bruit. MinSeg ne fait pas d’assomption sur la taille ou la forme des objets
contenus dans l’image. Par ce fait, il est donc applicable sur une grande variété d’images.
Je présente aussi une suite d’algorithmes pour la quantification de petits complexes dans
des expériences de microscopie par fluorescence de molécules simples utilisant l’algorithme
de segmentation MinSeg. Cette suite d’algorithmes a été utilisée pour la quantification
d’une protéine nommée CENP-A qui est une variante de l’histone H3. Par cette
technique, nous avons trouvé que CENP-A est principalement présente sous forme de
dimère.Live-cell fluorescence microscopy produces high amounts of data with a high variability
in shapes at low signal-to-noise ratio. An efficient design of image analysis
pipelines requires a reliable and robust initial segmentation step that needs little parameter
fine-tuning. Here, I present a segmentation algorithm called MinSeg for fluorescence
image data that relies on minimal assumptions about the image, and uses statistical considerations
to distinguish signal from background. More importantly, the algorithm does
not make assumptions about feature size or shape, and is thus universally applicable. I
also present a pipeline for the quantification of small complexes with single-molecule
fluorescence microscopy using this segmentation algorithm as the first step of the workflow.
This pipeline was used for the quantification of a small histone H3 variant protein
called CENP-A. We found that the CENP-A nucleosomes are dimers
Plasmonic Nanoplatforms for Biochemical Sensing and Medical Applications
Plasmonics, the science of the excitation of surface plasmon polaritons (SPP) at the metal-dielectric interface under intense beam radiation, has been studied for its immense potential for developing numerous nanophotonic devices, optical circuits and lab-on-a-chip devices. The key feature, which makes the plasmonic structures promising is the ability to support strong resonances with different behaviors and tunable localized hotspots, excitable in a wide spectral range. Therefore, the fundamental understanding of light-matter interactions at subwavelength nanostructures and use of this understanding to tailor plasmonic nanostructures with the ability to sustain high-quality tunable resonant modes are essential toward the realization of highly functional devices with a wide range of applications from sensing to switching.
We investigated the excitation of various plasmonic resonance modes (i.e. Fano resonances, and toroidal moments) using both optical and terahertz (THz) plasmonic metamolecules. By designing and fabricating various nanostructures, we successfully predicted, demonstrated and analyzed the excitation of plasmonic resonances, numerically and experimentally. A simple comparison between the sensitivity and lineshape quality of various optically driven resonances reveals that nonradiative toroidal moments are exotic plasmonic modes with strong sensitivity to environmental perturbations. Employing toroidal plasmonic metasurfaces, we demonstrated ultrafast plasmonic switches and highly sensitive sensors. Focusing on the biomedical applications of toroidal moments, we developed plasmonic metamaterials for fast and cost-effective infection diagnosis using the THz range of the spectrum. We used the exotic behavior of toroidal moments for the identification of Zika-virus (ZIKV) envelope proteins as the infectious nano-agents through two protocols: 1) direct biding of targeted biomarkers to the plasmonic metasurfaces, and 2) attaching gold nanoparticles to the plasmonic metasurfaces and binding the proteins to the particles to enhance the sensitivity. This led to developing ultrasensitive THz plasmonic metasensors for detection of nanoscale and low-molecular-weight biomarkers at the picomolar range of concentration.
In summary, by using high-quality and pronounced toroidal moments as sensitive resonances, we have successfully designed, fabricated and characterized novel plasmonic toroidal metamaterials for the detection of infectious biomarkers using different methods. The proposed approach allowed us to compare and analyze the binding properties, sensitivity, repeatability, and limit of detection of the metasensing device
- …