2,565 research outputs found

    Genomics and proteomics: a signal processor's tour

    Get PDF
    The theory and methods of signal processing are becoming increasingly important in molecular biology. Digital filtering techniques, transform domain methods, and Markov models have played important roles in gene identification, biological sequence analysis, and alignment. This paper contains a brief review of molecular biology, followed by a review of the applications of signal processing theory. This includes the problem of gene finding using digital filtering, and the use of transform domain methods in the study of protein binding spots. The relatively new topic of noncoding genes, and the associated problem of identifying ncRNA buried in DNA sequences are also described. This includes a discussion of hidden Markov models and context free grammars. Several new directions in genomic signal processing are briefly outlined in the end

    Analysis of Protein Sequences Using Time Frequency and Kolmogorov-Smirnov Methods

    Get PDF
    The plethora of genomic data currently available has resulted in a search for new algorithms and analysis techniques to interpret genomic data. In this two-fold study we explore techniques for locating critical amino acid residues in protein sequences and for estimating the similarity between proteins. We demonstrate the use of the Short-Time Fourier Transform and the Continuous Wavelet Transform together with amino acid hydrophobicity in locating important amino acid domains in proteins and also show that the Kolmogorov-Smirnov statistic can be used as a metric of protein similarity

    The 1st Symposium on Chemical Evolution and the Origin and Evolution of Life

    Get PDF
    This symposium provided an opportunity for all NASA Exobiology principal investigators to present their most recent research in a scientific meeting forum. Papers were presented in the following exobiology areas: extraterrestrial chemistry primitive earth, information transfer, solar system exploration, planetary protection, geological record, and early biological evolution

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Bioactive peptide design using the Resonant Recognition Model

    Get PDF
    With a large number of DNA and protein sequences already known, the crucial question is to find out how the biological function of these macromolecules is "written" in the sequence of nucleotides or amino acids. Biological processes in any living organism are based on selective interactions between particular bio-molecules, mostly proteins. The rules governing the coding of a protein's biological function, i.e. its ability to selectively interact with other molecules, are still not elucidated. In addition, with the rapid accumulation of databases of protein primary structures, there is an urgent need for theoretical approaches that are capable of analysing protein structure-function relationships. The Resonant Recognition Model (RRM) [1,2] is one attempt to identify the selectivity of protein interactions within the amino acid sequence. The RRM [1,2] is a physico-mathematical approach that interprets protein sequence linear information using digital signal processing methods. In the RRM the protein primary structure is represented as a numerical series by assigning to each amino acid in the sequence a physical parameter value relevant to the protein's biological activity. The RRM concept is based on the finding that there is a significant correlation between spectra of the numerical presentation of amino acids and their biological activity. Once the characteristic frequency for a particular protein function/interaction is identified, it is possible then to utilize the RRM approach to predict the amino acids in the protein sequence, which predominantly contribute to this frequency and thus, to the observed function, as well as to design de novo peptides having the desired periodicities. As was shown in our previous studies of fibroblast growth factor (FGF) peptidic antagonists [2,3] and human immunodeficiency virus (HIV) envelope agonists [2,4], such de novo designed peptides express desired biological function. This study utilises the RRM computational approach to the analysis of oncogene and proto-oncogene proteins. The results obtained have shown that the RRM is capable of identifying the differences between the oncogenic and proto-oncogenic proteins with the possibility of identifying the "cancer-causing" features within their protein primary structure. In addition, the rational design of bioactive peptide analogues displaying oncogenic or proto-oncogenic-like activity is presented here

    Robust Odorant Recognition in Biological and Artificial Olfaction

    Get PDF
    Accurate detection and identification of gases pose a number of challenges for chemical sensory systems. The stimulus space is enormous; volatile compounds vary in size, charge, functional groups, and isomerization among others. Furthermore, variability arises from intrinsic (poisoning of the sensors or degradation due to aging) and extrinsic (environmental: humidity, temperature, flow patterns) sources. Nonetheless, biological olfactory systems have been refined over time to overcome these challenges. The main objective of this work is to understand how the biological olfactory system deals with these challenges, and translate them to artificial olfaction to achieve comparable capabilities. In particular, this thesis focuses on the design and computing mechanisms that allow a relatively simple invertebrate olfactory system to robustly recognize odorants even though the sensory neurons inputs may vary due to the identified intrinsic, or extrinsic factors. In biological olfaction, signal processing in the central circuits is largely shielded from the variations in the periphery arising from the constant replacement of older olfactory sensory neurons with newer ones. Inspired by this design principle, we developed an analytical method where the operation of a temperature programmed chemiresistor is treated akin to a mathematical input/output (I/O) transform. Results show that the I/O transform is unique for each analyte-transducer combination, robust with respect to sensor aging, and is highly reproducible across sensors of equal manufacture. This enables decoupling of the signal processing algorithms from the chemical transducer, and thereby allows seamless replacement of sensor array, while the signal processing approach was kept a constant. This is a key advance necessary for achieving long-term, non-invasive chemical sensing. Next, we explored how the biological system maintains invariance while environmental conditions, particularly with respect to changes in humidity levels. At the sensory level, odor-evoked responses to odorants did not vary with changes in humidity levels, however, the spontaneous activity varied significantly. Nevertheless, in the central antennal lobe circuits, ensembles of projection neurons robustly encoded information about odorant identity and intensity irrespective of the humidity levels. Interestingly, variations in humidity levels led to variable compression of intensity information which was carried forward to behavior. Taken together, these results indicate how the influence of humidity is diminished by central neural circuits in the biological olfactory system. Finally, we explored a potential biomedical application where a robust chemical sensing approach will be immensely useful: non-invasive assay for malaria diagnosis based on exhaled breath analysis. We developed a method to screen gas chromatography/mass spectroscopy (GC/MS) traces of human breath and identified 6 compounds that have abundance changes in malaria infected patients and can potentially serve as biomarkers in exhaled breath for their diagnosis. We will conclude with a discussion of on-going efforts to develop a non-invasive solution for diagnosing malaria based on breath volatiles. In sum, this work seeks to understand the basis for robust odor recognition in biological olfaction and proposes bioinspired and statistical solutions for achieving the same abilities in artificial chemical sensing systems

    QSAR analysis on tacrine-related acetylcholinesterase inhibitors

    Get PDF
    The evaluation of the clinical effects of Tacrine has shown efficacy in delaying the deterioration of the symptoms of Alzheimer's disease, while confirming the adverse events consisting mainly in the elevated liver transaminase levels. The study of tacrine analogs presents a continuous interest, and for this reason we establish Quantitative Structure-Activity Relationships on their Acetylcholinesterase inhibitory activity. Ten groups of new developed Tacrine-related inhibitors are explored, which have been experimentally measured in different biochemical conditions and AChE sources. The number of included descriptors in the structure-activity relationship is characterized by 'Rule of Thumb'. The 1502 applied molecular descriptors could provide the best linear models for the selected Alzheimer's data base and the best QSAR model is reported for the considered data sets. The QSAR models developed in this work have a satisfactory predictive ability, and are obtained by selecting the most representative molecular descriptors of the chemical structure, represented through more than a thousand of constitutional, topological, geometrical, quantum-mechanical and electronic descriptor types.Fil: Wong, Kai Y.. Imperial College London; Reino UnidoFil: Mercader, Andrew Gustavo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico la Plata. Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas; Argentina. Universidad Nacional de La Plata; ArgentinaFil: Saavedra Reyes, Laura Marcela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico la Plata. Centro de Investigación y Desarrollo En Ciencias Aplicadas; Argentina. Universidad Nacional de La Plata; ArgentinaFil: Honarparvar, Bahareh. University of Kwazulu-Natal; SudáfricaFil: Romanelli, Gustavo Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico la Plata. Centro de Investigación y Desarrollo En Ciencias Aplicadas; Argentina. Universidad Nacional de La Plata; ArgentinaFil: Duchowicz, Pablo Román. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico la Plata. Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas; Argentina. Universidad Nacional de La Plata; Argentin

    New approaches for unsupervised transcriptomic data analysis based on Dictionary learning

    Get PDF
    The era of high-throughput data generation enables new access to biomolecular profiles and exploitation thereof. However, the analysis of such biomolecular data, for example, transcriptomic data, suffers from the so-called "curse of dimensionality". This occurs in the analysis of datasets with a significantly larger number of variables than data points. As a consequence, overfitting and unintentional learning of process-independent patterns can appear. This can lead to insignificant results in the application. A common way of counteracting this problem is the application of dimension reduction methods and subsequent analysis of the resulting low-dimensional representation that has a smaller number of variables. In this thesis, two new methods for the analysis of transcriptomic datasets are introduced and evaluated. Our methods are based on the concepts of Dictionary learning, which is an unsupervised dimension reduction approach. Unlike many dimension reduction approaches that are widely applied for transcriptomic data analysis, Dictionary learning does not impose constraints on the components that are to be derived. This allows for great flexibility when adjusting the representation to the data. Further, Dictionary learning belongs to the class of sparse methods. The result of sparse methods is a model with few non-zero coefficients, which is often preferred for its simplicity and ease of interpretation. Sparse methods exploit the fact that the analysed datasets are highly structured. Indeed, a characteristic of transcriptomic data is particularly their structuredness, which appears due to the connection of genes and pathways, for example. Nonetheless, the application of Dictionary learning in medical data analysis is mainly restricted to image analysis. Another advantage of Dictionary learning is that it is an interpretable approach. Interpretability is a necessity in biomolecular data analysis to gain a holistic understanding of the investigated processes. Our two new transcriptomic data analysis methods are each designed for one main task: (1) identification of subgroups for samples from mixed populations, and (2) temporal ordering of samples from dynamic datasets, also referred to as "pseudotime estimation". Both methods are evaluated on simulated and real-world data and compared to other methods that are widely applied in transcriptomic data analysis. Our methods convince through high performance and overall outperform the comparison methods

    Antibiotic Molecular Design Using Artificial Bee Colony Algorithm

    Get PDF
    Research is acutely needed to develop novel therapies to treat resistant infections. This project aims to design a drug molecule via a computer aided molecular design approach to provide lead candidates for the treatment of bacterial infections caused by Staphylococcus aureus. In a recently published WHO report, a list of bacteria which pose the greatest threat to human health was given. The purpose of this report was to identify the most important resistant bacteria at global level for which immediate treatment is required. Staphylococcus aureus, which is on this list, is a pathogen causing infections such as pneumonia and bone disorders. A methodology which determines the structures of candidate antibiotic molecules is described. The Artificial Bee Colony algorithm has been used for the first time for molecular design in this work. It is necessary to predict physical and/or biological properties of compounds in order to design them. The prediction of properties is performed using Quantitative Structure Property Relationships (QSPRs). QSPRs are equations, which are developed using reported data for properties of interest by the method of regression analysis. This work applies connectivity indices and 3D MoRSE descriptors to develop QSPRs. The properties used in this work are minimum inhibitory concentration and Log P values. 3D MoRSE descriptors have been used for the first time for molecular design in this work. The QSPRs are combined with structural feasibility and connectivity constraints to formulate an optimization problem, which is a mixed integer nonlinear program (MINLP). Because of the large number of potential chemical structures and the uncertainty in the structure-property correlations, stochastic algorithms are preferred to solve the resulting MINLP. One stochastic algorithm which has shown promise to solve these problems is the Artificial Bee Colony algorithm, which relies on principles of swarm intelligence to find near-optimal solutions efficiently. The Artificial Bee Colony algorithm described in this work is used to derive solutions which serve as lead compounds for a narrowed search for novel antibiotics. Results show that the ABC algorithm is very effective in finding near optimal solutions to the MINLP, which is a combinatorial optimization problem. Molecular structures were obtained by optimizing objective function for individual property values and simultaneously for both the properties
    corecore