1,243 research outputs found

    INVESTIGATION OF SOME POSSIBLE ORIGINS OF PROTEIN FAMILIES

    Get PDF
    ABSTRACT Title of Document: INVESTIGATION OF SOME POSSIBLE ORIGINS OF PROTEIN FAMILIES Nuttinee Teerakulkittipong, Ph.D., 2013 Directed By: Professor John Moult, Institute for Bioscience and Biotechnology Research Department of Cell Biology and Molecular Genetics The prevailing view of the evolutionary history of proteins has been that all protein domains are descendents of distinct evolutionary lines, and that these lines are all relatively ancient families. The primary basis for that view was that known protein structures could be grouped by similarity of topology into a small number of folds. However, two lines of evidence challenge that view of protein evolution. First, analysis of sequence relationships within and between sets of complete genomes has established that a large proportion of protein sequence families are narrowly distributed in phylogenetic space and so appear to be relatively recent in origin. Second, analysis of the relationship between known protein structures shows that there are many more than a 1000 distinct folds, appearing to imply many more evolutionary lines. There are four hypotheses for the discrepancy between the traditional view and the observed structural and sequence distributions within protein families. Specifically, these are that apparently young protein families may arise from (1) previously non-coding DNA, or frame-shifted from existing coding sequence, (2) recombination of structural fragments between proteins or recombination with non-coding DNA, (3) older families where the rapid rate of sequence change makes relatives hard to detect, and (4) lateral gene transfer (LGT) from other organisms. In the investigation of these hypotheses, phylogenetic analysis provides a means of estimating the relative age of protein families and of detecting lateral gene transfer effects. Phylogeny based investigation of prokaryotic species divergence has generally been performed using a small number of families resulting in significant bias that affects age analysis. Therefore, we decided to use information from many protein families for constructing a species tree, utilizing a new procedure for combining these diverse sources. The resulting tree for 66 Prokaryotic species incorporates information from 1,379 protein families. The families were selected on the basis of consistent family evolutionary rates obtained using three different methods. Noise resistant methods were used to combat the effects of lateral gene transfer and some inevitable errors in protein sequence alignment and identification of orthologous families. Most topological features of the tree are robust as assessed by bootstrap testing, and previous distortions of inter-kingdom distances and poor determination of short branch lengths have been corrected. The tree is used to obtain estimates of the age of all protein families, key to the investigation of all four hypotheses. Proteins affected by LGT events were detected using a previously developed method, and removed before the age calculation. We used the estimated family ages obtained from the phylogenetic analysis to examine five properties of proteins as a function of the age of the corresponding families. The goal here is to ascertain whether the age dependence of these properties supports hypotheses (1) and (2) for the origin of apparently young families - that is, these are truly new open reading frames. The five properties are the mRNA expression level, relative evolutionary rate, predicted percentage of structural disorder, number of protein interaction partners and codon composition bias. The results are consistent with the new open reading frame model: Expression is found to increase substantially as a function of family age, suggesting that young proteins are not yet adapted sufficiently to tolerate high concentration conditions. The rate of change of amino acid change is faster for young proteins, consistent with overall positive selection for improved structural and functional properties. The fraction of predicted disorder is highest in the youngest proteins, consistent with immature structural properties. The number of known protein-protein interactions increases steadily with age, with low levels for young proteins, suggesting an ongoing process of increasing functional complexity. Analysis of these four factors is reported in Chapter 3. Results for the final factor, codon compositional bias, are reported in Chapter 4. Here we found that the codon composition of young proteins is markedly different from that of old proteins and similar to that of proteins constructed with random codon assignment. Thus the results are consistent with a model of many young proteins having newly formed open reading frames, and that during the subsequent evolution process, the codon composition is gradually optimized to fit the specific genomic conditions of the organism concerned. Overall, results for all five properties lend statistical support to the new open reading frame hypotheses. Further investigation is needed however. In particular, examination of the structural properties of young proteins, such as super-secondary structure composition and the distribution of use of rare and common structural fragments, should be useful

    Automated Genome-Wide Protein Domain Exploration

    Get PDF
    Exploiting the exponentially growing genomics and proteomics data requires high quality, automated analysis. Protein domain modeling is a key area of molecular biology as it unravels the mysteries of evolution, protein structures, and protein functions. A plethora of sequences exist in protein databases with incomplete domain knowledge. Hence this research explores automated bioinformatics tools for faster protein domain analysis. Automated tool chains described in this dissertation generate new protein domain models thus enabling more effective genome-wide protein domain analysis. To validate the new tool chains, the Shewanella oneidensis and Escherichia coli genomes were processed, resulting in a new peptide domain database, detection of poor domain models, and identification of likely new domains. The automated tool chains will require months or years to model a small genome when executing on a single workstation. Therefore the dissertation investigates approaches with grid computing and parallel processing to significantly accelerate these bioinformatics tool chains

    Operon Prediction with Bayesian Classifiers

    Get PDF
    In this work, we present an approach to predicting transcription units based on Bayesian classifiers. The predictor uses publicly available data to train the classifier, such as genome sequence data from Genbank, expression values from microarray experiments, and a collection of experimentally verified transcription units. We have studied the importance of each of the data source on the performance of the predictor by developing three classifier models and evaluating their outcomes. The predictor was trained and validated on the E. coli genome, but can be extended to other organisms. Using the full Bayesian classifier, we were able to correctly identify 80% of gene pairs belonging to operons

    Bacterial detection using an anharmonic acoustic aptasensor

    Get PDF
    Infectious diseases are currently, one of the greatest global challenges in medicine. Rapid and precise diagnosis and identification of pathogen is important for timely initiation of appropriate antimicrobial therapy. However, many patients with infectious diseases receive empirical treatment rather than appropriate pathogen-directed therapy. As a result antimicrobials have been overused and/or misused, which has ultimately led to antimicrobial resistance (AMR). AMR is broadly considered as the most significant public health threat facing the world today. Policy makers from all over the world have recognised the urgent need for rapid point-of-care (POC) diagnostics that would not only identify pathogens but also provide antimicrobial susceptibility profiles in meaningful timeframe to initiate appropriate antimicrobial therapy and thereby, prevent AMR. Traditional culture-dependent diagnostic methods are still considered as gold standard methods. But they are very slow and generally require 18 to 48 hours with further 8 to 48 hours to perform antibiotic susceptibility test. Among culture-independent methods, PCR and ELISA are label-based, costly, laborious and require specialised equipment and trained personnel to operate them. Lateral flow assays (LFAs) that are low-cost, simple, rapid and paper-based portable detection platforms are very popular, as they can be applied at the POC. [Continues.

    Trends in Infectious Diseases

    Get PDF
    This book gives a comprehensive overview of recent trends in infectious diseases, as well as general concepts of infections, immunopathology, diagnosis, treatment, epidemiology and etiology to current clinical recommendations in management of infectious diseases, highlighting the ongoing issues, recent advances, with future directions in diagnostic approaches and therapeutic strategies. The book focuses on various aspects and properties of infectious diseases whose deep understanding is very important for safeguarding human race from more loss of resources and economies due to pathogens

    Time to Diagnosis and Persistence: The Two Major Determinants of Effective Tuberculosis Control

    Get PDF
    The greatest challenge confronting effective tuberculosis (TB) eradication is the time to diagnosis, and duration of treatment of chronically infected individuals which represent a pool of infection. In an attempt to help limit the spread of TB in New Zealand, a fast SNP based diagnostic test was developed, to quickly identify the highly transmissible and virulent endemic Rangipo strain. The role of VapBC toxin-antitoxin systems in M. tuberculosis has been the subject of great interest recently, due to their expanded number in the genome and links with virulence and the regulation of cell growth in response to environmental stress. Their ability to regulate growth under adverse conditions for presumed survival advantages possibly leading to dormancy or persistence, make them ideal candidates for the development of new M. tuberculosis treatments. To establish differential expression of vapC, and therefore identify possible pathways and functions of the VapBC proteins, RT-qPCR was used to assess the expression levels of vapB and vapC in M. smegmatis under conditions of stress. No consistant changes in vapC mRNA levels were observed, resulting in the hypothesis that it is not the transcriptional differences which are important in the regulation of VapC, but post-transcriptional factors. In order to investigate the function(s) of M. tuberculosis VapBCs, these VapBC proteins were expressed and purified in M. smegmatis, and the VapC toxin tested for RNase activity. The purification, expression, RNase testing and bioinformatic analysis of M. tuberculosis VapCs suggested that VapCRv2530c, VapCRv0065 and VapCRv0617 may all target the same recognition sequence, UA*GG. Bioinformatic analysis revealed an abundance of this target sequence in horizontal gene transfer and TA genes, raising the possibility that VapC toxins could be functioning as selfish elements, or initiating transcriptional regulation cascades when a rapid change in the proteomic response and metabolic state of the cell is required. It is intriguing that the three M. tuberculosis VapC proteins tested thus far appear to target the same recognition sequence, possibly suggesting that all 47 VapCs are RNases and are targeting the same sequence. Alternatively; VapCs may belong to sub-groups targeting different sequences, allowing M. tuberculosis to exude both gross and fine metabolic control; or, they may share the same target, but are regulated by different activators triggered in response to different environmental stimuli

    Innovative slow sand filtration for use in Developing countries

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2002.Includes bibliographical references (leaves 65-70).by Heather A. Lukacs.M.Eng

    Formyl-methionine as a degradation signal at the N-termini of bacterial proteins

    Get PDF
    In bacteria, all nascent proteins bear the pretranslationally formed N-terminal formyl-methionine (fMet) residue. The fMet residue is cotranslationally deformylated by a ribosome-associated deformylase. The formylation of N-terminal Met in bacterial proteins is not strictly essential for either translation or cell viability. Moreover, protein synthesis by the cytosolic ribosomes of eukaryotes does not involve the formylation of N-terminal Met. What, then, is the main biological function of this metabolically costly, transient, and not strictly essential modification of N‑terminal Met, and why has Met formylation not been eliminated during bacterial evolution? One possibility is that the similarity of the formyl and acetyl groups, their identical locations in N‑terminally formylated (Nt‑formylated) and Nt-acetylated proteins, and the recently discovered proteolytic function of Nt-acetylation in eukaryotes might also signify a proteolytic role of Nt‑formylation in bacteria. We addressed this hypothesis about fMet‑based degradation signals, termed fMet/N-degrons, using specific E. coli mutants, pulse-chase degradation assays, and protein reporters whose deformylation was altered, through site-directed mutagenesis, to be either rapid or relatively slow. Our findings strongly suggest that the formylated N-terminal fMet can act as a degradation signal, largely a cotranslational one. One likely function of fMet/N-degrons is the control of protein quality. In bacteria, the rate of polypeptide chain elongation is nearly an order of magnitude higher than in eukaryotes. We suggest that the faster emergence of nascent proteins from bacterial ribosomes is one mechanistic and evolutionary reason for the pretranslational design of bacterial fMet/N‑degrons, in contrast to the cotranslational design of analogous Ac/N‑degrons in eukaryotes
    corecore