1,942 research outputs found

    Towards a genome-wide transcriptogram: the Saccharomyces cerevisiae case

    Get PDF
    A genome modular classification that associates cellular processes to modules could lead to a method to quantify the differences in gene expression levels in different cellular stages or conditions: the transcriptogram, a powerful tool for assessing cell performance, would be at hand. Here we present a computational method to order genes on a line that clusters strongly interacting genes, defining functional modules associated with gene ontology terms. The starting point is a list of genes and a matrix specifying their interactions, available at large gene interaction databases. Considering the Saccharomyces cerevisiae genome we produced a succession of plots of gene transcription levels for a fermentation process. These plots discriminate the fermentation stage the cell is going through and may be regarded as the first versions of a transcriptogram. This method is useful for extracting information from cell stimuli/responses experiments, and may be applied with diagnostic purposes to different organisms

    Clustering Algorithms for Microarray Data Mining

    Get PDF
    This thesis presents a systems engineering model of modern drug discovery processes and related systems integration requirements. Some challenging problems include the integration of public information content with proprietary corporate content, supporting different types of scientific analyses, and automated analysis tools motivated by diverse forms of biological data.To capture the requirements of the discovery system, we identify the processes, users, and scenarios to form a UML use case model. We then define the object-oriented system structure and attach behavioral elements. We also look at how object-relational database extensions can be applied for such analysis.The next portion of the thesis studies the performance of clustering algorithms based on LVQ, SVMs, and other machine learning algorithms, to two types of analyses - functional and phenotypic classification. We found that LVQ initialized with the LBG codebook yields comparable performance to the optimal separating surfaces generated by related SVM kernels. We also describe a novel similarity measure, called the unnormalized symmetric Kullback-Liebler measure, based on unnormalized expression values. Since the Mercer criterion cannot be applied to this measure, we compared the performance of this similarity measure with the log-Euclidean distance in the LVQ algorithm.The two distance measures perform similarly on cDNA arrays, while the unnormalized symmetric Kullback-Liebler measure outperforms the log-Euclidean distance on certain phenotypic classification problems. Pre-filtering algorithms to find discriminating instances based on PCA, the Find Similar function, and IB3 were also investigated. The Find Similar method gives the best performance in terms of multiple criteria

    Evidence for suppression of immunity as a driver for genomic introgressions and host range expansion in races of Albugo candida, a generalist parasite

    Get PDF
    How generalist parasites with wide host ranges can evolve is a central question in parasite evolution. Albugo candida is an obligate biotrophic parasite that consists of many physiological races that each specialize on distinct Brassicaceae host species. By analyzing genome sequence assemblies of five isolates, we show they represent three races that are genetically diverged by ∌1%. Despite this divergence, their genomes are mosaic-like, with ∌25% being introgressed from other races. Sequential infection experiments show that infection by adapted races enables subsequent infection of hosts by normally non-infecting races. This facilitates introgression and the exchange of effector repertoires, and may enable the evolution of novel races that can undergo clonal population expansion on new hosts. We discuss recent studies on hybridization in other eukaryotes such as yeast, Heliconius butterflies, Darwin’s finches, sunflowers and cichlid fishes, and the implications of introgression for pathogen evolution in an agro-ecological environment

    Characterization of PratA and Tic22 proteins for functions in membrane biogenesis in Synechocystis sp. PCC 6803

    Get PDF

    Identity Elements of Archaeal tRNA

    Full text link
    Features unique to a transfer-RNA are recognized by the corresponding tRNA-synthetase. Keeping this in view we isolate the discriminating features of all archaeal tRNA. These are our identity elements. Further, we investigate tRNA-characteristics that delineate the different orders of archaea

    Distribution and phylogeny of the bacterial translational GTPases and the Mqsr/YgiT regulatory system

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsioone.Valgud on raku ehituskivideks ja eluks vajalike reaktsioonide katalĂŒĂŒsijateks. Bioinformaatika on meid varustanud vĂ”imsate jĂ€rjestuste analĂŒĂŒsi vahenditega. JĂ€rjestuse sarnasuse alusel grupeeruvad valgud perekondadeks. Valguperekonna moodustavad homoloogsed jĂ€rjestused ehk siis jĂ€rjestused, mis pĂ€rinevad samast eellasjĂ€rjestusest. Tihti omavad samasse perekonda kuuluvad valgud ka sama vĂ”i ĂŒksteisele lĂ€hedast funktsiooni. Meie teadmised valkude funktsioonidest pĂ€rinevad ĂŒksikutelt mudelorganismidelt. Tihti huvitab teadlasi kui universaalne vĂ”i spetsiifiline on ĂŒks vĂ”i teine kirjeldatud funktsioon. Kuidas ja millal evolutsiooni kĂ€igus tekib olemasolevast materjalist uute omadustega (uue funktsiooniga) valk lĂ€bi geeniduplikatsiooni? Kui tihti on sellised sĂŒndmused evolutsioonilises ajaskaalas aset leidud? Oma töös olen ma analĂŒĂŒsinud bakterite translatsioonilisi GTPaase (trGTPaas) ja mqsR/ygiT toksiin-antitoksiin (TA) sĂŒsteemi valke. Ühiseks nimeÂŹÂŹtajaks mĂ”lemale on valgusĂŒnteesi aparaat – mĂ”lemad on seotud ribosoomiga ja sealtkaudu raku vĂ”imega sĂ”ltuvalt vajadusele toota valke. KĂŒsimused, mida selles kontekstis on kĂŒsitud, saab laias laastus jagada kaheks: a) valguperekonna esindatusega seotud ja b) valguperekonna evolutsiooni ja funktsionaalse innovatsiooniga seotud. Translatsiooniliste GTPaaside puhul bakterites saame rÀÀkida ĂŒheksast erinevast perekonnast – ĂŒheksast erinevast funktsioonide komplektist. TĂ€isgenoomidele pĂ”hinev analĂŒĂŒs nĂ€itas, et ĂŒheksast trGTPaaside perekonnast on bakterites konserveerunud neli: IF2, EF-Tu, EFG ja LepA(EF4). Vaatamata sellele, et RF3’e on omistatud klassikalise valgusĂŒnteesi mudeli valguses kanooniline roll translatsiooni lĂ”petamisel, puudus RF3 geen ligikaudu 40% analĂŒĂŒsitud bakteri genoomides. Samas aga ebaselge funktsiooniga LepA osutus bakterite spetsiifiliseks trGTPaasiks. Eelnev analĂŒĂŒs tĂ”i ka vĂ€lja EFG paraloogide laia esinemise – paljud bakteriÂŹgenoomid sisaldasid 2–3 ĂŒksteisest kĂŒllaltki erinevat (divergeerunud) EFG geeni. LĂ€hem analĂŒĂŒs tĂ”i vĂ€lja, et kogu varieeruvuse EFG perekonnas vĂ”ib jagada neljaks alamperekonnaks: EFG I, spdEFG1, spdEFG2 ja EFG II. Eksperimentaalselt on hĂ€sti iseloomustatud EFG I. Uuritud on ka spdEFG’sid ja leitud, et esimene neist omab translokaasi aktiivsust translatsioonil ja teine osaleb ribosoomide retsĂŒkleerimisel. Laialt levinud EFG II alamperekond on aga halvasti uuritud. FĂŒlogeneetiline analĂŒĂŒs vĂ”imaldab pĂŒstitada hĂŒpoteesi nelja EFG alamperekonna iidsest pĂ€ritolust, st. nad on tekkinud ajalises skaalas enne (vĂ”i samaaegselt) eukarĂŒootse rakuvormi lahknemist arhedest ja bakteritest. Funktsionaalse innovatsiooni kandjaks EFG II valgus vĂ”ib pidada eelkĂ”ige 12 positsiooni, mis on spetsiifiliselt konserveerunud just EFG II alamperekonnal. EFG II’e iseloomulikus kĂ”rge divergentsuse taustal tĂ”usevad need positsioonid esile GTPaasi domÀÀnis, domÀÀnis II ja neljandas domÀÀnis. Konserveerunud muutused GTPaasi domÀÀnis, millest osad on GTP’d siduvas G1 motiivis, vĂ”imaldavad teha jĂ€reldusi muutunud GTP sidumise ja hĂŒdrolĂŒĂŒsi tingimuste kohta. Suurenenud laeng neljanda domÀÀni lingu otsas, mis E. coli EFG’l siseneb A-saiti, vĂ”imaldab spekuleerida muutuse ĂŒle translokatsiooni keskkonnas. Konserveerunud muutused domÀÀn II piirkonnas viitavad muutunud interaktsioonile ribosoomi, domÀÀn I ja domÀÀn III vahel. EFG II alamperekonna fĂŒlogeneetiline ja jĂ€rjestuste analĂŒĂŒs nĂ€itab selgelt hĂ”imkonna/klassi spetsiifiliste alam-alamgruppide olemasolu. Need alam-alamgrupid erinevad teineteisest G2 motiivi konserveeruvuse ja insertsioonide/deletsioonide mustri alusel. See teine tase kirjeldab EFG II kui hĂ”imkonna/klassi spetsiifilist faktorit. Mis on EFG II roll tegelikult ja kuidas ning millistes tingimustes ta komplementeerib EFG I, ootab alles vastuseid. Antud töö on loonud raamistiku tulevaste eksperimentide tarvis.Proteins are vital for the cell – they serve as building blocks and catalysts for many different reactions. Bioinformatics has equipped us with powerful analysis tools. According to sequence similarity, proteins can be grouped into families. Protein family is composed of homologous sequences, i. e. from sequences, which share a common ancestor. Proteins, which belong to the same family, perform their function in a similar way. Our knowledge about functional properties of proteins originates from experimental works performed with a limited number of model organisms. Scientists are often interested in the universality or specificity of one or another described protein and function. How often is gene duplication and following innovation the source for genes/proteins with a new function? How often such events take place in the evolutionary timescale? In my dissertation I have analyzed gene and protein sequences of translational GTPases (trGTPases) and mqsR/ygiT toxin-antitoxin of bacteria. Common denominator for both protein families is their connection to cells protein synthesis machinery. Two types of questions can be asked in this context: those that are related to a) the representation of specific proteins/function, and b) the evolution and functional innovation. In the case of trGTPases nine different protein families, i. e. presence or absence of nine different functional complexes in the cell were described. Analyzes carried on completed genome sequences of bacteria revealed four conserved families: IF2, EF-Tu, EFG, and LepA(EF4). Despite the fact that in the classical model of protein synthesis RF3 carries canonic role at the final step of translation, RF3 coding gene was found missing approximately in 40% of analyzed bacteria. Surprisingly, LepA, whose function is still not well understood, appears to be specific trGTPase for bacteria. The analysis also revealed a wide distribution of EFG paralogs – many bacteria contained two to three relatively diverged gene copies for EFG. The phylogenetic tree of EFG revealed four subfamilies: EFG I, spdEFG1, spdEFG2, and EFG II. The EFG I subfamily is experimentally well characterized. Also, spdEFG1 was found to act as translocase and spdEFG2 helps recycle ribosome, indicating functional split between co-occurring paralogs. However, little research has been done on widely distributed EFG II subfamily. Phylogenetic analyses, performed by us, enable to propose hypothesis about ancient origin of EFG subfamilies - they have appeared at the same timescale with (or even before) arousing eukaryotic life-forms. Functional innovation, common for the whole subfamily, is carried by 12 EFG II specific positions. In contrast to overall high divergeny, these conserved positions have spotlighted in the GTPase domain, and in the domain II and IV. Conserved changes in the GTPase domain, some of which are located in the G1 motif, indicate changed conditions in GTP binding and hydrolysis. Increased charge in protruding loop of the fourth domain, which inserts into A-site, enables us to speculate about changes in the local conditions of the A-site during translocation. Conserved changes in the domain II indicate changed interaction between EFG domains I, II, and III and the ribosome. Phylogenetic analysis of the EFG II subfamily reveals phyla/class specific sub-subgroups. These sub-subgroups differ from each other by conserved amino acids pattern of the G2 motif and insertion/deletion pattern detected from multiple sequence alignment. This another level characterizes EFG II as phyla/class specific factor. Further research should be conducted on what role EFG II actually performs and how it complements EFG I. Current study can serve as framework for future experiments

    Genomic data mining for the computational prediction of small non-coding RNA genes

    Get PDF
    The objective of this research is to develop a novel computational prediction algorithm for non-coding RNA (ncRNA) genes using features computable for any genomic sequence without the need for comparative analysis. Existing comparative-based methods require the knowledge of closely related organisms in order to search for sequence and structural similarities. This approach imposes constraints on the type of ncRNAs, the organism, and the regions where the ncRNAs can be found. We have developed a novel approach for ncRNA gene prediction without the limitations of current comparative-based methods. Our work has established a ncRNA database required for subsequent feature and genomic analysis. Furthermore, we have identified significant features from folding-, structural-, and ensemble-based statistics for use in ncRNA prediction. We have also examined higher-order gene structures, namely operons, to discover potential insights into how ncRNAs are transcribed. Being able to automatically identify ncRNAs on a genome-wide scale is immensely powerful for incorporating it into a pipeline for large-scale genome annotation. This work will contribute to a more comprehensive annotation of ncRNA genes in microbial genomes to meet the demands of functional and regulatory genomic studies.Ph.D.Committee Chair: Dr. G. Tong Zhou; Committee Member: Dr. Arthur Koblasz; Committee Member: Dr. Eberhard Voit; Committee Member: Dr. Xiaoli Ma; Committee Member: Dr. Ying X

    High-resolution QTL mapping in Tetranychus urticae reveals acaricide-specific responses and common target-site resistance after selection by different METI-I acaricides

    Get PDF
    Arthropod herbivores cause dramatic crop losses, and frequent pesticide use has led to widespread resistance in numerous species. One such species, the two-spotted spider mite, Tetranychus urticae, is an extreme generalist herbivore and a major worldwide crop pest with a history of rapidly developing resistance to acaricides. Mitochondrial Electron Transport Inhibitors of complex I (METI-Is) have been used extensively in the last 25 years to control T. urticae around the globe, and widespread resistance to each has been documented. METI-I resistance mechanisms in T. urticae are likely complex, as increased metabolism by cytochrome P450 monooxygenases as well as a target-site mutation have been linked with resistance. To identify loci underlying resistance to the METI-I acaricides fenpyroximate, pyridaben and tebufenpyrad without prior hypotheses, we crossed a highly METI-I-resistant strain of T. urticae to a susceptible one, propagated many replicated populations over multiple generations with and without selection by each compound, and performed bulked segregant analysis genetic mapping. Our results showed that while the known H92R target-site mutation was associated with resistance to each compound, a genomic region that included cytochrome P450-reductase (CPR) was associated with resistance to pyridaben and tebufenpyrad. Within CPR, a single nonsynonymous variant distinguished the resistant strain from the sensitive one. Furthermore, a genomic region linked with tebufenpyrad resistance harbored a non-canonical member of the nuclear hormone receptor 96 (NHR96) gene family. This NHR96 gene does not encode a DNA-binding domain (DBD), an uncommon feature in arthropods, and belongs to an expanded family of 47 NHR96 proteins lacking DBDs in T. urticae. Our findings suggest that although cross-resistance to METI-Is involves known detoxification pathways, structural differences in METI-I acaricides have also resulted in resistance mechanisms that are compound-specific
    • 

    corecore