76 research outputs found

    Regional distribution of boreal and nemoral biome tree plants in Latvia

    Get PDF
    Elektroniskā versija nesatur pielikumusANOTĀCIJA Krampis I. (2010) Boreālā un nemorālā bioma kokaugu sugu reģionālā izplatība Latvijā. Promocijas darbs. Latvijas Universitāte, Rīga 128 lpp. Pētījuma mērķis bija noteikt vietējo un svešzemju kokaugu sugu izplatības telpiskās likumsakarības Latvijā, kas ļautu prognozēt kokaugu sugu izmaiņas reģionālās īpatnības mainīgā vidē. Darba ietvaros izveidota vienota ĢIS datu bāze, kurā apkopota dažādu resoru zinātniski pētnieciskās un lietišķās datu bāzes (augu sabiedrību datu bāze, konkrēto floru, dendroloģisko stādījumu un parku inventarizācijas, Valsts Meža reģistra Meža digitālās kartes datu bāze, u.c.). Izstrādāta sugu izplatības kartēšanas sistēma, kas balstīta uz mūsdienu tehnoloģijām. Darba izstrādes laikā autors piedalījies pēdējā laika nozīmīgāko bioģeogrāfiskās kartēšanas projektu realizācijā, kā rezultātā izveidoti divi sugu izplatības atlanti - Engures dabas parka floras atlants un Latvijas kokaugu atlants. Pirmo reizi Latvijā kokaugu sugu izplatības datu bāze sagatavota publicēšanai elektroniskā veidā – izstrādāta koncepcija Latvijas kokaugu atlanta publicēšanai internetā bāzētā kartē, kuras pamatā ir iepriekš minētā kokaugu ĢIS datu bāze. Promocijas darbā, izmantojot izveidotās datu bāzes un kartogrāfiskos materiālus, analizētas kokaugu izplatības reģionālās īpatnības, izvēloties 20 nemorālā bioma platlapu sugas, kā arī 14 boreālā bioma skujkoku sugas. Noteiktas likumsakarības kokaugu izplatībā, vērtējot klimata un cilvēka (antropogēno) faktoru ietekmi. Šim nolūkam veikta sugu sastopamības izmaiņu gradientanalīze sektoriālā (rietumu austrumu virzienā) un zonālā (dienvidu ziemeļu virzienā). Latvijas kokaudzi veidojošajām sugām sastopamības izmaiņas vērtētas arī pa reljefa augstuma izmaiņām. Vietējo, kā arī naturalizējušos svešzemju kokaugu sugu izvietojums atspoguļo nemorālā un boreāla bioma ietekmes reģionus Latvijā, kur nemorālā bioma audžu rakstursugu lielāks īpatsvars ir valsts rietumu daļā, turpretim boreālā skujkoku bioma – valsts ziemeļaustrumu daļā. Platlapu sugu izplatība raksturojama ar gradientu no Baltijas jūras piekrastes uz Latvijas austrumu robežu (rietumu austrumu virzienā). Boreālā bioma kokaugu sugu izplatību Latvijā ietekmē vietas augstuma (hipsometriskā līmeņa) izmaiņas. Atslēgas vārdi: Kokaugu izplatība, Bioģeogrāfiskā kartēšana, augu atlants, ģeogrāfiskās informācijas sistēmas, LatvijaANNOTATION Krampis I. (2010) Regional distribution of boreal and nemoral biome tree plants in Latvia Dissertation. Univrsity of Latvia, Riga 128 pp. The aim of the research was to determinate a pattern of spatial distribution of native and foreign tree plant species in Latvia that would allow to set distribution changes of tree plants species in changing environmental conditions. During the study, unitary GIS database was created in which a number of scientific and applied databases were captured (data base of plant communities; inventories of concrete flora, investigation results of dendrological plantings and parks; digital database of forestry maps from State Forest register, etc.). A methodological system of species distribution mapping was developed, based on modern GIS technologies. Within research, the author has participated in recent major biogeographical mapping projects in Latvia. As a result, two atlases of species distribution were published – Atlas of the Flora of the Lake Engure Nature Park and Atlas of Latvian Woody Plants. For the first time in Latvia, the data base of distribution of woody plants was prepared to publish digitally – the methodology for publishing the Atlas of Latvian Woody plants on web based map was established. To provide that, source information from previously mentioned woody plants GIS database were used. Within the study were analyzed 20 nemoral biome broadleaf tree species and 14 boreal biome coniferous tree species to determine indicators of regional distribution for tree plants in Latvia. As source data were used voluminous GIS data base and map material of Atlas of Latvian Woody Plants. Regional distribution for tree plants was defined by analyzing climate and anthropogenic impacts on species. Gradient analysis in changes of species occurrence were completed spectrally (direction west-east) and zonally (direction south-north). Distribution of tree stand species were analyzed taking into account changes of elevation (altitude) as well. Native as well as naturalized foreign tree plants distribution marks out regions in Latvia with nemoral and boreal biome impacts. Nemoral biome stand indicator species are more common in the western part of the country, but boreal coniferous tree biome – in the north-eastern part of the country. Broadleaf species may be characterized with gradient from the Baltic seashore to eastern border of Latvia (west-east direction). In Latvia, boreal biome woody plant distribution is influenced by altitude changes (hypsometrical level). Key words: Distribution of tree plants, Biogeographical mapping, Atlas of species, Geographic information systems, Latvi

    Advantages of distributed and parallel algorithms that leverage Cloud Computing platforms for large-scale genome assembly

    Get PDF
    Background: The transition to Next Generation sequencing (NGS) sequencing technologies has had numerous applications in Plant, Microbial and Human genomics during the past decade. However, NGS sequencing trades high read throughput for shorter read length, increasing the difficulty for genome assembly. This research presents a comparison of traditional versus Cloud computing-based genome assembly software, using as examples the Velvet and Contrail assemblers and reads from the genome sequence of the zebrafish (Danio rerio) model organism. Results: The first phase of the analysis involved a subset of the zebrafish data set (2X coverage) and best results were obtained using K-mer size of 65, while it was observed that Velvet takes less time than Contrail to complete the assembly. In the next phase, genome assembly was attempted using the full dataset of read coverage 192x and while Velvet failed to complete on a 256GB memory compute server, Contrail completed but required 240hours of computation. Conclusion: This research concludes that for deciding on which assembler software to use, the size of the dataset and available computing hardware should be taken into consideration. For a relatively small sequencing dataset, such as microbial or small eukaryotic genome, the Velvet assembler is a good option. However, for larger datasets Velvet requires large-memory compute servers in the order of 1000GB or more. On the other hand, Contrail is implemented using Hadoop, which performs the assembly in parallel across nodes of a compute cluster. Furthermore, Hadoop clusters can be rented on-demand from Cloud computing providers, and therefore Contrail can provide a simple and cost effective way for genome assembly of data generated at laboratories that lack the infrastructure or funds to build their own clusters

    Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

    Full text link
    Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets

    In Vitro Mutational and Bioinformatics Analysis of the M71 Odorant Receptor and Its Superfamily

    Full text link
    We performed an extensive mutational analysis of the canonical mouse odorant receptor (OR) M71 to determine the properties of ORs that inhibit plasma membrane trafficking in heterologous expression systems. We employed the use of the M71::GFP fusion protein to directly assess plasma membrane localization and functionality of M71 in heterologous cells in vitro or in olfactory sensory neurons (OSNs) in vivo. OSN expression of M71::GFP show only small differences in activity compared to untagged M71. However, M71::GFP could not traffic to the plasma membrane even in the presence of proposed accessory proteins RTP1S or mβ2AR. To ask if ORs contain an internal “kill sequence”, we mutated ~15 of the most highly conserved OR specific amino acids not found amongst the trafficking non-OR GPCR superfamily; none of these mutants rescued trafficking. Addition of various amino terminal signal sequences or different glycosylation motifs all failed to produce trafficking. The addition of the amino and carboxy terminal domains of mβ2AR or the mutation Y289A in the highly conserved GPCR motif NPxxY does not rescue plasma membrane trafficking. The failure of targeted mutagenesis on rescuing plasma membrane localization in heterologous cells suggests that OR trafficking deficits may not be attributable to conserved collinear motifs, but rather the overall amino acid composition of the OR family. Thus, we performed an in silico analysis comparing the OR and other amine receptor superfamilies. We find that ORs contain fewer charged residues and more hydrophobic residues distributed throughout the protein and a conserved overall amino acid composition. From our analysis, we surmise that it may be difficult to traffic ORs at high levels to the cell surface in vitro, without making significant amino acid modifications. Finally, we observed specific increases in methionine and histidine residues as well as a marked decrease in tryptophan residues, suggesting that these changes provide ORs with special characteristics needed for them to function in olfactory neurons

    RSEQREP: RNA-Seq Reports, an open-source cloud-enabled framework for reproducible RNA-Seq data processing, analysis, and result reporting

    Full text link
    RNA-Seq is increasingly being used to measure human RNA expression on a genome-wide scale. Expression profiles can be interrogated to identify and functionally characterize treatment-responsive genes. Ultimately, such controlled studies promise to reveal insights into molecular mechanisms of treatment effects, identify biomarkers, and realize personalized medicine. RNA-Seq Reports (RSEQREP) is a new open-source cloud-enabled framework that allows users to execute start-to-end gene-level RNA-Seq analysis on a preconfigured RSEQREP Amazon Virtual Machine Image (AMI) hosted by AWS or on their own Ubuntu Linux machine. The framework works with unstranded, stranded, and paired-end sequence FASTQ files stored locally, on Amazon Simple Storage Service (S3), or at the Sequence Read Archive (SRA). RSEQREP automatically executes a series of customizable steps including reference alignment, CRAM compression, reference alignment QC, data normalization, multivariate data visualization, identification of differentially expressed genes, heatmaps, co-expressed gene clusters, enriched pathways, and a series of custom visualizations. The framework outputs a file collection that includes a dynamically generated PDF report using R, knitr, and LaTeX, as well as publication-ready table and figure files. A user-friendly configuration file handles sample metadata entry, processing, analysis, and reporting options. The configuration supports time series RNA-Seq experimental designs with at least one pre- and one post-treatment sample for each subject, as well as multiple treatment groups and specimen types. All RSEQREP analyses components are built using open-source R code and R/Bioconductor packages allowing for further customization. As a use case, we provide RSEQREP results for a trivalent influenza vaccine (TIV) RNA-Seq study that collected 1 pre-TIV and 10 post-TIV vaccination samples (days 1-10) for 5 subjects and two specimen types (peripheral blood mononuclear cells and B-cells)

    Fibronectin and androgen receptor expression data in prostate cancer obtained from a RNA-sequencing bioinformatics analysis

    Full text link
    Prostate cancer is the second most commonly diagnosed male cancer in the world. The molecular mechanisms underlying its development and progression are still unclear. Here we show analysis of a prostate cancer RNA-sequencing dataset that was originally generated by Ren et al. [3] from the prostate tumor and adjacent normal tissues of 14 patients. The data presented here was analyzed using our RNA-sequencing bioinformatics analysis pipeline implemented on the bioinformatics web platform, Galaxy. The relative expression of fibronectin (FN1) and the androgen receptor (AR) were calculated in fragments per kilobase of transcript per million mapped reads, and represented in FPKM unit. A subanalysis is also shown for data from three patients, that includes the relative expression of FN1 and AR and their fold change. For interpretation and discussion, please refer to the article, “miR-1207-3p regulates the androgen receptor in prostate cancer via FNDC1/fibronectin” [1] by Das et al

    Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data

    Get PDF
    Background Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. Results To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (hive.biochemistry.gwu.edu/tools/csr/SRARecords_Curated.php). Conclusions Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides
    corecore