15 research outputs found

    Experiences with workflows for automating data-intensive bioinformatics

    Get PDF
    High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.Pubblicat

    Monte Carlo Simulations of the Equilibrium Properties of Semi-stiff Polymer Chains : Efficient Sampling from Compact to Extended Structures

    No full text
    Polymers is a class of molecules which can have many different structures due to a large number of degrees of freedom. Many biopolymers, e.g. DNA, but also synthetic macromolecules have special structural features due to their backbone stiffness. Since such structural properties are important for e.g. the biological function, a lot of effort has been put into the investigation of the configurational properties of semi-stiff molecules. A theoretical treatment of these systems is often accompanied by computer simulations. The main idea is to compare theoretically derived models with experimental results for real polymers. Using Monte Carlo simulations, I have investigated how this computational technique can build a bridge between theoretical models and experimentally observed phenomena. The effort was mainly directed to develop sampling techniques, for efficiently exploring the configurational space of semi-stiff chains in a wide range of structures. The work was concentrated on compact conformations, since they, as is well known from previous studies, are difficult to sample using conventional methods. In my studies I have shown that the simple and, at a first glance, time consuming method of bead-by-bead regrow as a way of changing the configuration of a semi-stiff chain gave very promising and encouraging results when combined with modern simulation techniques, like Entropic Sampling with the Wang-Landau algorithm. The resulting simulation package was also suitable for parallelization which resulted in a further speed-up of the calculations. In addition to the more elaborate sampling methods, I also investigated external conditions to induce compaction of a semi-stiff polymer. In the case of a polyampholyte the condensing agent could be a multivalent salt, creating effective attraction between the loops of the chain, while for neutral polymers, an external field and the geometry of the confining volume can induce a compaction

    The fallacy of the closest antenna: Towards an adequate view of device location in the mobile network

    Full text link
    The partition of the Mobile Phone Network (MPN) service area into the cell towers' Voronoi polygons (VP) may serve as a coordinate system for representing the location of the mobile phone devices. This view is shared by numerous papers that exploit mobile phone data for studying human spatial mobility. We investigate the credibility of this view by comparing volunteers' locational data of two kinds: (1) Cell towers' that served volunteers' connections and (2) The GPS tracks of the users at the time of connection. In more than 60\% of connections, user's mobile device was found outside the VP of the cell tower that served for the connection. We demonstrate that the area of possible device's location is many times larger than the area of the cell tower's VP. To comprise 90\% of the possible locations of the devices that may be connected to the cell tower one has to consider the tower's VP together with the two rings of the VPs adjacent to the tower's VPs. An additional, third, ring of the adjacent VPs is necessary to comprise 95\% of possible locations of the devices that can be connected to a cell tower. The revealed location uncertainty is in the nature of the MPN structure and service and entail essential overlap between the cell towers' service areas. We discuss the far-reaching consequences of this uncertainty in regards to the estimating of locational privacy and urban mobility. Our results undermine today's dominant opinion that an adversary, who obtains the access to the database of the Call Detail Records maintained by the MPN operator, can identify a mobile device without knowing its number based on a very short sequence of time-stamped field observations of the user's connection.Comment: 17 pages, 10 figures, 1 tabl

    GPCR-ModSim : A comprehensive web based solution for modeling G-protein coupled receptors

    No full text
    GPCR-ModSim (http://open.gpcr-modsim.org) is a centralized and easy to use service dedicated to the structural modeling of G-protein Coupled Receptors (GPCRs). 3D molecular models can be generated from amino acid sequence by homology-modeling techniques, considering different receptor conformations. GPCR-ModSim includes a membrane insertion and molecular dynamics (MD) equilibration protocol, which can be used to refine the generated model or any GPCR structure uploaded to the server, including if desired non-protein elements such as orthosteric or allosteric ligands, structural waters or ions. We herein revise the main characteristics of GPCR-ModSim and present new functionalities. The templates used for homology modeling have been updated considering the latest structural data, with separate profile structural alignments built for inactive, partially-active and active groups of templates. We have also added the possibility to perform multiple-template homology modeling in a unique and flexible way. Finally, our new MD protocol considers a series of distance restraints derived from a recently identified conserved network of helical contacts, allowing for a smoother refinement of the generated models which is particularly advised when there is low homology to the available templates. GPCR- ModSim has been tested on the GPCR Dock 2013 competition with satisfactory results.De tvÄ första författarna delar förstaförfattarskapet.</p

    Arabidopsis replacement histone variant H3.3 occupies promoters of regulated genes

    No full text
    Background Histone variants establish structural and functional diversity of chromatin by affecting nucleosome stability and histone-protein interactions. H3.3 is an H3 histone variant that is incorporated into chromatin outside of S-phase in various eukaryotes. In animals, H3.3 is associated with active transcription and possibly maintenance of transcriptional memory. Plant H3 variants, which evolved independently of their animal counterparts, are much less well understood. Results We profile the H3.3 distribution in Arabidopsis at mono-nucleosomal resolution using native chromatin immunoprecipitation. This results in the precise mapping of H3.3-containing nucleosomes, which are not only enriched in gene bodies as previously reported, but also at a subset of promoter regions and downstream of the 3â€Č ends of active genes. While H3.3 presence within transcribed regions is strongly associated with transcriptional activity, H3.3 at promoters is often independent of transcription. In particular, promoters with GA motifs carry H3.3 regardless of the gene expression levels. H3.3 on promoters of inactive genes is associated with H3K27me3 at gene bodies. In addition, H3.3-enriched plant promoters often contain RNA Pol II considerably upstream of the transcriptional start site. H3.3 and RNA Pol II are found on active as well as on inactive promoters and are enriched at strongly regulated genes. Conclusions In animals and plants, H3.3 organizes chromatin in transcribed regions and in promoters. The results suggest a function of H3.3 in transcriptional regulation and support a model that a single ancestral H3 evolved into H3 variants with similar sub-functionalization patterns in plants and animals

    BRR2a Affects Flowering Time via <i>FLC</i> Splicing

    No full text
    <div><p>Several pathways control time to flowering in <i>Arabidopsis thaliana</i> through transcriptional and posttranscriptional gene regulation. In recent years, mRNA processing has gained interest as a critical regulator of flowering time control in plants. However, the molecular mechanisms linking RNA splicing to flowering time are not well understood. In a screen for Arabidopsis early flowering mutants we identified an allele of <i>BRR2a</i>. BRR2 proteins are components of the spliceosome and highly conserved in eukaryotes. Arabidopsis BRR2a is ubiquitously expressed in all analyzed tissues and involved in the processing of flowering time gene transcripts, most notably <i>FLC</i>. A missense mutation of threonine 895 in BRR2a caused defects in <i>FLC</i> splicing and greatly reduced <i>FLC</i> transcript levels. Reduced <i>FLC</i> expression increased transcription of <i>FT</i> and <i>SOC1</i> leading to early flowering in both short and long days. Genome-wide experiments established that only a small set of introns was not correctly spliced in the <i>brr2a</i> mutant. Compared to control introns, retained introns were often shorter and GC-poor, had low H3K4me1 and CG methylation levels, and were often derived from genes with a high-H3K27me3-low-H3K36me3 signature. We propose that BRR2a is specifically needed for efficient splicing of a subset of introns characterized by a combination of factors including intron size, sequence and chromatin, and that <i>FLC</i> is most sensitive to splicing defects.</p></div

    <i>CÄÖ</i> encodes the ATP-dependent RNA helicase protein BRR2a.

    No full text
    <p>(A) SNP annotations in the identified region with reduced recombination on left arm of chromosome 1. (B) Schematic representation of the protein domain structure of BRR2a. A detailed description of the protein domains can be found in the main text. (C) Threonine 895 is conserved among eukaryotic BRR2 proteins. Sequence alignment of the end of helicase domain 1 in BRR2A proteins from yeast, animals and plants. The asterisk highlights threonine 895, which is altered to an isoleucine in <i>brr2a</i>-2. Conserved amino acid residues are highlighted in black. Residues not identical but similar are highlighted in gray.</p

    <i>FLC</i> splicing efficiency is reduced in <i>brr2a</i>-2.

    No full text
    <p>Intron retention was calculated as the ratio of unspliced to total (spliced + unspliced) transcripts for three representative <i>FLC</i> introns (A) and for intron 1 in <i>MAF1</i> and <i>SEP3</i>, and intron 2 in <i>AG</i> (B) in both Col and <i>brr2a</i>-2. For <i>FLC</i> and <i>MAF1</i>, RNA was extracted from 15 day-old seedlings grown under SD conditions at ZT = 7. For <i>SEP3</i> and <i>AG</i>, RNA was extracted from inflorescences of LD-grown plants. Results were normalized to <i>PP2a</i>; shown are mean ± SE (n = 3). Note the different scales used for each gene showing that that majority of <i>MAF1</i>, <i>SEP3</i> and <i>AG</i> transcripts are correctly spliced even in the <i>brr2a</i>-2 mutant.</p
    corecore