5 research outputs found

    SEQADAPT: an adaptable system for the tracking, storage and analysis of high throughput sequencing experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires.</p> <p>Results</p> <p>Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code.</p> <p>Conclusion</p> <p>The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.</p

    mspecLINE: bridging knowledge of human disease with the proteome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Public proteomics databases such as PeptideAtlas contain peptides and proteins identified in mass spectrometry experiments. However, these databases lack information about human disease for researchers studying disease-related proteins. We have developed mspecLINE, a tool that combines knowledge about human disease in MEDLINE with empirical data about the detectable human proteome in PeptideAtlas. mspecLINE associates diseases with proteins by calculating the semantic distance between annotated terms from a controlled biomedical vocabulary. We used an established semantic distance measure that is based on the co-occurrence of disease and protein terms in the MEDLINE bibliographic database.</p> <p>Results</p> <p>The mspecLINE web application allows researchers to explore relationships between human diseases and parts of the proteome that are detectable using a mass spectrometer. Given a disease, the tool will display proteins and peptides from PeptideAtlas that may be associated with the disease. It will also display relevant literature from MEDLINE. Furthermore, mspecLINE allows researchers to select proteotypic peptides for specific protein targets in a mass spectrometry assay.</p> <p>Conclusions</p> <p>Although mspecLINE applies an information retrieval technique to the MEDLINE database, it is distinct from previous MEDLINE query tools in that it combines the knowledge expressed in scientific literature with empirical proteomics data. The tool provides valuable information about candidate protein targets to researchers studying human disease and is freely available on a public web server.</p

    Model-Based Clustering for Social Networks

    No full text
    Network models are widely used to represent relations among interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the Latent Position Cluster Model (LPCM), under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean “social space,” and the actors ’ locations in the latent social space arise from a mixture of distributions, each one corresponding to a cluster. We propose two estimation methods: a two-stage maximum likelihood method, and a Bayesian MCMC method; the former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters present using approximate conditional Bayes factors. It models transitivity, homophily by attributes and clustering simultaneously, and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, potentiall
    corecore