86 research outputs found

    Mining Social Media Data to Study the Consequences of Dementia Diagnosis on Caregivers and Relatives

    Get PDF
    INTRODUCTION: Caregivers for people with dementia face a number of challenges such as changing family relationships, social isolation, or financial difficulties. Internet usage and social media are increasingly being recognised as resources to increase support and general public health. OBJECTIVE: Using automated analysis, the aim of this study was to explore (i) the age and sex of people who post to the social media forum Reddit about dementia diagnoses, (ii) the affected person and their diagnosis, (iii) which subreddits authors are posting to, (iv) the types of messages posted, and (v) the content of these posts. METHODS: We analysed Reddit posts concerning dementia diagnoses and used a previously developed text analysis pipeline to determine attributes of the posts and their authors. The posts were further examined through manual annotation of the diagnosis provided and the person affected. Lastly, we investigated the communities posters engage with and assessed the contents of the posts with an automated topic gathering/clustering technique. RESULTS: Five hundred and thirty-five Reddit posts were identified as relevant and further processed. The majority of posters in our dataset are females and predominantly close relatives, such as parents and grandparents, are mentioned. The communities frequented and topics gathered reflect not only the person's diagnosis but also potential outcomes, for example hardships experienced by the caregiver or the requirement for legal support. CONCLUSIONS: This work demonstrates the value of social media data as a resource for in-depth examination of caregivers' experience after a dementia diagnosis. It is important to study groups actively posting online, both in topic-specific and general communities, as they are most likely to benefit from novel internet-based support systems or interventions

    ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data

    Get PDF
    RNA-Seq and microarray platforms have emerged as important tools for detecting changes in gene expression and RNA processing in biological samples. We present ExpressionPlot, a software package consisting of a default back end, which prepares raw sequencing or Affymetrix microarray data, and a web-based front end, which offers a biologically centered interface to browse, visualize, and compare different data sets. Download and installation instructions, a user's manual, discussion group, and a prototype are available at http://expressionplot.com/ webcite.ALS Therapy Allianc

    SICTIN: Rapid footprinting of massively parallel sequencing data

    Get PDF
    BACKGROUND: Massively parallel sequencing allows for genome-wide hypothesis-free investigation of for instance transcription factor binding sites or histone modifications. Although nucleotide resolution detailed information can easily be generated, biological insight often requires a more general view of patterns (footprints) over distinct genomic features such as transcription start sites, exons or repetitive regions. The construction of these footprints is however a time consuming task. METHODS: The presented software generates a binary representation of the signals enabling fast and scalable lookup. This representation allows for footprint generation in mere minutes on a desktop computer. Several different input formats are accepted, e.g. the SAM format, bed-files and the UCSC wiggle track. CONCLUSIONS: Hypothesis-free investigation of genome wide interactions allows for biological data mining at a scale never before seen. Until recently, the main focus of analysis of sequencing data has been targeted on signal patterns around transcriptional start sites which are in manageable numbers. Today, focus is shifting to a wider perspective and numerous genomic features are being studied. To this end, we provide a system allowing for fast querying in the order of hundreds of thousands of features

    Integrating sequence and structural biology with DAS.

    Get PDF
    BACKGROUND: The Distributed Annotation System (DAS) is a network protocol for exchanging biological data. It is frequently used to share annotations of genomes and protein sequence. RESULTS: Here we present several extensions to the current DAS 1.5 protocol. These provide new commands to share alignments, three dimensional molecular structure data, add the possibility for registration and discovery of DAS servers, and provide a convention how to provide different types of data plots. We present examples of web sites and applications that use the new extensions. We operate a public registry of DAS sources, which now includes entries for more than 250 distinct sources. CONCLUSION: Our DAS extensions are essential for the management of the growing number of services and exchange of diverse biological data sets. In addition the extensions allow new types of applications to be developed and scientific questions to be addressed. The registry of DAS sources is available at http://www.dasregistry.org.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Gene Ontology project supports categorization of gene products according to their location of action, the molecular functions that they carry out, and the processes that they are involved in. Although the ontologies are intentionally developed to be taxon neutral, and to cover all species, there are inherent taxon specificities in some branches. For example, the process 'lactation' is specific to mammals and the location 'mitochondrion' is specific to eukaryotes. The lack of an explicit formalization of these constraints can lead to errors and inconsistencies in automated and manual annotation.</p> <p>Results</p> <p>We have formalized the taxonomic constraints implicit in some GO classes, and specified these at various levels in the ontology. We have also developed an inference system that can be used to check for violations of these constraints in annotations. Using the constraints in conjunction with the inference system, we have detected and removed errors in annotations and improved the structure of the ontology.</p> <p>Conclusions</p> <p>Detection of inconsistencies in taxon-specificity enables gradual improvement of the ontologies, the annotations, and the formalized constraints. This is progressively improving the quality of our data. The full system is available for download, and new constraints or proposed changes to constraints can be submitted online at <url>https://sourceforge.net/tracker/?atid=605890&group_id=36855</url>.</p

    Statistical Modeling of Transcription Factor Binding Affinities Predicts Regulatory Interactions

    Get PDF
    Recent experimental and theoretical efforts have highlighted the fact that binding of transcription factors to DNA can be more accurately described by continuous measures of their binding affinities, rather than a discrete description in terms of binding sites. While the binding affinities can be predicted from a physical model, it is often desirable to know the distribution of binding affinities for specific sequence backgrounds. In this paper, we present a statistical approach to derive the exact distribution for sequence models with fixed GC content. We demonstrate that the affinity distribution of almost all known transcription factors can be effectively parametrized by a class of generalized extreme value distributions. Moreover, this parameterization also describes the affinity distribution for sequence backgrounds with variable GC content, such as human promoter sequences. Our approach is applicable to arbitrary sequences and all transcription factors with known binding preferences that can be described in terms of a motif matrix. The statistical treatment also provides a proper framework to directly compare transcription factors with very different affinity distributions. This is illustrated by our analysis of human promoters with known binding sites, for many of which we could identify the known regulators as those with the highest affinity. The combination of physical model and statistical normalization provides a quantitative measure which ranks transcription factors for a given sequence, and which can be compared directly with large-scale binding data. Its successful application to human promoter sequences serves as an encouraging example of how the method can be applied to other sequences

    Functional Annotation and Identification of Candidate Disease Genes by Computational Analysis of Normal Tissue Gene Expression Data

    Get PDF
    Background: High-throughput gene expression data can predict gene function through the ‘‘guilt by association’ ’ principle: coexpressed genes are likely to be functionally associated. Methodology/Principal Findings: We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE) and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG), small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin. Conclusions/Significance: We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several geneti

    Genome Expression Pathway Analysis Tool – Analysis and visualization of microarray gene expression data under genomic, proteomic and metabolic context

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation.</p> <p>Results</p> <p>We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at <url>http://gepat.sourceforge.net</url>.</p> <p>Conclusion</p> <p>GEPAT is a modular, scalable and professional-grade software integrating analysis and interpretation of microarray gene expression data. An installation available for academic users can be found at <url>http://gepat.bioapps.biozentrum.uni-wuerzburg.de</url>.</p

    Multiple organism algorithm for finding ultraconserved elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ultraconserved elements are nucleotide or protein sequences with 100% identity (no mismatches, insertions, or deletions) in the same organism or between two or more organisms. Studies indicate that these conserved regions are associated with micro RNAs, mRNA processing, development and transcription regulation. The identification and characterization of these elements among genomes is necessary for the further understanding of their functionality.</p> <p>Results</p> <p>We describe an algorithm and provide freely available software which can find all of the ultraconserved sequences between genomes of multiple organisms. Our algorithm takes a combinatorial approach that finds all sequences without requiring the genomes to be aligned. The algorithm is significantly faster than BLAST and is designed to handle very large genomes efficiently. We ran our algorithm on several large comparative analyses to evaluate its effectiveness; one compared 17 vertebrate genomes where we find 123 ultraconserved elements longer than 40 bps shared by all of the organisms, and another compared the human body louse, <it>Pediculus humanus humanus</it>, against itself and select insects to find thousands of non-coding, potentially functional sequences.</p> <p>Conclusion</p> <p>Whole genome comparative analysis for multiple organisms is both feasible and desirable in our search for biological knowledge. We argue that bioinformatic programs should be forward thinking by assuming analysis on multiple (and possibly large) genomes in the design and implementation of algorithms. Our algorithm shows how a compromise design with a trade-off of disk space versus memory space allows for efficient computation while only requiring modest computer resources, and at the same time providing benefits not available with other software.</p

    Gitools: Analysis and Visualisation of Genomic Data Using Interactive Heat-Maps

    Get PDF
    Intuitive visualization of data and results is very important in genomics, especially when many conditions are to be analyzed and compared. Heat-maps have proven very useful for the representation of biological data. Here we present Gitools (http://www.gitools.org), an open-source tool to perform analyses and visualize data and results as interactive heat-maps. Gitools contains data import systems from several sources (i.e. IntOGen, Biomart, KEGG, Gene Ontology), which facilitate the integration of novel data with previous knowledge