323 research outputs found

    Stochastic Blockmodeling for the Analysis of Big Data

    Get PDF
    The aim of this paper is to consider the stochastic blockmodel to obtain clusters of units as regards patterns of similar relations; moreover we want to analyze the relations between clusters. Blockmodeling is a technique usually applied in social network analysis focusing on the relations between \u201cactors\u201d i.e. units. In our time people and devices constantly generate data. The network is generating location and other data that keeps services running and ready to use in every moment. This rapid development in the availability and access to data has induced the need for better analysis techniques to understand the various phenomena. Blockmodeling techniques and Clustering algorithms, can be used for this aim. In this paper application regards the Web

    Quantifying short-term dynamics of Parkinson's disease using self-reported symptom data from an internet social network

    Get PDF
    Background: Parkinson’s disease (PD) is an incurable neurological disease with approximately 0.3% prevalence. The hallmark symptom is gradual movement deterioration. Current scientific consensus about disease progression holds that symptoms will worsen smoothly over time unless treated. Accurate information about symptom dynamics is of critical importance to patients, caregivers, and the scientific community for the design of new treatments, clinical decision making, and individual disease management. Long-term studies characterize the typical time course of the disease as an early linear progression gradually reaching a plateau in later stages. However, symptom dynamics over durations of days to weeks remains unquantified. Currently, there is a scarcity of objective clinical information about symptom dynamics at intervals shorter than 3 months stretching over several years, but Internet-based patient self-report platforms may change this. Objective: To assess the clinical value of online self-reported PD symptom data recorded by users of the health-focused Internet social research platform PatientsLikeMe (PLM), in which patients quantify their symptoms on a regular basis on a subset of the Unified Parkinson’s Disease Ratings Scale (UPDRS). By analyzing this data, we aim for a scientific window on the nature of symptom dynamics for assessment intervals shorter than 3 months over durations of several years. Methods: Online self-reported data was validated against the gold standard Parkinson’s Disease Data and Organizing Center (PD-DOC) database, containing clinical symptom data at intervals greater than 3 months. The data were compared visually using quantile-quantile plots, and numerically using the Kolmogorov-Smirnov test. By using a simple piecewise linear trend estimation algorithm, the PLM data was smoothed to separate random fluctuations from continuous symptom dynamics. Subtracting the trends from the original data revealed random fluctuations in symptom severity. The average magnitude of fluctuations versus time since diagnosis was modeled by using a gamma generalized linear model. Results: Distributions of ages at diagnosis and UPDRS in the PLM and PD-DOC databases were broadly consistent. The PLM patients were systematically younger than the PD-DOC patients and showed increased symptom severity in the PD off state. The average fluctuation in symptoms (UPDRS Parts I and II) was 2.6 points at the time of diagnosis, rising to 5.9 points 16 years after diagnosis. This fluctuation exceeds the estimated minimal and moderate clinically important differences, respectively. Not all patients conformed to the current clinical picture of gradual, smooth changes: many patients had regimes where symptom severity varied in an unpredictable manner, or underwent large rapid changes in an otherwise more stable progression. Conclusions: This information about short-term PD symptom dynamics contributes new scientific understanding about the disease progression, currently very costly to obtain without self-administered Internet-based reporting. This understanding should have implications for the optimization of clinical trials into new treatments and for the choice of treatment decision timescales

    Male-Specific Transfer and Fine Scale Spatial Differences of Newly Identified Cuticular Hydrocarbons and Triacylglycerides in a Drosophila Species Pair

    Get PDF
    We analyzed epicuticular hydrocarbon variation in geographically isolated populations of D. mojavensis cultured on different rearing substrates and a sibling species, D. arizonae, with ultraviolet laser desorption/ionization mass spectrometry (UV-LDI MS). Different body parts, i.e. legs, proboscis, and abdomens, of both species showed qualitatively similar hydrocarbon profiles consisting mainly of long-chain monoenes, dienes, trienes, and tetraenes. However, D. arizonae had higher amounts of most hydrocarbons than D. mojavensis and females of both species exhibited greater hydrocarbon amounts than males. Hydrocarbon profiles of D. mojavensis populations were significantly influenced by sex and rearing substrates, and differed between body parts. Lab food–reared flies had lower amounts of most hydrocarbons than flies reared on fermenting cactus substrates. We discovered 48 male- and species-specific hydrocarbons ranging in size from C22 to C50 in the male anogenital region of both species, most not described before. These included several oxygen-containing hydrocarbons in addition to high intensity signals corresponding to putative triacylglycerides, amounts of which were influenced by larval rearing substrates. Some of these compounds were transferred to female cuticles in high amounts during copulation. This is the first study showing that triacylglycerides may be a separate class of courtship-related signaling molecules in drosophilids. This study also extends the kind and number of epicuticular hydrocarbons in these species and emphasizes the role of larval ecology in influencing amounts of these compounds, many of which mediate courtship success within and between species

    Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression

    Get PDF
    BACKGROUND: Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions. RESULTS: We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI. CONCLUSION: Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions

    Effect of promoter architecture on the cell-to-cell variability in gene expression

    Get PDF
    According to recent experimental evidence, the architecture of a promoter, defined as the number, strength and regulatory role of the operators that control the promoter, plays a major role in determining the level of cell-to-cell variability in gene expression. These quantitative experiments call for a corresponding modeling effort that addresses the question of how changes in promoter architecture affect noise in gene expression in a systematic rather than case-by-case fashion. In this article, we make such a systematic investigation, based on a simple microscopic model of gene regulation that incorporates stochastic effects. In particular, we show how operator strength and operator multiplicity affect this variability. We examine different modes of transcription factor binding to complex promoters (cooperative, independent, simultaneous) and how each of these affects the level of variability in transcription product from cell-to-cell. We propose that direct comparison between in vivo single-cell experiments and theoretical predictions for the moments of the probability distribution of mRNA number per cell can discriminate between different kinetic models of gene regulation.Comment: 35 pages, 6 figures, Submitte

    Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++

    Get PDF
    Computational efforts to identify functional elements within genomes leverage comparative sequence information by looking for regions that exhibit evidence of selective constraint. One way of detecting constrained elements is to follow a bottom-up approach by computing constraint scores for individual positions of a multiple alignment and then defining constrained elements as segments of contiguous, highly scoring nucleotide positions. Here we present GERP++, a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. Using GERP++ we identify over 1.3 million constrained elements spanning over 7% of the human genome. We predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences. GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments

    Tissue eosinophilia: a morphologic marker for assessing stromal invasion in laryngeal squamous neoplasms

    Get PDF
    BACKGROUND: The assessment of tumor invasion of underlying benign stroma in neoplastic squamous proliferation of the larynx may pose a diagnostic challenge, particularly in small biopsy specimens that are frequently tangentially sectioned. We studied whether thresholds of an eosinophilic response to laryngeal squamous neoplasms provides an adjunctive histologic criterion for determining the presence of invasion. METHODS: Eighty-seven(n = 87) cases of invasive squamous cell carcinoma and preinvasive squamous neoplasia were evaluated. In each case, the number of eosinophils per high power field(eosinophils/hpf), and per 10 hpf in the tissue adjacent to the neoplastic epithelium, were counted and tabulated. For statistical purposes, the elevated eosinophils were defined and categorized as: focally and moderately elevated (5–9 eos/hpf), focally and markedly increased(>10/hpf), diffusely and moderately elevated(5–19 eos/10hpf), and diffusely and markedly increased (>20/10hpf). RESULTS: In the invasive carcinoma, eosinophil counts were elevated focally and /or diffusely, more frequently seen than in non-invasive neoplastic lesions. The increased eosinophil counts, specifically >10hpf, and >20/10hpf, were all statistically significantly associated with stromal invasion. Greater than 10 eosinophils/hpf and/or >20 eosinophils/10hpf had highest predictive power, with a sensitivity, specificity and positive predictive value of 82%, 93%, 96% and 80%, 100% and 100%, respectively. Virtually, greater than 20 eosinophils/10 hpf was diagnostic for tumor invasion in our series. CONCLUSION: Our study suggests for the first time that the elevated eosinophil count in squamous neoplasia of the larynx is a morphologic feature associated with tumor invasion. When the number of infiltrating eosinophils exceeds 10/hpf and or >20/10 hpf in a laryngeal biopsy with squamous neoplasia, it represents an indicator for the possibility of tumor invasion. Similarly, the presence of eosinophils meeting these thresholds in an excisional specimen should prompt a thorough evaluation for invasiveness, when evidence of invasion is absent, or when invasion is suspected by conventional criteria in the initial sections

    Cross species comparison of C/EBPα and PPARγ profiles in mouse and human adipocytes reveals interdependent retention of binding sites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The transcription factors peroxisome proliferator activated receptor γ (PPARγ) and CCAAT/enhancer binding protein α (C/EBPα) are key transcriptional regulators of adipocyte differentiation and function. We and others have previously shown that binding sites of these two transcription factors show a high degree of overlap and are associated with the majority of genes upregulated during differentiation of murine 3T3-L1 adipocytes.</p> <p>Results</p> <p>Here we have mapped all binding sites of C/EBPα and PPARγ in human SGBS adipocytes and compared these with the genome-wide profiles from mouse adipocytes to systematically investigate what biological features correlate with retention of sites in orthologous regions between mouse and human. Despite a limited interspecies retention of binding sites, several biological features make sites more likely to be retained. First, co-binding of PPARγ and C/EBPα in mouse is the most powerful predictor of retention of the corresponding binding sites in human. Second, vicinity to genes highly upregulated during adipogenesis significantly increases retention. Third, the presence of C/EBPα consensus sites correlate with retention of both factors, indicating that C/EBPα facilitates recruitment of PPARγ. Fourth, retention correlates with overall sequence conservation within the binding regions independent of C/EBPα and PPARγ sequence patterns, indicating that other transcription factors work cooperatively with these two key transcription factors.</p> <p>Conclusions</p> <p>This study provides a comprehensive and systematic analysis of what biological features impact on retention of binding sites between human and mouse. Specifically, we show that the binding of C/EBPα and PPARγ in adipocytes have evolved in a highly interdependent manner, indicating a significant cooperativity between these two transcription factors.</p
    corecore