169 research outputs found

    Which Step Do I Take First? Troubleshooting with Bayesian Models

    Get PDF
    Online discussion forums and community question-answering websites provide one of the primary avenues for online users to share information. In this paper, we propose text mining techniques which aid users navigate troubleshooting-oriented data such as questions asked on forums and their suggested solutions. We introduce Bayesian generative models of the troubleshooting data and apply them to two interrelated tasks (a) predicting the complexity of the solutions (e.g., plugging a keyboard in the computer is easier compared to installing a special driver) and (b) presenting them in a ranked order from least to most complex. Experimental results show that our models are on par with human performance on these tasks, while outperforming baselines based on solution length or readability

    Structured and Unstructured Cache Models for SMT Domain Adaptation

    Get PDF
    We present a French to English translation system for Wikipedia biography articles. We use training data from out- of-domain corpora and adapt the system for biographies. We propose two forms of domain adaptation. The first biases the system towards words likely in biographies and encourages repetition of words across the document. Since biographies in Wikipedia follow a regular structure, our second model exploits this structure as a sequence of topic segments, where each segment discusses a narrower subtopic of the biography domain. In this structured model, the system is encouraged to use words likely in the current segment’s topic rather than in biographies as a whole. We implement both systems using cache based translation techniques. We show that a system trained on Europarl and news can be adapted for biographies with 0.5 BLEU score improvement using our models. Further the structure-aware model out performs the system which treats the entire document as a single segment

    Verbose, Laconic or Just Right: A Simple Computational Model of Content Appropriateness under Length Constraints

    Get PDF
    Length constraints impose implicit requirements on the type of content that can be included in a text. Here we pro- pose the first model to computationally assess if a text deviates from these requirements. Specifically, our model predicts the appropriate length for texts based on content types present in a snippet of constant length. We consider a range of features to approximate content type, including syntactic phrasing, constituent compression probability, presence of named entities, sentence specificity and intersentence continuity. Weights for these features are learned using a corpus of summaries written by experts and on high quality journalistic writing. During test time, the difference between actual and predicted length allows us to quantify text verbosity. We use data from manual evaluation of summarization systems to assess the verbosity scores produced by our model. We show that the automatic verbosity scores are significantly negatively correlated with manual content quality scores given to the summaries

    A Bayesian Method to Incorporate Background Knowledge during Automatic Text Summarization

    Get PDF
    In order to summarize a document, it is often useful to have a background set of documents from the domain to serve as a reference for determining new and important information in the input document. We present a model based on Bayesian surprise which provides an intuitive way to identify surprising information from a summarization input with respect to a background corpus. Specifically, the method quantifies the degree to which pieces of information in the input change one’s beliefs’ about the world represented in the background. We develop systems for generic and update summarization based on this idea. Our method provides competitive content selection performance with particular advantages in the update task where systems are given a small and topical background corpus

    Conversation Trees: A Grammar Model for Topic Structure in Forums

    Get PDF
    Online forum discussions proceed differently from face-to-face conversations and any single thread on an online forum contains posts on different subtopics. This work aims to characterize the content of a forum thread as a conversation tree of topics. We present models that jointly per- form two tasks: segment a thread into sub- parts, and assign a topic to each part. Our core idea is a definition of topic structure using probabilistic grammars. By leveraging the flexibility of two grammar formalisms, Context-Free Grammars and Linear Context-Free Rewriting Systems, our models create desirable structures for forum threads: our topic segmentation is hierarchical, links non-adjacent segments on the same topic, and jointly labels the topic during segmentation. We show that our models outperform a number of tree generation baselines

    Chromatin Profiles of Chromosomally Integrated Human Herpesvirus-6A

    Get PDF
    Human herpesvirus-6A (HHV-6A) and 6B (HHV-6B) are two closely related betaherpesviruses that are associated with various diseases including seizures and encephalitis. The HHV-6A/B genomes have been shown to be present in an integrated state in the telomeres of latently infected cells. In addition, integration of HHV-6A/B in germ cells has resulted in individuals harboring this inherited chromosomally integrated HHV-6A/B (iciHHV-6) in every cell of their body. Until now, the viral transcriptome and the epigenetic modifications that contribute to the silencing of the integrated virus genome remain elusive. In the current study, we used a patient-derived iciHHV-6A cell line to assess the global viral gene expression profile by RNA-seq, and the chromatin profiles by MNase-seq and ChIP-seq analyses. In addition, we investigated an in vitro generated cell line (293-HHV-6A) that expresses GFP upon the addition of agents commonly used to induce herpesvirus reactivation such as TPA. No viral gene expression including miRNAs was detected from the HHV-6A genomes, indicating that the integrated virus is transcriptionally silent. Intriguingly, upon stimulation of the 293-HHV-6A cell line with TPA, only foreign promoters in the virus genome were activated, while all HHV-6A promoters remained completely silenced. The transcriptional silencing of latent HHV-6A was further supported by MNase-seq results, which demonstrate that the latent viral genome resides in a highly condensed nucleosome-associated state. We further explored the enrichment profiles of histone modifications via ChIP-seq analysis. Our results indicated that the HHV-6 genome is modestly enriched with the repressive histone marks H3K9me3/H3K27me3 and does not possess the active histone modifications H3K27ac/H3K4me3. Overall, these results indicate that HHV-6 genomes reside in a condensed chromatin state, providing insight into the epigenetic mechanisms associated with the silencing of the integrated HHV-6A genome

    Sensory Communication

    Get PDF
    Contains table of contents for Section 2, an introduction and reports on twelve research projects.National Institutes of Health Grant R01 DC00117National Institutes of Health Grant R01 DC02032National Institutes of Health/National Institute of Deafness and Other Communication Disorders Grant 2 R01 DC00126National Institutes of Health Grant 2 R01 DC00270National Institutes of Health Contract N01 DC-5-2107National Institutes of Health Grant 2 R01 DC00100U.S. Navy - Office of Naval Research Grant N61339-96-K-0002U.S. Navy - Office of Naval Research Grant N61339-96-K-0003U.S. Navy - Office of Naval Research Grant N00014-97-1-0635U.S. Navy - Office of Naval Research Grant N00014-97-1-0655U.S. Navy - Office of Naval Research Subcontract 40167U.S. Navy - Office of Naval Research Grant N00014-96-1-0379U.S. Air Force - Office of Scientific Research Grant F49620-96-1-0202National Institutes of Health Grant RO1 NS33778Massachusetts General Hospital, Center for Innovative Minimally Invasive Therapy Research Fellowship Gran

    New genetic loci link adipose and insulin biology to body fat distribution.

    Get PDF
    Body fat distribution is a heritable trait and a well-established predictor of adverse metabolic outcomes, independent of overall adiposity. To increase our understanding of the genetic basis of body fat distribution and its molecular links to cardiometabolic traits, here we conduct genome-wide association meta-analyses of traits related to waist and hip circumferences in up to 224,459 individuals. We identify 49 loci (33 new) associated with waist-to-hip ratio adjusted for body mass index (BMI), and an additional 19 loci newly associated with related waist and hip circumference measures (P < 5 × 10(-8)). In total, 20 of the 49 waist-to-hip ratio adjusted for BMI loci show significant sexual dimorphism, 19 of which display a stronger effect in women. The identified loci were enriched for genes expressed in adipose tissue and for putative regulatory elements in adipocytes. Pathway analyses implicated adipogenesis, angiogenesis, transcriptional regulation and insulin resistance as processes affecting fat distribution, providing insight into potential pathophysiological mechanisms

    Genome-wide identification of basic helix-loop helix and NF-1 motifs underlying GR binding sites in male rat hippocampus

    Get PDF
    Glucocorticoids regulate hippocampal function in part by modulating gene expression through the glucocorticoid receptor (GR). GR binding is highly cell type specific, directed to accessible chromatin regions established during tissue differentiation. Distinct classes of GR binding sites are dependent on the activity of additional signal-activated transcription factors that prime chromatin toward context-specific organization. We hypothesized a stress context dependency for GR binding in hippocampus as a consequence of rapidly induced stress mediators priming chromatin accessibility. Using chromatin immunoprecipitation sequencing to interrogate GR binding, we found no effect of restraint stress context on GR binding, although analysis of sequences underlying GR binding sites revealed mechanistic detail for hippocampal GR function. We note enrichment of GR binding sites proximal to genes linked to structural and organizational roles, an absence of major tethering partners for GRs, and little or no evidence for binding at negative glucocorticoid response elements. A basic helix-loop-helix motif closely resembling a NeuroD1 or Olig2 binding site was found underlying a subset of GR binding sites and is proposed as a candidate lineage-determining transcription factor directing hippocampal chromatin access for GRs. Of our GR binding sites, 54% additionally contained half-sites for nuclear factor (NF)-1 that we propose as a collaborative or general transcription factor involved in hippocampal GR function. Our findings imply a dosedependent and context-independent action of GRs in the hippocampus. Alterations in the expression or activity of NF-1/basic helix-loop-helix factors may play an as yet undetermined role in glucocorticoid-related disease susceptibility and outcome by altering GR access to hippocampal binding sites. Copyright for this article is retained by the author(s).</p

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
    corecore