105 research outputs found
A Bayesian Method to Incorporate Background Knowledge during Automatic Text Summarization
In order to summarize a document, it is often useful to have a background set of documents from the domain to serve as a reference for determining new and important information in the input document. We present a model based on Bayesian surprise which provides an intuitive way to identify surprising information from a summarization input with respect to a background corpus. Specifically, the method quantifies the degree to which pieces of information in the input change one’s beliefs’ about the world represented in the background. We develop systems for generic and update summarization based on this idea. Our method provides competitive content selection performance with particular advantages in the update task where systems are given a small and topical background corpus
Verbose, Laconic or Just Right: A Simple Computational Model of Content Appropriateness under Length Constraints
Length constraints impose implicit requirements on the type of content that can be included in a text. Here we pro- pose the first model to computationally assess if a text deviates from these requirements. Specifically, our model predicts the appropriate length for texts based on content types present in a snippet of constant length. We consider a range of features to approximate content type, including syntactic phrasing, constituent compression probability, presence of named entities, sentence specificity and intersentence continuity. Weights for these features are learned using a corpus of summaries written by experts and on high quality journalistic writing. During test time, the difference between actual and predicted length allows us to quantify text verbosity. We use data from manual evaluation of summarization systems to assess the verbosity scores produced by our model. We show that the automatic verbosity scores are significantly negatively correlated with manual content quality scores given to the summaries
Which Step Do I Take First? Troubleshooting with Bayesian Models
Online discussion forums and community question-answering websites provide one of the primary avenues for online users to share information. In this paper, we propose text mining techniques which aid users navigate troubleshooting-oriented data such as questions asked on forums and their suggested solutions. We introduce Bayesian generative models of the troubleshooting data and apply them to two interrelated tasks (a) predicting the complexity of the solutions (e.g., plugging a keyboard in the computer is easier compared to installing a special driver) and (b) presenting them in a ranked order from least to most complex. Experimental results show that our models are on par with human performance on these tasks, while outperforming baselines based on solution length or readability
Structured and Unstructured Cache Models for SMT Domain Adaptation
We present a French to English translation system for Wikipedia biography articles. We use training data from out- of-domain corpora and adapt the system for biographies. We propose two forms of domain adaptation. The first biases the system towards words likely in biographies and encourages repetition of words across the document. Since biographies in Wikipedia follow a regular structure, our second model exploits this structure as a sequence of topic segments, where each segment discusses a narrower subtopic of the biography domain. In this structured model, the system is encouraged to use words likely in the current segment’s topic rather than in biographies as a whole. We implement both systems using cache based translation techniques. We show that a system trained on Europarl and news can be adapted for biographies with 0.5 BLEU score improvement using our models. Further the structure-aware model out performs the system which treats the entire document as a single segment
Conversation Trees: A Grammar Model for Topic Structure in Forums
Online forum discussions proceed differently from face-to-face conversations and any single thread on an online forum contains posts on different subtopics. This work aims to characterize the content of a forum thread as a conversation tree of topics. We present models that jointly per- form two tasks: segment a thread into sub- parts, and assign a topic to each part. Our core idea is a definition of topic structure using probabilistic grammars. By leveraging the flexibility of two grammar formalisms, Context-Free Grammars and Linear Context-Free Rewriting Systems, our models create desirable structures for forum threads: our topic segmentation is hierarchical, links non-adjacent segments on the same topic, and jointly labels the topic during segmentation. We show that our models outperform a number of tree generation baselines
Chromatin Profiles of Chromosomally Integrated Human Herpesvirus-6A
Human herpesvirus-6A (HHV-6A) and 6B (HHV-6B) are two closely related betaherpesviruses that are associated with various diseases including seizures and encephalitis. The HHV-6A/B genomes have been shown to be present in an integrated state in the telomeres of latently infected cells. In addition, integration of HHV-6A/B in germ cells has resulted in individuals harboring this inherited chromosomally integrated HHV-6A/B (iciHHV-6) in every cell of their body. Until now, the viral transcriptome and the epigenetic modifications that contribute to the silencing of the integrated virus genome remain elusive. In the current study, we used a patient-derived iciHHV-6A cell line to assess the global viral gene expression profile by RNA-seq, and the chromatin profiles by MNase-seq and ChIP-seq analyses. In addition, we investigated an in vitro generated cell line (293-HHV-6A) that expresses GFP upon the addition of agents commonly used to induce herpesvirus reactivation such as TPA. No viral gene expression including miRNAs was detected from the HHV-6A genomes, indicating that the integrated virus is transcriptionally silent. Intriguingly, upon stimulation of the 293-HHV-6A cell line with TPA, only foreign promoters in the virus genome were activated, while all HHV-6A promoters remained completely silenced. The transcriptional silencing of latent HHV-6A was further supported by MNase-seq results, which demonstrate that the latent viral genome resides in a highly condensed nucleosome-associated state. We further explored the enrichment profiles of histone modifications via ChIP-seq analysis. Our results indicated that the HHV-6 genome is modestly enriched with the repressive histone marks H3K9me3/H3K27me3 and does not possess the active histone modifications H3K27ac/H3K4me3. Overall, these results indicate that HHV-6 genomes reside in a condensed chromatin state, providing insight into the epigenetic mechanisms associated with the silencing of the integrated HHV-6A genome
Sensory Communication
Contains table of contents for Section 2, an introduction and reports on twelve research projects.National Institutes of Health Grant R01 DC00117National Institutes of Health Grant R01 DC02032National Institutes of Health/National Institute of Deafness and Other Communication Disorders Grant 2 R01 DC00126National Institutes of Health Grant 2 R01 DC00270National Institutes of Health Contract N01 DC-5-2107National Institutes of Health Grant 2 R01 DC00100U.S. Navy - Office of Naval Research Grant N61339-96-K-0002U.S. Navy - Office of Naval Research Grant N61339-96-K-0003U.S. Navy - Office of Naval Research Grant N00014-97-1-0635U.S. Navy - Office of Naval Research Grant N00014-97-1-0655U.S. Navy - Office of Naval Research Subcontract 40167U.S. Navy - Office of Naval Research Grant N00014-96-1-0379U.S. Air Force - Office of Scientific Research Grant F49620-96-1-0202National Institutes of Health Grant RO1 NS33778Massachusetts General Hospital, Center for Innovative Minimally Invasive Therapy Research Fellowship Gran
New genetic loci link adipose and insulin biology to body fat distribution.
Body fat distribution is a heritable trait and a well-established predictor of adverse metabolic outcomes, independent of overall adiposity. To increase our understanding of the genetic basis of body fat distribution and its molecular links to cardiometabolic traits, here we conduct genome-wide association meta-analyses of traits related to waist and hip circumferences in up to 224,459 individuals. We identify 49 loci (33 new) associated with waist-to-hip ratio adjusted for body mass index (BMI), and an additional 19 loci newly associated with related waist and hip circumference measures (P < 5 × 10(-8)). In total, 20 of the 49 waist-to-hip ratio adjusted for BMI loci show significant sexual dimorphism, 19 of which display a stronger effect in women. The identified loci were enriched for genes expressed in adipose tissue and for putative regulatory elements in adipocytes. Pathway analyses implicated adipogenesis, angiogenesis, transcriptional regulation and insulin resistance as processes affecting fat distribution, providing insight into potential pathophysiological mechanisms
Burden of disease scenarios for 204 countries and territories, 2022–2050: a forecasting analysis for the Global Burden of Disease Study 2021
Background: Future trends in disease burden and drivers of health are of great interest to policy makers and the public at large. This information can be used for policy and long-term health investment, planning, and prioritisation. We have expanded and improved upon previous forecasts produced as part of the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) and provide a reference forecast (the most likely future), and alternative scenarios assessing disease burden trajectories if selected sets of risk factors were eliminated from current levels by 2050. Methods: Using forecasts of major drivers of health such as the Socio-demographic Index (SDI; a composite measure of lag-distributed income per capita, mean years of education, and total fertility under 25 years of age) and the full set of risk factor exposures captured by GBD, we provide cause-specific forecasts of mortality, years of life lost (YLLs), years lived with disability (YLDs), and disability-adjusted life-years (DALYs) by age and sex from 2022 to 2050 for 204 countries and territories, 21 GBD regions, seven super-regions, and the world. All analyses were done at the cause-specific level so that only risk factors deemed causal by the GBD comparative risk assessment influenced future trajectories of mortality for each disease. Cause-specific mortality was modelled using mixed-effects models with SDI and time as the main covariates, and the combined impact of causal risk factors as an offset in the model. At the all-cause mortality level, we captured unexplained variation by modelling residuals with an autoregressive integrated moving average model with drift attenuation. These all-cause forecasts constrained the cause-specific forecasts at successively deeper levels of the GBD cause hierarchy using cascading mortality models, thus ensuring a robust estimate of cause-specific mortality. For non-fatal measures (eg, low back pain), incidence and prevalence were forecasted from mixed-effects models with SDI as the main covariate, and YLDs were computed from the resulting prevalence forecasts and average disability weights from GBD. Alternative future scenarios were constructed by replacing appropriate reference trajectories for risk factors with hypothetical trajectories of gradual elimination of risk factor exposure from current levels to 2050. The scenarios were constructed from various sets of risk factors: environmental risks (Safer Environment scenario), risks associated with communicable, maternal, neonatal, and nutritional diseases (CMNNs; Improved Childhood Nutrition and Vaccination scenario), risks associated with major non-communicable diseases (NCDs; Improved Behavioural and Metabolic Risks scenario), and the combined effects of these three scenarios. Using the Shared Socioeconomic Pathways climate scenarios SSP2-4.5 as reference and SSP1-1.9 as an optimistic alternative in the Safer Environment scenario, we accounted for climate change impact on health by using the most recent Intergovernmental Panel on Climate Change temperature forecasts and published trajectories of ambient air pollution for the same two scenarios. Life expectancy and healthy life expectancy were computed using standard methods. The forecasting framework includes computing the age-sex-specific future population for each location and separately for each scenario. 95% uncertainty intervals (UIs) for each individual future estimate were derived from the 2·5th and 97·5th percentiles of distributions generated from propagating 500 draws through the multistage computational pipeline. Findings: In the reference scenario forecast, global and super-regional life expectancy increased from 2022 to 2050, but improvement was at a slower pace than in the three decades preceding the COVID-19 pandemic (beginning in 2020). Gains in future life expectancy were forecasted to be greatest in super-regions with comparatively low life expectancies (such as sub-Saharan Africa) compared with super-regions with higher life expectancies (such as the high-income super-region), leading to a trend towards convergence in life expectancy across locations between now and 2050. At the super-region level, forecasted healthy life expectancy patterns were similar to those of life expectancies. Forecasts for the reference scenario found that health will improve in the coming decades, with all-cause age-standardised DALY rates decreasing in every GBD super-region. The total DALY burden measured in counts, however, will increase in every super-region, largely a function of population ageing and growth. We also forecasted that both DALY counts and age-standardised DALY rates will continue to shift from CMNNs to NCDs, with the most pronounced shifts occurring in sub-Saharan Africa (60·1% [95% UI 56·8–63·1] of DALYs were from CMNNs in 2022 compared with 35·8% [31·0–45·0] in 2050) and south Asia (31·7% [29·2–34·1] to 15·5% [13·7–17·5]). This shift is reflected in the leading global causes of DALYs, with the top four causes in 2050 being ischaemic heart disease, stroke, diabetes, and chronic obstructive pulmonary disease, compared with 2022, with ischaemic heart disease, neonatal disorders, stroke, and lower respiratory infections at the top. The global proportion of DALYs due to YLDs likewise increased from 33·8% (27·4–40·3) to 41·1% (33·9–48·1) from 2022 to 2050, demonstrating an important shift in overall disease burden towards morbidity and away from premature death. The largest shift of this kind was forecasted for sub-Saharan Africa, from 20·1% (15·6–25·3) of DALYs due to YLDs in 2022 to 35·6% (26·5–43·0) in 2050. In the assessment of alternative future scenarios, the combined effects of the scenarios (Safer Environment, Improved Childhood Nutrition and Vaccination, and Improved Behavioural and Metabolic Risks scenarios) demonstrated an important decrease in the global burden of DALYs in 2050 of 15·4% (13·5–17·5) compared with the reference scenario, with decreases across super-regions ranging from 10·4% (9·7–11·3) in the high-income super-region to 23·9% (20·7–27·3) in north Africa and the Middle East. The Safer Environment scenario had its largest decrease in sub-Saharan Africa (5·2% [3·5–6·8]), the Improved Behavioural and Metabolic Risks scenario in north Africa and the Middle East (23·2% [20·2–26·5]), and the Improved Nutrition and Vaccination scenario in sub-Saharan Africa (2·0% [–0·6 to 3·6]). Interpretation: Globally, life expectancy and age-standardised disease burden were forecasted to improve between 2022 and 2050, with the majority of the burden continuing to shift from CMNNs to NCDs. That said, continued progress on reducing the CMNN disease burden will be dependent on maintaining investment in and policy emphasis on CMNN disease prevention and treatment. Mostly due to growth and ageing of populations, the number of deaths and DALYs due to all causes combined will generally increase. By constructing alternative future scenarios wherein certain risk exposures are eliminated by 2050, we have shown that opportunities exist to substantially improve health outcomes in the future through concerted efforts to prevent exposure to well established risk factors and to expand access to key health interventions
- …