52 research outputs found

    News Comments: Exploring, Modeling, and Online Prediction

    Get PDF
    Abstract. Online news agents provide commenting facilities for their readers to express their opinions or sentiments with regards to news stories. The number of user supplied comments on a news article may be indicative of its importance, interestingness, or impact. We explore the news comments space, and compare the log-normal and the negative binomial distributions for modeling comments from various news agents. These estimated models can be used to normalize raw comment counts and enable comparison across different news sites. We also examine the feasibility of online prediction of the number of comments, based on the volume observed shortly after publication. We report on solid performance for predicting news comment volume in the long run, after short observation. This prediction can be useful for identifying news stories with the potential to “take off, ” and can be used to support front page optimization for news sites.

    Recipient Recommendation in Enterprises using Communication Graphs and Email Content

    Get PDF
    ABSTRACT We address the task of recipient recommendation for emailing in enterprises. We propose an intuitive and elegant way of modeling the task of recipient recommendation, which uses both the communication graph (i.e., who are most closely connected to the sender) and the content of the email. Additionally, the model can incorporate evidence as prior probabilities. Experiments on two enterprise email collections show that our model achieves very high scores, and that it outperforms two variants that use either the communication graph or the content in isolation

    Blog feed search with a post index

    Get PDF
    User generated content forms an important domain for mining knowledge. In this paper, we address the task of blog feed search: to find blogs that are principally devoted to a given topic, as opposed to blogs that merely happen to mention the topic in passing. The large number of blogs makes the blogosphere a challenging domain, both in terms of effectiveness and of storage and retrieval efficiency. We examine the effectiveness of an approach to blog feed search that is based on individual posts as indexing units (instead of full blogs). Working in the setting of a probabilistic language modeling approach to information retrieval, we model the blog feed search task by aggregating over a blogger’s posts to collect evidence of relevance to the topic and persistence of interest in the topic. This approach achieves state-of-the-art performance in terms of effectiveness. We then introduce a two-stage model where a pre-selection of candidate blogs is followed by a ranking step. The model integrates aggressive pruning techniques as well as very lean representations of the contents of blog posts, resulting in substantial gains in efficiency while maintaining effectiveness at a very competitive level

    Clinical characteristics of women captured by extending the definition of severe postpartum haemorrhage with 'refractoriness to treatment': a cohort study

    Get PDF
    Background: The absence of a uniform and clinically relevant definition of severe postpartum haemorrhage hampers comparative studies and optimization of clinical management. The concept of persistent postpartum haemorrhage, based on refractoriness to initial first-line treatment, was proposed as an alternative to common definitions that are either based on estimations of blood loss or transfused units of packed red blood cells (RBC). We compared characteristics and outcomes of women with severe postpartum haemorrhage captured by these three types of definitions. Methods: In this large retrospective cohort study in 61 hospitals in the Netherlands we included 1391 consecutive women with postpartum haemorrhage who received either ≥4 units of RBC or a multicomponent transfusion. Clinical characteristics and outcomes of women with severe postpartum haemorrhage defined as persistent postpartum haemorrhage were compared to definitions based on estimated blood loss or transfused units of RBC within 24 h following birth. Adverse maternal outcome was a composite of maternal mortality, hysterectomy, arterial embolisation and intensive care unit admission. Results: One thousand two hundred sixty out of 1391 women (90.6%) with postpartum haemorrhage fulfilled the definition of persistent postpartum haemorrhage. The majority, 820/1260 (65.1%), fulfilled this definition within 1 h following birth, compared to 819/1391 (58.7%) applying the definition of ≥1 L blood loss and 37/845 (4.4%) applying the definition of ≥4 units of RBC. The definition persistent postpartum haemorrhage captured 430/471 adverse maternal outcomes (91.3%), compared to 471/471 (100%) for ≥1 L blood loss and 383/471 (81.3%) for ≥4 units of RBC. Persistent postpartum haemorrhage did not capture all adverse outcomes because of missing data on timing of initial, first-line treatment. Conclusion: The definition persistent postpartum haemo

    Rijke, “A two-stage model for blog feed search

    No full text
    ABSTRACT We consider blog feed search: identifying relevant blogs for a given topic. An individual's search behavior often involves a combination of exploratory behavior triggered by salient features of the information objects being examined plus goal-directed in-depth information seeking behavior. We present a two-stage blog feed search model that directly builds on this insight. We first rank blog posts for a given topic, and use their parent blogs as selection of blogs that we rank using a blog-based model

    A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections

    No full text
    User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user’s information need and documents in a specific user generated content environment, the blogosphere, we apply a form of query expansion, i.e., adding and reweighing query terms. Since the blogosphere is noisy, query expansion on the collection itself is rarely effective but external, edited collections are more suitable. We propose a generative model for expanding queries using external collections in which dependencies between queries, documents, and expansion documents are explicitly modeled. Different instantiations of our model are discussed and make different (in)dependence assumptions. Results using two external collections (news and Wikipedia) show that external expansion for retrieval of user generated content is effective; besides, conditioning the external collection on the query is very beneficial, and making candidate expansion terms dependent on just the document seems sufficient.
    corecore