14 research outputs found
Ranking, Labeling, and Summarizing Short Text in Social Media
One of the key features driving the growth and success of the Social Web is large-scale participation through user-contributed content – often through short text in social media. Unlike traditional long-form documents – e.g., Web pages, blog posts – these short text resources are typically quite brief (on the order of 100s of characters), often of a personal nature (reflecting opinions and reactions of users), and being generated at an explosive rate. Coupled with this explosion of short text in social media is the need for new methods to organize, monitor, and distill relevant information from these large-scale social systems, even in the face of the inherent “messiness” of short text, considering the wide variability in quality, style, and substance of short text generated by a legion of Social Web participants.
Hence, this dissertation seeks to develop new algorithms and methods to ensure the continued growth of the Social Web by enhancing how users engage with short text in social media. Concretely, this dissertation takes a three-fold approach:
First, this dissertation develops a learning-based algorithm to automatically rank short text comments associated with a Social Web object (e.g., Web document, image, video) based on the expressed preferences of the community itself, so that low-quality short text may be filtered and user attention may be focused on highly-ranked short text.
Second, this dissertation organizes short text through labeling, via a graph- based framework for automatically assigning relevant labels to short text. In this way meaningful semantic descriptors may be assigned to short text for improved classification, browsing, and visualization.
Third, this dissertation presents a cluster-based summarization approach for extracting high-quality viewpoints expressed in a collection of short text, while maintaining diverse viewpoints. By summarizing short text, user attention may quickly assess the aggregate viewpoints expressed in a collection of short text, without the need to scan each of possibly thousands of short text items
Alexandria: Extensible Framework for Rapid Exploration of Social Media
The Alexandria system under development at IBM Research provides an
extensible framework and platform for supporting a variety of big-data
analytics and visualizations. The system is currently focused on enabling rapid
exploration of text-based social media data. The system provides tools to help
with constructing "domain models" (i.e., families of keywords and extractors to
enable focus on tweets and other social media documents relevant to a project),
to rapidly extract and segment the relevant social media and its authors, to
apply further analytics (such as finding trends and anomalous terms), and
visualizing the results. The system architecture is centered around a variety
of REST-based service APIs to enable flexible orchestration of the system
capabilities; these are especially useful to support knowledge-worker driven
iterative exploration of social phenomena. The architecture also enables rapid
integration of Alexandria capabilities with other social media analytics
system, as has been demonstrated through an integration with IBM Research's
SystemG. This paper describes a prototypical usage scenario for Alexandria,
along with the architecture and key underlying analytics.Comment: 8 page
Mapping 123 million neonatal, infant and child deaths between 2000 and 2017
Since 2000, many countries have achieved considerable success in improving child survival, but localized progress remains unclear. To inform efforts towards United Nations Sustainable Development Goal 3.2—to end preventable child deaths by 2030—we need consistently estimated data at the subnational level regarding child mortality rates and trends. Here we quantified, for the period 2000–2017, the subnational variation in mortality rates and number of deaths of neonates, infants and children under 5 years of age within 99 low- and middle-income countries using a geostatistical survival model. We estimated that 32% of children under 5 in these countries lived in districts that had attained rates of 25 or fewer child deaths per 1,000 live births by 2017, and that 58% of child deaths between 2000 and 2017 in these countries could have been averted in the absence of geographical inequality. This study enables the identification of high-mortality clusters, patterns of progress and geographical inequalities to inform appropriate investments and implementations that will help to improve the health of all populations
Summarizing User-Contributed Comments
User-contributed comments are one of the hallmarks of the Social Web, widely adopted across social media sites and mainstream news providers alike. While comments encourage higher-levels of user engagement with online media, their wide success places new burdens on users to process and assimilate the perspectives of a huge number of user-contributed perspectives. Toward overcoming this problem we study in this paper the comment summarization problem: for a set of n user-contributed comments associated with an online resource, select the best top-k comments for summarization. In this paper we propose (i) a clustering-based approach for identifying correlated groups of comments; and (ii) a precedence-based ranking framework for automatically selecting informative user-contributed comments. We find that in combination, these two salient features yield promising results
Probabilistic generative models of the social annotation process
Abstract—With the growth in the past few years of social tagging services like Delicious and CiteULike, there is growing interest in modeling and mining these social systems for deriving implicit social collective intelligence. In this paper, we propose and explore two probabilistic generative models of the social annotation (or tagging) process with an emphasis on user participation. These models leverage the inherent social communities implicit in these tagging services. We compare the proposed models to two prominent probabilistic topic models (Latent Dirichlet Allocation and Pachinko Allocation) via an experimental study of the popular Delicious tagging service. We find that the proposed community-based annotation models identify more coherent implicit structures than the alternatives and are better suited to handle unseen social annotation data. I