93,705 research outputs found
Alexandria: Extensible Framework for Rapid Exploration of Social Media
The Alexandria system under development at IBM Research provides an
extensible framework and platform for supporting a variety of big-data
analytics and visualizations. The system is currently focused on enabling rapid
exploration of text-based social media data. The system provides tools to help
with constructing "domain models" (i.e., families of keywords and extractors to
enable focus on tweets and other social media documents relevant to a project),
to rapidly extract and segment the relevant social media and its authors, to
apply further analytics (such as finding trends and anomalous terms), and
visualizing the results. The system architecture is centered around a variety
of REST-based service APIs to enable flexible orchestration of the system
capabilities; these are especially useful to support knowledge-worker driven
iterative exploration of social phenomena. The architecture also enables rapid
integration of Alexandria capabilities with other social media analytics
system, as has been demonstrated through an integration with IBM Research's
SystemG. This paper describes a prototypical usage scenario for Alexandria,
along with the architecture and key underlying analytics.Comment: 8 page
Statistical Inferences for Polarity Identification in Natural Language
Information forms the basis for all human behavior, including the ubiquitous
decision-making that people constantly perform in their every day lives. It is
thus the mission of researchers to understand how humans process information to
reach decisions. In order to facilitate this task, this work proposes a novel
method of studying the reception of granular expressions in natural language.
The approach utilizes LASSO regularization as a statistical tool to extract
decisive words from textual content and draw statistical inferences based on
the correspondence between the occurrences of words and an exogenous response
variable. Accordingly, the method immediately suggests significant implications
for social sciences and Information Systems research: everyone can now identify
text segments and word choices that are statistically relevant to authors or
readers and, based on this knowledge, test hypotheses from behavioral research.
We demonstrate the contribution of our method by examining how authors
communicate subjective information through narrative materials. This allows us
to answer the question of which words to choose when communicating negative
information. On the other hand, we show that investors trade not only upon
facts in financial disclosures but are distracted by filler words and
non-informative language. Practitioners - for example those in the fields of
investor communications or marketing - can exploit our insights to enhance
their writings based on the true perception of word choice
- …