72,426 research outputs found
Mining association language patterns using a distributional semantic model for negative life event classification
AbstractPurposeNegative life events, such as the death of a family member, an argument with a spouse or the loss of a job, play an important role in triggering depressive episodes. Therefore, it is worthwhile to develop psychiatric services that can automatically identify such events. This study describes the use of association language patterns, i.e., meaningful combinations of words (e.g., <loss, job>), as features to classify sentences with negative life events into predefined categories (e.g., Family, Love, Work).MethodsThis study proposes a framework that combines a supervised data mining algorithm and an unsupervised distributional semantic model to discover association language patterns. The data mining algorithm, called association rule mining, was used to generate a set of seed patterns by incrementally associating frequently co-occurring words from a small corpus of sentences labeled with negative life events. The distributional semantic model was then used to discover more patterns similar to the seed patterns from a large, unlabeled web corpus.ResultsThe experimental results showed that association language patterns were significant features for negative life event classification. Additionally, the unsupervised distributional semantic model was not only able to improve the level of performance but also to reduce the reliance of the classification process on the availability of a large, labeled corpus
Social media mining for identification and exploration of health-related information from pregnant women
Widespread use of social media has led to the generation of substantial
amounts of information about individuals, including health-related information.
Social media provides the opportunity to study health-related information about
selected population groups who may be of interest for a particular study. In
this paper, we explore the possibility of utilizing social media to perform
targeted data collection and analysis from a particular population group --
pregnant women. We hypothesize that we can use social media to identify cohorts
of pregnant women and follow them over time to analyze crucial health-related
information. To identify potentially pregnant women, we employ simple
rule-based searches that attempt to detect pregnancy announcements with
moderate precision. To further filter out false positives and noise, we employ
a supervised classifier using a small number of hand-annotated data. We then
collect their posts over time to create longitudinal health timelines and
attempt to divide the timelines into different pregnancy trimesters. Finally,
we assess the usefulness of the timelines by performing a preliminary analysis
to estimate drug intake patterns of our cohort at different trimesters. Our
rule-based cohort identification technique collected 53,820 users over thirty
months from Twitter. Our pregnancy announcement classification technique
achieved an F-measure of 0.81 for the pregnancy class, resulting in 34,895 user
timelines. Analysis of the timelines revealed that pertinent health-related
information, such as drug-intake and adverse reactions can be mined from the
data. Our approach to using user timelines in this fashion has produced very
encouraging results and can be employed for other important tasks where
cohorts, for which health-related information may not be available from other
sources, are required to be followed over time to derive population-based
estimates.Comment: 9 page
Mining Event Logs to Support Workflow Resource Allocation
Workflow technology is widely used to facilitate the business process in
enterprise information systems (EIS), and it has the potential to reduce design
time, enhance product quality and decrease product cost. However, significant
limitations still exist: as an important task in the context of workflow, many
present resource allocation operations are still performed manually, which are
time-consuming. This paper presents a data mining approach to address the
resource allocation problem (RAP) and improve the productivity of workflow
resource management. Specifically, an Apriori-like algorithm is used to find
the frequent patterns from the event log, and association rules are generated
according to predefined resource allocation constraints. Subsequently, a
correlation measure named lift is utilized to annotate the negatively
correlated resource allocation rules for resource reservation. Finally, the
rules are ranked using the confidence measures as resource allocation rules.
Comparative experiments are performed using C4.5, SVM, ID3, Na\"ive Bayes and
the presented approach, and the results show that the presented approach is
effective in both accuracy and candidate resource recommendations.Comment: T. Liu et al., Mining event logs to support workflow resource
allocation, Knowl. Based Syst. (2012), http://dx.doi.org/
10.1016/j.knosys.2012.05.01
Discovering conversational topics and emotions associated with Demonetization tweets in India
Social media platforms contain great wealth of information which provides us
opportunities explore hidden patterns or unknown correlations, and understand
people's satisfaction with what they are discussing. As one showcase, in this
paper, we summarize the data set of Twitter messages related to recent
demonetization of all Rs. 500 and Rs. 1000 notes in India and explore insights
from Twitter's data. Our proposed system automatically extracts the popular
latent topics in conversations regarding demonetization discussed in Twitter
via the Latent Dirichlet Allocation (LDA) based topic model and also identifies
the correlated topics across different categories. Additionally, it also
discovers people's opinions expressed through their tweets related to the event
under consideration via the emotion analyzer. The system also employs an
intuitive and informative visualization to show the uncovered insight.
Furthermore, we use an evaluation measure, Normalized Mutual Information (NMI),
to select the best LDA models. The obtained LDA results show that the tool can
be effectively used to extract discussion topics and summarize them for further
manual analysis.Comment: 6 pages, 11 figures. arXiv admin note: substantial text overlap with
arXiv:1608.02519 by other authors; text overlap with arXiv:1705.08094 by
other author
- …