4,544 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Extracting News Events from Microblogs
Twitter stream has become a large source of information for many people, but
the magnitude of tweets and the noisy nature of its content have made
harvesting the knowledge from Twitter a challenging task for researchers for a
long time. Aiming at overcoming some of the main challenges of extracting the
hidden information from tweet streams, this work proposes a new approach for
real-time detection of news events from the Twitter stream. We divide our
approach into three steps. The first step is to use a neural network or deep
learning to detect news-relevant tweets from the stream. The second step is to
apply a novel streaming data clustering algorithm to the detected news tweets
to form news events. The third and final step is to rank the detected events
based on the size of the event clusters and growth speed of the tweet
frequencies. We evaluate the proposed system on a large, publicly available
corpus of annotated news events from Twitter. As part of the evaluation, we
compare our approach with a related state-of-the-art solution. Overall, our
experiments and user-based evaluation show that our approach on detecting
current (real) news events delivers a state-of-the-art performance
Your click decides your fate: Inferring Information Processing and Attrition Behavior from MOOC Video Clickstream Interactions
In this work, we explore video lecture interaction in Massive Open Online
Courses (MOOCs), which is central to student learning experience on these
educational platforms. As a research contribution, we operationalize video
lecture clickstreams of students into cognitively plausible higher level
behaviors, and construct a quantitative information processing index, which can
aid instructors to better understand MOOC hurdles and reason about
unsatisfactory learning outcomes. Our results illustrate how such a metric
inspired by cognitive psychology can help answer critical questions regarding
students' engagement, their future click interactions and participation
trajectories that lead to in-video & course dropouts. Implications for research
and practice are discusse
Text Data Analysis in Chinese Folk Music with Effective Clustering Model toward Feature Identification of Inheritance
Folk music based on big data analysis can provide valuable insights into the history, culture, and evolution of traditional music. By understanding the historical and cultural contexts of folk music, one better appreciate its value and contribute to its continued development and inheritance. Big data analysis can help identify patterns and trends in the performance, distribution, and reception of folk music across time and space. In this paper designed a Weighted Clustering Euclidean Feature (WCEF) model to evaluate folk music on the development of inheritance. Initially, the text data is extracted from folk music for the estimation of features in the big data analysis. Secondly, the WCEF model uses a clustering model for a subset of the folk music dataset with Weighted Non-Negative Matrix Factorization (WNMF). With the clustered model feature extraction is computed with Named Entity Recognition (NER). The NER model uses the Euclidean distance estimation for the computation of features in the folk data analysis. Finally, the WCEF model uses the deep learning model for the classification of inheritance in folk music. The experimental analysis stated that the WCEF model effectively classifies the folk music words and their contribution to inheritance
Can Conversations on Reddit Forecast Future Economic Uncertainty? An Interpretable Machine Learning Approach
In recent years, social media has become an indispensable source of information through which public attitudes, opinions, and concerns can be studied and quantified. This paper proposes an interpretable machine learning framework for predicting the Equity Market-related Economic Uncertainty Index using features generated from a popular discussion forum on Reddit. Our framework consists of a series of custom preprocessing and analytics methods, including BERTopic for latent topic identification and regularized linear models. Using our framework, we evaluate explanatory models with different configurations over a large corpus of Reddit posts belonging to the personal finance category. Our analysis generates valuable insights about discussion topics on Reddit and their efficacy in accurately predicting future economic uncertainty. The study demonstrates the potential of using social media data and interpretable machine learning to inform economic forecasting research
- …