4,544 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Extracting News Events from Microblogs

    Full text link
    Twitter stream has become a large source of information for many people, but the magnitude of tweets and the noisy nature of its content have made harvesting the knowledge from Twitter a challenging task for researchers for a long time. Aiming at overcoming some of the main challenges of extracting the hidden information from tweet streams, this work proposes a new approach for real-time detection of news events from the Twitter stream. We divide our approach into three steps. The first step is to use a neural network or deep learning to detect news-relevant tweets from the stream. The second step is to apply a novel streaming data clustering algorithm to the detected news tweets to form news events. The third and final step is to rank the detected events based on the size of the event clusters and growth speed of the tweet frequencies. We evaluate the proposed system on a large, publicly available corpus of annotated news events from Twitter. As part of the evaluation, we compare our approach with a related state-of-the-art solution. Overall, our experiments and user-based evaluation show that our approach on detecting current (real) news events delivers a state-of-the-art performance

    Your click decides your fate: Inferring Information Processing and Attrition Behavior from MOOC Video Clickstream Interactions

    Full text link
    In this work, we explore video lecture interaction in Massive Open Online Courses (MOOCs), which is central to student learning experience on these educational platforms. As a research contribution, we operationalize video lecture clickstreams of students into cognitively plausible higher level behaviors, and construct a quantitative information processing index, which can aid instructors to better understand MOOC hurdles and reason about unsatisfactory learning outcomes. Our results illustrate how such a metric inspired by cognitive psychology can help answer critical questions regarding students' engagement, their future click interactions and participation trajectories that lead to in-video & course dropouts. Implications for research and practice are discusse

    Text Data Analysis in Chinese Folk Music with Effective Clustering Model toward Feature Identification of Inheritance

    Get PDF
    Folk music based on big data analysis can provide valuable insights into the history, culture, and evolution of traditional music. By understanding the historical and cultural contexts of folk music, one better appreciate its value and contribute to its continued development and inheritance. Big data analysis can help identify patterns and trends in the performance, distribution, and reception of folk music across time and space. In this paper designed a Weighted Clustering Euclidean Feature (WCEF) model to evaluate folk music on the development of inheritance. Initially, the text data is extracted from folk music for the estimation of features in the big data analysis. Secondly, the WCEF model uses a clustering model for a subset of the folk music dataset with Weighted Non-Negative Matrix Factorization (WNMF). With the clustered model feature extraction is computed with Named Entity Recognition (NER). The NER model uses the Euclidean distance estimation for the computation of features in the folk data analysis. Finally, the WCEF model uses the deep learning model for the classification of inheritance in folk music. The experimental analysis stated that the WCEF model effectively classifies the folk music words and their contribution to inheritance

    Can Conversations on Reddit Forecast Future Economic Uncertainty? An Interpretable Machine Learning Approach

    Get PDF
    In recent years, social media has become an indispensable source of information through which public attitudes, opinions, and concerns can be studied and quantified. This paper proposes an interpretable machine learning framework for predicting the Equity Market-related Economic Uncertainty Index using features generated from a popular discussion forum on Reddit. Our framework consists of a series of custom preprocessing and analytics methods, including BERTopic for latent topic identification and regularized linear models. Using our framework, we evaluate explanatory models with different configurations over a large corpus of Reddit posts belonging to the personal finance category. Our analysis generates valuable insights about discussion topics on Reddit and their efficacy in accurately predicting future economic uncertainty. The study demonstrates the potential of using social media data and interpretable machine learning to inform economic forecasting research