705 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    MetaRec: Meta-Learning Meets Recommendation Systems

    Get PDF
    Artificial neural networks (ANNs) have recently received increasing attention as powerful modeling tools to improve the performance of recommendation systems. Meta-learning, on the other hand, is a paradigm that has re-surged in popularity within the broader machine learning community over the past several years. In this thesis, we will explore the intersection of these two domains and work on developing methods for integrating meta-learning to design more accurate and flexible recommendation systems. In the present work, we propose a meta-learning framework for the design of collaborative filtering methods in recommendation systems, drawing from ideas, models, and solutions from modern approaches in both the meta-learning and recommendation system literature, applying them to recommendation tasks to obtain improved generalization performance. Our proposed framework, MetaRec, includes and unifies the main state-of-the-art models in recommendation systems, extending them to be flexibly configured and efficiently operate with limited data. We empirically test the architectures created under our MetaRec framework on several recommendation benchmark datasets using a plethora of evaluation metrics and find that by taking a meta-learning approach to the collaborative filtering problem, we observe notable gains in predictive performance

    Learning from Structured Data with High Dimensional Structured Input and Output Domain

    Get PDF
    Structured data is accumulated rapidly in many applications, e.g. Bioinformatics, Cheminformatics, social network analysis, natural language processing and text mining. Designing and analyzing algorithms for handling these large collections of structured data has received significant interests in data mining and machine learning communities, both in the input and output domain. However, it is nontrivial to adopt traditional machine learning algorithms, e.g. SVM, linear regression to structured data. For one thing, the structural information in the input domain and output domain is ignored if applying the normal algorithms to structured data. For another, the major challenge in learning from many high-dimensional structured data is that input/output domain can contain tens of thousands even larger number of features and labels. With the high dimensional structured input space and/or structured output space, learning a low dimensional and consistent structured predictive function is important for both robustness and interpretability of the model. In this dissertation, we will present a few machine learning models that learn from the data with structured input features and structured output tasks. For learning from the data with structured input features, I have developed structured sparse boosting for graph classification, structured joint sparse PCA for anomaly detection and localization. Besides learning from structured input, I also investigated the interplay between structured input and output under the context of multi-task learning. In particular, I designed a multi-task learning algorithms that performs structured feature selection & task relationship Inference. We will demonstrate the applications of these structured models on subgraph based graph classification, networked data stream anomaly detection/localization, multiple cancer type prediction, neuron activity prediction and social behavior prediction. Finally, through my intern work at IBM T.J. Watson Research, I will demonstrate how to leverage structural information from mobile data (e.g. call detail record and GPS data) to derive important places from people's daily life for transit optimization and urban planning

    USER PROFILING AND PRIVACY PRESERVING FROM MULTIPLE SOCIAL NETWORKS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Data Mining

    Get PDF
    The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining

    PERSONALIZED POINT OF INTEREST RECOMMENDATIONS WITH PRIVACY-PRESERVING TECHNIQUES

    Get PDF
    Location-based services (LBS) have become increasingly popular, with millions of people using mobile devices to access information about nearby points of interest (POIs). Personalized POI recommender systems have been developed to assist users in discovering and navigating these POIs. However, these systems typically require large amounts of user data, including location history and preferences, to provide personalized recommendations. The collection and use of such data can pose significant privacy concerns. This dissertation proposes a privacy-preserving approach to POI recommendations that address these privacy concerns. The proposed approach uses clustering, tabular generative adversarial networks, and differential privacy to generate synthetic user data, allowing for personalized recommendations without revealing individual user data. Specifically, the approach clusters users based on their fuzzy locations, generates synthetic user data using a tabular generative adversarial network and perturbs user data with differential privacy before it is used for recommendation. The proposed approaches achieve well-balanced trade-offs between accuracy and privacy preservation and can be applied to different recommender systems. The approach is evaluated through extensive experiments on real-world POI datasets, demonstrating that it is effective in providing personalized recommendations while preserving user privacy. The results show that the proposed approach achieves comparable accuracy to traditional POI recommender systems that do not consider privacy while providing significant privacy guarantees for users. The research\u27s contribution is twofold: it compares different methods for synthesizing user data specifically for POI recommender systems and offers a general privacy-preserving framework for different recommender systems. The proposed approach provides a novel solution to the privacy concerns of POI recommender systems, contributes to the development of more trustworthy and user-friendly LBS applications, and can enhance the trust of users in these systems
    corecore