359 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Collaborative Filtering Based Recommendation System: A survey

    Get PDF
    Abstract—the most common technique used for recommendations is collaborative filtering. Recommender systems based on collaborative filtering predict user preferences for products or services by learning past user-item relationships from a group of user who share the same preferences and taste. In this paper we have explored various aspects of collaborative filtering recommendation system. We have categorized collaborative filtering recommendation system and shown how the similarity is computed. The desired criteria for selection of data set are also listed. The measures used for evaluating the performance of collaborative filtering recommendation system are discussed along with the challenges faced by the recommendation system. Types of rating that can be collected from the user to rate items are also discussed along with the uses of collaborative filtering recommendation system

    ENHANCE NMF-BASED RECOMMENDATION SYSTEMS WITH AUXILIARY INFORMATION IMPUTATION

    Get PDF
    This dissertation studies the factors that negatively impact the accuracy of the collaborative filtering recommendation systems based on nonnegative matrix factorization (NMF). The keystone in the recommendation system is the rating that expresses the user\u27s opinion about an item. One of the most significant issues in the recommendation systems is the lack of ratings. This issue is called cold-start issue, which appears clearly with New-Users who did not rate any item and New-Items, which did not receive any rating. The traditional recommendation systems assume that users are independent and identically distributed and ignore the connections among users whereas the recommendation actually is a social activity. This dissertation aims to enhance NMF-based recommendation systems by utilizing the imputation method and limiting the errors that are introduced in the system. External information such as trust network and item categories are incorporated into NMF-based recommendation systems through the imputation. The proposed approaches impute various subsets of the missing ratings. The subsets are defined based on the total number of the ratings of the user or item before the imputation, such as impute the missing ratings of New-Users, New-Items, or cold-start users or items that suffer from the lack of the ratings. In addition, several factors are analyzed that affect the prediction accuracy when the imputation method is utilized with NMF-based recommendation systems. These factors include the total number of the ratings of the user or item before the imputation, the total number of imputed ratings for each user and item, the average of imputed rating values, and the value of imputed rating values. In addition, several strategies are applied to select the subset of missing ratings for the imputation that lead to increasing the prediction accuracy and limiting the imputation error. Moreover, a comparison is conducted with some popular methods that are in common with the proposed method in utilizing the imputation to handle the lack of ratings, but they differ in the source of the imputed ratings. Experiments on different large-size datasets are conducted to examine the proposed approaches and analyze the effects of the imputation on accuracy. Users and items are divided into three groups based on the total number of the ratings before the imputation is applied and their recommendation accuracy is calculated. The results show that the imputation enhances the recommendation system by capacitating the system to recommend items to New-Users, introduce New-Items to users, and increase the accuracy of the cold-start users and items. However, the analyzed factors play important roles in the recommendation accuracy and limit the error that is introduced from the imputation

    A Novel Nonparametric Test for Heterogeneity Detection and Assessment of Fluid Removal Among CRRT Patients in ICU

    Get PDF
    Over the past decade acute kidney injury (AKI) has been occurring among 20%-50% of patients admitted to the intensive care unit (ICU) in United States. Continuous renal replacement therapy (CRRT) has become a popular treatment method among these critically ill patients. But there are multiple complications in implementing this treatment, including discrepancies in practiced and prescribed fluid removal, possibly related to the heterogeneity among these patients. With mixture modeling there have been several techniques in detecting heterogeneity with their specific limitations. In this dissertation a novel nonparametric ‘d test’ will be used to detect heterogeneity among CRRT patients in ICU. Along with heterogeneity detection, this dissertation will also seek to understand ongoing issues with fluid removal and discrepancy in treatment implementations

    Cooperative multi-sensor tracking of vulnerable road users in the presence of missing detections

    Get PDF
    This paper presents a vulnerable road user (VRU) tracking algorithm capable of handling noisy and missing detections from heterogeneous sensors. We propose a cooperative fusion algorithm for matching and reinforcing of radar and camera detections using their proximity and positional uncertainty. The belief in the existence and position of objects is then maximized by temporal integration of fused detections by a multi-object tracker. By switching between observation models, the tracker adapts to the detection noise characteristics making it robust to individual sensor failures. The main novelty of this paper is an improved imputation sampling function for updating the state when detections are missing. The proposed function uses a likelihood without association that is conditioned on the sensor information instead of the sensor model. The benefits of the proposed solution are two-fold: firstly, particle updates become computationally tractable and secondly, the problem of imputing samples from a state which is predicted without an associated detection is bypassed. Experimental evaluation shows a significant improvement in both detection and tracking performance over multiple control algorithms. In low light situations, the cooperative fusion outperforms intermediate fusion by as much as 30%, while increases in tracking performance are most significant in complex traffic scenes

    Deep Clustering: A Comprehensive Survey

    Full text link
    Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is crucial for clustering algorithms. Recently, deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this paper we provide a comprehensive survey for deep clustering in views of data sources. With different data sources and initial conditions, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture. Concretely, deep clustering methods are introduced according to four categories, i.e., traditional single-view deep clustering, semi-supervised deep clustering, deep multi-view clustering, and deep transfer clustering. Finally, we discuss the open challenges and potential future opportunities in different fields of deep clustering

    Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

    Get PDF
    Research into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Causal effects of green infrastructure on stormwater hydrology and water quality

    Get PDF
    Applications of green infrastructure to stormwater management continue to increase in urban landscapes. There are numerous studies of individual stormwater management sites, but few meta-analyses that synthesize and explore design variables for stormwater control structures within a robust statistical framework. The lack of a standardized framework is due to the complexity of stormwater infrastructure designs. Locally customized designs fit to meet diverse site conditions create datasets that become messy, non-uniform, and difficult to analyze across multiple sites. In this dissertation, I first examine how hydrologic processes govern the function of various stormwater infrastructure technologies using water budget data from published literature. The hydrologic observations are displayed on a Water Budget Triangle---a ternary plot tool developed to visualize simplified water budgets---to enable direct functional comparisons of green and grey approaches to stormwater management. The findings are used to generate a suite of observable site characteristics, which are then mapped to a set of stormwater control and treatment sites reported in the International Stormwater Best Management Practice (BMP) database. These mapped site characteristics provide site context for the runoff and water quality observations present in the database. Drawing from these contextual observations of design variables, I next examine the functional design of different stormwater management technologies by quantifying the differences among varied structural features, and comparing their causal effects on hydrologic and water quality performance. This stormwater toolbox provides a framework for comparison of the overall performance of different system types to understand causal implications of stormwater design
    • …
    corecore