2,584 research outputs found

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    KCRC-LCD: Discriminative Kernel Collaborative Representation with Locality Constrained Dictionary for Visual Categorization

    Full text link
    We consider the image classification problem via kernel collaborative representation classification with locality constrained dictionary (KCRC-LCD). Specifically, we propose a kernel collaborative representation classification (KCRC) approach in which kernel method is used to improve the discrimination ability of collaborative representation classification (CRC). We then measure the similarities between the query and atoms in the global dictionary in order to construct a locality constrained dictionary (LCD) for KCRC. In addition, we discuss several similarity measure approaches in LCD and further present a simple yet effective unified similarity measure whose superiority is validated in experiments. There are several appealing aspects associated with LCD. First, LCD can be nicely incorporated under the framework of KCRC. The LCD similarity measure can be kernelized under KCRC, which theoretically links CRC and LCD under the kernel method. Second, KCRC-LCD becomes more scalable to both the training set size and the feature dimension. Example shows that KCRC is able to perfectly classify data with certain distribution, while conventional CRC fails completely. Comprehensive experiments on many public datasets also show that KCRC-LCD is a robust discriminative classifier with both excellent performance and good scalability, being comparable or outperforming many other state-of-the-art approaches

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    The Emerging Trends of Multi-Label Learning

    Full text link
    Exabytes of data are generated daily by humans, leading to the growing need for new efforts in dealing with the grand challenges for multi-label learning brought by big data. For example, extreme multi-label classification is an active and rapidly growing research area that deals with classification tasks with an extremely large number of classes or labels; utilizing massive data with limited supervision to build a multi-label classification model becomes valuable for practical applications, etc. Besides these, there are tremendous efforts on how to harvest the strong learning capability of deep learning to better capture the label dependencies in multi-label learning, which is the key for deep learning to address real-world classification tasks. However, it is noted that there has been a lack of systemic studies that focus explicitly on analyzing the emerging trends and new challenges of multi-label learning in the era of big data. It is imperative to call for a comprehensive survey to fulfill this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202

    Personalized large scale classification of public tenders on hadoop

    Get PDF
    Ce projet a Ă©tĂ© rĂ©alisĂ© dans le cadre d’un partenariat entre Fujitsu Canada et UniversitĂ© Laval. Les besoins du projets ont Ă©tĂ© centrĂ©s sur une problĂ©matique d’affaire dĂ©finie conjointement avec Fujitsu. Le projet consistait Ă  classifier un corpus d’appels d’offres Ă©lectroniques avec une approche orientĂ© big data. L’objectif Ă©tait d’identifier avec un trĂšs fort rappel les offres pertinentes au domaine d’affaire de l’entreprise. AprĂšs une sĂ©ries d’expĂ©rimentations Ă  petite Ă©chelle qui nous ont permise d’illustrer empiriquement (93% de rappel) l’efficacitĂ© de notre approche basĂ© sur l’algorithme BNS (Bi-Normal Separation), nous avons implantĂ© un systĂšme complet qui exploite l’infrastructure technologique big data Hadoop. Nos expĂ©rimentations sur le systĂšme complet dĂ©montrent qu’il est possible d’obtenir une performance de classification tout aussi efficace Ă  grande Ă©chelle (91% de rappel) tout en exploitant les gains de performance rendus possible par l’architecture distribuĂ©e de Hadoop.This project was completed as part of an innovation partnership with Fujitsu Canada and UniversitĂ© Laval. The needs and objectives of the project were centered on a business problem defined jointly with Fujitsu. Our project aimed to classify a corpus of electronic public tenders based on state of the art Hadoop big data technology. The objective was to identify with high recall public tenders relevant to the IT services business of Fujitsu Canada. A small scale prototype based on the BNS algorithm (Bi-Normal Separation) was empirically shown to classify with high recall (93%) the public tender corpus. The prototype was then re-implemented on a full scale Hadoop cluster using Apache Pig for the data preparation pipeline and using Apache Mahout for classification. Our experimentation show that the large scale system not only maintains high recall (91%) on the classification task, but can readily take advantage of the massive scalability gains made possible by Hadoop’s distributed architecture

    A study on machine vision techniques for the inspection of health personnels' protective suits for the treatment of patients in extreme isolation

    Get PDF
    The examination of Personal Protective Equipment (PPE) to assure the complete integrity of health personnel in contact with infected patients is one of the most necessary tasks when treating patients affected by infectious diseases, such as Ebola. This work focuses on the study of machine vision techniques for the detection of possible defects on the PPE that could arise after contact with the aforementioned pathological patients. A preliminary study on the use of image classification algorithms to identify blood stains on PPE subsequent to the treatment of the infected patient is presented. To produce training data for these algorithms, a synthetic dataset was generated from a simulated model of a PPE suit with blood stains. Furthermore, the study proceeded with the utilization of images of the PPE with a physical emulation of blood stains, taken by a real prototype. The dataset reveals a great imbalance between positive and negative samples; therefore, all the selected classification algorithms are able to manage this kind of data. Classifiers range from Logistic Regression and Support Vector Machines, to bagging and boosting techniques such as Random Forest, Adaptive Boosting, Gradient Boosting and eXtreme Gradient Boosting. All these algorithms were evaluated on accuracy, precision, recall and F1 score; and additionally, execution times were considered. The obtained results report promising outcomes of all the classifiers, and, in particular Logistic Regression resulted to be the most suitable classification algorithm in terms of F1 score and execution time, considering both datasets.The research leading to these results received funding from: InspecciĂłn robotizada de los trajes de proteccion del personal sanitario de pacientes en aislamiento de alto nivel, incluido el Ă©bola, Programa Explora Ciencia, Ministerio de Ciencia, InnovaciĂłn y Universidades (DPI2015-72015-EXP); the RoboCity2030-DIH-CM Madrid Robotics Digital Innovation Hub (“RobĂłtica aplicada a la mejora de la calidad de vida de los ciudadanos. fase IV”; S2018/NMT-4331), funded by “Programas de Actividades I+D en la Comunidad de Madrid” and cofunded by Structural Funds of the EU; and ROBOESPAS: Active rehabilitation of patients with upper limb spasticity using collaborative robots, Ministerio de EconomĂ­a, Industria y Competitividad, Programa Estatal de I+D+i Orientada a los Retos de la Sociedad (DPI2017-87562-C2-1-R)
    • 

    corecore