339,351 research outputs found

    The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

    Full text link
    Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy

    A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    Full text link
    The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as multilinguality, scalability, and issues which are related to diversity and inconsistency in content of different web pages. Due to the wide range of domains and the dynamic environments that the Semantic Annotation systems must be performed on, the problem of automating annotation process is one of the significant challenges in this domain. To overcome this problem, different machine learning approaches such as supervised learning, unsupervised learning and more recent ones like, semi-supervised learning and active learning have been utilized. In this paper we present an inclusive layered classification of Semantic Annotation challenges and discuss the most important issues in this field. Also, we review and analyze machine learning applications for solving semantic annotation problems. For this goal, the article tries to closely study and categorize related researches for better understanding and to reach a framework that can map machine learning techniques into the Semantic Annotation challenges and requirements

    PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

    Full text link
    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML

    Progressive growing of self-organized hierarchical representations for exploration

    Get PDF
    Designing agent that can autonomously discover and learn a diversity of structures and skills in unknown changing environments is key for lifelong machine learning. A central challenge is how to learn incrementally representations in order to progressively build a map of the discovered structures and re-use it to further explore. To address this challenge, we identify and target several key functionalities. First, we aim to build lasting representations and avoid catastrophic forgetting throughout the exploration process. Secondly we aim to learn a diversity of representations allowing to discover a "diversity of diversity" of structures (and associated skills) in complex high-dimensional environments. Thirdly, we target representations that can structure the agent discoveries in a coarse-to-fine manner. Finally, we target the reuse of such representations to drive exploration toward an "interesting" type of diversity, for instance leveraging human guidance. Current approaches in state representation learning rely generally on monolithic architectures which do not enable all these functionalities. Therefore, we present a novel technique to progressively construct a Hierarchy of Observation Latent Models for Exploration Stratification, called HOLMES. This technique couples the use of a dynamic modular model architecture for representation learning with intrinsically-motivated goal exploration processes (IMGEPs). The paper shows results in the domain of automated discovery of diverse self-organized patterns, considering as testbed the experimental framework from Reinke et al. (2019)
    • …
    corecore