339,351 research outputs found
The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
Training of neural networks for automated diagnosis of pigmented skin lesions
is hampered by the small size and lack of diversity of available datasets of
dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human
Against Machine with 10000 training images") dataset. We collected
dermatoscopic images from different populations acquired and stored by
different modalities. Given this diversity we had to apply different
acquisition and cleaning methods and developed semi-automatic workflows
utilizing specifically trained neural networks. The final dataset consists of
10015 dermatoscopic images which are released as a training set for academic
machine learning purposes and are publicly available through the ISIC archive.
This benchmark dataset can be used for machine learning and for comparisons
with human experts. Cases include a representative collection of all important
diagnostic categories in the realm of pigmented lesions. More than 50% of
lesions have been confirmed by pathology, while the ground truth for the rest
of the cases was either follow-up, expert consensus, or confirmation by in-vivo
confocal microscopy
A Machine Learning Based Analytical Framework for Semantic Annotation Requirements
The Semantic Web is an extension of the current web in which information is
given well-defined meaning. The perspective of Semantic Web is to promote the
quality and intelligence of the current web by changing its contents into
machine understandable form. Therefore, semantic level information is one of
the cornerstones of the Semantic Web. The process of adding semantic metadata
to web resources is called Semantic Annotation. There are many obstacles
against the Semantic Annotation, such as multilinguality, scalability, and
issues which are related to diversity and inconsistency in content of different
web pages. Due to the wide range of domains and the dynamic environments that
the Semantic Annotation systems must be performed on, the problem of automating
annotation process is one of the significant challenges in this domain. To
overcome this problem, different machine learning approaches such as supervised
learning, unsupervised learning and more recent ones like, semi-supervised
learning and active learning have been utilized. In this paper we present an
inclusive layered classification of Semantic Annotation challenges and discuss
the most important issues in this field. Also, we review and analyze machine
learning applications for solving semantic annotation problems. For this goal,
the article tries to closely study and categorize related researches for better
understanding and to reach a framework that can map machine learning techniques
into the Semantic Annotation challenges and requirements
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison
The selection, development, or comparison of machine learning methods in data
mining can be a difficult task based on the target problem and goals of a
particular study. Numerous publicly available real-world and simulated
benchmark datasets have emerged from different sources, but their organization
and adoption as standards have been inconsistent. As such, selecting and
curating specific benchmarks remains an unnecessary burden on machine learning
practitioners and data scientists. The present study introduces an accessible,
curated, and developing public benchmark resource to facilitate identification
of the strengths and weaknesses of different machine learning methodologies. We
compare meta-features among the current set of benchmark datasets in this
resource to characterize the diversity of available data. Finally, we apply a
number of established machine learning methods to the entire benchmark suite
and analyze how datasets and algorithms cluster in terms of performance. This
work is an important first step towards understanding the limitations of
popular benchmarking suites and developing a resource that connects existing
benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML
Progressive growing of self-organized hierarchical representations for exploration
Designing agent that can autonomously discover and learn a diversity of
structures and skills in unknown changing environments is key for lifelong
machine learning. A central challenge is how to learn incrementally
representations in order to progressively build a map of the discovered
structures and re-use it to further explore. To address this challenge, we
identify and target several key functionalities. First, we aim to build lasting
representations and avoid catastrophic forgetting throughout the exploration
process. Secondly we aim to learn a diversity of representations allowing to
discover a "diversity of diversity" of structures (and associated skills) in
complex high-dimensional environments. Thirdly, we target representations that
can structure the agent discoveries in a coarse-to-fine manner. Finally, we
target the reuse of such representations to drive exploration toward an
"interesting" type of diversity, for instance leveraging human guidance.
Current approaches in state representation learning rely generally on
monolithic architectures which do not enable all these functionalities.
Therefore, we present a novel technique to progressively construct a Hierarchy
of Observation Latent Models for Exploration Stratification, called HOLMES.
This technique couples the use of a dynamic modular model architecture for
representation learning with intrinsically-motivated goal exploration processes
(IMGEPs). The paper shows results in the domain of automated discovery of
diverse self-organized patterns, considering as testbed the experimental
framework from Reinke et al. (2019)
- …