4,039 research outputs found
A Machine Learning Based Analytical Framework for Semantic Annotation Requirements
The Semantic Web is an extension of the current web in which information is
given well-defined meaning. The perspective of Semantic Web is to promote the
quality and intelligence of the current web by changing its contents into
machine understandable form. Therefore, semantic level information is one of
the cornerstones of the Semantic Web. The process of adding semantic metadata
to web resources is called Semantic Annotation. There are many obstacles
against the Semantic Annotation, such as multilinguality, scalability, and
issues which are related to diversity and inconsistency in content of different
web pages. Due to the wide range of domains and the dynamic environments that
the Semantic Annotation systems must be performed on, the problem of automating
annotation process is one of the significant challenges in this domain. To
overcome this problem, different machine learning approaches such as supervised
learning, unsupervised learning and more recent ones like, semi-supervised
learning and active learning have been utilized. In this paper we present an
inclusive layered classification of Semantic Annotation challenges and discuss
the most important issues in this field. Also, we review and analyze machine
learning applications for solving semantic annotation problems. For this goal,
the article tries to closely study and categorize related researches for better
understanding and to reach a framework that can map machine learning techniques
into the Semantic Annotation challenges and requirements
PRESISTANT: Learning based assistant for data pre-processing
Data pre-processing is one of the most time consuming and relevant steps in a
data analysis process (e.g., classification task). A given data pre-processing
operator (e.g., transformation) can have positive, negative or zero impact on
the final result of the analysis. Expert users have the required knowledge to
find the right pre-processing operators. However, when it comes to non-experts,
they are overwhelmed by the amount of pre-processing operators and it is
challenging for them to find operators that would positively impact their
analysis (e.g., increase the predictive accuracy of a classifier). Existing
solutions either assume that users have expert knowledge, or they recommend
pre-processing operators that are only "syntactically" applicable to a dataset,
without taking into account their impact on the final analysis. In this work,
we aim at providing assistance to non-expert users by recommending data
pre-processing operators that are ranked according to their impact on the final
analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the
impact of pre-processing operators on the performance (e.g., predictive
accuracy) of 5 different classification algorithms, such as J48, Naive Bayes,
PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the
recommendations provided by our tool, show that PRESISTANT can effectively help
non-experts in order to achieve improved results in their analytical tasks
Recommended from our members
A classification of data quality assessment and improvement methods
Data quality (DQ) assessment and improvement in larger
information systems would often not be feasible without using suitable “DQ
methods”, which are algorithms that can be automatically executed by
computer systems to detect and/or correct problems in datasets. Currently, these
methods are already essential, and they will be of even greater importance as
the quantity of data in organisational systems grows. This paper provides a
review of existing methods for both DQ assessment and improvement and
classifies them according to the DQ problem and problem context. Six gaps
have been identified in the classification, where no current DQ methods exist,
and these show where new methods are required as a guide for future research
and DQ tool development.This is the accepted manuscript. It's currently embargoed pending publication by Inderscience
On Interpretability of Deep Learning based Skin Lesion Classifiers using Concept Activation Vectors
Deep learning based medical image classifiers have shown remarkable prowess
in various application areas like ophthalmology, dermatology, pathology, and
radiology. However, the acceptance of these Computer-Aided Diagnosis (CAD)
systems in real clinical setups is severely limited primarily because their
decision-making process remains largely obscure. This work aims at elucidating
a deep learning based medical image classifier by verifying that the model
learns and utilizes similar disease-related concepts as described and employed
by dermatologists. We used a well-trained and high performing neural network
developed by REasoning for COmplex Data (RECOD) Lab for classification of three
skin tumours, i.e. Melanocytic Naevi, Melanoma and Seborrheic Keratosis and
performed a detailed analysis on its latent space. Two well established and
publicly available skin disease datasets, PH2 and derm7pt, are used for
experimentation. Human understandable concepts are mapped to RECOD image
classification model with the help of Concept Activation Vectors (CAVs),
introducing a novel training and significance testing paradigm for CAVs. Our
results on an independent evaluation set clearly shows that the classifier
learns and encodes human understandable concepts in its latent representation.
Additionally, TCAV scores (Testing with CAVs) suggest that the neural network
indeed makes use of disease-related concepts in the correct way when making
predictions. We anticipate that this work can not only increase confidence of
medical practitioners on CAD but also serve as a stepping stone for further
development of CAV-based neural network interpretation methods.Comment: Accepted for the IEEE International Joint Conference on Neural
Networks (IJCNN) 202
The Hierarchic treatment of marine ecological information from spatial networks of benthic platforms
Measuring biodiversity simultaneously in different locations, at different temporal scales, and over wide spatial scales is of strategic importance for the improvement of our understanding of the functioning of marine ecosystems and for the conservation of their biodiversity. Monitoring networks of cabled observatories, along with other docked autonomous systems (e.g., Remotely Operated Vehicles [ROVs], Autonomous Underwater Vehicles [AUVs], and crawlers), are being conceived and established at a spatial scale capable of tracking energy fluxes across benthic and pelagic compartments, as well as across geographic ecotones. At the same time, optoacoustic imaging is sustaining an unprecedented expansion in marine ecological monitoring, enabling the acquisition of new biological and environmental data at an appropriate spatiotemporal scale. At this stage, one of the main problems for an effective application of these technologies is the processing, storage, and treatment of the acquired complex ecological information. Here, we provide a conceptual overview on the technological developments in the multiparametric generation, storage, and automated hierarchic treatment of biological and environmental information required to capture the spatiotemporal complexity of a marine ecosystem. In doing so, we present a pipeline of ecological data acquisition and processing in different steps and prone to automation. We also give an example of population biomass, community richness and biodiversity data computation (as indicators for ecosystem functionality) with an Internet Operated Vehicle (a mobile crawler). Finally, we discuss the software requirements for that automated data processing at the level of cyber-infrastructures with sensor calibration and control, data banking, and ingestion into large data portals.Peer ReviewedPostprint (published version
- …