10 research outputs found
Deep learning from crowds
Over the last few years, deep learning has revolutionized the field of
machine learning by dramatically improving the state-of-the-art in various
domains. However, as the size of supervised artificial neural networks grows,
typically so does the need for larger labeled datasets. Recently, crowdsourcing
has established itself as an efficient and cost-effective solution for labeling
large sets of data in a scalable manner, but it often requires aggregating
labels from multiple noisy contributors with different levels of expertise. In
this paper, we address the problem of learning deep neural networks from
crowds. We begin by describing an EM algorithm for jointly learning the
parameters of the network and the reliabilities of the annotators. Then, a
novel general-purpose crowd layer is proposed, which allows us to train deep
neural networks end-to-end, directly from the noisy labels of multiple
annotators, using only backpropagation. We empirically show that the proposed
approach is able to internally capture the reliability and biases of different
annotators and achieve new state-of-the-art results for various crowdsourced
datasets across different settings, namely classification, regression and
sequence labeling.Comment: 10 pages, The Thirty-Second AAAI Conference on Artificial
Intelligence (AAAI), 201
A methodology for peripheral nerve segmentation using a multiple annotators approach based on Centered Kernel Alignment
Peripheral Nerve Blocking (PNB) is a technique commonly used to perform regional
anesthesia and for pain management. The success of PNB procedures depends on the accurate
location of the target nerve. Recently, ultrasound imaging has been widely used to locate
nerve structures to carry out PNB, due to it enables a non-invasive visualization of the
target nerve and the anatomical structures around it. However, the ultrasound images are
affected by several artifacts making difficult the accurate delimitation of nerves. In the
literature, several approaches have been proposed to carry out automatic or semi-automatic
segmentation. Nevertheless, these methods are designed assuming that the gold standard
is available, and for this segmentation problem this gold standard can not be obtained
considering that it corresponds to subjective interpretation. In this sense, for building those
segmentation models, we do not have access to the actual label but an amount of subjective
annotations provided by multiple experts. To deal with this drawback we use the concepts
of a relatively new area of machine learning known as “Learning from crowds”, this area
deals with supervised learning problems considering the case when the gold standard is not
available.
In this project, we develop a nerve segmentation system that includes: a preprocessing
stage, feature extraction methodology based on adaptive methods, and a Centered Kernel
Alignment (CKA) based representation to measure the annotators performance for building
a classifier with multiple annotators in order to support peripheral nerve segmentation.
Our approach to classification with multiple annotators based on CKA is tested on both
simulated data and real data; similarly, the methodology of automatic segmentation proposed
in this work was tested over ultrasound images labeled by a set of specialists who give their
opinion about the location of nerve structures. According to the results, we conclude that
our methodology can be used to locate nerve structures in ultrasound images even if the
gold standard (the actual location of nerve structures) is not available in the training stage.
Moreover, we determine that the approach proposed in this work could be implemented as
a guiding tool for the anesthesiologist to carry out PNB procedures assisted by ultrasound
imaging
A methodology for peripheral nerve segmentation using a multiple annotators approach based on Centered Kernel Alignment
Peripheral Nerve Blocking (PNB) is a technique commonly used to perform regional
anesthesia and for pain management. The success of PNB procedures depends on the accurate
location of the target nerve. Recently, ultrasound imaging has been widely used to locate
nerve structures to carry out PNB, due to it enables a non-invasive visualization of the
target nerve and the anatomical structures around it. However, the ultrasound images are
affected by several artifacts making difficult the accurate delimitation of nerves. In the
literature, several approaches have been proposed to carry out automatic or semi-automatic
segmentation. Nevertheless, these methods are designed assuming that the gold standard
is available, and for this segmentation problem this gold standard can not be obtained
considering that it corresponds to subjective interpretation. In this sense, for building those
segmentation models, we do not have access to the actual label but an amount of subjective
annotations provided by multiple experts. To deal with this drawback we use the concepts
of a relatively new area of machine learning known as “Learning from crowds”, this area
deals with supervised learning problems considering the case when the gold standard is not
available.
In this project, we develop a nerve segmentation system that includes: a preprocessing
stage, feature extraction methodology based on adaptive methods, and a Centered Kernel
Alignment (CKA) based representation to measure the annotators performance for building
a classifier with multiple annotators in order to support peripheral nerve segmentation.
Our approach to classification with multiple annotators based on CKA is tested on both
simulated data and real data; similarly, the methodology of automatic segmentation proposed
in this work was tested over ultrasound images labeled by a set of specialists who give their
opinion about the location of nerve structures. According to the results, we conclude that
our methodology can be used to locate nerve structures in ultrasound images even if the
gold standard (the actual location of nerve structures) is not available in the training stage.
Moreover, we determine that the approach proposed in this work could be implemented as
a guiding tool for the anesthesiologist to carry out PNB procedures assisted by ultrasound
imaging
Scalable and Ensemble Learning for Big Data
University of Minnesota Ph.D. dissertation. May 2019. Major: Electrical/Computer Engineering. Advisor: Georgios Giannakis. 1 computer file (PDF); xi, 126 pages.The turn of the decade has trademarked society and computing research with a ``data deluge.'' As the number of smart, highly accurate and Internet-capable devices increases, so does the amount of data that is generated and collected. While this sheer amount of data has the potential to enable high quality inference, and mining of information, it introduces numerous challenges in the processing and pattern analysis, since available statistical inference and machine learning approaches do not necessarily scale well with the number of data and their dimensionality. In addition to the challenges related to scalability, data gathered are often noisy, dynamic, contaminated by outliers or corrupted to specifically inhibit the inference task. Moreover, many machine learning approaches have been shown to be susceptible to adversarial attacks. At the same time, the cost of cloud and distributed computing is rapidly declining. Therefore, there is a pressing need for statistical inference and machine learning tools that are robust to attacks and scale with the volume and dimensionality of the data, by harnessing efficiently the available computational resources. This thesis is centered on analytical and algorithmic foundations that aim to enable statistical inference and data analytics from large volumes of high-dimensional data. The vision is to establish a comprehensive framework based on state-of-the-art machine learning, optimization and statistical inference tools to enable truly large-scale inference, which can tap on the available (possibly distributed) computational resources, and be resilient to adversarial attacks. The ultimate goal is to both analytically and numerically demonstrate how valuable insights from signal processing can lead to markedly improved and accelerated learning tools. To this end, the present thesis investigates two main research thrusts: i) Large-scale subspace clustering; and ii) unsupervised ensemble learning. The aforementioned research thrusts introduce novel algorithms that aim to tackle the issues of large-scale learning. The potential of the proposed algorithms is showcased by rigorous theoretical results and extensive numerical tests
Physician Participation in Crowdsourcing: Effect of Intrinsic and Extrinsic Motivation
Physicians must participate in developing medical protocols to ensure that medical best practices are adopted for patients\u27 social benefit. Healthcare leaders have struggled to gain sufficient physician participation in developing medical protocols. Using technology-based crowdsourcing to assimilate knowledge from physicians may help healthcare managers improve medical protocol development. Using self-determination theory, this quantitative causal-comparative design aimed to determine whether differences in intrinsic and extrinsic motivation existed among the 132 participating physicians who did or did not participate in developing medical protocols in a crowdsourcing environment. Participants were recruited by e-mail through an independent physician association. Motivation levels were measured by the Aspirations Index via an online survey. A total of 55.3% of respondents participated in developing medical protocols. Differences were anticipated in the levels of participation in developing medical protocols between intrinsically and extrinsically motivated physicians. Rank correlations were computed between the number of protocols completed and all of the motivation scores. Personal growth and community contribution were significantly correlated with the number of addressed protocols. Positive social change may occur through improving medical protocols and healthcare outcomes by informing healthcare leaders about physicians\u27 motivation to participate in developing medical protocols. By understanding these motivators, leaders can highlight the benefits of protocol development to encourage physician participation. If participation is enhanced, protocol quality and healthcare effectiveness may be improved, benefitting patients and healthy individuals
Gamifying Language Resource Acquisition
PhD ThesisNatural Language Processing, is an important collection of methods for processing the vast
amounts of available natural language text we continually produce. These methods make
use of supervised learning, an approach that learns from large amounts of annotated
data. As humans, we’re able to provide information about text that such systems can learn from.
Historically, this was carried out by small groups of experts. However, this did not scale. This led
to various crowdsourcing approaches being taken that used large pools of non-experts.
The traditional form of crowdsourcing was to pay users small amounts of money to complete
tasks. As time progressed, gamification approaches such as GWAPs, showed various benefits
over the micro-payment methods used before. These included a cost saving, worker training
opportunities, increased worker engagement and potential to far exceed the scale of crowdsourcing.
While these were successful in domains such as image labelling, they struggled in the domain
of text annotation, which wasn’t such a natural fit. Despite many challenges, there were also
clearly many opportunities and benefits to applying this approach to text annotation. Many of
these are demonstrated by Phrase Detectives. Based on lessons learned from Phrase Detectives
and investigations into other GWAPs, in this work, we attempt to create full GWAPs for NLP,
extracting the benefits of the methodology. This includes training, high quality output from
non-experts and a truly game-like GWAP design that players are happy to play voluntarily
Sequence labeling with multiple annotators
The increasingly popular use of Crowdsourcing as a resource to obtain labeled data has been contributing to the wide awareness of the machine learning community to the problem of supervised learning from multiple annotators. Several approaches have been proposed to deal with this issue, but they disregard sequence labeling problems. However, these are very common, for example, among the Natural Language Processing and Bioinformatics communities. In this paper, we present a probabilistic approach for sequence labeling using Conditional Random Fields (CRF) for situations where label sequences from multiple annotators are available but there is no actual ground truth. The approach uses the Expectation-Maximization algorithm to jointly learn the CRF model parameters, the reliability of the annotators and the estimated ground truth. When it comes to performance, the proposed method (CRF-MA) significantly outperforms typical approaches such as majority voting