406 research outputs found
More efficient manual review of automatically transcribed tabular data
Machine learning methods have proven useful in transcribing historical data.
However, results from even highly accurate methods require manual verification
and correction. Such manual review can be time-consuming and expensive,
therefore the objective of this paper was to make it more efficient.
Previously, we used machine learning to transcribe 2.3 million handwritten
occupation codes from the Norwegian 1950 census with high accuracy (97%). We
manually reviewed the 90,000 (3%) codes with the lowest model confidence. We
allocated those 90,000 codes to human reviewers, who used our annotation tool
to review the codes. To assess reviewer agreement, some codes were assigned to
multiple reviewers. We then analyzed the review results to understand the
relationship between accuracy improvements and effort. Additionally, we
interviewed the reviewers to improve the workflow. The reviewers corrected
62.8% of the labels and agreed with the model label in 31.9% of cases. About
0.2% of the images could not be assigned a label, while for 5.1% the reviewers
were uncertain, or they assigned an invalid label. 9,000 images were
independently reviewed by multiple reviewers, resulting in an agreement of
86.43% and disagreement of 8.96%. We learned that our automatic transcription
is biased towards the most frequent codes, with a higher degree of
misclassification for the lowest frequency codes. Our interview findings show
that the reviewers did internal quality control and found our custom tool
well-suited. So, only one reviewer is needed, but they should report
uncertainty.Comment: 19 pages, 5 figures, 1 tabl
Rank-Aware Negative Training for Semi-Supervised Text Classification
Semi-supervised text classification-based paradigms (SSTC) typically employ
the spirit of self-training. The key idea is to train a deep classifier on
limited labeled texts and then iteratively predict the unlabeled texts as their
pseudo-labels for further training. However, the performance is largely
affected by the accuracy of pseudo-labels, which may not be significant in
real-world scenarios. This paper presents a Rank-aware Negative Training (RNT)
framework to address SSTC in learning with noisy label manner. To alleviate the
noisy information, we adapt a reasoning with uncertainty-based approach to rank
the unlabeled texts based on the evidential support received from the labeled
texts. Moreover, we propose the use of negative training to train RNT based on
the concept that ``the input instance does not belong to the complementary
label''. A complementary label is randomly selected from all labels except the
label on-target. Intuitively, the probability of a true label serving as a
complementary label is low and thus provides less noisy information during the
training, resulting in better performance on the test data. Finally, we
evaluate the proposed solution on various text classification benchmark
datasets. Our extensive experiments show that it consistently overcomes the
state-of-the-art alternatives in most scenarios and achieves competitive
performance in the others. The code of RNT is publicly available
at:https://github.com/amurtadha/RNT.Comment: TACL 202
Computerized Analysis of Magnetic Resonance Images to Study Cerebral Anatomy in Developing Neonates
The study of cerebral anatomy in developing neonates is of great importance for
the understanding of brain development during the early period of life. This
dissertation therefore focuses on three challenges in the modelling of cerebral
anatomy in neonates during brain development. The methods that have been
developed all use Magnetic Resonance Images (MRI) as source data.
To facilitate study of vascular development in the neonatal period, a set of image
analysis algorithms are developed to automatically extract and model cerebral
vessel trees. The whole process consists of cerebral vessel tracking from
automatically placed seed points, vessel tree generation, and vasculature
registration and matching. These algorithms have been tested on clinical Time-of-
Flight (TOF) MR angiographic datasets.
To facilitate study of the neonatal cortex a complete cerebral cortex segmentation
and reconstruction pipeline has been developed. Segmentation of the neonatal
cortex is not effectively done by existing algorithms designed for the adult brain
because the contrast between grey and white matter is reversed. This causes pixels
containing tissue mixtures to be incorrectly labelled by conventional methods. The
neonatal cortical segmentation method that has been developed is based on a novel
expectation-maximization (EM) method with explicit correction for mislabelled
partial volume voxels. Based on the resulting cortical segmentation, an implicit
surface evolution technique is adopted for the reconstruction of the cortex in
neonates. The performance of the method is investigated by performing a detailed
landmark study.
To facilitate study of cortical development, a cortical surface registration algorithm
for aligning the cortical surface is developed. The method first inflates extracted
cortical surfaces and then performs a non-rigid surface registration using free-form
deformations (FFDs) to remove residual alignment. Validation experiments using
data labelled by an expert observer demonstrate that the method can capture local
changes and follow the growth of specific sulcus
Large Area Crop Inventory Experiment (LACIE). Transition year Classification And Mensuration Sub system (CAMS) detailed analysis procedures
There are no author-identified significant results in this report
Automatic Schaeffer's gestures recognition system
Schaeffer's sign language consists of a reduced set of gestures designed to help children with autism or cognitive learning disabilities to develop adequate communication skills. Our automatic recognition system for Schaeffer's gesture language uses the information provided by an RGB-D camera to capture body motion and recognize gestures using dynamic time warping combined with k-nearest neighbors methods. The learning process is reinforced by the interaction with the proposed system that accelerates learning itself thus helping both children and educators. To demonstrate the validity of the system, a set of qualitative experiments with children were carried out. As a result, a system which is able to recognize a subset of 11 gestures of Schaeffer's sign language online was achieved.This work has been supported by the Spanish Government DPI2013-40534-R Grant, supported with Feder funds
- …