24 research outputs found
Learning Front-end Filter-bank Parameters using Convolutional Neural Networks for Abnormal Heart Sound Detection
Automatic heart sound abnormality detection can play a vital role in the
early diagnosis of heart diseases, particularly in low-resource settings. The
state-of-the-art algorithms for this task utilize a set of Finite Impulse
Response (FIR) band-pass filters as a front-end followed by a Convolutional
Neural Network (CNN) model. In this work, we propound a novel CNN architecture
that integrates the front-end bandpass filters within the network using
time-convolution (tConv) layers, which enables the FIR filter-bank parameters
to become learnable. Different initialization strategies for the learnable
filters, including random parameters and a set of predefined FIR filter-bank
coefficients, are examined. Using the proposed tConv layers, we add constraints
to the learnable FIR filters to ensure linear and zero phase responses.
Experimental evaluations are performed on a balanced 4-fold cross-validation
task prepared using the PhysioNet/CinC 2016 dataset. Results demonstrate that
the proposed models yield superior performance compared to the state-of-the-art
system, while the linear phase FIR filterbank method provides an absolute
improvement of 9.54% over the baseline in terms of an overall accuracy metric.Comment: 4 pages, 6 figures, IEEE International Engineering in Medicine and
Biology Conference (EMBC
An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification
In this work, we propose an ensemble of classifiers to distinguish between
various degrees of abnormalities of the heart using Phonocardiogram (PCG)
signals acquired using digital stethoscopes in a clinical setting, for the
INTERSPEECH 2018 Computational Paralinguistics (ComParE) Heart Beats
SubChallenge. Our primary classification framework constitutes a convolutional
neural network with 1D-CNN time-convolution (tConv) layers, which uses features
transferred from a model trained on the 2016 Physionet Heart Sound Database. We
also employ a Representation Learning (RL) approach to generate features in an
unsupervised manner using Deep Recurrent Autoencoders and use Support Vector
Machine (SVM) and Linear Discriminant Analysis (LDA) classifiers. Finally, we
utilize an SVM classifier on a high-dimensional segment-level feature extracted
using various functionals on short-term acoustic features, i.e., Low-Level
Descriptors (LLD). An ensemble of the three different approaches provides a
relative improvement of 11.13% compared to our best single sub-system in terms
of the Unweighted Average Recall (UAR) performance metric on the evaluation
dataset.Comment: 5 pages, 5 figures, Interspeech 2018 accepted manuscrip
A Large Multi-Target Dataset of Common Bengali Handwritten Graphemes
Latin has historically led the state-of-the-art in handwritten optical
character recognition (OCR) research. Adapting existing systems from Latin to
alpha-syllabary languages is particularly challenging due to a sharp contrast
between their orthographies. The segmentation of graphical constituents
corresponding to characters becomes significantly hard due to a cursive writing
system and frequent use of diacritics in the alpha-syllabary family of
languages. We propose a labeling scheme based on graphemes (linguistic segments
of word formation) that makes segmentation in-side alpha-syllabary words linear
and present the first dataset of Bengali handwritten graphemes that are
commonly used in an everyday context. The dataset contains 411k curated samples
of 1295 unique commonly used Bengali graphemes. Additionally, the test set
contains 900 uncommon Bengali graphemes for out of dictionary performance
evaluation. The dataset is open-sourced as a part of a public Handwritten
Grapheme Classification Challenge on Kaggle to benchmark vision algorithms for
multi-target grapheme classification. The unique graphemes present in this
dataset are selected based on commonality in the Google Bengali ASR corpus.
From competition proceedings, we see that deep-learning methods can generalize
to a large span of out of dictionary graphemes which are absent during
training. Dataset and starter codes at www.kaggle.com/c/bengaliai-cv19.Comment: 15 pages, 12 figures, 6 Tables, Submitted to CVPR-2
Abugida Normalizer and Parser for Unicode texts
This paper proposes two libraries to address common and uncommon issues with
Unicode-based writing schemes for Indic languages. The first is a normalizer
that corrects inconsistencies caused by the encoding scheme
https://pypi.org/project/bnunicodenormalizer/ . The second is a grapheme parser
for Abugida text https://pypi.org/project/indicparser/ . Both tools are more
efficient and effective than previously used tools. We report 400% increase in
speed and ensure significantly better performance for different language model
based downstream tasks.Comment: 3 pages, 1 figur
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset
While strides have been made in deep learning based Bengali Optical Character
Recognition (OCR) in the past decade, the absence of large Document Layout
Analysis (DLA) datasets has hindered the application of OCR in document
transcription, e.g., transcribing historical documents and newspapers.
Moreover, rule-based DLA systems that are currently being employed in practice
are not robust to domain variations and out-of-distribution layouts. To this
end, we present the first multidomain large Bengali Document Layout Analysis
Dataset: BaDLAD. This dataset contains 33,695 human annotated document samples
from six domains - i) books and magazines, ii) public domain govt. documents,
iii) liberation war documents, iv) newspapers, v) historical newspapers, and
vi) property deeds, with 710K polygon annotations for four unit types:
text-box, paragraph, image, and table. Through preliminary experiments
benchmarking the performance of existing state-of-the-art deep learning
architectures for English DLA, we demonstrate the efficacy of our dataset in
training deep learning based Bengali document digitization models