3 research outputs found
Efficient Sketching Algorithm for Sparse Binary Data
Recent advancement of the WWW, IOT, social network, e-commerce, etc. have
generated a large volume of data. These datasets are mostly represented by high
dimensional and sparse datasets. Many fundamental subroutines of common data
analytic tasks such as clustering, classification, ranking, nearest neighbour
search, etc. scale poorly with the dimension of the dataset. In this work, we
address this problem and propose a sketching (alternatively, dimensionality
reduction) algorithm -- \binsketch (Binary Data Sketch) -- for sparse binary
datasets. \binsketch preserves the binary version of the dataset after
sketching and maintains estimates for multiple similarity measures such as
Jaccard, Cosine, Inner-Product similarities, and Hamming distance, on the same
sketch. We present a theoretical analysis of our algorithm and complement it
with extensive experimentation on several real-world datasets. We compare the
performance of our algorithm with the state-of-the-art algorithms on the task
of mean-square-error and ranking. Our proposed algorithm offers a comparable
accuracy while suggesting a significant speedup in the dimensionality reduction
time, with respect to the other candidate algorithms. Our proposal is simple,
easy to implement, and therefore can be adopted in practice
Recommended from our members
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Gliomas are the most common primary brain malignancies, with different
degrees of aggressiveness, variable prognosis and various heterogeneous
histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic
core, active and non-enhancing core. This intrinsic heterogeneity is also
portrayed in their radio-phenotype, as their sub-regions are depicted by
varying intensity profiles disseminated across multi-parametric magnetic
resonance imaging (mpMRI) scans, reflecting varying biological properties.
Their heterogeneous shape, extent, and location are some of the factors that
make these tumors difficult to resect, and in some cases inoperable. The amount
of resected tumor is a factor also considered in longitudinal scans, when
evaluating the apparent tumor for potential diagnosis of progression.
Furthermore, there is mounting evidence that accurate segmentation of the
various tumor sub-regions can offer the basis for quantitative image analysis
towards prediction of patient overall survival. This study assesses the
state-of-the-art machine learning (ML) methods used for brain tumor image
analysis in mpMRI scans, during the last seven instances of the International
Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we
focus on i) evaluating segmentations of the various glioma sub-regions in
pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue
of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO
criteria, and iii) predicting the overall survival from pre-operative mpMRI
scans of patients that underwent gross total resection. Finally, we investigate
the challenge of identifying the best ML algorithms for each of these tasks,
considering that apart from being diverse on each instance of the challenge,
the multi-institutional mpMRI BraTS dataset has also been a continuously
evolving/growing dataset
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Gliomas are the most common primary brain malignancies, with different
degrees of aggressiveness, variable prognosis and various heterogeneous
histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic
core, active and non-enhancing core. This intrinsic heterogeneity is also
portrayed in their radio-phenotype, as their sub-regions are depicted by
varying intensity profiles disseminated across multi-parametric magnetic
resonance imaging (mpMRI) scans, reflecting varying biological properties.
Their heterogeneous shape, extent, and location are some of the factors that
make these tumors difficult to resect, and in some cases inoperable. The amount
of resected tumor is a factor also considered in longitudinal scans, when
evaluating the apparent tumor for potential diagnosis of progression.
Furthermore, there is mounting evidence that accurate segmentation of the
various tumor sub-regions can offer the basis for quantitative image analysis
towards prediction of patient overall survival. This study assesses the
state-of-the-art machine learning (ML) methods used for brain tumor image
analysis in mpMRI scans, during the last seven instances of the International
Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we
focus on i) evaluating segmentations of the various glioma sub-regions in
pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue
of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO
criteria, and iii) predicting the overall survival from pre-operative mpMRI
scans of patients that underwent gross total resection. Finally, we investigate
the challenge of identifying the best ML algorithms for each of these tasks,
considering that apart from being diverse on each instance of the challenge,
the multi-institutional mpMRI BraTS dataset has also been a continuously
evolving/growing dataset