451 research outputs found
Data Clustering and Partial Supervision with Some Parallel Developments
Data Clustering and Partial Supell'ision with SOllie Parallel Developments
by Sameh A. Salem
Clustering is an important and irreplaceable step towards the search for structures in the
data. Many different clustering algorithms have been proposed. Yet, the sources of variability
in most clustering algorithms affect the reliability of their results. Moreover, the
majority tend to be based on the knowledge of the number of clusters as one of the input
parameters. Unfortunately, there are many scenarios, where this knowledge may not be
available. In addition, clustering algorithms are very computationally intensive which leads
to a major challenging problem in scaling up to large datasets. This thesis gives possible
solutions for such problems.
First, new measures - called clustering performance measures (CPMs) - for assessing
the reliability of a clustering algorithm are introduced. These CPMs can be used to evaluate:
I) clustering algorithms that have a structure bias to certain type of data distribution
as well as those that have no such biases, 2) clustering algorithms that have initialisation
dependency as well as the clustering algorithms that have a unique solution for a given set
of parameter values with no initialisation dependency.
Then, a novel clustering algorithm, which is a RAdius based Clustering ALgorithm
(RACAL), is proposed. RACAL uses a distance based principle to map the distributions of
the data assuming that clusters are determined by a distance parameter, without having to
specify the number of clusters. Furthermore, RACAL is enhanced by a validity index to
choose the best clustering result, i.e. result has compact clusters with wide cluster separations,
for a given input parameter. Comparisons with other clustering algorithms indicate
the applicability and reliability of the proposed clustering algorithm. Additionally, an adaptive
partial supervision strategy is proposed for using in conjunction with RACAL_to make
it act as a classifier. Results from RACAL with partial supervision, RACAL-PS, indicate
its robustness in classification. Additionally, a parallel version of RACAL (P-RACAL) is
proposed. The parallel evaluations of P-RACAL indicate that P-RACAL is scalable in terms
of speedup and scaleup, which gives the ability to handle large datasets of high dimensions
in a reasonable time.
Next, a novel clustering algorithm, which achieves clustering without any control of
cluster sizes, is introduced. This algorithm, which is called Nearest Neighbour Clustering,
Algorithm (NNCA), uses the same concept as the K-Nearest Neighbour (KNN) classifier
with the advantage that the algorithm needs no training set and it is completely unsupervised.
Additionally, NNCA is augmented with a partial supervision strategy, NNCA-PS, to
act as a classifier. Comparisons with other methods indicate the robustness of the proposed
method in classification. Additionally, experiments on parallel environment indicate the
suitability and scalability of the parallel NNCA, P-NNCA, in handling large datasets.
Further investigations on more challenging data are carried out. In this context, microarray
data is considered. In such data, the number of clusters is not clearly defined.
This points directly towards the clustering algorithms that does not require the knowledge
of the number of clusters. Therefore, the efficacy of one of these algorithms is examined.
Finally, a novel integrated clustering performance measure (lCPM) is proposed to be used
as a guideline for choosing the proper clustering algorithm that has the ability to extract
useful biological information in a particular dataset.
Supplied by The British Library - 'The world's knowledge'
Supplied by The British Library - 'The world's knowledge
Neuropathy Classification of Corneal Nerve Images Using Artificial Intelligence
Nerve variations in the human cornea have been associated with alterations in
the neuropathy state of a patient suffering from chronic diseases. For some diseases,
such as diabetes, detection of neuropathy prior to visible symptoms is important,
whereas for others, such as multiple sclerosis, early prediction of disease worsening is
crucial. As current methods fail to provide early diagnosis of neuropathy, in vivo
corneal confocal microscopy enables very early insight into the nerve damage by
illuminating and magnifying the human cornea. This non-invasive method captures a
sequence of images from the corneal sub-basal nerve plexus. Current practices of
manual nerve tracing and classification impede the advancement of medical research in
this domain. Since corneal nerve analysis for neuropathy is in its initial stages, there is
a dire need for process automation.
To address this limitation, we seek to automate the two stages of this process:
nerve segmentation and neuropathy classification of images. For nerve segmentation,
we compare the performance of two existing solutions on multiple datasets to select the
appropriate method and proceed to the classification stage. Consequently, we approach
neuropathy classification of the images through artificial intelligence using Adaptive
Neuro-Fuzzy Inference System, Support Vector Machines, Naïve Bayes and k-nearest
neighbors. We further compare the performance of machine learning classifiers with
deep learning. We ascertained that nerve segmentation using convolutional neural networks provided a significant improvement in sensitivity and false negative rate by
at least 5% over the state-of-the-art software. For classification, ANFIS yielded the best
classification accuracy of 93.7% compared to other classifiers. Furthermore, for this
problem, machine learning approaches performed better in terms of classification
accuracy than deep learning
Deep active learning for suggestive segmentation of biomedical image stacks via optimisation of Dice scores and traced boundary length
Manual segmentation of stacks of 2D biomedical images (e.g., histology) is a time-consuming task which can be sped up with semi-automated techniques. In this article, we present a suggestive deep active learning framework that seeks to minimise the annotation effort required to achieve a certain level of accuracy when labelling such a stack. The framework suggests, at every iteration, a specific region of interest (ROI) in one of the images for manual delineation. Using a deep segmentation neural network and a mixed cross-entropy loss function, we propose a principled strategy to estimate class probabilities for the whole stack, conditioned on heterogeneous partial segmentations of the 2D images, as well as on weak supervision in the form of image indices that bound each ROI. Using the estimated probabilities, we propose a novel active learning criterion based on predictions for the estimated segmentation performance and delineation effort, measured with average Dice scores and total delineated boundary length, respectively, rather than common surrogates such as entropy. The query strategy suggests the ROI that is expected to maximise the ratio between performance and effort, while considering the adjacency of structures that may have already been labelled – which decrease the length of the boundary to trace. We provide quantitative results on synthetically deformed MRI scans and real histological data, showing that our framework can reduce labelling effort by up to 60–70% without compromising accuracy
Information Access Using Neural Networks For Diverse Domains And Sources
The ever-increasing volume of web-based documents poses a challenge in efficiently accessing specialized knowledge from domain-specific sources, requiring a profound understanding of the domain and substantial comprehension effort. Although natural language technologies, such as information retrieval and machine reading compression systems, offer rapid and accurate information retrieval, their performance in specific domains is hindered by training on general domain datasets. Creating domain-specific training datasets, while effective, is time-consuming, expensive, and heavily reliant on domain experts. This thesis presents a comprehensive exploration of efficient technologies to address the challenge of information access in specific domains, focusing on retrieval-based systems encompassing question answering and ranking.
We begin with a comprehensive introduction to the information access system. We demonstrated the structure of a information access system through a typical open-domain question-answering task. We outline its two major components: retrieval and reader models, and the design choice for each part. We focus on mainly three points: 1) the design choice of the connection of the two components. 2) the trade-off associated with the retrieval model and the best frontier in practice. 3) a data augmentation method to adapt the reader model, trained initially on closed-domain datasets, to effectively answer questions in the retrieval-based setting.
Subsequently, we discuss various methods enabling system adaptation to specific domains. Transfer learning techniques are presented, including generation as data augmentation, further pre-training, and progressive domain-clustered training. We also present a novel zero-shot re-ranking method inspired by the compression-based distance. We summarize the conclusions and findings gathered from the experiments.
Moreover, the exploration extends to retrieval-based systems beyond textual corpora. We explored the search system for an e-commerce database, wherein natural language queries are combined with user preference data to facilitate the retrieval of relevant products. To address the challenges, including noisy labels and cold start problems, for the retrieval-based e-commerce ranking system, we enhanced model training through cascaded training and adversarial sample weighting. Another scenario we investigated is the search system in the math domain, characterized by the unique role of formulas and distinct features compared to textual searches. We tackle the math related search problem by combining neural ranking models with structual optimized algorithms.
Finally, we summarize the research findings and future research directions
Role of Imaging and AI in the Evaluation of COVID-19 Infection: A Comprehensive Survey
Coronavirus disease 2019 (COVID-19) is a respiratory illness that started and rapidly became the pandemic of the century, as the number of people infected with it globally exceeded 253.4 million. Since the beginning of the pandemic of COVID-19, over two years have passed. During this hard period, several defies have been coped by the scientific society to know this novel disease, evaluate it, and treat affected patients. All these efforts are done to push back the spread of the virus. This article provides a comprehensive review to learn about the COVID-19 virus and its entry mechanism, its main repercussions on many organs and tissues of the body, identify its symptoms in the short and long terms, in addition to recognize the role of diagnosis imaging in COVID-19. Principally, the quick evolution of active vaccines act an exceptional accomplishment where leaded to decrease rate of death worldwide. However, some hurdels still have to be overcome. Many proof referrers that infection with CoV-19 causes neurological dis function in a substantial ratio of influenced patients, where these symptoms appear severely during the infection and still less is known about the potential long term consequences for the brain, where Loss of smell is a neurological sign and rudimentary symptom of COVID-19. Hence, we review the causes of olfactory bulb dysfunction and Anosmia associated with COVID-19, the latest appropriate therapeutic strategies for the COVID-19 treatment (e.g., the ACE2 strategy and the Ang II receptor), and the tests through the follow-up phases. Additionally, we discuss the long-term complications of the virus and thus the possibility of improving therapeutic strategies. Moreover, the main steps of artificial intelligence that have been used to foretell and early diagnose COVID-19 are presented, where Artificial intelligence, especially machine learning is emerging as an effective approach for diagnostic image analysis with performance in the discriminate diagnosis of injuries of COVID-19 on multiple organs, comparable to that of human practitioners. The followed methodology to prepare the current survey is to search the related work concerning the mentioned topic from different journals, such as Springer, Wiley, and Elsevier. Additionally, different studies have been compared, the results are collected and then reported as shown. The articles are selected based on the year (i.e., the last three years). Also, different keywords were checked (e.g., COVID-19, COVID-19 Treatment, COVID-19 Symptoms, and COVID-19 and Anosmia)
Biomedical Image Processing and Classification
Biomedical image processing is an interdisciplinary field involving a variety of disciplines, e.g., electronics, computer science, physics, mathematics, physiology, and medicine. Several imaging techniques have been developed, providing many approaches to the study of the human body. Biomedical image processing is finding an increasing number of important applications in, for example, the study of the internal structure or function of an organ and the diagnosis or treatment of a disease. If associated with classification methods, it can support the development of computer-aided diagnosis (CAD) systems, which could help medical doctors in refining their clinical picture
Gland Instance Segmentation in Colon Histology Images
This thesis looks at approaches to gland instance segmentation in histology images. The aim is to find suitable local image representations to describe the gland structures in images with benign tissue and those with malignant tissue and subsequently use them for design of accurate, scalable and flexible gland instance segmentation methods. The gland instance segmentation is a clinically important and technically challenging problem as the morphological structure and visual appearance of gland tissue is highly variable and complex. Glands are one of the most common organs in the human body. The glandular features are present in many cancer types and histopathologists use these features to predict tumour grade. Accurate tumour grading is critical for prescribing suitable cancer treatment resulting in improved outcome and survival rate. Different cancer grades are reflected by differences in glands morphology and structure. It is therefore important to accurately segment glands in histology images in order to get a valid prediction of tumour grade. Several segmentation methods, including segmentation with and without pre-classification, have been proposed and investigated as part of the research reported in this thesis. A number of feature spaces, including hand-crafted and deep features, have been investigated and experimentally validated to find a suitable set of image attributes for representation of benign and malignant gland tissue for the segmentation task. Furthermore, an exhaustive experimental examination of different combinations of features and
classification methods have been carried out using both qualitative and quantitative assessments, including detection, shape and area fidelity metrics. It has been shown that the proposed hybrid method combining image level classification, to identify images with benign and malignant tissue, and pixel level classification, to perform gland segmentation, achieved the best results. It has been further shown that modelling benign glands using a three-class model, i.e. inside, outside and gland boundary, and malignant tissue using a two-class model is the best combination for achieving accurate and robust gland instance segmentation results. The deep learning features have been shown to overall outperform handcrafted features, however proposed ring-histogram features still performed adequately, particularly for segmentation of benign glands. The adopted transfer-learning model with proposed image augmentation has proven very successful with 100% image classification accuracy on the available test dataset. It has been shown that the modified object- level Boundary Jaccard metric is more suitable for measuring shape similarity than the previously used object-level Hausdorff distance, as it is not sensitive to outliers and could be easily integrated with region- based metrics such as the object-level Dice index, as contrary to the Hausdorff distance it is bounded between 0 and 1. Dissimilar to most of the other reported research, this study provides comprehensive comparative results for gland segmentation, with a large collection of diverse types of image features, including hand-crafted and deep features. The novel contributions include hybrid segmentation model superimposing image and pixel level classification, data augmentation for re-training deep learning models for the proposed image level classification, and the object- level Boundary Jaccard metric adopted for evaluation of instance segmentation methods
Computational Approaches to Drug Profiling and Drug-Protein Interactions
Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a
long period of stagnation in drug approvals. Due to the extreme costs associated with
introducing a drug to the market, locating and understanding the reasons for clinical failure
is key to future productivity. As part of this PhD, three main contributions were made in
this respect. First, the web platform, LigNFam enables users to interactively explore
similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly,
two deep-learning-based binding site comparison tools were developed, competing with
the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the
open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold
relationships and has already been used in multiple projects, including integration into a
virtual screening pipeline to increase the tractability of ultra-large screening experiments.
Together, and with existing tools, the contributions made will aid in the understanding of
drug-protein relationships, particularly in the fields of off-target prediction and drug
repurposing, helping to design better drugs faster
- …