451 research outputs found

    Data Clustering and Partial Supervision with Some Parallel Developments

    Get PDF
    Data Clustering and Partial Supell'ision with SOllie Parallel Developments by Sameh A. Salem Clustering is an important and irreplaceable step towards the search for structures in the data. Many different clustering algorithms have been proposed. Yet, the sources of variability in most clustering algorithms affect the reliability of their results. Moreover, the majority tend to be based on the knowledge of the number of clusters as one of the input parameters. Unfortunately, there are many scenarios, where this knowledge may not be available. In addition, clustering algorithms are very computationally intensive which leads to a major challenging problem in scaling up to large datasets. This thesis gives possible solutions for such problems. First, new measures - called clustering performance measures (CPMs) - for assessing the reliability of a clustering algorithm are introduced. These CPMs can be used to evaluate: I) clustering algorithms that have a structure bias to certain type of data distribution as well as those that have no such biases, 2) clustering algorithms that have initialisation dependency as well as the clustering algorithms that have a unique solution for a given set of parameter values with no initialisation dependency. Then, a novel clustering algorithm, which is a RAdius based Clustering ALgorithm (RACAL), is proposed. RACAL uses a distance based principle to map the distributions of the data assuming that clusters are determined by a distance parameter, without having to specify the number of clusters. Furthermore, RACAL is enhanced by a validity index to choose the best clustering result, i.e. result has compact clusters with wide cluster separations, for a given input parameter. Comparisons with other clustering algorithms indicate the applicability and reliability of the proposed clustering algorithm. Additionally, an adaptive partial supervision strategy is proposed for using in conjunction with RACAL_to make it act as a classifier. Results from RACAL with partial supervision, RACAL-PS, indicate its robustness in classification. Additionally, a parallel version of RACAL (P-RACAL) is proposed. The parallel evaluations of P-RACAL indicate that P-RACAL is scalable in terms of speedup and scaleup, which gives the ability to handle large datasets of high dimensions in a reasonable time. Next, a novel clustering algorithm, which achieves clustering without any control of cluster sizes, is introduced. This algorithm, which is called Nearest Neighbour Clustering, Algorithm (NNCA), uses the same concept as the K-Nearest Neighbour (KNN) classifier with the advantage that the algorithm needs no training set and it is completely unsupervised. Additionally, NNCA is augmented with a partial supervision strategy, NNCA-PS, to act as a classifier. Comparisons with other methods indicate the robustness of the proposed method in classification. Additionally, experiments on parallel environment indicate the suitability and scalability of the parallel NNCA, P-NNCA, in handling large datasets. Further investigations on more challenging data are carried out. In this context, microarray data is considered. In such data, the number of clusters is not clearly defined. This points directly towards the clustering algorithms that does not require the knowledge of the number of clusters. Therefore, the efficacy of one of these algorithms is examined. Finally, a novel integrated clustering performance measure (lCPM) is proposed to be used as a guideline for choosing the proper clustering algorithm that has the ability to extract useful biological information in a particular dataset. Supplied by The British Library - 'The world's knowledge' Supplied by The British Library - 'The world's knowledge

    Neuropathy Classification of Corneal Nerve Images Using Artificial Intelligence

    Get PDF
    Nerve variations in the human cornea have been associated with alterations in the neuropathy state of a patient suffering from chronic diseases. For some diseases, such as diabetes, detection of neuropathy prior to visible symptoms is important, whereas for others, such as multiple sclerosis, early prediction of disease worsening is crucial. As current methods fail to provide early diagnosis of neuropathy, in vivo corneal confocal microscopy enables very early insight into the nerve damage by illuminating and magnifying the human cornea. This non-invasive method captures a sequence of images from the corneal sub-basal nerve plexus. Current practices of manual nerve tracing and classification impede the advancement of medical research in this domain. Since corneal nerve analysis for neuropathy is in its initial stages, there is a dire need for process automation. To address this limitation, we seek to automate the two stages of this process: nerve segmentation and neuropathy classification of images. For nerve segmentation, we compare the performance of two existing solutions on multiple datasets to select the appropriate method and proceed to the classification stage. Consequently, we approach neuropathy classification of the images through artificial intelligence using Adaptive Neuro-Fuzzy Inference System, Support Vector Machines, Naïve Bayes and k-nearest neighbors. We further compare the performance of machine learning classifiers with deep learning. We ascertained that nerve segmentation using convolutional neural networks provided a significant improvement in sensitivity and false negative rate by at least 5% over the state-of-the-art software. For classification, ANFIS yielded the best classification accuracy of 93.7% compared to other classifiers. Furthermore, for this problem, machine learning approaches performed better in terms of classification accuracy than deep learning

    Deep active learning for suggestive segmentation of biomedical image stacks via optimisation of Dice scores and traced boundary length

    Get PDF
    Manual segmentation of stacks of 2D biomedical images (e.g., histology) is a time-consuming task which can be sped up with semi-automated techniques. In this article, we present a suggestive deep active learning framework that seeks to minimise the annotation effort required to achieve a certain level of accuracy when labelling such a stack. The framework suggests, at every iteration, a specific region of interest (ROI) in one of the images for manual delineation. Using a deep segmentation neural network and a mixed cross-entropy loss function, we propose a principled strategy to estimate class probabilities for the whole stack, conditioned on heterogeneous partial segmentations of the 2D images, as well as on weak supervision in the form of image indices that bound each ROI. Using the estimated probabilities, we propose a novel active learning criterion based on predictions for the estimated segmentation performance and delineation effort, measured with average Dice scores and total delineated boundary length, respectively, rather than common surrogates such as entropy. The query strategy suggests the ROI that is expected to maximise the ratio between performance and effort, while considering the adjacency of structures that may have already been labelled – which decrease the length of the boundary to trace. We provide quantitative results on synthetically deformed MRI scans and real histological data, showing that our framework can reduce labelling effort by up to 60–70% without compromising accuracy

    Information Access Using Neural Networks For Diverse Domains And Sources

    Get PDF
    The ever-increasing volume of web-based documents poses a challenge in efficiently accessing specialized knowledge from domain-specific sources, requiring a profound understanding of the domain and substantial comprehension effort. Although natural language technologies, such as information retrieval and machine reading compression systems, offer rapid and accurate information retrieval, their performance in specific domains is hindered by training on general domain datasets. Creating domain-specific training datasets, while effective, is time-consuming, expensive, and heavily reliant on domain experts. This thesis presents a comprehensive exploration of efficient technologies to address the challenge of information access in specific domains, focusing on retrieval-based systems encompassing question answering and ranking. We begin with a comprehensive introduction to the information access system. We demonstrated the structure of a information access system through a typical open-domain question-answering task. We outline its two major components: retrieval and reader models, and the design choice for each part. We focus on mainly three points: 1) the design choice of the connection of the two components. 2) the trade-off associated with the retrieval model and the best frontier in practice. 3) a data augmentation method to adapt the reader model, trained initially on closed-domain datasets, to effectively answer questions in the retrieval-based setting. Subsequently, we discuss various methods enabling system adaptation to specific domains. Transfer learning techniques are presented, including generation as data augmentation, further pre-training, and progressive domain-clustered training. We also present a novel zero-shot re-ranking method inspired by the compression-based distance. We summarize the conclusions and findings gathered from the experiments. Moreover, the exploration extends to retrieval-based systems beyond textual corpora. We explored the search system for an e-commerce database, wherein natural language queries are combined with user preference data to facilitate the retrieval of relevant products. To address the challenges, including noisy labels and cold start problems, for the retrieval-based e-commerce ranking system, we enhanced model training through cascaded training and adversarial sample weighting. Another scenario we investigated is the search system in the math domain, characterized by the unique role of formulas and distinct features compared to textual searches. We tackle the math related search problem by combining neural ranking models with structual optimized algorithms. Finally, we summarize the research findings and future research directions

    Role of Imaging and AI in the Evaluation of COVID-19 Infection: A Comprehensive Survey

    Get PDF
    Coronavirus disease 2019 (COVID-19) is a respiratory illness that started and rapidly became the pandemic of the century, as the number of people infected with it globally exceeded 253.4 million. Since the beginning of the pandemic of COVID-19, over two years have passed. During this hard period, several defies have been coped by the scientific society to know this novel disease, evaluate it, and treat affected patients. All these efforts are done to push back the spread of the virus. This article provides a comprehensive review to learn about the COVID-19 virus and its entry mechanism, its main repercussions on many organs and tissues of the body, identify its symptoms in the short and long terms, in addition to recognize the role of diagnosis imaging in COVID-19. Principally, the quick evolution of active vaccines act an exceptional accomplishment where leaded to decrease rate of death worldwide. However, some hurdels still have to be overcome. Many proof referrers that infection with CoV-19 causes neurological dis function in a substantial ratio of influenced patients, where these symptoms appear severely during the infection and still less is known about the potential long term consequences for the brain, where Loss of smell is a neurological sign and rudimentary symptom of COVID-19. Hence, we review the causes of olfactory bulb dysfunction and Anosmia associated with COVID-19, the latest appropriate therapeutic strategies for the COVID-19 treatment (e.g., the ACE2 strategy and the Ang II receptor), and the tests through the follow-up phases. Additionally, we discuss the long-term complications of the virus and thus the possibility of improving therapeutic strategies. Moreover, the main steps of artificial intelligence that have been used to foretell and early diagnose COVID-19 are presented, where Artificial intelligence, especially machine learning is emerging as an effective approach for diagnostic image analysis with performance in the discriminate diagnosis of injuries of COVID-19 on multiple organs, comparable to that of human practitioners. The followed methodology to prepare the current survey is to search the related work concerning the mentioned topic from different journals, such as Springer, Wiley, and Elsevier. Additionally, different studies have been compared, the results are collected and then reported as shown. The articles are selected based on the year (i.e., the last three years). Also, different keywords were checked (e.g., COVID-19, COVID-19 Treatment, COVID-19 Symptoms, and COVID-19 and Anosmia)

    Biomedical Image Processing and Classification

    Get PDF
    Biomedical image processing is an interdisciplinary field involving a variety of disciplines, e.g., electronics, computer science, physics, mathematics, physiology, and medicine. Several imaging techniques have been developed, providing many approaches to the study of the human body. Biomedical image processing is finding an increasing number of important applications in, for example, the study of the internal structure or function of an organ and the diagnosis or treatment of a disease. If associated with classification methods, it can support the development of computer-aided diagnosis (CAD) systems, which could help medical doctors in refining their clinical picture

    Gland Instance Segmentation in Colon Histology Images

    Get PDF
    This thesis looks at approaches to gland instance segmentation in histology images. The aim is to find suitable local image representations to describe the gland structures in images with benign tissue and those with malignant tissue and subsequently use them for design of accurate, scalable and flexible gland instance segmentation methods. The gland instance segmentation is a clinically important and technically challenging problem as the morphological structure and visual appearance of gland tissue is highly variable and complex. Glands are one of the most common organs in the human body. The glandular features are present in many cancer types and histopathologists use these features to predict tumour grade. Accurate tumour grading is critical for prescribing suitable cancer treatment resulting in improved outcome and survival rate. Different cancer grades are reflected by differences in glands morphology and structure. It is therefore important to accurately segment glands in histology images in order to get a valid prediction of tumour grade. Several segmentation methods, including segmentation with and without pre-classification, have been proposed and investigated as part of the research reported in this thesis. A number of feature spaces, including hand-crafted and deep features, have been investigated and experimentally validated to find a suitable set of image attributes for representation of benign and malignant gland tissue for the segmentation task. Furthermore, an exhaustive experimental examination of different combinations of features and classification methods have been carried out using both qualitative and quantitative assessments, including detection, shape and area fidelity metrics. It has been shown that the proposed hybrid method combining image level classification, to identify images with benign and malignant tissue, and pixel level classification, to perform gland segmentation, achieved the best results. It has been further shown that modelling benign glands using a three-class model, i.e. inside, outside and gland boundary, and malignant tissue using a two-class model is the best combination for achieving accurate and robust gland instance segmentation results. The deep learning features have been shown to overall outperform handcrafted features, however proposed ring-histogram features still performed adequately, particularly for segmentation of benign glands. The adopted transfer-learning model with proposed image augmentation has proven very successful with 100% image classification accuracy on the available test dataset. It has been shown that the modified object- level Boundary Jaccard metric is more suitable for measuring shape similarity than the previously used object-level Hausdorff distance, as it is not sensitive to outliers and could be easily integrated with region- based metrics such as the object-level Dice index, as contrary to the Hausdorff distance it is bounded between 0 and 1. Dissimilar to most of the other reported research, this study provides comprehensive comparative results for gland segmentation, with a large collection of diverse types of image features, including hand-crafted and deep features. The novel contributions include hybrid segmentation model superimposing image and pixel level classification, data augmentation for re-training deep learning models for the proposed image level classification, and the object- level Boundary Jaccard metric adopted for evaluation of instance segmentation methods

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster
    • …
    corecore