239 research outputs found

    HEp-2 Cell Classification with heterogeneous classes-processes based on K-Nearest Neighbours

    Get PDF
    We present a scheme for the feature extraction and classification of the fluorescence staining patterns of HEp-2 cells in IIF images. We propose a set of complementary processes specific to each class of patterns to search. Our set of processes consists of preprocessing,features extraction and classification. The choice of methods, features and parameters was performed automatically, using the Mean Class Accuracy (MCA) as a figure of merit. We extract a large number (108) of features able to fully characterize the staining pattern of HEp-2 cells. We propose a classification approach based on two steps: the first step follows the one-against-all(OAA) scheme, while the second step follows the one-against-one (OAO) scheme. To do this, we needed to implement 21 KNN classifiers: 6 OAA and 15 OAO. Leave-one-out image cross validation method was used for the evaluation of the results

    Computer-Assisted Classification Patterns in Autoimmune Diagnostics: The AIDA Project

    Get PDF
    Antinuclear antibodies (ANAs) are significant biomarkers in the diagnosis of autoimmune diseases in humans, done by mean of Indirect ImmunoFluorescence (IIF)method, and performed by analyzing patterns and fluorescence intensity. This paper introduces the AIDA Project (autoimmunity: diagnosis assisted by computer) developed in the framework of an Italy-Tunisia cross-border cooperation and its preliminary results. A database of interpreted IIF images is being collected through the exchange of images and double reporting and a Gold Standard database, containing around 1000 double reported images, has been settled. The Gold Standard database is used for optimization of aCAD(Computer AidedDetection) solution and for the assessment of its added value, in order to be applied along with an Immunologist as a second Reader in detection of autoantibodies. This CAD system is able to identify on IIF images the fluorescence intensity and the fluorescence pattern. Preliminary results show that CAD, used as second Reader, appeared to perform better than Junior Immunologists and hence may significantly improve their efficacy; compared with two Junior Immunologists, the CAD system showed higher Intensity Accuracy (85,5% versus 66,0% and 66,0%), higher Patterns Accuracy (79,3% versus 48,0% and 66,2%), and higher Mean Class Accuracy (79,4% versus 56,7% and 64.2%)

    Preliminary results of the project A.I.D.A. (Auto Immunity: Diagnosis Assisted by computer)

    Get PDF
    In this paper, are presented the preliminary results of the A.I.D.A. (Auto Immunity: Diagnosis Assisted by computer) project which is developed in the frame of the cross-border cooperation Italy-Tunisia. According to the main objectives of this project, a database of interpreted Indirect ImmunoFluorescence (IIF) images on HEp 2 cells is being collected thanks to the contribution of Italian and Tunisian experts involved in routine diagnosis of autoimmune diseases. Through exchanging images and double reporting; a Gold Standard database, containing around 1000 double reported IIF images with different patterns including negative tests, has been settled. This Gold Standard database has been used for optimization of a computing solution (CADComputer Aided Detection) and for assessment of its added value in order to be used along with an immunologist as a second reader in detection of auto antibodies for autoimmune disease diagnosis. From the preliminary results obtained, the CAD appeared more powerful than junior immunologists used as second readers and may significantly improve their efficacy

    A text mining based approach for biomarker discovery

    Get PDF
    Dissertação de mestrado em BioinformáticaBiomarkers have long been heralded as potential motivators for the emergence of new treatment and diagnostic procedures for disease conditions. However, for many years, the biomarker discovery process could only be achieved through experimental means, serving as a deterrent for their increase in popularity as the usually large number of candidates resulted in a costly and time-consuming discovery process. The increase in computational capabilities has led to a change in the paradigm of biomarker discovery, migrating from the clinical laboratory to in silico environments. Furthermore, text mining, the act of automatically extracting information from text through computational means, has seen a rise in popularity in the biomedical fields. The number of studies and clinical trials in these fields has greatly increased in the past years, making the task of manually examining and annotating these, at the very least, incredibly cumbersome. Adding to this, even though the development of efficient and thorough natural language processing is still an on-going process, the potential for the discovery of common reported and hidden behaviours in the scientific literature is too high to be ignored. Several tools, technologies, pipelines and frameworks already exist capable of, at least, giving a glimpse on how the analysis of the available pile of scientific literature can pave the way for the development of novel medical techniques that might help in the prevention, diagnostic and treatment of diseases. As such, a novel approach is presented in this work for achieving biomarker discov ery, one that integrates both gene-disease associations extracted from current biomedical literature and RNA-Seq gene expression data in an L1-regularization mixed-integer linear programming model for identifying potential biomarkers, potentially providing an optimal and robust genetic signature for disease diagnostic and helping identify novel biomarker candidates. This analysis was carried out on five publicly available RNA-Seq datasets ob tained from the Genomic Data Commons Data Portal, related to breast, colon, lung and prostate cancer, and head and neck squamous cell carcinoma. Hyperparameter optimiza tion was also performed for this approach, and the performance of the optimal set of pa rameters was compared against other machine learning methods.Os biomarcadores há muito que são considerados como os motivadores principais para o desenvolvimento de novos procedimentos de diagnóstico e tratamento de doenças. No entanto, ate há relativamente pouco tempo, o processo de descoberta de biomarcadores estava dependente de métodos experimentais, sendo este um elemento dissuasor da sua aplicação e estudo em massa dado que o número elevado de candidatos implicava um processo de averiguação extremamente dispendioso e demorado. O grande aumento do poder computacional nas últimas décadas veio contrariar esta tendência, levando a migração do processo de descoberta de biomarcadores do laboratório para o ambiente in silico. Para além disso, a aplicação de processos de mineração de textos, que consistem na extração de informação de documentos através de meios computacionais, tem visto um aumento da sua popularidade na comunidade biomédica devido ao aumento exponencial do número de estudos e ensaios clínicos nesta área, tornando todo o processo de analise e anotação manual destes bastante laborioso. A adicionar a isto, apesar do desenvolvimento de métodos eficientes capazes de processar linguagem natural na sua plenitude seja um processo que ainda esteja a decorrer, o potencial para a descoberta de comportamentos reportados e escondidos na literatura e demasiado elevado para ser ignorado. Já existem diversas ferramentas e tecnologias capazes de, pelo menos, dar uma indicação de como a análise da literatura científica disponível pode abrir o caminho para o desenvolvimento de novas técnicas e procedimentos médicos que poder ao auxiliar na prevenção, diagnóstico e tratamento de doenças. Como tal, e apresentado neste trabalho um novo método para realizar a descoberta de biomarcadores, que considera simultaneamente associações entre genes e doenças, já extraídas da literatura biomédica e dados de expressão de genes RNA-Seq num modelo de otimização linear com regularização L1 com variáveis contínuas e inteiras (MILP) para identificar possíveis biomarcadores, sendo capaz potencialmente de providenciar assinaturas genéticas ótimas e robustas para o diagnostico de doenças e ajudar a identificar novos candidatos a biomarcador. Esta análise foi levada a cabo em cinco conjuntos de dados RNA-Seq obtidos através do Portal de Dados do Genomic Data Commons (GDC) relacionados com os cancros da mama, colon, pulmão, próstata, e carcinoma escamoso da cabeça e pescoço. Realizou-se também uma otimização dos hiperparâmetros deste método, e o desempenho do conjunto ideal de parâmetros foi comparado com o de outros métodos de aprendizagem máquina

    Multi-particle reconstruction with dynamic graph neural networks

    Get PDF
    The task of finding the incident particles from the sensor deposits they leave on particle detectors is called event or particle reconstruction. The sensor deposits can be represented generically as a point cloud, with each point corresponding to three spatial dimensions of the sensor location, the energy deposit, and occasionally, also the time of the deposit. As particle detectors become increasingly more complex, ever-more sophisticated methods are needed to perform particle reconstruction. An example is the ongoing High Luminosity (HL) upgrade of the Large Hadron Collider (HL-LHC). The HLHLC is the most significant milestone in experimental particle physics and aims to deliver an order of magnitude more data rate compared to the current LHC. As part of the upgrade, the endcap calorimeters of the Compact Muon Solenoid (CMS) experiment – one of the two largest and generalpurpose detectors at the LHC – will be replaced by the radiation-hard High Granularity Calorimeter (HGCAL). The HGCAL will contain ∼ 6 million sensors to achieve the spatial resolution required for reconstructing individual particles in HL-LHC conditions. It has an irregular geometry due to its hexagonal sensors, with sizes varying across the longitudinal and transverse axes. Further, it generates sparse data as less than 10% of the sensors register positive energy. Reconstruction in this environment, where highly irregular patterns of hits are left by the particles, is an unprecedentedly intractable and compute-intensive pattern recognition problem. This motivates the use of parallelisationfriendly deep learning approaches. More traditional deep learning methods, however, are not feasible for the HGCAL because a regular grid-like structure is assumed in those approaches. In this thesis, a reconstruction algorithm based on a dynamic graph neural network called GravNet is presented. The network is paired with a segmentation technique, Object Condensation, to first perform point-cloud segmentation on the detector hits. The property-prediction capability of the Object Condensation approach is then used for energy regression of the reconstructed particles. A range of experiments are conducted to show that this method works well in conditions expected in the HGCAL i.e., with 200 simultaneous proton-proton collisions. Parallel algorithms based on Nvidia CUDA are also presented to address the computational challenges of the graph neural network discussed in this thesis. With the optimisations, reconstruction can be performed by this method in approximately 2 seconds which is suitable considering the computational constraints at the LHC. The presented method is the first-ever example of deep learning based end-to-end calorimetric reconstruction in high occupancy environments. This sets the stage for the next era of particle reconstruction, which is expected to be end-to-end. While this thesis is focused on the HGCAL, the method discussed is general and can be extended not only to other calorimeters but also to other tasks such as track reconstruction

    Machine learning based soil maps for a wide range of soil properties for the forested area of Switzerland

    Get PDF
    Spatial soil information in forests is crucial to assess ecosystem services such as carbon storage, water purification or biodiversity. However, spatially continuous information on soil properties at adequate resolution is rare in forested areas, especially in mountain regions. Therefore, we aimed to build high-resolution soil property maps for pH, soil organic carbon, clay, sand, gravel and soil density for six depth intervals as well as for soil thickness for the entire forested area of Switzerland. We used legacy data from 2071 soil profiles and evaluated six different modelling approaches of digital soil mapping, namely lasso, robust external-drift kriging, geoadditive modelling, quantile regression forest (QRF), cubist and support vector machines. Moreover, we combined the predictions of the individual models by applying a weighted model averaging approach. All models were built from a large set of potential covariates which included e.g. multi-scale terrain attributes and remote sensing data characterizing vegetation cover. Model performances, evaluated against an independent dataset were similar for all methods. However, QRF achieved the best prediction performance in most cases (18 out of 37 models), while model averaging outperformed the individual models in five cases. For the final soil property maps we therefore used the QRF predictions. Prediction performance showed large differences for the individual soil properties. While for fine earth density the R2 of QRF varied between 0.51 and 0.64 across all depth intervals, soil organic carbon content was more difficult to predict (R2 = 0.19–0.32). Since QRF was used for map prediction, we assessed the 90% prediction intervals from which we derived uncertainty maps. The latter are valuable to better interpret the predictions and provide guidance for future mapping campaigns to improve the soil maps

    Investigation of artificial immune systems and variable selection techniques for credit scoring

    Get PDF
    Most lending institutions are aware of the importance of having a well-performing credit scoring model or scorecard and know that, in order to remain competitive in the credit industry, it is necessary to continuously improve their scorecards. This is because better scorecards result in substantial monetary savings that can be stated in terms of millions of dollars. Thus, there has been increasing interest in the application of new classifiers in credit scoring from both practitioners and researchers in the last few decades. Most of the recent work in this field has focused on the use of new and innovative techniques to classify applicants as either 'credit-worthy' or 'non-credit-worthy', with the aim of improving scorecard performance. In this thesis, we investigate the suitability of intelligent systems techniques for credit scoring. In particular, intelligent systems that use immunological metaphors are examined and used to build a learning and evolutionary classification algorithm. Our model, named Simple Artificial Immune System (SAIS), is based on the concepts of the natural immune system. The model uses applicants' credit details to classify them as either 'credit-worthy' or 'non-credit-worthy'. As part of the model development, we also investigate several techniques for selecting variables from the applicants' credit details. Variable selection is important as choosing the best set of variables can have a significant effect on the performance of scorecards. Interestingly, our results demonstrate that the traditional stepwise regression variable selection technique seems to perform better than many of the more recent techniques. A further contribution offered by this thesis is a detailed description of the scorecard development process. A detailed explanation of this process is not readily available in the literature and our description of the process is based on our own experiences and discussions with industry credit risk practitioners. We evaluate our model using both publicly available datasets as well as a very large set of real-world consumer credit scoring data obtained from a leading Australian bank. The evaluation results reveal that SAIS is a competitive classifier and is appropriate for developing scorecards which require a class decision as an outcome. Another conclusion reached is one confirmed by the existing literature, that even though more sophisticated scorecard development techniques, including SAIS, perform well compared to the traditional statistical methods, their performances are not statistically significantly different from the statistical methods. As with other intelligent systems techniques, SAIS is not explicitly designed to develop practical scorecards which require the generation of a score that represents the degree of confidence that an applicant will belong to a particular group. However, it is comparable to other intelligent systems techniques which are outperformed by statistical techniques for generating p ractical scorecards. Our final remark on this research is that even though SAIS does not seem to be quite suitable for developing practical scorecards, we still believe that there is room for improvement and that the natural immune system of the body has a number of avenues yet to be explored which could assist with the development of practical scorecards

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise
    • …
    corecore