28 research outputs found

    Implementazione ed ottimizzazione di algoritmi per l'analisi di Biomedical Big Data

    Get PDF
    Big Data Analytics poses many challenges to the research community who has to handle several computational problems related to the vast amount of data. An increasing interest involves Biomedical data, aiming to get the so-called personalized medicine, where therapy plans are designed on the specific genotype and phenotype of an individual patient and algorithm optimization plays a key role to this purpose. In this work we discuss about several topics related to Biomedical Big Data Analytics, with a special attention to numerical issues and algorithmic solutions related to them. We introduce a novel feature selection algorithm tailored on omics datasets, proving its efficiency on synthetic and real high-throughput genomic datasets. We tested our algorithm against other state-of-art methods obtaining better or comparable results. We also implemented and optimized different types of deep learning models, testing their efficiency on biomedical image processing tasks. Three novel frameworks for deep learning neural network models development are discussed and used to describe the numerical improvements proposed on various topics. In the first implementation we optimize two Super Resolution models showing their results on NMR images and proving their efficiency in generalization tasks without a retraining. The second optimization involves a state-of-art Object Detection neural network architecture, obtaining a significant speedup in computational performance. In the third application we discuss about femur head segmentation problem on CT images using deep learning algorithms. The last section of this work involves the implementation of a novel biomedical database obtained by the harmonization of multiple data sources, that provides network-like relationships between biomedical entities. Data related to diseases and other biological relates were mined using web-scraping methods and a novel natural language processing pipeline was designed to maximize the overlap between the different data sources involved in this project

    Implementazione e benchmarking dell'algoritmo QDANet PRO per l'analisi di big data genomici

    Get PDF
    Dato il recente avvento delle tecnologie NGS, in grado di sequenziare interi genomi umani in tempi e costi ridotti, la capacità di estrarre informazioni dai dati ha un ruolo fondamentale per lo sviluppo della ricerca. Attualmente i problemi computazionali connessi a tali analisi rientrano nel topic dei Big Data, con databases contenenti svariati tipi di dati sperimentali di dimensione sempre più ampia. Questo lavoro di tesi si occupa dell'implementazione e del benchmarking dell'algoritmo QDANet PRO, sviluppato dal gruppo di Biofisica dell'Università di Bologna: il metodo consente l'elaborazione di dati ad alta dimensionalità per l'estrazione di una Signature a bassa dimensionalità di features con un'elevata performance di classificazione, mediante una pipeline d'analisi che comprende algoritmi di dimensionality reduction. Il metodo è generalizzabile anche all'analisi di dati non biologici, ma caratterizzati comunque da un elevato volume e complessità, fattori tipici dei Big Data. L'algoritmo QDANet PRO, valutando la performance di tutte le possibili coppie di features, ne stima il potere discriminante utilizzando un Naive Bayes Quadratic Classifier per poi determinarne il ranking. Una volta selezionata una soglia di performance, viene costruito un network delle features, da cui vengono determinate le componenti connesse. Ogni sottografo viene analizzato separatamente e ridotto mediante metodi basati sulla teoria dei networks fino all'estrapolazione della Signature finale. Il metodo, già precedentemente testato su alcuni datasets disponibili al gruppo di ricerca con riscontri positivi, è stato messo a confronto con i risultati ottenuti su databases omici disponibili in letteratura, i quali costituiscono un riferimento nel settore, e con algoritmi già esistenti che svolgono simili compiti. Per la riduzione dei tempi computazionali l'algoritmo è stato implementato in linguaggio C++ su HPC, con la parallelizzazione mediante librerie OpenMP delle parti più critiche

    COVID-19 Lung Segmentation

    Get PDF
    The COVID-19 Lung Segmentation project provides a novel, unsupervised and fully auto- mated pipeline for the semantic segmentation of ground-glass opacity (GGO) areas in chest Computer Tomography (CT) scans of patients affected by COVID-19. In the project we provide a series of scripts and functions for the automated segmentation of lungs 3D areas, segmentation of GGO areas, and estimation of radiomic features

    A network approach for low dimensional signatures from high throughput data

    Get PDF
    : One of the main objectives of high-throughput genomics studies is to obtain a low-dimensional set of observables-a signature-for sample classification purposes (diagnosis, prognosis, stratification). Biological data, such as gene or protein expression, are commonly characterized by an up/down regulation behavior, for which discriminant-based methods could perform with high accuracy and easy interpretability. To obtain the most out of these methods features selection is even more critical, but it is known to be a NP-hard problem, and thus most feature selection approaches focuses on one feature at the time (k-best, Sequential Feature Selection, recursive feature elimination). We propose DNetPRO, Discriminant Analysis with Network PROcessing, a supervised network-based signature identification method. This method implements a network-based heuristic to generate one or more signatures out of the best performing feature pairs. The algorithm is easily scalable, allowing efficient computing for high number of observables ([Formula: see text]-[Formula: see text]). We show applications on real high-throughput genomic datasets in which our method outperforms existing results, or is compatible with them but with a smaller number of selected features. Moreover, the geometrical simplicity of the resulting class-separation surfaces allows a clearer interpretation of the obtained signatures in comparison to nonlinear classification models

    3D Virtual Modeling for Morphological Characterization of Pituitary Tumors: Preliminary Results on Its Predictive Role in Tumor Resection Rate

    Get PDF
    Among potential factors affecting the surgical resection in pituitary tumors, the role of tumor three-dimensional (3D) features is still unexplored. The aim of this study is to introduce the use of 3D virtual modeling for geometrical and morphological characterization of pituitary tumors and to evaluate its role as a predictor of total tumor removal. A total of 75 patients operated for a pituitary tumor have been retrospectively reviewed. Starting from patient imaging, a 3D tumor model was reconstructed, and 3D characterization based on tumor volume (Vol), area, sphericity (Spher), and convexity (Conv) was provided. The extent of tumor removal was then evaluated at post-operative imaging. Mean values were obtained for Vol (9117 +/- 8423 mm(3)), area (2352 +/- 1571 mm(2)), Spher (0.86 +/- 0.08), and Conv (0.88 +/- 0.08). Total tumor removal was achieved in 57 (75%) cases. The standard prognostic Knosp grade, Vol, and Conv were found to be independent factors, significantly predicting the extent of tumor removal. Total tumor resection correlated with lower Knosp grades (p = 0.032) and smaller Vol (p = 0.015). Conversely, tumors with a more irregular shape (low Conv) have an increased chance of incomplete tumor removal (p = 0.022). 3D geometrical and morphological features represent significant independent prognostic factors for pituitary tumor resection, and they should be considered in pre-operative planning to allow a more accurate decision-making process

    Impact of Freeze-Drying on Cord Blood (CB), Serum (S), and Platelet-Rich Plasma (CB-PRP) Preparations on Growth Factor Content and In Vitro Cell Wound Healing

    Get PDF
    Blood-based preparations are used in clinical practice for the treatment of several eye disorders. The aim of this study is to analyze the effect of freeze-drying blood-based preparations on the levels of growth factors and wound healing behaviors in an in vitro model. Platelet-rich plasma (PRP) and serum (S) preparations from the same Cord Blood (CB) sample, prepared in both fresh frozen (FF) and freeze-dried (FD) forms (and then reconstituted), were analyzed for EGF and BDNF content (ELISA Quantikine kit). The human MIO-M1 glial cell line (Moorfield/Institute of Ophthalmology, London, UK) was incubated with FF and FD products and evaluated for cell migration with scratch-induced wounding (IncuCyte S3 Essen BioScience), proliferation with cyclin A2 and D1 gene expression, and activation with vimentin and GFAP gene expression. The FF and FD forms showed similar concentrations of EGF and BDNF in both the S and PRP preparations. The wound healing assay showed no significant difference between the FF and FD forms for both S and PRP. Additionally, cell migration, proliferation, and activation did not appear to change in the FD forms compared to the FF ones. Our study showed that reconstituted FD products maintained the growth factor concentrations and biological properties of FF products and could be used as a functional treatment option

    Characterization of Pupillary Light Response Features for the Classification of Patients with Optic Neuritis

    Get PDF
    Pupillometry is a promising technique for the potential diagnosis of several neurological pathologies. However, its potential is not fully explored yet, especially for prediction purposes and results interpretation. In this work, we analyzed 100 pupillometric curves obtained by 12 subjects, applying both advanced signal processing techniques and physics methods to extract typically collected features and newly proposed ones. We used machine learning techniques for the classification of Optic Neuritis (ON) vs. Healthy subjects, controlling for overfitting and ranking the features by random permutation, following their importance in prediction. All the extracted features, except one, turned out to have significant importance for prediction, with an average accuracy of 76%, showing the complexity of the processes involved in the pupillary light response. Furthermore, we provided a possible neurological interpretation of this new set of pupillometry features in relation to ON vs. Healthy classification

    Intraspecies characterization of bacteria via evolutionary modeling of protein domains

    Get PDF
    The ability to detect and characterize bacteria within a biological sample is crucial for the monitoring of infections and epidemics, as well as for the study of human health and its relationship with commensal microorganisms. To this aim, a commonly used technique is the 16S rRNA gene targeted sequencing. PCR-amplified 16S sequences derived from the sample of interest are usually clustered into the so-called Operational Taxonomic Units (OTUs) based on pairwise similarities. Then, representative OTU sequences are compared with reference (human-made) databases to derive their phylogeny and taxonomic classification. Here, we propose a new reference-free approach to define the phylogenetic distance between bacteria based on protein domains, which are the evolving units of proteins. We extract the protein domain profiles of 3368 bacterial genomes and we use an ecological approach to model their Relative Species Abundance distribution. Based on the model parameters, we then derive a new measurement of phylogenetic distance. Finally, we show that such model-based distance is capable of detecting differences between bacteria in cases in which the 16S rRNA-based method fails, providing a possibly complementary approach , which is particularly promising for the analysis of bacterial populations measured by shotgun sequencing

    Combinatorial Discriminant Analysis Applied to RNAseq Data Reveals a Set of 10 Transcripts as Signatures of Exposure of Cattle to Mycobacterium avium subsp. paratuberculosis

    Get PDF
    Paratuberculosis or Johne's disease in cattle is a chronic granulomatous gastroenteritis caused by infection with Mycobacterium avium subspecies paratuberculosis (MAP). Paratuberculosis is not treatable; therefore, the early identification and isolation of infected animals is a key point to reduce its incidence. In this paper, we analyse RNAseq experimental data of 5 ELISA-negative cattle exposed to MAP in a positive herd, compared to 5 negative-unexposed controls. The purpose was to find a small set of differentially expressed genes able to discriminate between exposed animals in a preclinical phase from non-exposed controls. Our results identified 10 transcripts that differentiate between ELISA-negative, clinically healthy, and exposed animals belonging to paratuberculosis-positive herds and negative-unexposed animals. Of the 10 transcripts, five (TRPV4, RIC8B, IL5RA, ERF, CDC40) showed significant differential expression between the three groups while the remaining 5 (RDM1, EPHX1, STAU1, TLE1, ASB8) did not show a significant difference in at least one of the pairwise comparisons. When tested in a larger cohort, these findings may contribute to the development of a new diagnostic test for paratuberculosis based on a gene expression signature. Such a diagnostic tool could allow early interventions to reduce the risk of the infection spreading

    Unraveling pedestrian mobility on a road network using ICTs data during great tourist events

    Get PDF
    Tourist flows in historical cities are continuously growing in a globalized world and adequate governance processes, politics and tools are necessary in order to reduce impacts on the urban livability and to guarantee the preservation of cultural heritage. The ICTs offer the possibility of collecting large amount of data that can point out and quantify some statistical and dynamic properties of human mobility emerging from the individual behavior and referring to a whole road network. In this paper we analyze a new dataset that has been collected by the Italian mobile phone company TIM, which contains the GPS positions of a relevant sample of mobile devices when they actively connected to the cell phone network. Our aim is to propose innovative tools allowing to study properties of pedestrian mobility on the whole road network. Venice is a paradigmatic example for the impact of tourist flows on the resident life quality and on the preservation of cultural heritage. The GPS data provide anonymized georeferenced information on the displacements of the devices. After a filtering procedure, we develop specific algorithms able to reconstruct the daily mobility paths on the whole Venice road network. The statistical analysis of the mobility paths suggests the existence of a travel time budget for the mobility and points out the role of the rest times in the empirical relation between the mobility time and the corresponding path length. We succeed to highlight two connected mobility subnetworks extracted from the whole road network, that are able to explain the majority of the observed mobility. Our approach shows the existence of characteristic mobility paths in Venice for the tourists and for the residents. Moreover the data analysis highlights the different mobility features of the considered case studies and it allows to detect the mobility paths associated to different points of interest. Finally we have disaggregated the Italian and foreigner categories to study their different mobility behaviors
    corecore