57 research outputs found

    Parallel selective sampling method for imbalanced and large data classification

    Get PDF
    We proposed a new algorithm to preprocess huge and imbalanced data.This algorithm, based on distance calculations, reduce both size and imbalance.The selective sampling method was conceived for parallel and distributed computing.It was combined with SVM obtaining optimized classification performances.Synthetic and real data sets were used to evaluate the classifiers performances. Several applications aim to identify rare events from very large data sets. Classification algorithms may present great limitations on large data sets and show a performance degradation due to class imbalance. Many solutions have been presented in literature to deal with the problem of huge amount of data or imbalancing separately. In this paper we assessed the performances of a novel method, Parallel Selective Sampling (PSS), able to select data from the majority class to reduce imbalance in large data sets. PSS was combined with the Support Vector Machine (SVM) classification. PSS-SVM showed excellent performances on synthetic data sets, much better than SVM. Moreover, we showed that on real data sets PSS-SVM classifiers had performances slightly better than those of SVM and RUSBoost classifiers with reduced processing times. In fact, the proposed strategy was conceived and designed for parallel and distributed computing. In conclusion, PSS-SVM is a valuable alternative to SVM and RUSBoost for the problem of classification by huge and imbalanced data, due to its accurate statistical predictions and low computational complexity

    The Immune Landscapes of Polypoid and Nonpolypoid Precancerous Colorectal Lesions.

    Get PDF
    Little is known about the immunoediting process in precancerous lesions. We explored this aspect of benign colorectal adenomas with a descriptive analysis of the immune pathways and immune cells whose regulation is linked to the morphology and size of these lesions. Two series of polypoid and nonpolypoid colorectal adenomas were used in this study: 1) 84 samples (42 lesions, each with matched samples of normal mucosa) whose gene expression data were used to quantify the tumor morphology- and size-related dysregulation of immune pathways collected in the Molecular Signature Database, using Gene Set Enrichment Analysis; 2) 40 other lesions examined with immunohistochemistry to quantify the presence of immune cells in the stromal compartment. In the analysis of transcriptomic data, 429 immune pathways displayed significant differential regulation in neoplasms of different morphology and size. Most pathways were significantly upregulated or downregulated in polypoid lesions versus nonpolypoid lesions (regardless of size). Differential pathway regulation associated with lesion size was observed only in polypoid neoplasms. These findings were mirrored by tissue immunostaining with CD4, CD8, FOXP3, MHC-I, CD68, and CD163 antibodies: stromal immune cell counts (mainly T lymphocytes and macrophages) were significantly higher in polypoid lesions. Certain markers displayed significant size-related differences regardless of lesion morphology. Multivariate analysis of variance showed that the marker panel clearly discriminated between precancerous lesions of different morphologies and sizes. Statistical analysis of immunostained cell counts fully support the results of the transcriptomic data analysis: the density of infiltration of most immune cells in the stroma of polypoid precancerous lesions was significantly higher than that observed in nonpolypoid lesions. Large neoplasms also have more immune cells in their stroma than small lesions. Immunoediting in precancerous colorectal tumors may vary with lesion morphology and stage of development, and this variability could influence a given lesion's trajectory to cancer

    Regularized Least Squares Cancer Classifiers from DNA microarray data

    Get PDF
    BACKGROUND: The advent of the technology of DNA microarrays constitutes an epochal change in the classification and discovery of different types of cancer because the information provided by DNA microarrays allows an approach to the problem of cancer analysis from a quantitative rather than qualitative point of view. Cancer classification requires well founded mathematical methods which are able to predict the status of new specimens with high significance levels starting from a limited number of data. In this paper we assess the performances of Regularized Least Squares (RLS) classifiers, originally proposed in regularization theory, by comparing them with Support Vector Machines (SVM), the state-of-the-art supervised learning technique for cancer classification by DNA microarray data. The performances of both approaches have been also investigated with respect to the number of selected genes and different gene selection strategies. RESULTS: We show that RLS classifiers have performances comparable to those of SVM classifiers as the Leave-One-Out (LOO) error evaluated on three different data sets shows. The main advantage of RLS machines is that for solving a classification problem they use a linear system of order equal to either the number of features or the number of training examples. Moreover, RLS machines allow to get an exact measure of the LOO error with just one training. CONCLUSION: RLS classifiers are a valuable alternative to SVM classifiers for the problem of cancer classification by gene expression data, due to their simplicity and low computational complexity. Moreover, RLS classifiers show generalization ability comparable to the ones of SVM classifiers also in the case the classification of new specimens involves very few gene expression levels

    Behavioral Pattern of Risso’s Dolphin (Grampus griseus) in the Gulf of Taranto (Northern Ionian Sea, Central-Eastern Mediterranean Sea)

    Get PDF
    Relatively scant information is available on the Risso’s dolphin in comparison to the other species regularly present in the Mediterranean Sea. Recently, its conservation status has been updated to Endangered by the International Union for Conservation of Nature (IUCN) in this Sea. Therefore, the need to increase information on its biology and ecology is even more urgent. This study reports the first preliminary information on the behavioral traits of the species occurring in the Gulf of Taranto (Northern Ionian Sea). Data on predominant behavioral activity states and on a set of group composition variables (group formation, cruising speed, dive duration and interaction between individuals) were collected from April 2019 to September 2021, applying the focal-group protocol with instantaneous scan sampling. Group size, depth and group composition variables were compared between activity states. Results highlight that both the group size and the several variables considered varied significantly depending on activity state. The group size was significantly smaller during feeding than resting and traveling and a characterization in terms of group formation, cruise speed, dive duration and interaction between animals is provided for the different activity states. Moreover, a list of behavioral events which occurred, as well as their relative frequency of distribution among activity states, is reported. Finally, details on the sympatric occurrences between Risso’s and striped dolphins, as well as the repetitive interaction observed between adult individuals and plastic bags floating on the sea surface, are reported and discussed

    Comparative study of gene set enrichment methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited.</p> <p>Results</p> <p>The simulation study highlights that none of the three method outperforms all others consistently. GSEA and RS are able to detect weak signals of deregulation and they perform differently when genes in a gene set are both differentially up and down regulated. GLAPA is more conservative and large differences between the two phenotypes are required to allow the method to detect differential deregulation in gene sets. This is due to the fact that the enrichment statistic in GLAPA is prediction error which is a stronger criteria than classical two sample statistic as used in RS and GSEA. This was reflected in the analysis on real data sets as GSEA and RS were seen to be significant for particular gene sets while GLAPA was not, suggesting a small effect size. We find that the rank of gene set enrichment induced by GLAPA is more similar to RS than GSEA. More importantly, the rankings of the three methods share significant overlap.</p> <p>Conclusion</p> <p>The three methods considered in our study recover relevant gene sets known to be deregulated in the experimental conditions and pathologies analyzed. There are differences between the three methods and GSEA seems to be more consistent in finding enriched gene sets, although no method uniformly dominates over all data sets. Our analysis highlights the deep difference existing between associative and predictive methods for detecting enrichment and the use of both to better interpret results of pathway analysis. We close with suggestions for users of gene set methods.</p

    Assessment of cetacean–fishery interactions in the marine food web of the Gulf of Taranto (Northern Ionian Sea, Central Mediterranean Sea)

    Get PDF
    AbstractThe exploitation of fishery resources acts as a driving force on cetaceans both directly, by determining their fishing mortality or injury as by-catch species, and indirectly, through the lowering the availability of their prey. This competitive overlap between fishing and cetaceans often results in inadequate solutions so that in some cases there have been cases of intentional cetacean culling to maximize fishing production. A modelling approach applied to investigate the ecological roles of cetaceans in the food web could prove more effective to integrate ecological and fishing aspects and to provide suggestions for management. The comparative analysis carried out in the Gulf of Taranto (Northern Ionian Sea, Central Mediterranean Sea) showed that fishing exploitation provides impacts on the investigated food web greater than those due to cetacean predation. Trawling was estimated to be the most negatively impacting fishing gear considering the mortality rates and consumption flows. On the other hand, the striped dolphin was the main impact on the food web due to its highest consumption flows. Analysis showed a negative and non-selective impact on the exploited species due to the fishing gears, while the odontocetes proved to select their prey species and provide a positive impact in the assemblage. In particular, while the fishing gears are primarily size selective, targeting mostly large and economically valuable fish, the odontocetes seem to follow a co-evolution process with their prey, developing a specialization in their resources, providing control of the meso-consumers and ensuring a trophic stability in the ecosystem

    Glycopatterns of the foregut in the striped dolphin Stenella coeruleoalba, Meyen 1833 from the Mediterranean Sea

    Get PDF
    AbstractThe glycopatterns of the glycans secreted by the mucosa of stomach and duodenal ampulla of the striped dolphin, Stenella coeruleoalba were studied by histochemical (Periodic acid‐Schiff, Alcian Blue pH 2.5, High Iron Diamine) and lectin‐binding (SBA, DBA, PNA, WGA, MAA‐II, SNA, ConA, UEA‐I, AAA, LTA) techniques. The stomach can be divided into four compartments: main stomach, two connecting chambers and pylorus. The pylorus is followed by the duodenal ampulla. Mucins are secreted by surface cells and intramucosal glands specific for each compartment. In the main stomach glands, neck cells were weakly sulphated, with prevailing glycosaminylated, glycosylated/mannosylated, and fucosylated residuals. Parietal and chief cells in general were scarcely reactive. In the connecting chambers glands, there were high levels of sulphation, glycosaminylation, glycosylation/mannosylation, and fucosylation, the latter with more complex patterns than those observed in the main stomach glands. In the pyloric glands sulphated, glycosaminylated and fucosylated residuals decreased, whereas the opposite was observed for galactosyl/galactosaminylated residuals. Glycosylation patterns in the glands of the duodenal ampulla differed from those of the pyloric ones, with similar levels of sulphation, lower levels of galactosyl/galactosaminylation and glycosaminylation, and higher level of fucosylation. The results are compared with those available in literature

    Promoter methylation correlates with reduced NDRG2 expression in advanced colon tumour

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Aberrant DNA methylation of CpG islands of cancer-related genes is among the earliest and most frequent alterations in cancerogenesis and might be of value for either diagnosing cancer or evaluating recurrent disease. This mechanism usually leads to inactivation of tumour-suppressor genes. We have designed the current study to validate our previous microarray data and to identify novel hypermethylated gene promoters.</p> <p>Methods</p> <p>The validation assay was performed in a different set of 8 patients with colorectal cancer (CRC) by means quantitative reverse-transcriptase polymerase chain reaction analysis. The differential RNA expression profiles of three CRC cell lines before and after 5-aza-2'-deoxycytidine treatment were compared to identify the hypermethylated genes. The DNA methylation status of these genes was evaluated by means of bisulphite genomic sequencing and methylation-specific polymerase chain reaction (MSP) in the 3 cell lines and in tumour tissues from 30 patients with CRC.</p> <p>Results</p> <p>Data from our previous genome search have received confirmation in the new set of 8 patients with CRC. In this validation set six genes showed a high induction after drug treatment in at least two of three CRC cell lines. Among them, the N-myc downstream-regulated gene 2 (<it>NDRG2) </it>promoter was found methylated in all CRC cell lines. <it>NDRG2 </it>hypermethylation was also detected in 8 out of 30 (27%) primary CRC tissues and was significantly associated with advanced AJCC stage IV. Normal colon tissues were not methylated.</p> <p>Conclusion</p> <p>The findings highlight the usefulness of combining gene expression patterns and epigenetic data to identify tumour biomarkers, and suggest that NDRG2 silencing might bear influence on tumour invasiveness, being associated with a more advanced stage.</p

    Automated hippocampal segmentation in 3D MRI using random undersampling with boosting algorithm

    Get PDF
    The automated identification of brain structure in Magnetic Resonance Imaging is very important both in neuroscience research and as a possible clinical diagnostic tool. In this study, a novel strategy for fully automated hippocampal segmentation in MRI is presented. It is based on a supervised algorithm, called RUSBoost, which combines data random undersampling with a boosting algorithm. RUSBoost is an algorithm specifically designed for imbalanced classification, suitable for large data sets because it uses random undersampling of the majority class. The RUSBoost performances were compared with those of ADABoost, Random Forest and the publicly available brain segmentation package, FreeSurfer. This study was conducted on a data set of 50 T1-weighted structural brain images. The RUSBoost-based segmentation tool achieved the best results with a Dice’s index of (Formula presented.) (Formula presented.) for the left (right) brain hemisphere. An independent data set of 50 T1-weighted structural brain scans was used for an independent validation of the fully trained strategies. Again the RUSBoost segmentations compared favorably with manual segmentations with the highest performances among the four tools. Moreover, the Pearson correlation coefficient between hippocampal volumes computed by manual and RUSBoost segmentations was 0.83 (0.82) for left (right) side, statistically significant, and higher than those computed by Adaboost, Random Forest and FreeSurfer. The proposed method may be suitable for accurate, robust and statistically significant segmentations of hippocampi
    • 

    corecore