Search CORE

26 research outputs found

Machine Learning Approaches Identify Genes Containing Spatial Information From Single-Cell Transcriptomics Data.

Author: Karathanasis Nestoras
Loher Phillipe
Publication venue: Jefferson Digital Commons
Publication date: 01/02/2021
Field of study

The development of single-cell sequencing technologies has allowed researchers to gain important new knowledge about the expression profile of genes in thousands of individual cells of a model organism or tissue. A common disadvantage of this technology is the loss of the three-dimensional (3-D) structure of the cells. Consequently, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized the Single-Cell Transcriptomics Challenge, in which we participated, with the aim to address the following two problems: (a) to identify the top 60, 40, and 20 genes of the Drosophila melanogaster embryo that contain the most spatial information and (b) to reconstruct the 3-D arrangement of the embryo using information from those genes. We developed two independent techniques, leveraging machine learning models from least absolute shrinkage and selection operator (Lasso) and deep neural networks (NNs), which are applied to high-dimensional single-cell sequencing data in order to accurately identify genes that contain spatial information. Our first technique, Lasso.TopX, utilizes the Lasso and ranking statistics and allows a user to define a specific number of features they are interested in. The NN approach utilizes weak supervision for linear regression to accommodate for uncertain or probabilistic training labels. We show, individually for both techniques, that we are able to identify important, stable, and a user-defined number of genes containing the most spatial information. The results from both techniques achieve high performance when reconstructing spatial information in D. melanogaster and also generalize to zebrafish (Danio rerio). Furthermore, we identified novel D. melanogaster genes that carry important positional information and were not previously suspected. We also show how the indirect use of the full datasets’ information can lead to data leakage and generate bias in overestimating the model’s performance. Lastly, we discuss the applicability of our approaches to other feature selection problems outside the realm of single-cell sequencing and the importance of being able to handle probabilistic training labels. Our source code and detailed documentation are available at https://github.com/TJU-CMC-Org/SingleCell-DREAM/

Jefferson Digital Commons

Reproducibility Efforts as a Teaching Tool: A Pilot Study

Author: Abhimannyu Rimal
Buchel Gina
Frisbie Victoria
Heng Vibol
Hwang Daniel
Karathanasis Nestoras
Kryoneriti Dafni
Li Peiyao
Rigoutsos Isidore
Slogoff-Sevilla Phillip
Publication venue: Jefferson Digital Commons
Publication date: 01/11/2022
Field of study

The replication crisis is a methodological problem in which many scientific research findings have been difficult or impossible to replicate. Because the reproducibility of empirical results is an essential aspect of the scientific method, such failures endanger the credibility of theories based on them and possibly significant portions of scientific knowledge. An instance of the replication crisis, analytic replication, pertains to reproducing published results through computational reanalysis of the authors\u27 original data. However, direct replications are costly, time-consuming, and unrewarded in today\u27s publishing standards. We propose that bioinformatics and computational biology students replicate recent discoveries as part of their curriculum. Considering the above, we performed a pilot study in one of the graduate-level courses we developed and taught at our University. The course is entitled Intro to R Programming and is meant for students in our Master\u27s and PhD programs who have little to no programming skills. As the course emphasized real-world data analysis, we thought it would be an appropriate setting to carry out this study. The primary objective was to expose the students to real biological data analysis problems. These include locating and downloading the needed datasets, understanding any underlying conventions and annotations, understanding the analytical methods, and regenerating multiple graphs from their assigned article. The secondary goal was to determine whether the assigned articles contained sufficient information for a graduate-level student to replicate its figures. Overall, the students successfully reproduced 39% of the figures. The main obstacles were the need for more advanced programming skills and the incomplete documentation of the applied methods. Students were engaged, enthusiastic, and focused throughout the semester. We believe that this teaching approach will allow students to make fundamental scientific contributions under appropriate supervision. It will teach them about the scientific process, the importance of reporting standards, and the importance of openness

Directory of Open Access Journals

PubMed Central

Jefferson Digital Commons

Non-parametric combination analysis of multiple data types enables detection of novel regulatory mechanisms in T cells of multiple sclerosis patients

Author: Ewing Ewoud
Fernandes Sunjay Jude
Gomez-Cabrero David
Jagodic Maja
Joshi Rubin Narayan
Karathanasis Nestoras
Khademi Mohsen
Kockum Ingrid
Lagani Vincenzo
Morikawa Hiromasa
Olsson Tomas
Piehl Fredrik
Planell Nuria
Ruhrmann Sabrina
Schmidt Angelika
Tegner Jesper
Tsamardinos Ioannis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Multiple Sclerosis (MS) is an autoimmune disease of the central nervous system with prominent neurodegenerative components. The triggering and progression of MS is associated with transcriptional and epigenetic alterations in several tissues, including peripheral blood. The combined influence of transcriptional and epigenetic changes associated with MS has not been assessed in the same individuals. Here we generated paired transcriptomic (RNA-seq) and DNA methylation (Illumina 450 K array) profiles of CD4+ and CD8+ T cells (CD4, CD8), using clinically accessible blood from healthy donors and MS patients in the initial relapsing-remitting and subsequent secondary-progressive stage. By integrating the output of a differential expression test with a permutation-based non-parametric combination methodology, we identified 149 differentially expressed (DE) genes in both CD4 and CD8 cells collected from MS patients. Moreover, by leveraging the methylation-dependent regulation of gene expression, we identified the gene SH3YL1, which displayed significant correlated expression and methylation changes in MS patients. Importantly, silencing of SH3YL1 in primary human CD4 cells demonstrated its influence on T cell activation. Collectively, our strategy based on paired sampling of several cell-types provides a novel approach to increase sensitivity for identifying shared mechanisms altered in CD4 and CD8 cells of relevance in MS in small sized clinical materials

Open Access LMU

Jefferson Digital Commons

STATegra: Multi-Omics Data Integration - A Conceptual Scheme With a Bioinformatics Pipeline

Author: Arozarena Imanol
Conesa Ana
Ewing Ewoud
Gomez-Cabrero David
Jagodic Maja
Karathanasis Nestoras
Lagani Vincenzo
Planell Nuria
Sebastian-Leon Patricia
Tarazona Sonia
Tegner Jesper
Tsamardinos Ioannis
Urdangarin Arantxa
van der Kloet Frans
Publication venue: Jefferson Digital Commons
Publication date: 04/03/2021
Field of study

Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor packag

Jefferson Digital Commons

FAIR+E pathogen data for surveillance and research: lessons from COVID-19

Author: Aitana Neves
David Salgado
Erik Hjerde
Isabel Cuesta
Jacques van Helden
Jacques van Helden
Nadim Rahman
Nazeefa Fatima
Nestoras Karathanasis
Niklas Blomberg
Pawel Zmora
Sushma Nagaraja Grellscheid
Terje Klemetsen
Wolmar Nyberg Åkerström
Zahra Waheed
Publication venue: Frontiers Media S.A.
Publication date: 01/11/2023
Field of study

The COVID-19 pandemic has exemplified the importance of interoperable and equitable data sharing for global surveillance and to support research. While many challenges could be overcome, at least in some countries, many hurdles within the organizational, scientific, technical and cultural realms still remain to be tackled to be prepared for future threats. We propose to (i) continue supporting global efforts that have proven to be efficient and trustworthy toward addressing challenges in pathogen molecular data sharing; (ii) establish a distributed network of Pathogen Data Platforms to (a) ensure high quality data, metadata standardization and data analysis, (b) perform data brokering on behalf of data providers both for research and surveillance, (c) foster capacity building and continuous improvements, also for pandemic preparedness; (iii) establish an International One Health Pathogens Portal, connecting pathogen data isolated from various sources (human, animal, food, environment), in a truly One Health approach and following FAIR principles. To address these challenging endeavors, we have started an ELIXIR Focus Group where we invite all interested experts to join in a concerted, expert-driven effort toward sustaining and ensuring high-quality data for global surveillance and research

Directory of Open Access Journals

Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data.

Author: Ahsen Mehmet Eren
Banda Peter
Bhatti Gaurav
Chen Yang
Consortium DREAM SCTC
Gabor Attila
Glaab Enrico
Guinney Justin
Hafemeister Christoph
Hu Ying
Karaiskos Nikos
Karathanasis Nestoras
Krause Roland
Le Thuc D
Liang Xiaoyu
Loher Phillipe
Mao Disheng
Meyer Pablo
Nguyen Thin
Nguyen Tin
Ouyang Zhengqing
Pham Hoang Vv
Qiu Peng
Rajewsky Nikolaus
Romero Roberto
Ruan Jianhua
Saez-Rodriguez Julio
Shu Chang
Stolovitzky Gustavo
Tanevski Jovan
Tarca Adi L
Tran Duc
Truong Buu
Xiaomei Li
Xu Ke
Yu Thomas
Zand Maryam
Zhang Xinyu
Zhang Yuping
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/01/2020
Field of study

Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair-rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues

The Jackson Laboratory: The Mouseion at the JAXlibrary

MDC Repository

Open Repository and Bibliography - Luxembourg

Jefferson Digital Commons

Μελέτη microRNA-mRNA αλληλεπιδράσεων σχετιζόμενων με καρκίνο

Author: Karathanasis Nestoras
Καραθανάσης Νέστορας
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 01/01/2013
Field of study

Τα microRNA είναι μικρά μη κωδικοποιά μόρια RNA τα οποία προσδένονταιστην 3’ αμετάφραστη περιοχή (3’UTR) του mRNA στόχου και οδηγούν σεκαταστολή της μετάφρασης ή/και την αποικοδόμηση του. Έχουν συνδεθεί μεδιάφορα είδη καρκίνου, μέσω της εμφάνισής τους σε γενωμικές περιοχές πουσχετίζονται με καρκίνο (CAGR/FRA), επειδή στοχεύουν γονίδια που εμπλέκονταισε καρκίνο ή επειδή η έκφραση τους εμφανίζεται διαφοροποιημένη σεκαρκινικούς ιστούς. Το εργαστήριο της Δρ. Ποϊράζη ανακάλυψε πρόσφατατέσσερα καινούργια πρόδρομα microRNA σε CAGR, χωρίς ωστόσο να είναιγνωστά τα ώριμα μόρια και η ακριβής σχέση τους με τον καρκίνο. Στόχοι τηπαρούσας διατριβής είναι: Η δημιουργία ενός υπολογιστικού εργαλείου για την πρόβλεψη τωνώριμων μορίων των miRNA, (περιγράφεται στο κεφάλαιο ΙΙ και ΙΙΙ). Η πειραματική εύρεση των ώριμων μορίων που παράγονται από τέσσεραπρόδρομα miRNAs, (περιγράφεται στο κεφάλαιο IV). Η υπολογιστική πρόβλεψη και πειραματική επιβεβαίωσηαλληλεπιδράσεων μεταξύ των ώριμων μορίων και γονιδίων που έχουνσυσχετιστεί με τον καρκίνο. (περιγράφεται στο κεφάλαιο IV).MicroRNAs belong to the large family of small non coding RNAs. They regulateprotein synthesis by binding to their mRNA targets causing mRNA degradationor translational repression. A large number of miRNAs have been associated withcancer because they are often found to be located within cancer associatedgenomic region (CAGRs/FRA) to target cancer-related genes, and to bedifferentially expressed in tumor compared to normal tissues. Previous work inthe Computational Biology lab had identified four new putative miRNA genesthat were located within CAGR. However their mature molecules and theirassociation with cancer phenotypes were unknown. My thesis focuses onresolving these two issues, using a combination of theoretical and experimentaltechniques. The specific aims of this work are: The development of a mature miRNA prediction algorithm(Chapter II, III) The identification of the mature miRNA molecules of the newly identifiedmiRNA genes via a combination of computational and experimentalmethods (Chapter IV) The utilization of a target prediction algorithm to predict andexperimentally verify interactions between the mature molecules andcancer-related genes Chapter IV)

Hellenic National Archive of Doctoral Dissertations

MiRduplexSVM: A High-Performing MiRNA-Duplex Prediction and Evaluation Methodology

Author: Ioannis Tsamardinos (244260)
Nestoras Karathanasis (737388)
Panayiota Poirazi (21391)
Publication venue
Publication date: 11/05/2015
Field of study

<div><p>We address the problem of predicting the position of a miRNA duplex on a microRNA hairpin via the development and application of a novel SVM-based methodology. Our method combines a unique problem representation and an unbiased optimization protocol to learn from mirBase19.0 an accurate predictive model, termed MiRduplexSVM. This is the first model that provides precise information about all four ends of the miRNA duplex. We show that (a) our method outperforms four state-of-the-art tools, namely MaturePred, MiRPara, MatureBayes, MiRdup as well as a Simple Geometric Locator when applied on the same training datasets employed for each tool and evaluated on a common blind test set. (b) In all comparisons, MiRduplexSVM shows superior performance, achieving up to a 60% increase in prediction accuracy for mammalian hairpins and can generalize very well on plant hairpins, without any special optimization. (c) The tool has a number of important applications such as the ability to accurately predict the miRNA or the miRNA*, given the opposite strand of a duplex. Its performance on this task is superior to the 2nts overhang rule commonly used in computational studies and similar to that of a comparative genomic approach, without the need for prior knowledge or the complexity of performing multiple alignments. Finally, it is able to evaluate novel, potential miRNAs found either computationally or experimentally. In relation with recent confidence evaluation methods used in miRBase, MiRduplexSVM was successful in identifying high confidence potential miRNAs.</p></div

Directory of Open Access Journals

FigShare

omicsNPC: Applying the Non-Parametric Combination Methodology to the Integrative Analysis of Heterogeneous Omics Data

Author: Ioannis Tsamardinos (244260)
Nestoras Karathanasis (737388)
Vincenzo Lagani (3331419)
Publication venue
Publication date: 03/11/2016
Field of study

<div><p>Background</p><p>The advance of omics technologies has made possible to measure several data modalities on a system of interest. In this work, we illustrate how the Non-Parametric Combination methodology, namely NPC, can be used for simultaneously assessing the association of different molecular quantities with an outcome of interest. We argue that NPC methods have several potential applications in integrating heterogeneous omics technologies, as for example identifying genes whose methylation and transcriptional levels are jointly deregulated, or finding proteins whose abundance shows the same trends of the expression of their encoding genes.</p><p>Results</p><p>We implemented the NPC methodology within “omicsNPC”, an R function specifically tailored for the characteristics of omics data. We compare omicsNPC against a range of alternative methods on simulated as well as on real data. Comparisons on simulated data point out that omicsNPC produces unbiased / calibrated p-values and performs equally or significantly better than the other methods included in the study; furthermore, the analysis of real data show that omicsNPC (a) exhibits higher statistical power than other methods, (b) it is easily applicable in a number of different scenarios, and (c) its results have improved biological interpretability.</p><p>Conclusions</p><p>The omicsNPC function competitively behaves in all comparisons conducted in this study. Taking into account that the method (i) requires minimal assumptions, (ii) it can be used on different studies designs and (iii) it captures the dependences among heterogeneous data modalities, omicsNPC provides a flexible and statistically powerful solution for the integrative analysis of different omics data.</p></div

Directory of Open Access Journals

PubMed Central

FigShare

Identification of high confidence miRNAs.

Author: Ioannis Tsamardinos (244260)
Nestoras Karathanasis (737388)
Panayiota Poirazi (21391)
Publication venue
Publication date
Field of study

<p>As shown in the figure MiRduplexSVM assigned a higher score to 554 high confidence miRNAs (blue bars, median = 0.53 and mean = 0.44) than to 554 randomly selected miRNAS (red bars, median = -0.24 and mean = -0.14) with the observed differences being statistically significant (ranksum: p = 8.3084e-47 and t-test: p = 3.9577e-50). The x axis shows MiRduplexSVM’s scores and the y axis shows the percentage of hairpins assigned with the respective scores.</p

FigShare