Search CORE

24 research outputs found

A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing

Author: Aevermann B.
Bakken T.
Hodge R.
Keshk M.
Lein E.
Lelieveldt B.
Miller J.
Novotny M.
Scheuermann R.H.
Zhang Y.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/10/2021
Field of study

Single-cell genomics is rapidly advancing our knowledge of the diversity of cell phenotypes, including both cell types and cell states. Driven by single-cell/-nucleus RNA sequencing (scRNA-seq), comprehensive cell atlas projects characterizing a wide range of organisms and tissues are currently underway. As a result, it is critical that the transcriptional phenotypes discovered are defined and disseminated in a consistent and concise manner. Molecular biomarkers have historically played an important role in biological research, from defining immune cell types by surface protein expression to defining diseases by their molecular drivers. Here, we describe a machine learning-based marker gene selection algorithm, NS-Forest version 2.0, which leverages the nonlinear attributes of random forest feature selection and a binary expression scoring approach to discover the minimal marker gene expression combinations that optimally capture the cell type identity represented in complete scRNA-seq transcriptional profiles. The marker genes selected provide an expression barcode that serves as both a useful tool for downstream biological investigation and the necessary and sufficient characteristics for semantic cell type definition. The use of NS-Forest to identify marker genes for human brain middle temporal gyrus cell types reveals the importance of cell signaling and noncoding RNAs in neuronal cell type identity.Neuro Imaging Researc

Leiden University Scholary Publications

Recommended from our members

Influenza Research Database: An integrated bioinformatics resource for influenza virus research

Author: Aevermann BD
Anderson TK
Burke DF
Dauphin G
Gu Z
He S
Klem EB
Kumar S
Larsen CN
Lee AJ
Li X
Macken C
Mahaffey C
Pickett BE
Reardon B
Scheuermann RH
Smith T
Stewart L
Suloway C
Sun G
Tong L
Vincent AL
Walters B
Zaremba S
Zhang Y
Zhao H
Zhou L
Zmasek C
Publication venue: Nucleic Acids Research
Publication date: 04/01/2017
Field of study

The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support. Here, we describe the recent improvements in IRD including the use of cloud and high performance computing resources, analysis and visualization of user-provided sequence data with associated metadata, predictions of novel variant proteins, annotations of phenotype-associated sequence markers and their predicted phenotypic effects, hemagglutinin (HA) clade classifications, an automated tool for HA subtype numbering conversion, linkouts to disease event data and the addition of host factor and antiviral drug components. All data and tools are freely available without restriction from the IRD website at https://www.fludb.org.National Institutes of Health/National Institute for Allergy and Infectious Diseases [HHSN272201400028C]. Funding for open access charge: J. Craig Venter Institute

Apollo (Cambridge)

Global Transcriptome Profiling of the Pine Shoot Beetle, Tomicus yunnanensis (Coleoptera: Scolytinae)

Author: A Horn
AA Enayati
AA Knowlton
AA Michels
B Ewen-Campen
BD Aevermann
Bin Yang
C Bass
Christian Schönbach
D Gullipalli
D Picard
ER Waters
F Hamajima
G Aparicio
G Pertea
H Chen
H Jia
H Liu
H Yang
HS Wang
J Xue
J Zhou
JD Thompson
JE Crawford
Jia-Ying Zhu
JP Der
K Ohtsuka
K Tamura
K Yamamoto
KK Kim
LH Huang
LR Kirkendall
M Faccoli
M Hata
M Saraste
ME Cheetham
MS Clark
MS Clark
N Karatolos
Ning Zhao
O Choresh
R Arya
R Feyereisen
R Feyereisen
R Friedman
R Li
RA Haack
RLM van Montfort
S Sonoda
TA Eager
WJ Welch
X Bai
X Li
XB Qiu
XW Wang
Y Ding
Y Duan
Y Sun
Y Tazir
Z Hegedus
ZW Li
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Background: The pine shoot beetle Tomicus yunnanensis (Coleoptera: Scolytinae) is an economically important pest of Pinus yunnanensis in southwestern China. Developed resistance to insecticides due to chemical pesticides being used for a long time is a factor involved in its serious damage, which poses a challenge for management. In addition, highly efficient adaptation to divergent environmental ecologies results in this pest posing great potential threat to pine forests. However, the molecular mechanisms remain unknown as only limited nucleotide sequence data for this species is available. Methodology/Principal Findings: In this study, we applied next generation sequencing (Illumina sequencing) to sequence the adult transcriptome of T. yunnanensis. A total of 51,822,230 reads were obtained. They were assembled into 140,702 scaffolds, and 60,031 unigenes. The unigenes were further functionally annotated with gene descriptions, Gene Ontology (GO), Clusters of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genome (KEGG). In total, 80,932 unigenes were classified into GO, 13,599 unigenes were assigned to COG, and 33,875 unigenes were found in KO categories. A biochemical pathway database containing 219 predicted pathways was also created based on the annotations. In depth analysis of the data revealed a large number of genes related to insecticides resistance and heat shock protein genes associated with environmental stress. Conclusions/Significance: The results facilitate the investigations of molecular resistance mechanisms to insecticides an

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

A comprehensive collection of systems biology data characterizing the host response to viral infection

The Systems Biology for Infectious Diseases Research program was established by the U.S. National Institute of Allergy and Infectious Diseases to investigate host-pathogen interactions at a systems level. This program generated 47 transcriptomic and proteomic datasets from 30 studies that investigate in vivo and in vitro host responses to viral infections. Human pathogens in the Orthomyxoviridae and Coronaviridae families, especially pandemic H1N1 and avian H5N1 influenza A viruses and severe acute respiratory syndrome coronavirus (SARS-CoV), were investigated. Study validation was demonstrated via experimental quality control measures and meta-analysis of independent experiments performed under similar conditions. Primary assay results are archived at the GEO and PeptideAtlas public repositories, while processed statistical results together with standardized metadata are publically available at the Influenza Research Database (www.fludb.org) and the Virus Pathogen Resource (www.viprbrc.org). By comparing data from mutant versus wild-type virus and host strains, RNA versus protein differential expression, and infection with genetically similar strains, these data can be used to further investigate genetic and physiological determinants of host responses to viral infection

Carolina Digital Repository

Comparative cellular analysis of motor cortex in human, marmoset and mouse

Author: Aevermann B.D.
Aldridge A.I.
Ament S.A.
Bakken T.E.
Bartlett A.
Behrens M.M.
Bertagnolli D.
Bravo H.C.
Casper T.
Castanon R.G.
Chun J.
Crichton K.
Crow M.
Daigle T.L.
Dalley R.
Dee N.
Dembrow N.
Diep D.
Ding S.L.
Dobin A.
Dong W.X.
Ecker J.R.
Eggermont J.
Fang R.X.
Feng G.P.
Fischer S.
Gillis J.
Goldman M.
Goldy J.
Graybuck L.T.
Hawrylycz M.
Herb B.R.
Hertzano R.
Hodge R.D.
Hof P.R.
Hollt T.
Horwitz G.D.
Hou X.M.
Hu Q.W.
Jorstad N.L.
Kalmbach B.E.
Kancherla J.
Keene C.D.
Kharchenko P.V.
Ko A.L.
Koch C.
Krienen F.M.
Kroll M.
Lake B.B.
Lathia K.
Lein E.S.
Lelieveldt B.P.
Lew B. van
Li Y.E.
Linnarsson S.
Liu C.E.S.
Liu H.Q.
Lucero J.D.
Luo C.Y.
Macosko E.Z.
Mahurkar A.
McCarroll S.A.
McMillen D.
Miller J.A.
Moussa M.
Mukamel E.A.
Nery J.R.
Nicovich P.R.
Niu S.Y.
Orvis J.
Osteen J.K.
Owen S.
Palmer C.R.
Pham T.
Pinto-Duarte A.
Plongthongkum N.
Poirion O.
Preissl S.
Reed N.M.
Regev A.
Ren B.
Rimorin C.
Rivkin A.
Romanow W.J.
Scheuermann R.H.
Sedeno-Cortes A.E.
Siletti K.
Smith K.
Somasundaram S.
Sorensen S.A.
Spain W.J.
Sulc J.
Tasic B.
Tian W.
Tieu M.
Ting J.T.
Torkelson A.
Tung H.R.
Wang X.N.
White O.R.
Xie F.M.
Yanny A.M.
Yao Z.Z.
Zeng H.K.
Zhang K.
Zhang R.E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2021
Field of study

The primary motor cortex (M1) is essential for voluntary fine-motor control and is functionally conserved across mammals(1). Here, using high-throughput transcriptomic and epigenomic profiling of more than 450,000 single nuclei in humans, marmoset monkeys and mice, we demonstrate a broadly conserved cellular makeup of this region, with similarities that mirror evolutionary distance and are consistent between the transcriptome and epigenome. The core conserved molecular identities of neuronal and non-neuronal cell types allow us to generate a cross-species consensus classification of cell types, and to infer conserved properties of cell types across species. Despite the overall conservation, however, many species-dependent specializations are apparent, including differences in cell-type proportions, gene expression, DNA methylation and chromatin state. Few cell-type marker genes are conserved across species, revealing a short list of candidate genes and regulatory mechanisms that are responsible for conserved features of homologous cell types, such as the GABAergic chandelier cells. This consensus transcriptomic classification allows us to use patch-seq (a combination of whole-cell patch-clamp recordings, RNA sequencing and morphological characterization) to identify corticospinal Betz cells from layer 5 in non-human primates and humans, and to characterize their highly specialized physiology and anatomy. These findings highlight the robust molecular underpinnings of cell-type diversity in M1 across mammals, and point to the genes and regulatory pathways responsible for the functional identity of cell types and their species-specific adaptations.Cardiovascular Aspects of Radiolog

Leiden University Scholary Publications

Comparative cellular analysis of motor cortex in human, marmoset and mouse

Author: Aevermann Brian D
Aldridge Andrew I
Ament Seth A
Bakken Trygve E
Bartlett Anna
Behrens M Margarita
Bertagnolli Darren
Bravo Hector Corrada
Casper Tamara
Castanon Rosa G
Chun Jerold
Crichton Kirsten
Crow Megan
Daigle Tanya L
Dalley Rachel
Dee Nick
Dembrow Nikolai
Diep Dinh
Ding Song-Lin
Dobin Alexander
Dong Weixiu
Ecker Joseph R
Eggermont Jeroen
Fang Rongxin
Feng Guoping
Fischer Stephan
Gillis Jesse
Goldman Melissa
Goldy Jeff
Graybuck Lucas T
Hawrylycz Michael
Herb Brian R
Hertzano Ronna
Hodge Rebecca D
Hof Patrick R
Horwitz Gregory D
Hou Xiaomeng
Hu Qiwen
Höllt Thomas
Jorstad Nikolas L
Kalmbach Brian E
Kancherla Jayaram
Keene C Dirk
Kharchenko Peter V
Ko Andrew L
Koch Christof
Krienen Fenna M
Kroll Matthew
Lake Blue B
Lathia Kanan
Lein Ed S
Lelieveldt Boudewijn P
Li Yang Eric
Linnarsson Sten
Liu Christine S
Liu Hanqing
Lucero Jacinta D
Luo Chongyuan
Macosko Evan Z
Mahurkar Anup
McCarroll Steven A
McMillen Delissa
Miller Jeremy A
Moussa Marmar
Mukamel Eran A
Nery Joseph R
Nicovich Philip R
Niu Sheng-Yong
Orvis Joshua
Osteen Julia K
Owen Scott
Palmer Carter R
Pham Thanh
Pinto-Duarte António
Plongthongkum Nongluk
Poirion Olivier
Preissl Sebastian
Reed Nora M
Regev Aviv
Ren Bing
Rimorin Christine
Rivkin Angeline
Romanow William J
Scheuermann Richard H
Sedeño-Cortés Adriana E
Siletti Kimberly
Smith Kimberly
Somasundaram Saroja
Sorensen Staci A
Spain William J
Sulc Josef
Tasic Bosiljka
Tian Wei
Tieu Michael
Ting Jonathan T
Torkelson Amy
Tung Herman
van Lew Baldur
Wang Xinxin
White Owen R
Xie Fangming
Yanny Anna Marie
Yao Zizhen
Zeng Hongkui
Zhang Kun
Zhang Renee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2021
Field of study

The primary motor cortex (M1) is essential for voluntary fine-motor control and is functionally conserved across mammals1. Here, using high-throughput transcriptomic and epigenomic profiling of more than 450,000 single nuclei in humans, marmoset monkeys and mice, we demonstrate a broadly conserved cellular makeup of this region, with similarities that mirror evolutionary distance and are consistent between the transcriptome and epigenome. The core conserved molecular identities of neuronal and non-neuronal cell types allow us to generate a cross-species consensus classification of cell types, and to infer conserved properties of cell types across species. Despite the overall conservation, however, many species-dependent specializations are apparent, including differences in cell-type proportions, gene expression, DNA methylation and chromatin state. Few cell-type marker genes are conserved across species, revealing a short list of candidate genes and regulatory mechanisms that are responsible for conserved features of homologous cell types, such as the GABAergic chandelier cells. This consensus transcriptomic classification allows us to use patch-seq (a combination of whole-cell patch-clamp recordings, RNA sequencing and morphological characterization) to identify corticospinal Betz cells from layer 5 in non-human primates and humans, and to characterize their highly specialized physiology and anatomy. These findings highlight the robust molecular underpinnings of cell-type diversity in M1 across mammals, and point to the genes and regulatory pathways responsible for the functional identity of cell types and their species-specific adaptations

DSpace@MIT

Cold Spring Harbor Laboratory Institutional Repository

eScholarship - University of California

Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain

Author: Abdelaal T.
Aevermann B.D.
Biancalani T.
Comiter C.
Dzyubachyk O.
Eggermont J.
Langseth C.M.
Lein E.S.
Lelieveldt B.P.
Long B.
Miller J.A.
Park J.
Petukhov V.
Scalia G.
Scheuermann R.H.
Vaishnav E.D.
Zhang Y.
Zhao Y.L.
Publication venue
Publication date: 13/06/2023
Field of study

With the advent of multiplex fluorescence in situ hybridization (FISH) and in situ RNA sequencing technologies, spatial transcriptomics analysis is advancing rapidly, providing spatial location and gene expression information about cells in tissue sections at single cell resolution. Cell type classification of these spatially-resolved cells can be inferred by matching the spatial transcriptomics data to reference atlases derived from single cell RNA-sequencing (scRNA-seq) in which cell types are defined by differences in their gene expression profiles. However, robust cell type matching of the spatially-resolved cells to reference scRNA-seq atlases is challenging due to the intrinsic differences in resolution between the spatial and scRNA-seq data. In this study, we systematically evaluated six computational algorithms for cell type matching across four image-based spatial transcriptomics experimental protocols (MERFISH, smFISH, BaristaSeq, and ExSeq) conducted on the same mouse primary visual cortex (VISp) brain region. We find that many cells are assigned as the same type by multiple cell type matching algorithms and are present in spatial patterns previously reported from scRNA-seq studies in VISp. Furthermore, by combining the results of individual matching strategies into consensus cell type assignments, we see even greater alignment with biological expectations. We present two ensemble meta-analysis strategies used in this study and share the consensus cell type matching results in the Cytosplore Viewer (https://viewer.cytosplore.org) for interactive visualization and data exploration. The consensus matching can also guide spatial data analysis using SSAM, allowing segmentation-free cell type assignment.Radiolog

Leiden University Scholary Publications

Transcriptomic and morphophysiological evidence for a specialized human cortical GABAergic cell type.

Author: Aevermann B. D.
Baka Judith
Bakken T. E.
Barzó Pál
Boldog Eszter
Bordé Sándor
Faragó Nóra
Hodge R. D.
Kocsis Ágnes Katalin
Kovács Balázs
Molnár Gábor
Novotny M.
Oláh Gáspár
Ozsvár Attila
Puskás László
Rózsa Márton
Tamás Gábor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We describe convergent evidence from transcriptomics, morphology, and physiology for a specialized GABAergic neuron subtype in human cortex. Using unbiased single-nucleus RNA sequencing, we identify ten GABAergic interneuron subtypes with combinatorial gene signatures in human cortical layer 1 and characterize a group of human interneurons with anatomical features never described in rodents, having large 'rosehip'-like axonal boutons and compact arborization. These rosehip cells show an immunohistochemical profile (GAD1(+)CCK(+), CNR1(-)SST(-)CALB2(-)PVALB(-)) matching a single transcriptomically defined cell type whose specific molecular marker signature is not seen in mouse cortex. Rosehip cells in layer 1 make homotypic gap junctions, predominantly target apical dendritic shafts of layer 3 pyramidal neurons, and inhibit backpropagating pyramidal action potentials in microdomains of the dendritic tuft. These cells are therefore positioned for potent local control of distal dendritic computation in cortical pyramidal neurons

Repository of the Academy's Library

Cross-comparison of inflammatory skin disease transcriptomics identifies PTEN as a pathogenic disease classifier in cutaneous lupus erythematosus.

Author: Aevermann B.D.
Armstrong J.M.
Bachelez H.
Barker J.
Gilliet M.
Griffiths CEM
Haniffa M.
Homey B.
Julia V.
Juul K.
Krishnaswamy J.K.
Litman T.
Olah P.
Parsons I.
Saidoune F.
Sarin K.Y.
Scheuermann R.H.
Schmuth M.
Sierra M.
Simpson M.
Publication venue
Publication date
Field of study

Tissue transcriptomics is used to uncover molecular dysregulations underlying diseases. However, the majority of transcriptomics studies focus on single diseases with limited relevance for understanding the molecular relationship between diseases or for identifying disease-specific markers. Here, we used a normalization approach to compare gene expression across nine inflammatory skin diseases. The normalized datasets were found to retain differential expression signals that allowed unsupervised disease clustering and identification of disease-specific gene signatures. Using the NS-Forest algorithm, we identified a minimal set of biomarkers and validated their use as diagnostic disease classifier. Among them, PTEN was identified as being a specific marker for cutaneous lupus erythematosus (CLE) and found to be strongly expressed by lesional keratinocytes in association with pathogenic type I interferons (IFNs). In fact, PTEN facilitated expression of IFN-β and IFN-κ in keratinocytes by promoting activation and nuclear translocation of IRF3. Thus, cross-comparison of tissue transcriptomics is a valid strategy to establish a molecular disease classification and identify pathogenic disease biomarkers

Serveur académique lausannois

Recommended from our members

PRODUCTION OF A PRELIMINARY QUALITY CONTROL PIPELINE FOR SINGLE NUCLEI RNA-SEQ AND ITS APPLICATION IN THE ANALYSIS OF CELL TYPE DIVERSITY OF POST-MORTEM HUMAN BRAIN NEOCORTEX

Author: AEVERMANN BRIAN
Altman Russ B
BAKKEN TRYGVE
CHRISTIANSEN LENA
DIEZFUERTES FRANCISCO
Dunker A Keith
HODGE REBECCA
Hunter Lawrence
Klein Teri E
LASKEN ROGER S
LEIN ED
MCCORRISON JAMISON
MILLER JEREMY
Murray Tiffany A
NOVOTNY MARK
Ritchie Marylyn D
SCHEUERMANN RICHARD H
SCHORK NICHOLAS
STEEMERS FRANK
TRAN DANNY N
VENEPALLY PRATAP
ZHANG FAN
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Next generation sequencing of the RNA content of single cells or single nuclei (sc/nRNA-seq) has become a powerful approach to understand the cellular complexity and diversity of multicellular organisms and environmental ecosystems. However, the fact that the procedure begins with a relatively small amount of starting material, thereby pushing the limits of the laboratory procedures required, dictates that careful approaches for sample quality control (QC) are essential to reduce the impact of technical noise and sample bias in downstream analysis applications. Here we present a preliminary framework for sample level quality control that is based on the collection of a series of quantitative laboratory and data metrics that are used as features for the construction of QC classification models using random forest machine learning approaches. We've applied this initial framework to a dataset comprised of 2272 single nuclei RNA-seq results and determined that ~79% of samples were of high quality. Removal of the poor quality samples from downstream analysis was found to improve the cell type clustering results. In addition, this approach identified quantitative features related to the proportion of unique or duplicate reads and the proportion of reads remaining after quality trimming as useful features for pass/fail classification. The construction and use of classification models for the identification of poor quality samples provides for an objective and scalable approach to sc/nRNA-seq quality control

eScholarship - University of California