Search CORE

64 research outputs found

Abstract P-22: Enhanced Crosslinking and Immunoprecipitation (Eclip) Data Reveal Interactions of RNA Binding Proteins with the Human Ribosome

Author: Andrey Buyan
Ivan V. Kulakovskiy
Sergey E. Dmitriev
Publication venue: 'International Medical Research and Development Corporation'
Publication date: 01/06/2021
Field of study

Background: The ribosome is a protein-synthesizing molecular machine composed of four ribosomal RNAs (rRNAs) and dozens of ribosomal proteins. In mammals, the ribosome has a complicated structure with an additional outer layer of rRNA, including large tentacle-like extensions. A number of RNA binding proteins (RBPs) interact with this layer to assist ribosome biogenesis, nuclear export and decay, or to modulate translation. Plenty of methods have been developed in the last decade in order to study such protein-RNA interactions, including RNA pulldown and crosslinking-immunoprecipitation (CLIP) assays. Methods: In the current study, using publicly available data of the enhanced CLIP (eCLIP) experiments for 223 proteins studied in the ENCODE project, we found a number of RBPs that bind rRNAs in human cells. To locate their binding sites in rRNAs, we used a newly developed computational protocol for mapping and evaluation of the eCLIP data with the respect to the repetitive sequences. Results: For two proteins with known ribosomal localization, uS3/RPS3 and uS17/RPS11, the identified sites were in good agreement with structural data, thus validating our approach. Then, we identified rRNA contacts of overall 22 RBPs involved in rRNA processing and ribosome maturation (DDX21, DDX51, DDX52, NIP7, SBDS, UTP18, UTP3, WDR3, and WDR43), translational control during stress (SERBP1, G3BP1, SND1), IRES activity (PCBP1/hnRNPE1), and other translation-related functions. In many cases, the identified proteins interact with the rRNA expansion segments (ES) of the human ribosome pointing to their important role in protein synthesis. Conclusion: Our study identifies a number of RBPs as interacting partners of the human ribosome and sheds light on the role of rRNA expansion segments in translation

Directory of Open Access Journals

MIXALIME: MIXture models for ALlelic IMbalance Estimation in high-throughput sequencing data

Author: Abramov Sergey
Boytsov Aleksandr
Buyan Andrey I.
Kulakovskiy Ivan V.
Makeev Vsevolod J.
Meshcheryakov Georgy
Publication venue
Publication date: 01/04/2024
Field of study

Modern high-throughput sequencing assays efficiently capture not only gene expression and different levels of gene regulation but also a multitude of genome variants. Focused analysis of alternative alleles of variable sites at homologous chromosomes of the human genome reveals allele-specific gene expression and allele-specific gene regulation by assessing allelic imbalance of read counts at individual sites. Here we formally describe an advanced statistical framework for detecting the allelic imbalance in allelic read counts at single-nucleotide variants detected in diverse omics studies (ChIP-Seq, ATAC-Seq, DNase-Seq, CAGE-Seq, and others). MIXALIME accounts for copy-number variants and aneuploidy, reference read mapping bias, and provides several scoring models to balance between sensitivity and specificity when scoring data with varying levels of experimental noise-caused overdispersion

arXiv.org e-Print Archive

EpiFactors : a comprehensive database of human epigenetic factors and complexes

Author: Andreas Lennartsson
Finn Drabløs
Grigory Khimulya
Ilya E. Vorontsov
Ivan V. Kulakovskiy
Jaenisch
Pouda Panahandeh
Rezvan Ehsani
Takeya Kasukawa
Yulia A. Medvedeva
Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Altres ajuts: Russian Fund For Basic Research(RFFI)grant 14-04-0018 i grant 15-34-20423, Ake Olsson's foundation, Swedish Cancer foundation, Swedish Childhood cancer foundation, Dynasty Foundation Fellowship, RIKEN Omics Science Center, RIKEN Preventive Medicine and Diagnosis Innovation Program i RIKEN Center for Life Science Technologies.Abstract: Epigenetics refers to stable and long-term alterations of cellular traits that are not caused by changes in the DNA sequence per se. Rather, covalent modifications of DNA and histones affect gene expression and genome stability via proteins that recognize and act upon such modifications. Many enzymes that catalyse epigenetic modifications or are critical for enzymatic complexes have been discovered, and this is encouraging investigators to study the role of these proteins in diverse normal and pathological processes. Rapidly growing knowledge in the area has resulted in the need for a resource that compiles, organizes and presents curated information to the researchers in an easily accessible and user-friendly form. Here we present EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets and products. EpiFactors contains information on 815 proteins, including 95 histones and protamines. For 789 of these genes, we include expressions values across several samples, in particular a collection of 458 human primary cell samples (for approximately 200 cell types, in many cases from three individual donors), covering most mammalian cell steady states, 255 different cancer cell lines (representing approximately 150 cancer subtypes) and 134 human postmortem tissues. Expression values were obtained by the FANTOM5 consortium using Cap Analysis of Gene Expression technique. EpiFactors also contains information on 69 protein complexes that are involved in epigenetic regulation. The resource is practical for a wide range of users, including biologists, pharmacologists and clinicians

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

PubMed Central

Diposit Digital de Documents de la UAB

NORA - Norwegian Open Research Archives

Negative selection maintains transcription factor binding motifs in human cancer

Author: A Mathelier
B Vernot
C Melton
CA Zahnow
Daria D. Nikolaeva
E Khurana
E Wingender
E Wingender
Elena N. Lukianova
F Sanchez-Garcia
FW Huang
G Macintyre
GD Stormo
Grigory Khimulya
I Landa
IE Vorontsov
Ilya E. Vorontsov
Irina A. Eliseeva
IV Kulakovskiy
IV Kulakovskiy
IV Kulakovskiy
Ivan V. Kulakovskiy
J Crocker
K Lawrenson
L Arbiza
LB Alexandrov
M Dabrowski
OG Berg
P Jiang
P Polak
RE Thurman
RJA Bell
SL Ostrow
SS Myatt
T Manke
V Charoensawan
VG Levitsky
Vsevolod J. Makeev
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Systematic identification of post-transcriptional regulatory modules

Author: Baratam Rithvik
Buyan Andrey
Carpenter Christopher
Choi Benedict
Corces M. Ryan
Dodel Martin
Doty Anthony
Garcia Kristle
Goodarzi Hani
Joshi Tanvi
Khoroshkin Matvei
Kulakovskiy Ivan V.
Lee Sean B.
Mardakheh Faraz K.
Markett Daniel
Miglani Sohit
Modi Hailey
Navickas Albertas
Subramanyam Vishvak
Trejo Fathima
Yu Johnny
Zhou Shaopu
Publication venue: Nature Research
Publication date: 09/09/2024
Field of study

In our cells, a limited number of RNA binding proteins (RBPs) are responsible for all aspects of RNA metabolism across the entire transcriptome. To accomplish this, RBPs form regulatory units that act on specific target regulons. However, the landscape of RBP combinatorial interactions remains poorly explored. Here, we perform a systematic annotation of RBP combinatorial interactions via multimodal data integration. We build a large-scale map of RBP protein neighborhoods by generating in vivo proximity-dependent biotinylation datasets of 50 human RBPs. In parallel, we use CRISPR interference with single-cell readout to capture transcriptomic changes upon RBP knockdowns. By combining these physical and functional interaction readouts, along with the atlas of RBP mRNA targets from eCLIP assays, we generate an integrated map of functional RBP interactions. We then use this map to match RBPs to their context-specific functions and validate the predicted functions biochemically for four RBPs. This study provides a detailed map of RBP interactions and deconvolves them into distinct regulatory modules with annotated functions and target regulons. This multimodal and integrative framework provides a principled approach for studying post-transcriptional regulatory processes and enriches our understanding of their underlying mechanisms

Oxford University Research Archive

Recommended from our members

Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay

Author: Adato Orit
Adhikari Aashish N.
Ahituv Nadav
Beer Michael A.
Boyle Alan P.
Dong Shengcheng
Hawkins‐hooker Alex
Inoue Fumitaka
Juven‐gershon Tamar
Kenlay Henry
Kircher Martin
Kreimer Anat
Kulakovskiy Ivan V.
Martin Beth
Patra Ayoti
Penzar Dmitry D.
Reid John
Schubach Max
Shendure Jay
Shigaki Dustin
Unger Ron
Xiong Chenling
Yan Zhongxia
Yosef Nir
Publication venue: 'Wiley'
Publication date: 01/09/2019
Field of study

The integrative analysis of highâ throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five diseaseâ associated human enhancers and nine diseaseâ associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cellâ types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of diseaseâ associated genetic variation.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/151884/1/humu23797_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/151884/2/humu23797.pd

eScholarship - University of California

Deep Blue Documents

Functional annotation of human long noncoding RNAs via molecular phenotyping

Author: Abugessaisa Imad
Agrawal Saumya
Akalin Altuna
Antonov Ivan V.
Arner Erik
Baillie J. Kenneth
Bonetti Alessandro
Bono Hidemasa
Borsari Beatrice
Brombacher Frank
Cameron Chris J. F.
Cannistraci Carlo Vittorio
Cardenas Ryan
Cardon Melissa
Carninci Piero
Chang Howard
Chang Jen-Chien
Ciani Yari
Dostie Josee
Ducoli Luca
Favorov Alexander
Forrest Alistair R. R.
Fort Alexandre
Garrido Diego
Gil Noa
Gimenez Juliette
Guigo Roderic
Guler Reto
Handoko Lusy
Harshbarger Jayson
Hasegawa Akira
Hasegawa Yuki
Hashimoto Kosuke
Hayatsu Norihito
Heutink Peter
Hirose Tetsuro
Hoffman Michael M.
Hon Chung Chau
Hoon Michiel J. L. de
Imada Eddie L.
Itoh Masayoshi
Kaczkowski Bogumil
Kanhere Aditi
Kasukawa Takeya
Kauppinen Sakari
Kawabata Emily
Kawaji Hideya
Kawashima Tsugumi
Kelly S. Thomas
Kere Juha
Kojima Miki
Kondo Naoto
Koseki Haruhiko
Kouno Tsukasa
Kratz Anton
Kulakovskiy Ivan V.
Kurowska-Stolarska Mariola
Kwon Andrew Tae Jun
Leek Jeffrey
Lenhard Boris
Lennartsson Andreas
Lizio Marina
Lopez-Redondo Fernando
Luginbuhl Joachim
Maeda Shiori
Makeev Vsevolod J.
Marchionni Luigi
Martinez Diego Fernando Sanchez
Medvedeva Yulia A.
Mendez Mickael
Minoda Aki
Mueller Ferenc
Munoz-Aguirre Manuel
Murata Mitsuyoshi
Nishiyori Hiromi
Nitta Kazuhiro R.
Noguchi Shuhei
Noro Yukihiko
Nurtdinov Ramil
Okazaki Yasushi
Ooi Jasmine Li Ching
Orlando Valerio
Ouyang John F.
Paquette Denis
Parkinson Nick
Parr Callum J. C.
Petri Andreas
Rackham Owen J. L.
Ramilowski Jordan A.
Rizzu Patrizia
Roos Leonie
Sandelin Albin
Sanjana Pillay
Schneider Claudio
Semple Colin A. M.
Severin Jessica
Shibayama Youtaro
Shin Jay W.
Sivaraman Divya M.
Suzuki Harukazu
Suzuki Takahiro
Szumowski Suzannah C.
Tagami Michihira
Taylor Martin S.
Terao Chikashi
Thodberg Malte
Thongjuea Supat
Tripathi Vidisha
Ulitsky Igor
Verardo Roberto
Vorontsov Ilya E.
Yagi Ken
Yamamoto Chinatsu
Yasuzawa Kayoko
Yip Chi Wai
Young Robert S.
Publication venue
Publication date: 01/01/2020
Field of study

Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-todate lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for ZNF213-AS1 and lnc-KHDC3L-2.Peer reviewe

University of Liverpool Repository

Southampton (e-Prints Soton)

Archivio istituzionale della ricerca - Università degli Studi di Udine

University of Birmingham Research Portal

Publikationsserver der Universität Tübingen

Edinburgh Research Explorer

Spiral - Imperial College Digital Repository

Enlighten

UPF Digital Repository

Helsingin yliopiston digitaalinen arkisto

NORA - Norwegian Open Research Archives

Repository for Publications and Research Data

University of Bergen

Copenhagen University Research Information System

VBN

MDC Repository

The Constrained Maximal Expression Level Owing to Haploidy Shapes Gene Content on the Mammalian X Chromosome.

Author: 't Hoen Peter A C
Alam Intikhab
Albanese Davide
Altschuler Gabriel M.
Andersson Robin
Arakawa Takahiro
Archer John A C
Arner Erik
Arner Peter
Babina Magda
Bajic Vladimir B.
Baker Sarah
Balwierz Piotr J.
Beckhouse Anthony G.
Bertin Nicolas
Blake Judith A.
Blumenthal Antje
Bodega Beatrice
Bonetti Alessandro
Briggs James
Brombacher Frank
Califano Andrea
Cannistraci Carlo V.
Carbajo Daniel
Carninci Piero
Chen Yun
Chierici Marco
Ciani Yari
Clevers Hans C.
Dalla Emiliano
Daub Carsten O.
Davis Carrie A.
de Hoon Michiel J L
de Lima Morais David A.
Detmar Michael
Diehl Alexander D.
Dimont Emmanuel
Dohi Taeko
Drabløs Finn
Edge Albert S B
Edinger Matthias
Ekwall Karl
Endoh Mitsuhiro
Enomoto Hideki
Fagiolini Michela
Fairbairn Lynsey
Fang Hai
Farach-Carson Mary C.
Faulkner Geoffrey J.
Favorov Alexander V.
Fisher Malcolm E.
Forrest Alistair R R
Francescatto Margherita
Freeman Tom C.
Frith Martin C.
Fujita Rie
Fukuda Shiro
Furlanello Cesare
Furuno Masaaki
Furusawa Jun ichi
Geijtenbeek Teunis B.
Ghanbarian Avazeh T.
Gibson Andrew
Gingeras Thomas
Goldowitz Daniel
Gough Julian
Guhl Sven
Guler Reto
Gustincich Stefano
Ha Thomas J.
Haberle Vanja
Hamaguchi Masahide
Hara Mitsuko
Harbers Matthias
Harshbarger Jayson
Hasegawa Akira
Hasegawa Yuki
Hashimoto Takehiro
Hayashizaki Yoshihide
Herlyn Meenhard
Heutink Peter
Hide Winston
Hitchens Kelly J.
Ho Sui Shannan J.
Hofmann Oliver M.
Hoof Ilka
Hori Fumi
Hume David A.
Huminiecki Lukasz
Huminiecki Lukasz
Hurst Laurence D.
Iida Kei
Ikawa Tomokatsu
Ishizu Yuri
Itoh Masayoshi
Jankovic Boris R.
Jia Hui
Joshi Anagha
Jurman Giuseppe
Jørgensen Mette
Kaczkowski Bogumil
Kai Chieko
Kaida Kaoru
Kaiho Ai
Kajiyama Kazuhiro
Kanamori Mutsumi Katayama
Kasianov Artem S.
Kasukawa Takeya
Katayama Shintaro
Kato Sachi
Kawaguchi Shuji
Kawai Jun
Kawamoto Hiroshi
Kawamura Yuki I.
Kawashima Tsugumi
Kempfle Judith S.
Kenna Tony J.
Kenneth Baillie J.
Kere Juha
Khachigian Levon M.
Kitamura Toshio
Knox Alan J.
Kojima Miki
Kojima Soichi
Kondo Naoto
Koseki Haruhiko
Koyasu Shigeo
Krampitz Sarah
Kubosaki Atsutaka
Kulakovskiy Ivan V.
Kwon Andrew T.
Laros Jeroen F J
Lassmann Timo
Lee Weonju
Lenhard Boris
Lennartsson Andreas
Li Kang
Lilje Berit
Lipovich Leonard
Lizio Marina
Mackay Alan sim
Makeev Vsevolod J.
Manabe Riichiroh
Mar Jessica C.
Marchand Benoit
Mathelier Anthony
Maxwell Burroughs A.
Medvedeva Yulia A.
Meehan Terrence F.
Mejhert Niklas
Meynert Alison
Mizuno Yosuke
Morikawa Hiromasa
Morimoto Mitsuru
Moro Kazuyo
Motakis Efthymios
Motohashi Hozumi
Mummery Christine L.
Mungall Christopher J.
Murata Mitsuyoshi
Nagao Sayaka Sato
Nakachi Yutaka
Nakahara Fumio
Nakamura Toshiyuki
Nakamura Yukio
Nakazato Kenichi
Ninomiya Noriko
Nishiyori Hiromi
Noma Shohei
Nozaki Tadasuke
Ogishima Soichi
Ohkura Naganari
Ohmiya Hiroko
Ohno Hiroshi
Ohshima Mitsuhiro
Okada Mariko Hatakeyama
Okazaki Yasushi
Orlando Valerio
Ovchinnikov Dmitry A.
Pain Arnab
Passier Robert
Patrikakis Margaret
Persson Helena
Peter Klinken S.
Piazza Silvano
Plessy Charles
Pradhan Swati Bhatt
Prendergast James G D
Rackham Owen J L
Ramilowski Jordan A.
Rashid Mamoon
Ravasi Timothy
Rehli Michael
Rizzu Patrizia
Roncador Marco
Roy Sugata
Rye Morten B.
Saijyo Eri
Sajantila Antti
Saka Akiko
Sakaguchi Shimon
Sakai Mizuho
Sandelin Albin Gustav
Sato Hiroki
Satoh Hironori
Savvi Suzana
Saxena Alka
Schaefer Ulf
Schmeier Sebastian
Schmidl Christian
Schneider Claudio
Schultes Erik A.
Schulze-Tanzil Gundula G.
Schwegmann Anita
Semple Colin A.
Sengstag Thierry
Severin Jessica
Sheng Guojun
Shimoji Hisashi
Shimoni Yishai
Shin Jay W.
Simon Christophe
Sugiyama Daisuke
Sugiyama Takaaki
Summers Kim M.
Suzuki Harukazu
Suzuki Masanori
Suzuki Naoko
Swoboda Rolf K.
Tagami Michihira
Takahashi Naoko
Takai Jun
Tanaka Hiroshi
Tatsukawa Hideki
Tatum Zuotian
Taylor Martin S.
Thompson Mark
Toyoda Hiroo
Toyoda Tetsuro
Valen Eivind
van de Wetering Marc
van den Berg Linda M.
van Nimwegen Erik
Verardo Roberto
Vijayan Dipti
Vitezic Morana
Vorontsov Ilya E.
Wasserman Wyeth W.
Watanabe Shoko
Wells Christine A.
Winteringham Louise N.
Wolvetang Ernst
Wood Emily J.
Yamaguchi Yoko
Yamamoto Masayuki
Yoneda Misako
Yonekura Yohei
Yoshida Shigehiro
Young Robert S.
Zabierowski Suzan E.
Zhang Peter G.
Zhao Xiaobei
Zucchelli Silvia
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

X chromosomes are unusual in many regards, not least of which is their nonrandom gene content. The causes of this bias are commonly discussed in the context of sexual antagonism and the avoidance of activity in the male germline. Here, we examine the notion that, at least in some taxa, functionally biased gene content may more profoundly be shaped by limits imposed on gene expression owing to haploid expression of the X chromosome. Notably, if the X, as in primates, is transcribed at rates comparable to the ancestral rate (per promoter) prior to the X chromosome formation, then the X is not a tolerable environment for genes with very high maximal net levels of expression, owing to transcriptional traffic jams. We test this hypothesis using The Encyclopedia of DNA Elements (ENCODE) and data from the Functional Annotation of the Mammalian Genome (FANTOM5) project. As predicted, the maximal expression of human X-linked genes is much lower than that of genes on autosomes: on average, maximal expression is three times lower on the X chromosome than on autosomes. Similarly, autosome-to-X retroposition events are associated with lower maximal expression of retrogenes on the X than seen for X-to-autosome retrogenes on autosomes. Also as expected, X-linked genes have a lesser degree of increase in gene expression than autosomal ones (compared to the human/Chimpanzee common ancestor) if highly expressed, but not if lowly expressed. The traffic jam model also explains the known lower breadth of expression for genes on the X (and the Z of birds), as genes with broad expression are, on average, those with high maximal expression. As then further predicted, highly expressed tissue-specific genes are also rare on the X and broadly expressed genes on the X tend to be lowly expressed, both indicating that the trend is shaped by the maximal expression level not the breadth of expression per se. Importantly, a limit to the maximal expression level explains biased tissue of expression profiles of X-linked genes. Tissues whose tissue-specific genes are very highly expressed (e.g., secretory tissues, tissues abundant in structural proteins) are also tissues in which gene expression is relatively rare on the X chromosome. These trends cannot be fully accounted for in terms of alternative models of biased expression. In conclusion, the notion that it is hard for genes on the Therian X to be highly expressed, owing to transcriptional traffic jams, provides a simple yet robustly supported rationale of many peculiar features of X's gene content, gene expression, and evolution

Cold Spring Harbor Laboratory Institutional Repository

ZENODO

Directory of Open Access Journals

Edinburgh Research Explorer

Electronic Archiving System

FigShare

Public Library of Science (PLOS)

Repository for Publications and Research Data

Crossref

edoc

Dryad Digital Repository (Duke University)

PubMed Central

Copenhagen University Research Information System

eScholarship - University of California

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Utrecht University Repository

DSpace at Rice University

University of Melbourne Institutional Repository

ScholarBank@NUS

Jaccard index based similarity measure to compare transcription factor binding site models

Author: Ilya E Vorontsov
Ivan V Kulakovskiy
Vsevolod J Makeev
Publication venue: Springer Nature
Publication date: 01/01/2013
Field of study

BACKGROUND: Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model. TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds. RESULTS: We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation). CONCLUSIONS: MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query. AVAILABILITY AND IMPLEMENTATION: MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials

Springer - Publisher Connector

PubMed Central

Brain-related genes are specifically enriched with long phase 1 introns.

Author: Eugene F Baulin
Ivan V Kulakovskiy
Mikhail A Roytberg
Tatiana V Astakhova
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Intronic gene regions are mostly considered in the scope of gene expression regulation, such as alternative splicing. However, relations between basic statistical properties of introns are much rarely studied in detail, despite vast available data. Particularly, little is known regarding the relationship between the intron length and the intron phase. Intron phase distribution is significantly different at different intron length thresholds. In this study, we performed GO enrichment analysis of gene sets with a particular intron phase at varying intron length thresholds using a list of 13823 orthologous human-mouse gene pairs. We found a specific group of 153 genes with phase 1 introns longer than 50 kilobases that were specifically expressed in brain, functionally related to synaptic signaling, and strongly associated with schizophrenia and other mental disorders. We propose that the prevalence of long phase 1 introns arises from the presence of the signal peptide sequence and is connected with 1-1 exon shuffling

Directory of Open Access Journals