12 research outputs found
Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network
Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism
Queer In AI: A Case Study in Community-Led Participatory AI
We present Queer in AI as a case study for community-led participatory design
in AI. We examine how participatory design and intersectional tenets started
and shaped this community's programs over the years. We discuss different
challenges that emerged in the process, look at ways this organization has
fallen short of operationalizing participatory and intersectional principles,
and then assess the organization's impact. Queer in AI provides important
lessons and insights for practitioners and theorists of participatory methods
broadly through its rejection of hierarchy in favor of decentralization,
success at building aid and programs by and for the queer community, and effort
to change actors and institutions outside of the queer community. Finally, we
theorize how communities like Queer in AI contribute to the participatory
design in AI more broadly by fostering cultures of participation in AI,
welcoming and empowering marginalized participants, critiquing poor or
exploitative participatory practices, and bringing participation to
institutions outside of individual research projects. Queer in AI's work serves
as a case study of grassroots activism and participatory methods within AI,
demonstrating the potential of community-led participatory methods and
intersectional praxis, while also providing challenges, case studies, and
nuanced insights to researchers developing and using participatory methods.Comment: To appear at FAccT 202
Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network
Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism
Synthesis of Phthalyl Substituted Imidazolones and Schiff Bases as Antimicrobial Agents
A new series of phthalyl substituted imidazolones (4a–g) and Schiff bases (5a–d) were synthesized from 2-methyl-(m-nitro-1,3-dioxo-1,3-dihydro-(2H)-isoindole-2-yl)-5-amino-1,3,4-thiadiazole (3a–b). Compounds (3a–b) were prepared by cyclisation of 2-(m-nitro-1,3-dioxo-1,3-dihydro-(2H)-isoindole-2-yl)methyl ethanoate (2) with thiosemicarbazide. 2-(m-nitro-1,3-dioxo-1,3-dihydro-(2H)-isoindole-2-yl)ethanoic acid (1) in presence of thionyl chloride and methanol gave the ester (2) while compound (1) was synthesized by aminolysis of phthalic anhydride with glycine. The compounds were characterized by spectral techniques of IR, 1H NMR, Mass and elemental analysis. All the synthesized compounds (4a–g) and (5a–d) were screened for their antibacterial activity against the pathogenic strains E. coli, P. aureus, C. freundii while antifungal activity was evaluated against A. niger, A. flavus, Penicillium sp. and C. albicans
Biologically relevant transfer learning improves transcription factor binding prediction
Background
Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets. Transfer learning can reduce the amount of data required for deep learning, while improving overall model performance, compared to training a separate model for each new task.
Results
We assess a transfer learning strategy for TF binding prediction consisting of a pre-training step, wherein we train a multi-task model with multiple TFs, and a fine-tuning step, wherein we initialize single-task models for individual TFs with the weights learned by the multi-task model, after which the single-task models are trained at a lower learning rate. We corroborate that transfer learning improves model performance, especially if in the pre-training step the multi-task model is trained with biologically relevant TFs. We show the effectiveness of transfer learning for TFs with ~ 500 ChIP-seq peak regions. Using model interpretation techniques, we demonstrate that the features learned in the pre-training step are refined in the fine-tuning step to resemble the binding motif of the target TF (i.e., the recipient of transfer learning in the fine-tuning step). Moreover, pre-training with biologically relevant TFs allows single-task models in the fine-tuning step to learn useful features other than the motif of the target TF.
Conclusions
Our results confirm that transfer learning is a powerful technique for TF binding prediction.Medicine, Faculty ofScience, Faculty ofNon UBCStatistics, Department ofReviewedFacultyResearche
Evidence of transcription at polyT short tandem repeats
Background: Using the Cap Analysis of Gene Expression technology, the FANTOM5 consortium provided one of the most comprehensive maps of Transcription Start Sites (TSSs) in several species. Strikingly, ∼ 72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Results: Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at short tandem repeats (STRs) corresponding to homopolymers of thymidines (T). Additional analyses confirm that these CAGEs are truly associated with transcriptionally active chromatin marks. Furthermore, we train a sequence-based deep learning model able to predict CAGE signal at T STRs with high accuracy (∼ 81%). Extracting features learned by this model reveals that transcription at T STRs is mostly directed by STR length but also instructions lying in the downstream sequence. Excitingly, our model also predicts that genetic variants linked to human diseases affect this STR-associated transcription.Conclusions: Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism. We also provide a new metric that can be considered in future studies of STR-related complex traits
Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network
Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of Transcription Start Sites (TSSs) in several species. Strikingly, ~ 72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probed these unassigned TSSs and showed that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we developed Cap Trap RNA-seq, a technology which combines cap trapping and long reads MinION sequencing. We trained sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveiled the importance of STR surrounding sequences not only to distinguish STR classes, as defined by the repeated DNA motif, one from each other, but also to predict their transcription. Excitingly, our models predicted that genetic variants linked to human diseases affect STR-associated transcription and correspond precisely to the key positions identified by our models to predict transcription. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism