4,925 research outputs found

    Linking Motif Sequences with Tale Types by Machine Learning

    Get PDF
    Abstract units of narrative content called motifs constitute sequences, also known as tale types. However whereas the dependency of tale types on the constituent motifs is clear, the strength of their bond has not been measured this far. Based on the observation that differences between such motif sequences are reminiscent of nucleotide and chromosome mutations in genetics, i.e., constitute "narrative DNA", we used sequence mining methods from bioinformatics to learn more about the nature of tale types as a corpus. 94% of the Aarne-Thompson-Uther catalogue (2249 tale types in 7050 variants) was listed as individual motif strings based on the Thompson Motif Index, and scanned for similar subsequences. Next, using machine learning algorithms, we built and evaluated a classifier which predicts the tale type of a new motif sequence. Our findings indicate that, due to the size of the available samples, the classification model was best able to predict magic tales, novelles and jokes

    Learning a Better Motif Index: Toward Automated Motif Extraction

    Get PDF
    Motifs are distinctive recurring elements found in folklore, and are used by folklorists to categorize and find tales across cultures and track the genetic relationships of tales over time. Motifs have significance beyond folklore as communicative devices found in news, literature, press releases, and propaganda that concisely imply a large constellation of culturally-relevant information. Until now, folklorists have only extracted motifs from narratives manually, and the conceptual structure of motifs has not been formally laid out. In this short paper we propose that it is possible to automate the extraction of both existing and new motifs from narratives using supervised learning techniques and thereby possible to learn a computational model of how folklorists determine motifs. Automatic extraction would enable the construction of a truly comprehensive motif index, which does not yet exist, as well as the automatic detection of motifs in cultural materials, opening up a new world of narrative information for analysis by anyone interested in narrative and culture. We outline an experimental design, and report on our efforts to produce a structured form of Thompson\u27s motif index, as well as a development annotation of motifs in a small collection of Russian folklore. We propose several initial computational, supervised approaches, and describe several possible metrics of success. We describe lessons learned and difficulties encountered so far, and outline our plan going forward

    Computational analysis of transcriptional regulation in metazoans

    Get PDF
    This HDR thesis presents my work on transcriptional regulation in metazoans (animals). As a computational biologist, my research activities cover both the development of new bioinformatics tools, and contributions to a better understanding of biological questions. The first part focuses on transcription factors, with a study of the evolution of Hox and ParaHox gene families across meta- zoans, for which I developed HoxPred, a bioinformatics tool to automatically classify these genes into their groups of homology. Transcription factors regulate their target genes by binding to short cis-regulatory elements in DNA. The second part of this thesis introduces the prediction of these cis-regulatory elements in genomic sequences, and my contributions to the development of user- friendly computational tools (RSAT software suite and TRAP). The third part covers the detection of these cis-regulatory elements using high-throughput sequencing experiments such as ChIP-seq or ChIP-exo. The bioinformatics developments include reusable pipelines to process these datasets, and novel motif analysis tools adapted to these large datasets (RSAT peak-motifs and ExoProfiler). As all these approaches are generic, I naturally apply them to diverse biological questions, in close collaboration with experimental groups. In particular, this third part presents the studies uncover- ing new DNA sequences that are driving or preventing the binding of the glucocorticoid receptor. Finally, my research perspectives are introduced, especially regarding further developments within the RSAT suite enabling cross-species conservation analyses, and new collaborations with exper- imental teams, notably to tackle the epigenomic remodelling during osteoporosis.Cette thĂšse d’HDR prĂ©sente mes travaux concernant la rĂ©gulation transcriptionelle chez les mĂ©tazoaires (animaux). En tant que biologiste computationelle, mes activitĂ©s de recherche portent sur le dĂ©veloppement de nouveaux outils bioinformatiques, et contribuent Ă  une meilleure comprĂ©hension de questions biologiques. La premiĂšre partie concerne les facteurs de transcriptions, avec une Ă©tude de l’évolution des familles de gĂšnes Hox et ParaHox chez les mĂ©tazoaires. Pour cela, j’ai dĂ©veloppĂ© HoxPred, un outil bioinformatique qui classe automatiquement ces gĂšnes dans leur groupe d’homologie. Les facteurs de transcription rĂ©gulent leurs gĂšnes cibles en se fixant Ă  l’ADN sur des petites rĂ©gions cis-rĂ©gulatrices. La seconde partie de cette thĂšse introduit la prĂ©diction de ces Ă©lĂ©ments cis-rĂ©gulateurs au sein de sĂ©quences gĂ©nomiques, et prĂ©sente mes contributions au dĂ©veloppement d’outils accessibles aux non-spĂ©cialistes (la suite RSAT et TRAP). La troisiĂšme partie couvre la dĂ©tection de ces Ă©lĂ©ments cis-rĂ©gulateurs grĂące aux expĂ©riences basĂ©es sur le sĂ©quençage Ă  haut dĂ©bit comme le ChIP-seq ou le ChIP-exo. Les dĂ©veloppements bioinformatiques incluent des pipelines rĂ©utilisables pour analyser ces jeux de donnĂ©es, ainsi que de nouveaux outils d’analyse de motifs adaptĂ©s Ă  ces grands jeux de donnĂ©es (RSAT peak-motifs et ExoProfiler). Comme ces approches sont gĂ©nĂ©riques, je les applique naturellement Ă  des questions biologiques diverses, en Ă©troite collaboration avec des groupes expĂ©rimentaux. En particulier, cette troisiĂšme partie prĂ©sente les Ă©tudes qui ont permis de mettre en Ă©vidence de nouvelles sĂ©quences d’ADN qui favorisent ou empĂȘchent la fixation du rĂ©cepteur aux glucocorticoides. Enfin, mes perspectives de recherche sont prĂ©sentĂ©es, plus particuliĂšrement concernant les nouveaux dĂ©veloppements au sein de la suite RSAT pour permettre des analyses basĂ©es sur la conservation inter-espĂšces, mais aussi de nouvelles collaborations avec des Ă©quipes expĂ©rimentales, notamment pour Ă©udier le remodelage Ă©pigĂ©nomique au cours de l’ostĂ©oporose

    Communicating with Culture: How Humans and Machines Detect Narrative Elements

    Get PDF
    To understand how people communicate, we must understand how they leverage shared stories and all the knowledge, information, and associations contained within those stories. I examine three classes of narrative elements that convey a wealth of cultural knowledge: Propp\u27s morphology, motifs, and discourse structure. Propp\u27s morphology communicates how roles and actions drive a narrative forward; motifs fill those roles and actions with specific, remarkable events; discourse groups these into a coherent structure to convey a point. My thesis has three aims: first, to demonstrate that people can reliably detect and identify all three of these narrative elements; second, to develop automatic detectors for discourse and motifs; third, to demonstrate the deep relation between these narrative elements and other theories of narrative structure and knowledge representation that I refer to as the \textit{continuum of communication}. The first step of my work answers two key questions about Propp\u27s morphology by demonstrating the reliability of annotators applying Propp\u27s scheme across a variety of experiments, in a double-blind annotation study. Additionally, I demonstrate a shortcoming in Propp\u27s scheme, demonstrating areas in which there are elements present in the folktales he analyzed that are not part of his morphology. The second step of my work, showing that people familiar with motifs can reliably detect when they are being used to share information and associations, approaches this problem by performing a large-scale annotation study of 21,000 examples into four categories performed by three pairs of annotators over a period of 11 weeks. I show that, in a double-blind annotation study, people familiar with the motifs had a moderate to high degree of agreement, demonstrating the reliability of humans at this task. The third step demonstrates the reliability of applying a theory of news discourse structure to news articles via a double-blind annotation study and, using the results of this annotation, demonstrate a preliminary detector of the news discourse function of paragraphs in news articles. The fourth step of my work, detecting motific usage automatically, consists of a large-scale pipeline that achieves moderate performance. This pipeline is the first work towards automatically detecting motific usage of motifs and beats out simple baselines while comparing favorably too and generalizing better than a simple neural network baseline system. Additionally, the pipeline uses explainable features that can be used in future work to further develop our understanding of how humans automatically detect motifs. Finally, I describe an exploration of the broader scope of narrative elements that communicate information between individuals who share a cultural or sub-cultural background. This work is based off of a small-scale, in-lab annotation of posts from the “incel” subculture, a niche internet community with extremist elements and, at times, disturbing content. This small annotation has revealed a complex landscape encompassing fourteen categories, more than three times the number of elements as the large-scale annotation, many of which resemble the moving parts of other theories on narrative structure and cognition, including Vladimir Propp\u27s morphology of folktales and Silvan Tomkins\u27 script theory. I describe these relations and provide a rough continuum of the landscape of narrative communication

    Predicting Off-Target Potential of CRISPR-Cas9 Single Guide RNA

    Get PDF
    With advancements in the field of genome engineering, researchers have come up with potential ways for site-specific gene editing. One of the methods uses the Clustered Regularly Interspaced Short Palindromic Repeats - CRISPR-Cas technology. It consists of a Cas9 nuclease and a single guide RNA (sgRNA) that cleaves the DNA at the intended target site. However, the target genome could contain multiple potential off-target sites and cleaving an off-target site can have deleterious effects in case of gene editing in humans. Lab based assays have been developed to test the off-target effects of guide RNAs. However, it is not feasible to scale these assays for reasons related to cost and labor. The use of Machine Learning models to compute the off-target potential makes these calculations cheaper and scalable. Both, classification as well as regression, can be used to solve this problem. In this project, we explore three classification models - Support Vector Machines (SVM), Logistic Regression and Convolutional Neural Networks (CNN)

    Integrating Cultural Knowledge into Artificially Intelligent Systems: Human Experiments and Computational Implementations

    Get PDF
    With the advancement of Artificial Intelligence, it seems as if every aspect of our lives is impacted by AI in one way or the other. As AI is used for everything from driving vehicles to criminal justice, it becomes crucial that it overcome any biases that might hinder its fair application. We are constantly trying to make AI be more like humans. But most AI systems so far fail to address one of the main aspects of humanity: our culture and the differences between cultures. We cannot truly consider AI to have understood human reasoning without understanding culture. So it is important for cultural information to be embedded into AI systems in some way, as well as for the AI systems to understand the differences across these cultures. The main way I have chosen to do this are using two cultural markers: motifs and rituals. This is because they are both so inherently part of any culture. Motifs are things that are repeated often and are grounded in well-known stories, and tend to be very specific to individual cultures. Rituals are something that are part of every culture in some way, and while there are some that are constant across all cultures, some are very specific to individual ones. This makes them great to compare and to contrast. The first two parts of this dissertation talk about a couple of cognitive psychology studies I conducted. The first is to see how people understood motifs. Is is true that in-culture people identify motifs better than out-culture people? We see that my study shows this to indeed be the case. The second study attempts to test if motifs are recognizable in texts, regardless of whether or not people might understand their meaning. Our results confirm our hypothesis that motifs are recognizable. The third part of my work discusses the survey and data collection effort around rituals. I collected data about rituals from people from various national groups, and observed the differences in their responses. The main results from this was twofold: first, that cultural differences across groups are quantifiable, and that they are prevalent and observable with proper effort; and second, to collect and curate a substantial culturally sensitive dataset that can have a wide variety of use across various AI systems. The fourth part of the dissertation focuses on a system I built, called the motif association miner, which provides information about motifs present in input text, like associations, sources of motifs, connotations, etc. This information will be highly useful as this will enable future systems to use my output as input for their systems, and have a better understanding of motifs, especially as this shows an approach of bringing out meaning of motifs specific to certain culture to wider usage. As the final contribution, this thesis details my efforts to use the curated ritual data to improve existing Question Answering system, and show that this method helps systems perform better in situations which vary by culture. This data and approach, which will be made publicly available, will enable others in the field to take advantage of the information contained within to try and combat some bias in their systems

    Identification of StBEL5 RNA as a long-distance mobile signal in short-day facilitated tuber formation

    Get PDF
    In potato, the BEL1-like transcription factor, StBEL5 and its protein partner POTH1, regulate tuber formation by mediating hormone levels in the stolon tip. Heterografting experiments show that StBEL5 mRNA can move across the graft union to localize in stolon tips and enhance tuber formation. Over-expression of StBEL5 full-length transcripts including the untranslated regions (UTRs) endows transgenic lines with the capacity to overcome the long-day inhibitory effects on tuber formation.;In this study, the precise localization of endogenous StBEL5 mRNA and other gene specific transcripts in the vascular tissues were determined by laser microdissection coupled to laser pressure catapulting (LMPC) and following RT-PCR. The results demonstrate the presence of StBEL5 mRNA in phloem cells which is consistent with its role as a mobile RNA.;StBEL5 full length transcripts exhibited better mobility compared to its UTR-truncated form and StBEL14 transcripts in agroinfiltration experiments. No translation enhancement was observed for full length StBEL5 transcripts compared to its UTR-truncated transcripts by in vitro translation assay. This indicates involvement of StBEL5 UTRs in facilitating its long-distance movement. Further studies of applying EMSA and northwestern blotting to identify StBEL5 mRNA binding proteins may provide pivotal cues to understand the mechanism of phloem delivered StBEL5 RNA in this long-distance signaling pathway during tuber formation
    • 

    corecore