123 research outputs found

    Advanced Techniques for the Decipherment of Ancient Scripts

    Get PDF
    This contribution explores modern and traditional approaches to the decipherment of ancient writing systems. It surveys methods used by paleographers and epigraphers and state-of-the art applications of computational linguistics, such as models based on neural networks. It frames the contextual problems scholars encounter in dealing with ancient codes, the situations and preconditions of the unknown codes, their idiosyncrasies and peculiarities, and the potential solutions afforded by both traditional and novel methods of investigation

    Itzulpen automatiko gainbegiratu gabea

    Get PDF
    192 p.Modern machine translation relies on strong supervision in the form of parallel corpora. Such arequirement greatly departs from the way in which humans acquire language, and poses a major practicalproblem for low-resource language pairs. In this thesis, we develop a new paradigm that removes thedependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervisedmachine translation systems. For that purpose, our approach first aligns separately trained wordrepresentations in different languages based on their structural similarity, and uses them to initializeeither a neural or a statistical machine translation system, which is further trained through iterative backtranslation.While previous attempts at learning machine translation systems from monolingual corporahad strong limitations, our work¿along with other contemporaneous developments¿is the first to reportpositive results in standard, large-scale settings, establishing the foundations of unsupervised machinetranslation and opening exciting opportunities for future research

    Itzulpen automatiko gainbegiratu gabea

    Get PDF
    192 p.Modern machine translation relies on strong supervision in the form of parallel corpora. Such arequirement greatly departs from the way in which humans acquire language, and poses a major practicalproblem for low-resource language pairs. In this thesis, we develop a new paradigm that removes thedependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervisedmachine translation systems. For that purpose, our approach first aligns separately trained wordrepresentations in different languages based on their structural similarity, and uses them to initializeeither a neural or a statistical machine translation system, which is further trained through iterative backtranslation.While previous attempts at learning machine translation systems from monolingual corporahad strong limitations, our work¿along with other contemporaneous developments¿is the first to reportpositive results in standard, large-scale settings, establishing the foundations of unsupervised machinetranslation and opening exciting opportunities for future research

    Exploitation and engineering of lipopeptide biosynthesis in myxobacteria

    Get PDF
    To gain a deep understanding of the lipopeptide biosynthesis in myxobacteria, a comprehensive screening of myxobacterial genomes was initially carried out in the course of this thesis leading to the identification and characterization of novel lipopeptide biosynthetic pathways. By following this strategy, four yet unknown lipopeptide cores were predicted and further structurally characterized to ultimately prove the predicted structures. On the basis of detailed sequence analyses of the underlying biosynthetic pathways, the structural differences of the lipopeptide cores could be rationalized on a genetic basis. These studies also contributed to the elucidation of the genetic mechanisms, by which the different biosynthetic pathways have evolved. Furthermore, the identified lipopeptide biosynthetic pathways were used as model systems to establish synthetic expression platforms. In the course of this thesis, a versatile assembly strategy for the construction of artificial lipopeptide gene clusters was developed, which allowed the generation and heterologous expression of unnatural lipopeptide biosynthetic pathways based on an established gene library via combinatorial biosynthesis. These studies led to the production of five novel lipopeptide scaffolds and impressively demonstrate the huge potential of synthetic biology techniques compared to classical approaches. Moreover, the described strategy allows the rapid modification of the artificial biosynthetic pathways. Furthermore, the identified lipopeptide biosynthetic pathways were used as model systems to establish synthetic expression platforms. In the course of this thesis, a versatile assembly strategy for the construction of artificial lipopeptide gene clusters was developed, which allowed the generation and heterologous expression of unnatural lipopeptide biosynthetic pathways based on an established gene library via combinatorial biosynthesis. These studies led to the production of five novel lipopeptide scaffolds and impressively demonstrate the huge potential of synthetic biology techniques compared to classical approaches. Moreover, the described strategy allows the rapid modification of the artificial biosynthetic pathways.Um ein breites Verständnis der Lipopeptid-Biosynthese in Myxobakterien zu erhalten, wurden im Rahmen dieser Arbeit neue Lipopeptid-Biosynthesewege unter Einsatz eines umfangreichen Screenings myxobakterieller Genome identifiziert und charakterisiert. Auf diesem Weg konnten vier bisher unbekannte Lipopeptid-Gerüste vorhergesagt und im weiteren Verlauf durch Strukturaufklärung bestätigt werden. Daneben konnten anhand detaillierter Sequenzanalysen der beteiligten Biosynthesewege die strukturellen Unterschiede der Lipopeptid-Gerüste auf genetischer Ebene erklärt werden. Diese Untersuchungen haben ebenfalls zur Aufklärung der genetischen Mechanismen beigetragen, welche zur Evolution dieser Biosynthesewege geführt haben. Darüber hinaus wurden die identifizierten Lipopeptid-Biosynthesewege als Modellsysteme zur Etablierung synthetischer Expressionsplattformen herangezogen. Im Rahmen dieser Arbeit konnte eine flexible Assemblierungsstrategie zur Konstruktion artifizieller Lipopeptid-Gencluster entwickelt und eine Genbibliothek generiert werden, auf deren Basis nicht natürliche Lipopeptid-Biosynthesewege mittels kombinatorischer Biosynthese erzeugt und heterolog exprimiert werden konnten. Diese Studien führten zur Produktion von fünf neuartigen Lipopeptid-Gerüsten und demonstrieren eindrucksvoll die Vorteile synthetisch-biologischer Methoden gegenüber klassischen Ansätzen. Die beschriebene Strategie erlaubt darüber hinaus die schnelle Modifikation der artifiziellen Biosynthesewege

    The potential of automatic word comparison for historical linguistics

    Get PDF
    The amount of data from languages spoken all over the world is rapidly increasing. Traditional manual methods in historical linguistics need to face the challenges brought by this influx of data. Automatic approaches to word comparison could provide invaluable help to pre-analyze data which can be later enhanced by experts. In this way, computational approaches can take care of the repetitive and schematic tasks leaving experts to concentrate on answering interesting questions. Here we test the potential of automatic methods to detect etymologically related words (cognates) in cross-linguistic data. Using a newly compiled database of expert cognate judgments across five different language families, we compare how well different automatic approaches distinguish related from unrelated words. Our results show that automatic methods can identify cognates with a very high degree of accuracy, reaching 89% for the best-performing method Infomap. We identify the specific strengths and weaknesses of these different methods and point to major challenges for future approaches. Current automatic approaches for cognate detection-although not perfect -could become an important component of future research in historical linguistics.As part of the GlottoBank Project, this work was supported by the Max Planck Institute for the Science of Human History and the Royal Society of New Zealand Marsden Fund grant 13¬UOA-121. This paper was further supported by the DFG research fellowship grant 261553824 “Vertical and lateral aspects of Chinese dialect history”(JML), and the Australian Research Council’s Discovery Projects funding scheme (project number DE120101954, SJG)

    Regulation of the kRAS Promoter in Pancreatic Cancer by Proteins and Small Molecules

    Get PDF
    DNA-binding proteins play a pivotal role in cell biology. The major class of DNA-binding proteins are transcription factors (TFs). TFs are central to almost every fundamental cellular process such as cell development, differentiation, cell growth, and gene expression. They account for 10% of the genes in eukaryotes. In mammals, more than 700 TFs are identified to be DNA-binding TFs. They bind to the TF binding sites (TFBSs) in the genome and regulate the expression of their target genes. kRAS is a proto-oncogene with intrinsic GTPase activity, that contributes to cell proliferation, division, and apoptosis. kRAS mutations are observed in \u3e95% of pancreatic adenocarcinoma and in 30 % of all human tumors. Pancreatic cancer is the fourth most deadly cancer, with 5 year survival rate of ~6%. When kRAS is mutated it leads to constitutive activity and uncontrolled proliferation, which results in increased tumorigenicity and poor prognosis. Other than mutation, kRAS gene amplification, overexpression, or increased upstream activation is also observed. Downregulating kRAS expression has shown to halt proliferation and lead to cellular death in pancreatic cancer models, but to date no small molecule capable of silencing expression has been described. Moreover, the kRAS promoter region is G-rich and is a hot spot for binding of TFs. TF binding and function in respect to kRAS transcription, is not yet mapped, leading to a gap in understanding of kRAS transcriptional regulation. In the current study, our purpose was to: a) identify and evaluate the function and binding interactions of TF’s on the regulation of kRAS, with a particular focus on two putative G-quadruplex (G4)-forming regions (herein termed near and mid) and the core region from 0 - +50, respective to the transcriptional start site and (b) to evaluate the effect of novel G4 stabilizing compounds on the kRAS expression. This study evaluated biological effects in both an isolated system with plasmids in HEK-293 cells by luciferase assay, and in complex in vitro milieus within pancreatic cancer cell lines by RT-qPCR. Protein changes were evaluated by western blotting. TF binding to the kRAS promoter was predicted based on consensus binding sites by online tools, and by direct binding was probed by Qiagen, and by us using a promoter binding array kit, and DNA pulldown folloby LC-MS/MS. EMSA was utilized for binding studies and effects on G4 formation and stability profile was probed by ECD. For the identification of kRAS-G4 interactive molecules we used FRET, ECD, luciferase assay and RT-qPCR. This mapping of TF binding to the kRAS promoter, and the demonstration of their function as their transcriptional silencers and activators and identification of G4-interactive molecules is important piece of the puzzle associated with the kRAS regulation

    Systems protobiology:Origin of life in lipid catalytic networks

    Get PDF
    Life is that which replicates and evolves, but there is no consensus on how life emerged. We advocate a systems protobiology view, whereby the first replicators were assemblies of spontaneously accreting, heterogeneous and mostly non-canonical amphiphiles. This view is substantiated by rigorous chemical kinetics simulations of the graded autocatalysis replication domain (GARD) model, based on the notion that the replication or reproduction of compositional information predated that of sequence information. GARD reveals the emergence of privileged non-equilibrium assemblies (composomes), which portray catalysis-based homeostatic (concentration-preserving) growth. Such a process, along with occasional assembly fission, embodies cell-like reproduction. GARD pre-RNA evolution is evidenced in the selection of different composomes within a sparse fitness landscape, in response to environmental chemical changes. These observations refute claims that GARD assemblies (or other mutually catalytic networks in the metabolism first scenario) cannot evolve. Composomes represent both a genotype and a selectable phenotype, anteceding present-day biology in which the two are mostly separated. Detailed GARD analyses show attractor-like transitions from random assemblies to self-organized composomes, with negative entropy change, thus establishing composomes as dissipative systemstextemdashhallmarks of life. We show a preliminary new version of our model, metabolic GARD (M-GARD), in which lipid covalent modifications are orchestrated by non-enzymatic lipid catalysts, themselves compositionally reproduced. M-GARD fills the gap of the lack of true metabolism in basic GARD, and is rewardingly supported by a published experimental instance of a lipid-based mutually catalytic network. Anticipating near-future far-reaching progress of molecular dynamics, M-GARD is slated to quantitatively depict elaborate protocells, with orchestrated reproduction of both lipid bilayer and lumenal content. Finally, a GARD analysis in a whole-planet context offers the potential for estimating the probability of life's emergence. The invigorated GARD scrutiny presented in this review enhances the validity of autocatalytic sets as a bona fide early evolution scenario and provides essential infrastructure for a paradigm shift towards a systems protobiology view of life's origin
    corecore