202 research outputs found

    Multi-Domain Neural Machine Translation

    Get PDF
    Käesolev magistritöö kätkeb endas neurotõlke lähenemist, mis toetabmitme-domeenseid tekste ja võimaldab tõlkimisel arvestada domeenide eripära. Antud lähenemine lähtub põhimõttest, et me käsitleme domeene kui eraldiseisvaid keeli, ning kasutame nende tõlkimiseks mitmekeelse neurotõlke meetodeid. Samuti näitame et mainitud lähenemise tulemusena tõlkekvaliteedi hinnang paraneb märgatavalt. Käesolevas töös pakume välja ka lähenemise domeenide automaatseks määramiseks ja uurime, kas eelnev domeenijaotuse info on üldse vajalik. Tuleb välja, et on, kuid kui sellist infot ei ole, on automaatset määramist kasutades võimalik samuti kõrge tõlkekvaliteedini jõuda, kohati isegi kõrgemani, kui eelnevat domeenijaotuse infot kasutades. Lisaks uurime selles töös, kas keelesisene stiilile kohandamine tühipauk (zero-shot) tõlke kaudu on võimalik. Näitame, et see lähenemine on võimeline stiilile kohanduma, kuid koos siiani lahenduseta kvaliteedilangusega.In this thesis we present an approach to neural machine translation (NMT) thatsupports multiple domains in a single model and allows switching between the domains when translating. The core idea is to treat text domains as distinct languages and use multilingual NMT methods to create multi-domain translation systems; we show that this approach results in significant translation quality gains over fine-tuning. We also propose approach of unsupervised domain assignment and explore whether the knowledge of pre-specified text domains is necessary; turns out that it is after all, but also that when it is not known quite high translation quality can be reached, and even higher than with known domains in some cases. Additionally, we explore the possibility of intra-language style adaptation through zero shot translation. We show that this approach is able to styleadapt, however, with unresolved text deterioration issues

    Description and Application of Gene Expression Data Analysis Method Barcode

    Get PDF
    Käesoleva bakalaureusetöö peamised eesmärgid on üle kontrollida, kas\n\rgeeniekspressiooni andmete analüüsi meetod Barcode täiustab meetodit fRMA ja tuua\n\rerinevused visuaalselt välja.\n\rEsimene, kirjeldav osa keskendub geeniekspressiooni andmete analüüsi meetodil Barcode.\n\rBarcode'i kirjelduse käigus antakse ülevaade erinevatest Barcode'i versioonidest. Iga\n\rversiooni juures on kirjeldatud funktsionaalsused ja nende kasutamine. Põhirõhk on seejuures\n\rpandud uutele funktsionaalsustele võrreldes varasemate versioonidega.\n\rTeises, praktilises osas võrreldakse meetodeid Barcode ja fRMA (fRMA meetodi väljund\n\ron Barcode analüüsi alguspunkt). Nende kahe meetodi võrdlemiseks kasutatakse\n\rinimese geeniekspressiooni andmehulka DNA kiibi eksperimentidest. Andmehulk tähisega\n\rE-TABM-145 sisaldab 158 inimese koenäidise ekspressiooniandmeid. Kõigepealt jaotatakse\n\rneed koenäidised manuaalselt gruppidesse. Need manuaalselt loodud grupid on\n\raluseks mõlema meetodi töö hindamisele. Seejärel töödeldakse algseid andmeid nii meetodiga\n\rBarcode kui ka meetodiga fRMA. Mõlema meetodi tulemuste visualiseerimiseks\n\rja võrdlemiseks kasutatakse eraldi kahte statistilist meetodit: peakomponentanalüüs (principal\n\rcomponent analysis) ja hierarhiline klasterdamine. Mõlema statistilise meetodi\n\rväljunditele on tehtud analüüs ja võrdlus Barcode'i ja fRMA vahel. Vastavate statistiliste\n\rmeetodite väljundite võrdlusest saab järeldada, et Barcode on tõepoolest täiendab\n\rfRMA-d. Barcode võimaldab koenäidiseid apremini õigetesse klastritesse klassifitseerida -\n\rnäidised, mis tulevad samast koest on kasutades Barcode'i paremini ülejäänud näidistest\n\reraldatud kui fRMA puhul.The main goals of this thesis is to assert whether gene expression data analysis method\n\rBarcode offers improvement over the method fRMA and to visualise the difference clearly.\n\rFirst, descriptive part of this thesis focuses on the gene expression data analysis\n\rmethod Barcode. Barcode is explained by presenting an overview of different Barcode\n\rversions. For each version a description of functionalities and possible uses are given with\n\remphasis on new functionalities, compared to the older versions.\n\rSecond, practical part of this thesis compares Barcode and fRMA method(fRMA\n\rmethod output is the starting point for Barcode analysis). To compare these two methods\n\rhuman gene expression dataset of DNA microarray experiment results is used. The\n\rdataset E-TAB-145 contains expression data from 158 human tissue samples. Tissue samples\n\rare first manually clustered to use as reference in comparison of these two methods.\n\rData is then analysed with both Barcode and fRMA. To visualise and compare the result\n\rtwo statistical methods are separately used: Principal component analysis and Hierarchical\n\rclustering. For the results of both statisical analysis methods a detailed analysis\n\ris given. In the analysis it is concluded that Barcode really does offer an improvement\n\rover fRMA. Barcode allows samples to be classified better into clusters - samples of the\n\rsame tissue type are separated better from other samples compared to fRMA

    Multi-Domain Neural Machine Translation

    Get PDF
    We present an approach to neural machine translation (NMT) that supports multiple domains in a single model and allows switching between the domains when translating. The core idea is to treat text domains as distinct languages and use multilingual NMT methods to create multi-domain translation systems, we show that this approach results in significant translation quality gains over fine-tuning. We also explore whether the knowledge of pre-specified text domains is necessary, turns out that it is after all, but also that when it is not known quite high translation quality can be reached.Comment: Accepted to EAMT'2018, In Proceedings of the 21st Annual Conference of the European Association for Machine Translation (EAMT'2018

    Structure of the Borrelia burgdorferi ATP-dependent metalloprotease FtsH in its functionally relevant hexameric form

    Get PDF
    Publisher Copyright: © 2023 The AuthorsATP-dependent proteases FtsH are conserved in bacteria, mitochondria, and chloroplasts, where they play an essential role in degradation of misfolded/unneeded membrane and cytosolic proteins. It has also been demonstrated that the FtsH homologous protein BB0789 is crucial for mouse and tick infectivity and in vitro growth of the Lyme disease-causing agent Borrelia burgdorferi. This is not surprising, considering B. burgdorferi complex life cycle, residing in both in mammals and ticks, which requires a wide range of membrane proteins and short-lived cytosolic regulatory proteins to invade and persist in the host organism. In the current study, we have solved the crystal structure of the cytosolic BB0789 166 - 614, lacking both N-terminal transmembrane α-helices and the small periplasmic domain. The structure revealed the arrangement of the AAA+ ATPase and the zinc-dependent metalloprotease domains in a hexamer ring, which is essential for ATPase and proteolytic activity. The AAA+ domain was found in an ADP-bound state, while the protease domain showed coordination of a zinc ion by two histidine residues and one aspartic acid residue. The loop region that forms the central pore in the oligomer was poorly defined in the crystal structure and therefore predicted by AlphaFold to complement the missing structural details, providing a complete picture of the functionally relevant hexameric form of BB0789. We confirmed that BB0789 is functionally active, possessing both protease and ATPase activities, thus providing novel structural-functional insights into the protein, which is known to be absolutely necessary for B. burgdorferi to survive and cause Lyme disease.Peer reviewe

    Multi-Domain Neural Machine Translation

    Get PDF
    We present an approach to neural machine translation (NMT) that supports multiple domains in a single model and allows switching between the domains when translating. The core idea is to treat text domains as distinct languages and use multilingual NMT methods to create multi-domain translation systems; we show that this approach results in significant translation quality gains over fine-tuning. We also explore whether the knowledge of pre-specified text domains is necessary; turns out that it is after all, but also that when it is not known quite high translation quality can be reached, and even higher than with known domains in some cases.This work was supported by the Estonian Research Council grant no. 1226

    Survival analysis of oropharyngeal squamous cell carcinoma patients linked to histopathology, disease stage, tumor stage, risk factors, and received therapy

    Get PDF
    Publisher Copyright: Copyright © Experimental Oncology, 2020. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.Background: Survival of oropharyngeal squamous cell carcinoma (OSCC) patients depends on the risk and environmental factors, tumor biology, achievements in diagnostics and treatment approaches. Aim: To perform a survival analysis of the patients with OSCC treated over a 10-year period in a single hospital in Latvia linking these data to histopathological findings, risk factors and received therapy. Materials and Methods: The main outcome measures were overall and disease-specific survival (OS and DS) along with histopathology analysis. Results: Kaplan - Meier survival analysis showed better survival for females, younger patients lacking bad habits, operated and received radiotherapy, with lower T grade and disease stage. Cox regression showed diminished early death risk in patients with lower T grade, no regional metastases (N0) and bad habits, operated and received radiotherapy. A vast majority of tumors were localized in palatine tonsils and the base of the tongue. The localization did not correlate with mean survival time/survival. Lower OS (p = 0.03) and DS (p = 0.026) were estimated for patients with pharyngeal wall and tonsillar involvement compared to tumors localized in the soft palate. A histological variant of tumor seemed irrelevant estimating OS and DS, whereas therapeutic modalities significantly affected survival. Conclusions: OSCC patients with lower T grade, N0 status, lacking bad habits, and surgically treated had better survival.publishersversionPeer reviewe

    The dimerization mechanism of the N-terminal domain of spider silk proteins is conserved despite extensive sequence divergence

    Get PDF
    The N-terminal (NT) domain of spider silk proteins (spi-droins) is crucial for their storage at high concentrations and also regulates silk assembly. NTs from the major ampullate spidroin (MaSp) and the minor ampullate spidroin are mono-meric at neutral pH and confer solubility to spidroins, whereas at lower pH, they dimerize to interconnect spidroins in a fiber. This dimerization is known to result from modulation of electrostatic interactions by protonation of well-conserved glutamates, although it is undetermined if this mechanism applies to other spidroin types as well. Here, we determine the solution and crystal structures of the flagelliform spidroin NT, which shares only 35% identity with MaSp NT, and investigate the mechanisms of its dimerization. We show that flagelliform spidroin NT is structurally similar to MaSp NT and that the electrostatic intermolecular interaction between Asp 40 and Lys 65 residues is conserved. However, the protonation events involve a different set of residues than in MaSp, indicating that an overall mechanism of pH-dependent dimerization is conserved but can be mediated by different pathways in different silk types

    Crystal structure and proteomics analysis of empty virus-like particles of Cowpea mosaic virus

    Get PDF
    Empty virus-like particles (eVLPs) of Cowpea mosaic virus (CPMV) are currently being utilized as reagents in various biomedical and nanotechnology applications. Here, we report the crystal structure of CPMV eVLPs determined using X-ray crystallography at 2.3 Å resolution and compare it with previously reported cryo-electron microscopy (cryo-EM) of eVLPs and virion crystal structures. Although the X-ray and cryo-EM structures of eVLPs are mostly similar, there exist significant differences at the C-terminus of the small (S) subunit. The intact C-terminus of the S subunit plays a critical role in enabling the efficient assembly of CPMV virions and eVLPs, but undergoes proteolysis after particle formation. In addition, we report the results of mass spectrometry-based proteomics analysis of coat protein subunits from CPMV eVLPs and virions that identify the C-termini of S subunits undergo proteolytic cleavages at multiple sites instead of a single cleavage site as previously observed

    Production and purification of chimeric HBc virus-like particles carrying influenza virus LAH domain as vaccine candidates

    Get PDF
    Background: The lack of a universal influenza vaccine is a global health problem. Interest is now focused on structurally conserved protein domains capable of eliciting protection against a broad range of influenza virus strains. The long alpha helix (LAH) is an attractive vaccine component since it is one of the most conserved influenza hemagglutinin (HA) stalk regions. For an improved immune response, the LAH domain from H3N2 strain has been incorporated into virus-like particles (VLPs) derived from hepatitis B virus core protein (HBc) using recently developed tandem core technology. Results: Fermentation conditions for recombinant HBc-LAH were established in yeast Pichia pastoris and a rapid and efficient purification method for chimeric VLPs was developed to match the requirements for industrial scale-up. Purified VLPs induced strong antibody responses against both group 1 and group 2 HA proteins in mice. Conclusion: Our results indicate that the tandem core technology is a useful tool for incorporation of highly hydrophobic LAH domain into HBc VLPs. Chimeric VLPs can be successfully produced in bioreactor using yeast expression system. Immunologic data indicate that HBc VLPs carrying the LAH antigen represent a promising universal influenza vaccine component
    corecore