874 research outputs found

    Towards shared datasets for normalization research

    Get PDF
    In this paper we present a Dutch and English dataset that can serve as a gold standard for evaluating text normalization approaches. With the combination of text messages, message board posts and tweets, these datasets represent a variety of user generated content. All data was manually normalized to their standard form using newly-developed guidelines. We perform automatic lexical normalization experiments on these datasets using statistical machine translation techniques. We focus on both the word and character level and find that we can improve the BLEU score with ca. 20% for both languages. In order for this user generated content data to be released publicly to the research community some issues first need to be resolved. These are discussed in closer detail by focussing on the current legislation and by investigating previous similar data collection projects. With this discussion we hope to shed some light on various difficulties researchers are facing when trying to share social media data

    Normalization of Dutch user-generated content

    Get PDF
    Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work

    Design and synthesis of a series of truncated neplanocin fleximers

    Get PDF
    In an effort to study the effects of flexibility on enzyme recognition and activity, we have developed several different series of flexible nucleoside analogues in which the purine base is split into its respective imidazole and pyrimidine components. The focus of this particular study was to synthesize the truncated neplanocin A fleximers to investigate their potential anti-protozoan activities by inhibition of S-adenosylhomocysteine hydrolase (SAHase). The three fleximers tested displayed poor anti-trypanocidal activities, with EC50 values around 200 μM. Further studies of the corresponding ribose fleximers, most closely related to the natural nucleoside substrates, revealed low affinity for the known T. brucei nucleoside transporters P1 and P2, which may be the reason for the lack of trypanocidal activity observed

    DMRT5, DMRT3, and EMX2 Cooperatively Repress at the Pallium-Subpallium Boundary to Maintain Cortical Identity in Dorsal Telencephalic Progenitors

    Get PDF
    Specification of dorsoventral regional identity in progenitors of the developing telencephalon is a first pivotal step in the development of the cerebral cortex and basal ganglia. Previously, we demonstrated that the two zinc finger doublesex and mab-3 related (Dmrt) genes, Dmrt5 (Dmrta2) and Dmrt3, which are coexpressed in high caudomedial to low rostrolateral gradients in the cerebral cortical primordium, are separately needed for normal formation of the cortical hem, hippocampus, and caudomedial neocortex. We have now addressed the role of Dmrt3 and Dmrt5 in controlling dorsoventral division of the telencephalon in mice of either sex by comparing the phenotypes of single knock-out (KO) with double KO embryos and by misexpressing Dmrt5 in the ventral telencephalon. We find that DMRT3 and DMRT5 act as critical regulators of progenitor cell dorsoventral identity by repressing ventralizing regulators. Early ventral fate transcriptional regulators expressed in the dorsal lateral ganglionic eminence, such as Gsx2, are upregulated in the dorsal telencephalon of Dmrt3;Dmrt5 double KO embryos and downregulated when ventral telencephalic progenitors express ectopic Dmrt5. Conditional overexpression of Dmrt5 throughout the telencephalon produces gene expression and structural defects that are highly consistent with reduced GSX2 activity. Further, Emx2;Dmrt5 double KO embryos show a phenotype similar to Dmrt3;Dmrt5 double KO embryos, and both DMRT3, DMRT5 and the homeobox transcription factor EMX2 bind to a ventral telencephalon-specific enhancer in the Gsx2 locus. Together, our findings uncover cooperative functions of DMRT3, DMRT5, and EMX2 in dividing dorsal from ventral in the telencephalon. SIGNIFICANCE STATEMENT We identified the DMRT3 and DMRT5 zinc finger transcription factors as novel regulators of dorsoventral patterning in the telencephalon. Our data indicate that they have overlapping functions and compensate for one another. The double, but not the single, knock-out produces a dorsal telencephalon that is ventralized, and olfactory bulb tissue takes over most remaining cortex. Conversely, overexpressing Dmrt5 throughout the telencephalon causes expanded expression of dorsal gene determinants and smaller olfactory bulbs. Furthermore, we show that the homeobox transcription factor EMX2 that is coexpressed with DMRT3 and DMRT5 in cortical progenitors cooperates with them to maintain dorsoventral patterning in the telencephalon. Our study suggests that DMRT3/5 function with EMX2 in positioning the pallial-subpallial boundary by antagonizing the ventral homeobox transcription factor GSX2

    Inhibition of Monkeypox virus replication by RNA interference

    Get PDF
    The Orthopoxvirus genus of Poxviridae family is comprised of several human pathogens, including cowpox (CPXV), Vaccinia (VACV), monkeypox (MPV) and Variola (VARV) viruses. Species of this virus genus cause human diseases with various severities and outcome ranging from mild conditions to death in fulminating cases. Currently, vaccination is the only protective measure against infection with these viruses and no licensed antiviral drug therapy is available. In this study, we investigated the potential of RNA interference pathway (RNAi) as a therapeutic approach for orthopox virus infections using MPV as a model. Based on genome-wide expression studies and bioinformatic analysis, we selected 12 viral genes and targeted them by small interference RNA (siRNA). Forty-eight siRNA constructs were developed and evaluated in vitro for their ability to inhibit viral replication. Two genes, each targeted with four different siRNA constructs in one pool, were limiting to viral replication. Seven siRNA constructs from these two pools, targeting either an essential gene for viral replication (A6R) or an important gene in viral entry (E8L), inhibited viral replication in cell culture by 65-95% with no apparent cytotoxicity. Further analysis with wild-type and recombinant MPV expressing green fluorescence protein demonstrated that one of these constructs, siA6-a, was the most potent and inhibited viral replication for up to 7 days at a concentration of 10 nM. These results emphasis the essential role of A6R gene in viral replication, and demonstrate the potential of RNAi as a therapeutic approach for developing oligonucleotide-based drug therapy for MPV and other orthopox viruses

    b3galt6 Knock-out zebrafish recapitulate β3GalT6-deficiency disorders in human and reveal a trisaccharide proteoglycan linkage region

    Get PDF
    Proteoglycans are structurally and functionally diverse biomacromolecules found abundantly on cell membranes and in the extracellular matrix. They consist of a core protein linked to glycosaminoglycan chains via a tetrasaccharide linkage region. Here, we show that CRISPR/Cas9-mediated b3galt6 knock-out zebrafish, lacking galactosyltransferase II, which adds the third sugar in the linkage region, largely recapitulate the phenotypic abnormalities seen in human beta 3GalT6-deficiency disorders. These comprise craniofacial dysmorphism, generalized skeletal dysplasia, skin involvement and indications for muscle hypotonia. In-depth TEM analysis revealed disturbed collagen fibril organization as the most consistent ultrastructural characteristic throughout different affected tissues. Strikingly, despite a strong reduction in glycosaminoglycan content, as demonstrated by anion-exchange HPLC, subsequent LC-MS/MS analysis revealed a small amount of proteoglycans containing a unique linkage region consisting of only three sugars. This implies that formation of glycosaminoglycans with an immature linkage region is possible in a pathogenic context. Our study, therefore unveils a novel rescue mechanism for proteoglycan production in the absence of galactosyltransferase II, hereby opening new avenues for therapeutic intervention

    Efficient mouse transgenesis using Gateway-compatible ROSA26 locus targeting vectors and F1 hybrid ES cells

    Get PDF
    The ability to rapidly and efficiently generate reliable Cre/loxP conditional transgenic mice would greatly complement global high-throughput gene targeting initiatives aimed at identifying gene function in the mouse. We report here the generation of Cre/loxP conditional ROSA26-targeted ES cells within 3–4 weeks by using Gateway® cloning to build the target vectors. The cDNA of the gene of interest can be expressed either directly by the ROSA26 promoter providing a moderate level of expression or by a CAGG promoter placed in the ROSA26 locus providing higher transgene expression. Utilization of F1 hybrid ES cells with exceptional developmental potential allows the production of germ line transmitting, fully or highly ES cell-derived mice by aggregation of cells with diploid embryos. The presented streamlined procedures accelerate the examination of phenotypical consequences of transgene expression. It also provides a unique tool for comparing the biological activity of polymorphic or splice variants of a gene, or products of different genes functioning in the same or parallel pathways in an overlapping manner
    corecore