25 research outputs found

    NRC Russian-English Machine Translation System for WMT 2016

    Get PDF
    We describe the statistical machine translation system developed at the National Research Council of Canada (NRC) for the Russian-English news translation task of the First Conference on Machine Translation (WMT 2016). Our submission is a phrase-based SMT system that tackles the morphological complexity of Russian through comprehensive use of lemmatization. The core of our lemmatization strategy is to use different views of Russian for different SMT components: word alignment and bilingual neural network language models use lemmas, while sparse features and reordering models use fully inflected forms. Some components, such as the phrase table, use both views of the source. Russian words that remain out-ofvocabulary (OOV) after lemmatization are transliterated into English using a statistical model trained on examples mined from the parallel training corpus. The NRC Russian-English MT system achieved the highest uncased BLEU and the lowest TER scores among the eight participants in WMT 2016

    Complete Sequencing of pNDM-HK Encoding NDM-1 Carbapenemase from a Multidrug-Resistant Escherichia coli Strain Isolated in Hong Kong

    Get PDF
    BACKGROUND: The emergence of plasmid-mediated carbapenemases, such as NDM-1 in Enterobacteriaceae is a major public health issue. Since they mediate resistance to virtually all β-lactam antibiotics and there is often co-resistance to other antibiotic classes, the therapeutic options for infections caused by these organisms are very limited. METHODOLOGY: We characterized the first NDM-1 producing E. coli isolate recovered in Hong Kong. The plasmid encoding the metallo-β-lactamase gene was sequenced. PRINCIPAL FINDINGS: The plasmid, pNDM-HK readily transferred to E. coli J53 at high frequencies. It belongs to the broad host range IncL/M incompatibility group and is 88803 bp in size. Sequence alignment showed that pNDM-HK has a 55 kb backbone which shared 97% homology with pEL60 originating from the plant pathogen, Erwina amylovora in Lebanon and a 28.9 kb variable region. The plasmid backbone includes the mucAB genes mediating ultraviolet light resistance. The 28.9 kb region has a composite transposon-like structure which includes intact or truncated genes associated with resistance to β-lactams (bla(TEM-1), bla(NDM-1), Δbla(DHA-1)), aminoglycosides (aacC2, armA), sulphonamides (sul1) and macrolides (mel, mph2). It also harbors the following mobile elements: IS26, ISCR1, tnpU, tnpAcp2, tnpD, ΔtnpATn1 and insL. Certain blocks within the 28.9 kb variable region had homology with the corresponding sequences in the widely disseminated plasmids, pCTX-M3, pMUR050 and pKP048 originating from bacteria in Poland in 1996, in Spain in 2002 and in China in 2006, respectively. SIGNIFICANCE: The genetic support of NDM-1 gene suggests that it has evolved through complex pathways. The association with broad host range plasmid and multiple mobile genetic elements explain its observed horizontal mobility in multiple bacterial taxa

    Robust estimation of bacterial cell count from optical density

    Get PDF
    Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data

    USING SEMANTIC ROLE LABELS TO REORDER STATISTICAL MACHINE TRANSLATION OUTPUT

    No full text
    In memory of my grandfather, LAW Yuk Kau, who passed away several months before I started this study. He gave me strength, kept me determined in my research and helped me realize my career goal. He will always live in my heart. iv ACKNOWLEDGMENTS I would like to express my deepest thanks to my supervisor Professor Dekai Wu. Not only has he shared with me his insightful ideas and comments in my research, he has also supported me throughout the whole program of my study. From the time I approached him showing an interest in pursuing a postgraduate degree to now as I graduate, he has always been positive about my ability and work. His encouragement and indulgence has enabled me to complete my study. I would also like to thank Professor Pascale Fung. This thesis work has been developed on top of her research group’s product, C-ASSERT. Her generosity in sharing research ideas has been a key factor in the completion of this thesis. I am also grateful to her and Professor Brian Mak for sparing their valuable time to join my defense committee. Thanks go to Ms. Shuana Dalton and Miss Joanne Ng for helping me to review my thesis

    Supporting Data for "Advancing Biophysical Cytometry with Deep-learning (ABCD) for New-Generation Cell-based Diagnostics"

    No full text
    There are in total of 7 types of lung cancer cells (i.e. H69, H358, H520, H526, H1975, H2170 and HCC827). All the data were collected on 7 days using multi-ATOM setup, giving 3 batches per cell lines. Both single-cell brightfield and quantitative phase images (QPI) were collected.For training and testing the beGAN model, the data were separated into "Train", "Valid" and "Test" set. They are subsampled and contain 500 cells respectively as a demonstration in this repository (Folder Dataset). Data was uploaded in .mat format with brightfield images in _BF.mat and QPI in _QPI.mat. The images are stored in format of ImageHeight * ImageWidth * NoOfCells with a field of view of 45μm.</p
    corecore