12,292 research outputs found

    Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

    Full text link
    Characters have commonly been regarded as the minimal processing unit in Natural Language Processing (NLP). But many non-latin languages have hieroglyphic writing systems, involving a big alphabet with thousands or millions of characters. Each character is composed of even smaller parts, which are often ignored by the previous work. In this paper, we propose a novel architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to learn sub-character level representation and capture deeper level of semantic meanings. To build a concrete study and substantiate the efficiency of our neural architecture, we take Chinese Word Segmentation as a research case example. Among those languages, Chinese is a typical case, for which every character contains several components called radicals. Our networks employ a shared radical level embedding to solve both Simplified and Traditional Chinese Word Segmentation, without extra Traditional to Simplified Chinese conversion, in such a highly end-to-end way the word segmentation can be significantly simplified compared to the previous work. Radical level embeddings can also capture deeper semantic meaning below character level and improve the system performance of learning. By tying radical and character embeddings together, the parameter count is reduced whereas semantic knowledge is shared and transferred between two levels, boosting the performance largely. On 3 out of 4 Bakeoff 2005 datasets, our method surpassed state-of-the-art results by up to 0.4%. Our results are reproducible, source codes and corpora are available on GitHub.Comment: Accepted & forthcoming at ITNG-201

    catena-Poly[[bis­(μ-3-carboxy­benzoato)bis­(1,10-phenanthroline)tricopper(II)]-di-μ3-isophthalato]

    Get PDF
    The title copper coordination polymer, [Cu3(C8H4O4)2(C8H5O4)2(C10H8N2)2]n, was synthesized by reacting Cu(NO3)2, isophthalic acid and 1,10-phenanthroline under hydro­thermal conditions. The trinuclear unit presents a central almost planar CuO4 chromophore with the cation on a symmetry center, and two symmetry-related CuN2O3 groups with the metal centre in a distorted square-pyramidal environment. These units are bridged by isophthalate ligands into one-dimensional double-chain coordination polymers which are, in turn, connected by various π–π stacking inter­actions (face-to-face distance ca 3.45 Å) and O—H⋯O hydrogen bonds, forming a three-dimensional supra­molecular network

    Expanding CRISPR/Cas9 Genome Editing Capacity in Zebrafish Using SaCas9.

    Get PDF
    The type II CRISPR/Cas9 system has been used widely for genome editing in zebrafish. However, the requirement for the 5'-NGG-3' protospacer-adjacent motif (PAM) of Cas9 from Streptococcus pyogenes (SpCas9) limits its targeting sequences. Here, we report that a Cas9 ortholog from Staphylococcus aureus (SaCas9), and its KKH variant, successfully induced targeted mutagenesis with high frequency in zebrafish. Confirming previous findings, the SpCas9 variant, VQR, can also induce targeted mutations in zebrafish. Bioinformatics analysis of these new Cas targets suggests that the number of available target sites in the zebrafish genome can be greatly expanded. Collectively, the expanded target repertoire of Cas9 in zebrafish should further facilitate the utility of this organism for genetic studies of vertebrate biology
    corecore