359 research outputs found

    ANETAC: Arabic named entity transliteration and classification dataset

    Get PDF
    In this paper, we make freely accessible ANETAC, our English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79, 924 instances, each instance is a triplet (e, a, c), where e is the English named entity, a is its Arabic transliteration and c is its class that can be either a Person, a Location, or an Organization. The ANETAC dataset is mainly aimed for the researchers that are working on Arabic named entity transliteration, but it can also be used for named entity classification purposes. This dataset was developed and used as part of a previous research study done by Hadj Ameur et al. [1]

    Review of Temephos Discriminating Concentration for Monitoring the Susceptibility of Anopheles labranchiae (Falleroni, 1926), Malaria Vector in Morocco

    Get PDF
    In Morocco, the resistance monitoring of Anopheles labranchiae larvae to temephos is done using discriminating concentration of 0.125ā€‰mg, which is half of the WHO recommended dose for Anopheles. However, this dosage seemed to be too high to allow an early detection of the resistance and its revision was found necessary. The present study was carried out during May-June 2008 and 2009 in nine provinces from the north-west of the country. The aim was to determine the lethal concentrations LC100 of temephos for the most susceptible populations and to define the discriminating dosage as the double of this value. The bioassays were conducted according to WHO standard operating protocol to establish the dose-mortality relationship and deduct the LC50 and LC95. The results of this study indicated that the LC100 obtained on the most susceptible populations was close to 0.05ā€‰mg/L. Therefore, the temephos discriminating dosage for susceptibility monitoring of An. labranchiae larvae in Morocco was set to be 0.1ā€‰mg/L

    Improving Arabic neural machine translation via n-best list re-ranking

    Get PDF
    Even though the rise of the Neural Machine Translation (NMT) paradigm has brought a great deal of improvement to the machine translation field, the current translation results are still not perfect. One of the main reasons for this imperfection is the decoding task complexity. Indeed, the problem of finding the one best translation from the space of all possible translations was and still is a challenging problem. One of the most successful ways to address it is via n-best list re-ranking which attempts to reorder the n-best decoder translations according to some defined features. In this paper, we propose a set of new re-ranking features that can be extracted directly from the parallel corpus without needing any external tools. The features set that we propose takes into account lexical, syntactic, and even semantic aspects of the n-best list translations. We also present a method for feature weights optimization that uses a Quantum-behaved Particle Swarm Optimization (QPSO) algorithm. Our system has been evaluated on multiple English-to-Arabic and Arabic-to-English machine translation test sets, and the obtained re-ranking results yield noticeable improvements over the baseline NMT systems

    Synchronous Primary Tumors of the Kidney and Pancreas: Case Report

    Get PDF
    The simultaneous presence of primary carcinomas in the same patient is uncommon and synchronous primary tumors involving the kidney and pancreas are extremely rare. There are a few reports in the English literature of synchronous primary malignancies of the kidney and pancreas. We present a 62-year-old man who had weight loss of 9 kg and epigastric pain. Findings showed a Furhman grade II renal papillary carcinoma confined to the kidney and a synchronous well differentiated pancreatic ductal adenocarcinoma.Key Words: Synchronous double cancer, renal cell carcinoma, pancreatic carcinom

    Gene expression and splicing alterations analyzed by high throughput RNA sequencing of chronic lymphocytic leukemia specimens.

    Get PDF
    BackgroundTo determine differentially expressed and spliced RNA transcripts in chronic lymphocytic leukemia specimens a high throughput RNA-sequencing (HTS RNA-seq) analysis was performed.MethodsTen CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system.ResultsAn average of 48.5 million reads for B cells, and 50.6 million reads for CLL specimens were obtained with 10396 and 10448 assembled transcripts for normal B cells and primary CLL specimens respectively. With the Cuffdiff analysis, 2091 differentially expressed genes (DEG) between B cells and CLL specimens based on FPKM (fragments per kilobase of transcript per million reads and false discovery rate, FDR qā€‰<ā€‰0.05, fold change >2) were identified. Expression of selected DEGs (nā€‰=ā€‰32) with up regulated and down regulated expression in CLL from RNA-seq data were also analyzed by qRT-PCR in a test cohort of CLL specimens. Even though there was a variation in fold expression of DEG genes between RNA-seq and qRT-PCR; more than 90 % of analyzed genes were validated by qRT-PCR analysis. Analysis of RNA-seq data for splicing alterations in CLL and B cells was performed by Multivariate Analysis of Transcript Splicing (MATS analysis). Skipped exon was the most frequent splicing alteration in CLL specimens with 128 significant events (P-value <0.05, minimum inclusion level difference >0.1).ConclusionThe RNA-seq analysis of CLL specimens identifies novel DEG and alternatively spliced genes that are potential prognostic markers and therapeutic targets. High level of validation by qRT-PCR for a number of DEG genes supports the accuracy of this analysis. Global comparison of transcriptomes of B cells, IGVH non-mutated CLL (U-CLL) and mutated CLL specimens (M-CLL) with multidimensional scaling analysis was able to segregate CLL and B cell transcriptomes but the M-CLL and U-CLL transcriptomes were indistinguishable. The analysis of HTS RNA-seq data to identify alternative splicing events and other genetic abnormalities specific to CLL is an added advantage of RNA-seq that is not feasible with other genome wide analysis

    PeakRegressor Identifies Composite Sequence Motifs Responsible for STAT1 Binding Sites and Their Potential rSNPs

    Get PDF
    How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present ā€œPeakRegressor,ā€ a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency
    • ā€¦
    corecore