343 research outputs found

    Choosing an NLP library for analyzing software documentation: a systematic literature review and a series of experiments

    Get PDF
    To uncover interesting and actionable information from natural language documents authored by software developers, many researchers rely on "out-of-the-box" NLP libraries. However, software artifacts written in natural language are different from other textual documents due to the technical language used. In this paper, we first analyze the state of the art through a systematic literature review in which we find that only a small minority of papers justify their choice of an NLP library. We then report on a series of experiments in which we applied four state-of-the-art NLP libraries to publicly available software artifacts from three different sources. Our results show low agreement between different libraries (only between 60% and 71% of tokens were assigned the same part-of-speech tag by all four libraries) as well as differences in accuracy depending on source: For example, spaCy achieved the best accuracy on Stack Overflow data with nearly 90% of tokens tagged correctly, while it was clearly outperformed by Google's SyntaxNet when parsing GitHub ReadMe files. Our work implies that researchers should make an informed decision about the particular NLP library they choose and that customizations to libraries might be necessary to achieve good results when analyzing software artifacts written in natural language.Fouad Nasser A Al Omran, Christoph Treud

    Synthesis of some nucleosides derivatives from L- rhamnose with expected biological activity

    Get PDF
    Practical procedures for production of variously blocked compounds from L-rhamnose have been developed. These compounds are highly useful as indirect β-L-rhamnosyl donors. This approach represents a new method for the synthesis of aromatic nucleoside analogues and the synthesis of (3S, 4S, 5S, 6R) 3, 4, 5-triacetoxy-2-methyl-7,9-diaza-1-oxa-spiro [4,5]decane-10-one-8-thione (7)

    Assessment of heavy metal pollution in the Great Al-Mussaib irrigation channel

    Get PDF
    The Great Al- Mussaib Channel (GMC), in Babylon province, Iraq, has been selected as a case study to measure the concentration of nine heavy metals (Pb, Ni, Zn, Fe, Cd, Cr, Cu, Mn and Co) in both water and sediments of the GMC. The latter is used as a raw water source for two cities, which reveals the importance of the current study. Where, any heavy metals pollution could cause significant health problems for the population of these cities. The obtained results revealed that the concentrations of the studied heavy metals in the water of the GMC were less than the pollution levels and followed the order: Pb < Ni < Cu < Cr < Mn < Zn < Fe. It is noteworthy to highlight that the concentrations of Co and Cd were below the detectable limits. Additionally, the results obtained from the analyses of the studied sediment samples showed, according to the values of Pollution Load Index (PLI) and Geo-accumulation Index (Igeo), that the concentrations of studied metals were less than the pollution levels (except for a few cases) and followed the order: Cd < Co < Cu < Pb < Ni < Cr < Zn < Mn < Fe

    ENVIRONMENTAL ASSESSMENT OF AL-HILLAH RIVER POLLUTION AT BABIL GOVERNORATE (IRAQ)

    Get PDF
    In this study, the environmental characteristics of Al-Hillah River were studied using geoinformatics applications, which is one of the geospatial techniques (GST). Applying this methodology, a geographic information system was developed, and it was supplied with laboratory data for the physical and chemical properties of 16 parameters for 2021. These data were linked to their spatial locations, using radar imagery of the Digital Elevation Model (Shuttle Radar Topography Mission), and Landsat ETM+7 satellite image. The results indicated that Al-Hillah River was affected by the liquid discharges of factories, cities, and farms spread on its sides, especially in the cities of Sadat Al-Hindiya, Al-Hillah, and Al-Hashimiyah. The seasonal changes in the climate affected some characteristics, including water temperature, pH, turbidity, total dissolved solids, and total hardness. The study showed that the concentration of sulfate (SO4) has risen above the permissible limits for the waters of Iraqi rivers. There are relatively high hardness and alkalinity values, but they were within the permissible limits. The study also showed that most of the results of environmental parameters that were used in the laboratory, were within the permissible limits of Iraqi water, except for sulfates. The justification for conducting this study is to help government agencies and decision-makers to adopt a correct vision for development projects that serve Babil Governorate. Also, it is the first time that the environmental characteristics of Al-Hillah River are studied using geoinformatics applications

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    The prevalence of adaptive immunity to COVID-19 and reinfection after recovery - a comprehensive systematic review and meta-analysis.

    Get PDF
    This study aims to estimate the prevalence and longevity of detectable SARS-CoV-2 antibodies and T and B memory cells after recovery. In addition, the prevalence of COVID-19 reinfection and the preventive efficacy of previous infection with SARS-CoV-2 were investigated. A synthesis of existing research was conducted. The Cochrane Library, the China Academic Journals Full Text Database, PubMed, and Scopus, and preprint servers were searched for studies conducted between 1 January 2020 to 1 April 2021. Included studies were assessed for methodological quality and pooled estimates of relevant outcomes were obtained in a meta-analysis using a bias adjusted synthesis method. Proportions were synthesized with the Freeman-Tukey double arcsine transformation and binary outcomes using the odds ratio (OR). Heterogeneity was assessed using the I and Cochran's Q statistics and publication bias was assessed using Doi plots. Fifty-four studies from 18 countries, with around 12,000,000 individuals, followed up to 8 months after recovery, were included. At 6-8 months after recovery, the prevalence of SARS-CoV-2 specific immunological memory remained high; IgG - 90.4% (95%CI 72.2-99.9, I = 89.0%), CD4+ - 91.7% (95%CI 78.2-97.1y), and memory B cells 80.6% (95%CI 65.0-90.2) and the pooled prevalence of reinfection was 0.2% (95%CI 0.0-0.7, I = 98.8). Individuals previously infected with SARS-CoV-2 had an 81% reduction in odds of a reinfection (OR 0.19, 95% CI 0.1-0.3, I = 90.5%). Around 90% of recovered individuals had evidence of immunological memory to SARS-CoV-2, at 6-8 months after recovery and had a low risk of reinfection

    Diagnostic Accuracy of the Leishmania OligoC-TesT and NASBA-Oligochromatography for Diagnosis of Leishmaniasis in Sudan

    Get PDF
    The leishmaniases are a group of vector-borne diseases caused by protozoan parasites of the genus Leishmania. The parasites are transmitted by phlebotomine sand flies and can cause, depending on the infecting species, three clinical manifestations of leishmaniasis: visceral leishmaniasis (VL), post kala-azar dermal leishmaniasis (PKDL) and cutaneous leishmaniasis (CL) including the mucocutaneous form. VL, PKDL as well as CL are endemic in several parts of Sudan, and VL especially represents a major health problem in this country. Molecular tests such as the polymerase chain reaction (PCR) or nucleic acid sequence based assay (NASBA) are powerful techniques for accurate detection of the parasite in clinical specimens, but broad use is hampered by their complexity and lack of standardisation. Recently, the Leishmania OligoC-TesT and NASBA-Oligochromatography were developed as simplified and standardised PCR and NASBA formats. In this study, both tests were phase II evaluated for diagnosis of VL, PKDL and CL in Sudan

    Expanding the genetic heterogeneity of intellectual disability

    Get PDF
    Intellectual disability (ID) is a common morbid condition with a wide range of etiologies. The list of monogenic forms of ID has increased rapidly in recent years thanks to the implementation of genomic sequencing techniques. In this study, we describe the phenotypic and genetic findings of 68 families (105 patients) all with novel ID-related variants. In addition to established ID genes, including ones for which we describe unusual mutational mechanism, some of these variants represent the first confirmatory disease-gene links following previous reports (TRAK1, GTF3C3, SPTBN4 and NKX6-2), some of which were based on single families. Furthermore, we describe novel variants in 14 genes that we propose as novel candidates (ANKHD1, ASTN2, ATP13A1, FMO4, MADD, MFSD11, NCKAP1, NFASC, PCDHGA10, PPP1R21, SLC12A2, SLK, STK32C and ZFAT). We highlight MADD and PCDHGA10 as particularly compelling candidates in which we identified biallelic likely deleterious variants in two independent ID families each. We also highlight NCKAP1 as another compelling candidate in a large family with autosomal dominant mild intellectual disability that fully segregates with a heterozygous truncating variant. The candidacy of NCKAP1 is further supported by its biological function, and our demonstration of relevant expression in human brain. Our study expands the locus and allelic heterogeneity of ID and demonstrates the power of positional mapping to reveal unusual mutational mechanisms

    Characterizing the morbid genome of ciliopathies

    Get PDF
    Background Ciliopathies are clinically diverse disorders of the primary cilium. Remarkable progress has been made in understanding the molecular basis of these genetically heterogeneous conditions; however, our knowledge of their morbid genome, pleiotropy, and variable expressivity remains incomplete. Results We applied genomic approaches on a large patient cohort of 371 affected individuals from 265 families, with phenotypes that span the entire ciliopathy spectrum. Likely causal mutations in previously described ciliopathy genes were identified in 85% (225/265) of the families, adding 32 novel alleles. Consistent with a fully penetrant model for these genes, we found no significant difference in their “mutation load” beyond the causal variants between our ciliopathy cohort and a control non-ciliopathy cohort. Genomic analysis of our cohort further identified mutations in a novel morbid gene TXNDC15, encoding a thiol isomerase, based on independent loss of function mutations in individuals with a consistent ciliopathy phenotype (Meckel-Gruber syndrome) and a functional effect of its deficiency on ciliary signaling. Our study also highlighted seven novel candidate genes (TRAPPC3, EXOC3L2, FAM98C, C17orf61, LRRCC1, NEK4, and CELSR2) some of which have established links to ciliogenesis. Finally, we show that the morbid genome of ciliopathies encompasses many founder mutations, the combined carrier frequency of which accounts for a high disease burden in the study population. Conclusions Our study increases our understanding of the morbid genome of ciliopathies. We also provide the strongest evidence, to date, in support of the classical Mendelian inheritance of Bardet-Biedl syndrome and other ciliopathies
    corecore