69 research outputs found

    The design and construction of the 50 million words KSUCCA

    Get PDF
    In this paper, we report the design and construction of King Saud University Corpus of Classical Arabic (KSUCCA), which is part of ongoing research that attempts to study the meanings of words used in the holy Quran, through analysis of their distributional semantics in contemporaneous texts. The holy Quranic text was revealed in pure Classical Arabic, which forms the basis of Arabic linguistic theory and which is well understood by the educated Arabic reader. Therefore, it is necessary to investigate the distributional lexical semantics of the Quran's words in the light of similar texts (corpus) that are written in pure Classical Arabic. To the best of our knowledge, there exist only two corpora of Classical Arabic; one is part of the King Abdulaziz City for Science and Technology Arabic Corpus (KACST Arabic Corpus) and the other is the Classical Arabic Corpus (CAC) (Elewa, 2009). However, neither of the two corpora is adequate for our research; the former does not cover many genres such as: Linguistics, Literature, Science, Sociology and Biography; and it only contains 17+ million words, so it is not very large. While the latter is even smaller with only 5 million words. Therefore, we made an effort to carefully design and compose our own corpus bearing in mind that it should be large enough, balanced, and representative so that any result obtained from it can be generalized for Classical Arabic. In addition, we tried to make the design general enough in order to make the corpus also appropriate for other research

    An empirical study on the Holy Quran based on a large classical Arabic corpus

    Get PDF
    Distributional semantics is one of the empirical approaches to natural language processing and acquisition, which is mainly concerned by modeling word meaning using words distribution statistics gathered from huge corpora. Many distributional semantic models are available in the literature, but none of them have been applied so far to the Quran nor to Classical Arabic in general. This paper reports the construction of a very large corpus of Classical Arabic that will be used as a base to study distributional lexical semantics of the Quran and Classical Arabic. It also reports the results of two empirical studies; the first is applying a number of probabilistic distributional semantic models to automatically identify lexical collocations in the Quran and the other is applying those same models on the Classical Arabic corpus in an attempt to test their ability of capturing lexical collocations and co occurrences for a number of the corpus words. Results show that the MI.log_freq association measure achieved the highest results in extracting significant co-occurrences and collocations from small and large Classical Arabic corpora, while mutual information association measure achieved the worst results

    KSUCCA: a key to exploring Arabic historical linguistics

    Get PDF
    Classical Arabic forms the basis of Arabic linguistic theory and it is well understood by the educated Arabic reader. It is different in many ways from Modern Standard Arabic which is more simplified in its lexical, syntactic, morphological, phraseological and semantic structure. King Saud University Corpus of Classical Arabic is a pioneering corpus of around 50 million words of Classical Arabic. It is initially constructed for the purpose of studying distributional lexical semantics of the Quran and Classical Arabic, however, it is designed in a general way making it also appropriate for other researches in Linguistics and Computational Linguistics. In this paper, we will briefly describe the structure of our corpus, and then we will demonstrate how it can be used to depict some aspect of Arabic language change between the classical and the modern periods

    Thymoquinone and curcumin modify inducible nitric oxide synthase, caspase 3, and thioredoxin immunohistochemical expression in acetaminophen hepatotoxicity

    Get PDF
    Background: Acetaminophen (APAP) hepatotoxicity is characterised by an extensive oxidative stress due to depletion of glutathione (GSH), which results in massive lipid peroxidation and subsequent liver injury. The current paradigm suggests that mitochondria are the main source of reactive oxygen species (ROS), which impair mitochondrial function and are responsible for cell signalling resulting in cell death. This study was designed to compare the potential impact of thymoquinone (THQ), and/or curcumin (CURC) on liver injury induced by APAP toxicity in rats. Materials and methods: Serum levels of alanine transaminase, aspartate transaminase, total bilirubin, and total protein were measured. In addition, liver nitric oxide (NO), malondialdehyde, reduced glutathione (GSH), and superoxide dismutase (SOD) were estimated. Moreover, these biochemical parameters were confirmed by histopathological and immunohistochemical investigations for the expression of thioredoxin, iNOS and caspase 3. Results: Acetaminophen toxicity elevated most of the above-mentioned parameters but decreased GSH, SOD, and total protein levels. Histologically, liver sections demonstrated liver injury characterised by hepatocellular necrosis with nuclear pyknosis, karyorrhexis and karyolysis. Immunohistochemical study revealed increased expression of iNOS and caspase 3 proteins, while the thioredoxin protein expression was decreased. Conclusions: Treatment with the THQ and CURC regulated the biochemical and histopathological alterations induced by APAP toxicity. It was concluded that the combination strategy of THQ and CURC might be considered as a potential antidote in combating liver injury induced by APAP with minimal side effects

    Implementation of Fourier transform infrared spectroscopy for the rapid typing of uropathogenic Escherichia coli.

    Get PDF
    In this paper, we demonstrate that Fourier transform infrared (FT-IR) spectroscopy is able to discriminate rapidly between uropathogenic Escherichia coli (UPEC) of key lineages with only relatively simple sample preparation. A total of 95 bacteria from six different epidemiologically important multilocus sequence types (ST10, ST69, ST95, ST73, ST127 and ST131) were used in this project and principal component-discriminant function analysis (PC-DFA) of these samples produced clear separate clustering of isolates, based on the ST. Analysis of data using partial least squares-discriminant analysis (PLS-DA), incorporating cross-validation, indicated a high prediction accuracy of 91.19% for ST131. These results suggest that FT-IR spectroscopy could be a useful method for the rapid identification of members of important UPEC STs

    Phenotypic Signatures Arising from Unbalanced Bacterial Growth

    Get PDF
    Fluctuations in the growth rate of a bacterial culture during unbalanced growth are generally considered undesirable in quantitative studies of bacterial physiology. Under well-controlled experimental conditions, however, these fluctuations are not random but instead reflect the interplay between intra-cellular networks underlying bacterial growth and the growth environment. Therefore, these fluctuations could be considered quantitative phenotypes of the bacteria under a specific growth condition. Here, we present a method to identify “phenotypic signatures” by time-frequency analysis of unbalanced growth curves measured with high temporal resolution. The signatures are then applied to differentiate amongst different bacterial strains or the same strain under different growth conditions, and to identify the essential architecture of the gene network underlying the observed growth dynamics. Our method has implications for both basic understanding of bacterial physiology and for the classification of bacterial strains

    Genomic investigations of unexplained acute hepatitis in children

    Get PDF
    Since its first identification in Scotland, over 1,000 cases of unexplained paediatric hepatitis in children have been reported worldwide, including 278 cases in the UK1. Here we report an investigation of 38 cases, 66 age-matched immunocompetent controls and 21 immunocompromised comparator participants, using a combination of genomic, transcriptomic, proteomic and immunohistochemical methods. We detected high levels of adeno-associated virus 2 (AAV2) DNA in the liver, blood, plasma or stool from 27 of 28 cases. We found low levels of adenovirus (HAdV) and human herpesvirus 6B (HHV-6B) in 23 of 31 and 16 of 23, respectively, of the cases tested. By contrast, AAV2 was infrequently detected and at low titre in the blood or the liver from control children with HAdV, even when profoundly immunosuppressed. AAV2, HAdV and HHV-6 phylogeny excluded the emergence of novel strains in cases. Histological analyses of explanted livers showed enrichment for T cells and B lineage cells. Proteomic comparison of liver tissue from cases and healthy controls identified increased expression of HLA class 2, immunoglobulin variable regions and complement proteins. HAdV and AAV2 proteins were not detected in the livers. Instead, we identified AAV2 DNA complexes reflecting both HAdV-mediated and HHV-6B-mediated replication. We hypothesize that high levels of abnormal AAV2 replication products aided by HAdV and, in severe cases, HHV-6B may have triggered immune-mediated hepatic disease in genetically and immunologically predisposed children

    Genomic investigations of unexplained acute hepatitis in children

    Get PDF
    Since its first identification in Scotland, over 1,000 cases of unexplained paediatric hepatitis in children have been reported worldwide, including 278 cases in the UK1. Here we report an investigation of 38 cases, 66 age-matched immunocompetent controls and 21 immunocompromised comparator participants, using a combination of genomic, transcriptomic, proteomic and immunohistochemical methods. We detected high levels of adeno-associated virus 2 (AAV2) DNA in the liver, blood, plasma or stool from 27 of 28 cases. We found low levels of adenovirus (HAdV) and human herpesvirus 6B (HHV-6B) in 23 of 31 and 16 of 23, respectively, of the cases tested. By contrast, AAV2 was infrequently detected and at low titre in the blood or the liver from control children with HAdV, even when profoundly immunosuppressed. AAV2, HAdV and HHV-6 phylogeny excluded the emergence of novel strains in cases. Histological analyses of explanted livers showed enrichment for T cells and B lineage cells. Proteomic comparison of liver tissue from cases and healthy controls identified increased expression of HLA class 2, immunoglobulin variable regions and complement proteins. HAdV and AAV2 proteins were not detected in the livers. Instead, we identified AAV2 DNA complexes reflecting both HAdV-mediated and HHV-6B-mediated replication. We hypothesize that high levels of abnormal AAV2 replication products aided by HAdV and, in severe cases, HHV-6B may have triggered immune-mediated hepatic disease in genetically and immunologically predisposed children

    An Arabic Semantic Parser and Meaning Analyzer

    No full text
    Arabic language is very rich in derivations, vocabulary, and grammatical structures. The problem of determining the correct meaning of a word in a non-vowelized Arabic sentence is not a trivial task since Arabic is very rich in the polysemy phenomena. This paper attempts to reveal the word sense ambiguity, by building a semantic parser powered by a statistical semantic analyzer, which may aid in the improvement of machine translation, question answering and other Arabic NLP systems. Building the parser was done in three steps. The first step was to acquire the grammatical rules for Arabic that was covered in an Arabic grammar textbook, and develop constraints that aided in revealing part of the parsing ambiguity. The grammar and the constraints were then written in an XML format to make them readable and available for future uses. The second step was to build the semantic parser that assigns grammatical structure onto input sentence. The final step was to impose a semantically statistical technique on the resulting grammatical structures to determine the most accurate structure, the one that result in resolving the word sense ambiguity, and determining the most accurate meaning of the word
    corecore