578 research outputs found

    The contribution of corpus linguistics to lexicography and the future of Tibetan dictionaries

    Get PDF
    The first alphabetized dictionary of Tibetan appeared in 1829 (cf. Bray 2008) and the intervening 184 years have witnessed the publication of scores of other Tibetan dictionaries (cf. Simon 1964). Hundreds of Tibetan dictionaries are now available; these include bilin gual dictionaries, both to and from such languages as English, French, German, Latin, Japanese, etc. and specialized dictionaries focusing on medicine, plants, dialects, archaic terms, neologisms, etc. (cf. Walter 2006, McGrath 2008). However, if one classifies Tibetan dictionaries by the methods of their compilation the accomplishments of Tibetan lexicography are less impressive. Methodologies of dictionary compilation divide heuristically into three types. First, some dictionaries lack explicit methodology; these works assemble words in an ad hoc manner and illustrate them with invented examples. Second, there are dictionaries that are compiled over very long periods of time on the basis of collections of slips recording attestations of words as used in context. Third, more recent dictionaries are compiled on the basis of electronic text corpora, which are processed computationally to aid in the precision, consistency and speed of dictionary compilation. These methods may be called respectively the 'informal method', the 'traditional method', and the 'modern method'. The overwhelming majority of Tibetan dictionaries were compiled with the informal method. Only five Tibetan dictionaries use the traditional methodology. No Tibetan dictionary yet compiled makes use of the modern method

    Sanskrit Sandhi Splitting using seq2(seq)^2

    Full text link
    In Sanskrit, small words (morphemes) are combined to form compound words through a process known as Sandhi. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing word splitting exists in the language, it is highly challenging to identify the location of the splits in a compound word. Though existing Sandhi splitting systems incorporate these pre-defined splitting rules, they have a low accuracy as the same compound word might be broken down in multiple ways to provide syntactically correct splits. In this research, we propose a novel deep learning architecture called Double Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95% accuracy, and (ii) predicts the constituent words (learning the Sandhi splitting rules) with 79.5% accuracy, outperforming the state-of-art by 20%. Additionally, we show the generalization capability of our deep learning model, by showing competitive results in the problem of Chinese word segmentation, as well.Comment: Accepted in EMNLP 201

    ANALYTICAL STANDARDIZATION OF TAMRA YOGA

    Get PDF
    Rasa Shastra is a specialized branch of Ayurveda which mainly deals with the pharmaceutics of unique and potent preparations. Tamra Yoga is an important Rasa Oushadi mentioned in Rasa Tantra Sara Va Siddha Prayoga Sangraha which contains Tamra Bhasma, Yashtimadhu, Chincha Kshara, Trikatu, Sauvarchala lavana and Hingu. Shodhana, Bhavana, Marana, Amrutikarana, Chincha Kshara nirmana and Churna nirmana are the main pharmaceutical procedures employed in the preparation of Tamra Yoga. To assess the toxicity, safety and to understand the structural and chemical composition, it was tested through various modern analytical parameters like X-ray diffraction (XRD), Scanning electron microscopy (SEM), Energy dispersive X-ray spectroscopy (EDS), Particle size analysis (PSA), Zeta Potential (ZP), UV-Spectroscopy, Fourier transform Infra-Red spectroscopy (FTIR) and Inductively Coupled Plasma – Optical Emission Spectrometry (ICP-OES). XRD of Tamra Yoga shows major peaks of KCl (Potassium Chloride), CuS (Copper Sulphide) and minor peaks of HgS (Cinnabar), NaCl (Sodium Chloride), CaS (Calcium Sulphide) and ZnP4 (Zinc Phosphide), K2Fe2O4 (Potassium Iron Oxide). SEM micrographs showed an agglomeration of crystalline irregular sharped particles; EDS analysis confirmed the significant presence of elements viz. O-27.91%, S-21.83%, Cu-26.87% and Hg- 14.29%, K- 3.46%; Particle size was found to be 337.9nm and its Zeta Potential is -12.1mV. UV- Spectrum of Tamra Yoga showed maximum absorption at 307 nm; FT-IR analysis showed 11 peaks between the wavelengths 3356.21 - 418.34 cm-1 and ICP–OES analysis revealed Potassium as main constituent in 14376.50 ppm
    corecore