16,349 research outputs found

    Applying digital content management to support localisation

    Get PDF
    The retrieval and presentation of digital content such as that on the World Wide Web (WWW) is a substantial area of research. While recent years have seen huge expansion in the size of web-based archives that can be searched efficiently by commercial search engines, the presentation of potentially relevant content is still limited to ranked document lists represented by simple text snippets or image keyframe surrogates. There is expanding interest in techniques to personalise the presentation of content to improve the richness and effectiveness of the user experience. One of the most significant challenges to achieving this is the increasingly multilingual nature of this data, and the need to provide suitably localised responses to users based on this content. The Digital Content Management (DCM) track of the Centre for Next Generation Localisation (CNGL) is seeking to develop technologies to support advanced personalised access and presentation of information by combining elements from the existing research areas of Adaptive Hypermedia and Information Retrieval. The combination of these technologies is intended to produce significant improvements in the way users access information. We review key features of these technologies and introduce early ideas for how these technologies can support localisation and localised content before concluding with some impressions of future directions in DCM

    EUSMT: incorporating linguistic information to SMT for a morphologically rich language. Its use in SMT-RBMT-EBMT hybridation

    Get PDF
    148 p.: graf.This thesis is defined in the framework of machine translation for Basque. Having developed a Rule-Based Machine Translation (RBMT) system for Basque in the IXA group (Mayor, 2007), we decided to tackle the Statistical Machine Translation (SMT) approach and experiment on how we could adapt it to the peculiarities of the Basque language. First, we analyzed the impact of the agglutinative nature of Basque and the best way to deal with it. In order to deal with the problems presented above, we have split up Basque words into the lemma and some tags which represent the morphological information expressed by the inflection. By dividing each Basque word in this way, we aim to reduce the sparseness produced by the agglutinative nature of Basque and the small amount of training data. Similarly, we also studied the differences in word order between Spanish and Basque, examining different techniques for dealing with them. we confirm the weakness of the basic SMT in dealing with great word order differences in the source and target languages. Distance-based reordering, which is the technique used by the baseline system, does not have enough information to properly handle great word order differences, so any of the techniques tested in this work (based on both statistics and manually generated rules) outperforms the baseline. Once we had obtained a more accurate SMT system, we started the first attempts to combine different MT systems into a hybrid one that would allow us to get the best of the different paradigms. The hybridization attempts carried out in this PhD dissertation are preliminaries, but, even so, this work can help us to determine the ongoing steps. This thesis is defined in the framework of machine translation for Basque. Having developed a Rule-Based Machine Translation (RBMT) system for Basque in the IXA group (Mayor, 2007), we decided to tackle the Statistical Machine Translation (SMT) approach and experiment on how we could adapt it to the peculiarities of the Basque language. First, we analyzed the impact of the agglutinative nature of Basque and the best way to deal with it. In order to deal with the problems presented above, we have split up Basque words into the lemma and some tags which represent the morphological information expressed by the inflection. By dividing each Basque word in this way, we aim to reduce the sparseness produced by the agglutinative nature of Basque and the small amount of training data. Similarly, we also studied the differences in word order between Spanish and Basque, examining different techniques for dealing with them. we confirm the weakness of the basic SMT in dealing with great word order differences in the source and target languages. Distance-based reordering, which is the technique used by the baseline system, does not have enough information to properly handle great word order differences, so any of the techniques tested in this work (based on both statistics and manually generated rules) outperforms the baseline. Once we had obtained a more accurate SMT system, we started the first attempts to combine different MT systems into a hybrid one that would allow us to get the best of the different paradigms. The hybridization attempts carried out in this PhD dissertation are preliminaries, but, even so, this work can help us to determine the ongoing steps.Eusko Jaurlaritzaren ikertzaileak prestatzeko beka batekin (BFI05.326)eginda

    Entropy involved in fidelity of DNA replication

    Get PDF
    Information has an entropic character which can be analyzed within the Statistical Theory in molecular systems. R. Landauer and C.H. Bennett showed that a logical copy can be carried out in the limit of no dissipation if the computation is performed sufficiently slowly. Structural and recent single-molecule assays have provided dynamic details of polymerase machinery with insight into information processing. We introduce a rigorous characterization of Shannon Information in biomolecular systems and apply it to DNA replication in the limit of no dissipation. Specifically, we devise an equilibrium pathway in DNA replication to determine the entropy generated in copying the information from a DNA template in the absence of friction. Both the initial state, the free nucleotides randomly distributed in certain concentrations, and the final state, a polymerized strand, are mesoscopic equilibrium states for the nucleotide distribution. We use empirical stacking free energies to calculate the probabilities of incorporation of the nucleotides. The copied strand is, to first order of approximation, a state of independent and non-indentically distributed random variables for which the nucleotide that is incorporated by the polymerase at each step is dictated by the template strand, and to second order of approximation, a state of non-uniformly distributed random variables with nearest-neighbor interactions for which the recognition of secondary structure by the polymerase in the resultant double-stranded polymer determines the entropy of the replicated strand. Two incorporation mechanisms arise naturally and their biological meanings are explained. It is known that replication occurs far from equilibrium and therefore the Shannon entropy here derived represents an upper bound for replication to take place. Likewise, this entropy sets a universal lower bound for the copying fidelity in replication.Comment: 25 pages, 5 figure

    Introduction to the special issue on cross-language algorithms and applications

    Get PDF
    With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

    Computing vs. Genetics

    Get PDF
    This chapter first presents the interrelations between computing and genetics, which both are based on information and, particularly, self-reproducing artificial systems. It goes on to examine genetic code from a computational viewpoint. This raises a number of important questions about genetic code. These questions are stated in the form of an as yet unpublished working hypothesis. This hypothesis suggests that many genetic alterations are caused by the last base of certain codons. If this conclusive hypothesis were to be confirmed through experiementation if would be a significant advance for treating many genetic diseases

    Review on DNA Cryptography

    Get PDF
    Cryptography is the science that secures data and communication over the network by applying mathematics and logic to design strong encryption methods. In the modern era of e-business and e-commerce the protection of confidentiality, integrity and availability (CIA triad) of stored information as well as of transmitted data is very crucial. DNA molecules, having the capacity to store, process and transmit information, inspires the idea of DNA cryptography. This combination of the chemical characteristics of biological DNA sequences and classical cryptography ensures the non-vulnerable transmission of data. In this paper we have reviewed the present state of art of DNA cryptography.Comment: 31 pages, 12 figures, 6 table

    MicroRNAs in the stressed heart: Sorting the signal from the noise

    Get PDF
    The short noncoding RNAs, known as microRNAs, are of undisputed importance in cellular signaling during differentiation and development, and during adaptive and maladaptive responses of adult tissues, including those that comprise the heart. Cardiac microRNAs are regulated by hemodynamic overload resulting from exercise or hypertension, in the response of surviving myocardium to myocardial infarction, and in response to environmental or systemic disruptions to homeostasis, such as those arising from diabetes. A large body of work has explored microRNA responses in both physiological and pathological contexts but there is still much to learn about their integrated actions on individual mRNAs and signaling pathways. This review will highlight key studies of microRNA regulation in cardiac stress and suggest possible approaches for more precise identification of microRNA targets, with a view to exploiting the resulting data for therapeutic purposes
    • 

    corecore