124,746 research outputs found

    An agent-driven semantical identifier using radial basis neural networks and reinforcement learning

    Full text link
    Due to the huge availability of documents in digital form, and the deception possibility raise bound to the essence of digital documents and the way they are spread, the authorship attribution problem has constantly increased its relevance. Nowadays, authorship attribution,for both information retrieval and analysis, has gained great importance in the context of security, trust and copyright preservation. This work proposes an innovative multi-agent driven machine learning technique that has been developed for authorship attribution. By means of a preprocessing for word-grouping and time-period related analysis of the common lexicon, we determine a bias reference level for the recurrence frequency of the words within analysed texts, and then train a Radial Basis Neural Networks (RBPNN)-based classifier to identify the correct author. The main advantage of the proposed approach lies in the generality of the semantic analysis, which can be applied to different contexts and lexical domains, without requiring any modification. Moreover, the proposed system is able to incorporate an external input, meant to tune the classifier, and then self-adjust by means of continuous learning reinforcement.Comment: Published on: Proceedings of the XV Workshop "Dagli Oggetti agli Agenti" (WOA 2014), Catania, Italy, Sepember. 25-26, 201

    Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean

    Full text link
    A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly HMM-based speech recognition and natural language processing in a {\em word level}, which is obviously inadequate for morphologically complex agglutinative languages, our model constructs a spoken language system based on a {\em morpheme-level} speech and language integration. With this integration scheme, the spoken Korean processing engine (SKOPE) is designed and implemented using a TDNN-based diphone recognition module integrated with a Viterbi-based lexical decoding and symbolic phonological/morphological co-analysis. Our experiment results show that the speaker-dependent continuous {\em eojeol} (Korean word) recognition and integrated morphological analysis can be achieved with over 80.6% success rate directly from speech inputs for the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer processing of oriental language journa

    Content enrichment through dynamic annotation

    Get PDF
    This paper describes a technique for interceding between users and the information that they browse. This facility, that we term 'dynamic annotation', affords a means of editing Web page content 'on-the-fly' between the source Web server and the requesting client. Thereby, we have a generic way of modifying the content displayed to local users by addition, removal or reorganising any information sourced from the World-Wide Web, whether this derives from local or remote pages. For some time, we have been exploring the scope for this device and we believe that it affords many potential worthwhile applications. Here, we describe two varieties of use. The first variety focuses on support for individual users in two contexts (second-language support and second language learning). The second variety of use focuses on support for groups of users. These differing applications have a common goal which is content enrichment of the materials placed before the user. Dynamic annotation provides a potent and flexible means to this end

    A cascaded approach to normalising gene mentions in biomedical literature

    Get PDF
    Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where pre-processing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%
    corecore