124,746 research outputs found
An agent-driven semantical identifier using radial basis neural networks and reinforcement learning
Due to the huge availability of documents in digital form, and the deception
possibility raise bound to the essence of digital documents and the way they
are spread, the authorship attribution problem has constantly increased its
relevance. Nowadays, authorship attribution,for both information retrieval and
analysis, has gained great importance in the context of security, trust and
copyright preservation. This work proposes an innovative multi-agent driven
machine learning technique that has been developed for authorship attribution.
By means of a preprocessing for word-grouping and time-period related analysis
of the common lexicon, we determine a bias reference level for the recurrence
frequency of the words within analysed texts, and then train a Radial Basis
Neural Networks (RBPNN)-based classifier to identify the correct author. The
main advantage of the proposed approach lies in the generality of the semantic
analysis, which can be applied to different contexts and lexical domains,
without requiring any modification. Moreover, the proposed system is able to
incorporate an external input, meant to tune the classifier, and then
self-adjust by means of continuous learning reinforcement.Comment: Published on: Proceedings of the XV Workshop "Dagli Oggetti agli
Agenti" (WOA 2014), Catania, Italy, Sepember. 25-26, 201
Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean
A new tightly coupled speech and natural language integration model is
presented for a TDNN-based continuous possibly large vocabulary speech
recognition system for Korean. Unlike popular n-best techniques developed for
integrating mainly HMM-based speech recognition and natural language processing
in a {\em word level}, which is obviously inadequate for morphologically
complex agglutinative languages, our model constructs a spoken language system
based on a {\em morpheme-level} speech and language integration. With this
integration scheme, the spoken Korean processing engine (SKOPE) is designed and
implemented using a TDNN-based diphone recognition module integrated with a
Viterbi-based lexical decoding and symbolic phonological/morphological
co-analysis. Our experiment results show that the speaker-dependent continuous
{\em eojeol} (Korean word) recognition and integrated morphological analysis
can be achieved with over 80.6% success rate directly from speech inputs for
the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer
processing of oriental language journa
Content enrichment through dynamic annotation
This paper describes a technique for interceding between users and the information that they browse. This facility, that we term 'dynamic annotation', affords a means of editing Web page content 'on-the-fly' between the source Web server and the requesting client. Thereby, we have a generic way of modifying the content displayed to local users by addition, removal or reorganising any information sourced from the World-Wide Web, whether this derives from local or remote pages. For some time, we have been exploring the scope for this device and we believe that it affords many potential worthwhile applications. Here, we describe two varieties of use. The first variety focuses on support for individual users in two contexts (second-language support and second language learning). The second variety of use focuses on support for groups of users. These differing applications have a common goal which is content enrichment of the materials placed before the user. Dynamic annotation provides a potent and flexible means to this end
A cascaded approach to normalising gene mentions in biomedical literature
Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where pre-processing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%
- …