1,077 research outputs found

    Hybrid Approach to English-Hindi Name Entity Transliteration

    Full text link
    Machine translation (MT) research in Indian languages is still in its infancy. Not much work has been done in proper transliteration of name entities in this domain. In this paper we address this issue. We have used English-Hindi language pair for our experiments and have used a hybrid approach. At first we have processed English words using a rule based approach which extracts individual phonemes from the words and then we have applied statistical approach which converts the English into its equivalent Hindi phoneme and in turn the corresponding Hindi word. Through this approach we have attained 83.40% accuracy.Comment: Proceedings of IEEE Students' Conference on Electrical, Electronics and Computer Sciences 201

    From Semantic Search & Integration to Analytics

    Get PDF

    Adaptive Semantic Annotation of Entity and Concept Mentions in Text

    Get PDF
    The recent years have seen an increase in interest for knowledge repositories that are useful across applications, in contrast to the creation of ad hoc or application-specific databases. These knowledge repositories figure as a central provider of unambiguous identifiers and semantic relationships between entities. As such, these shared entity descriptions serve as a common vocabulary to exchange and organize information in different formats and for different purposes. Therefore, there has been remarkable interest in systems that are able to automatically tag textual documents with identifiers from shared knowledge repositories so that the content in those documents is described in a vocabulary that is unambiguously understood across applications. Tagging textual documents according to these knowledge bases is a challenging task. It involves recognizing the entities and concepts that have been mentioned in a particular passage and attempting to resolve eventual ambiguity of language in order to choose one of many possible meanings for a phrase. There has been substantial work on recognizing and disambiguating entities for specialized applications, or constrained to limited entity types and particular types of text. In the context of shared knowledge bases, since each application has potentially very different needs, systems must have unprecedented breadth and flexibility to ensure their usefulness across applications. Documents may exhibit different language and discourse characteristics, discuss very diverse topics, or require the focus on parts of the knowledge repository that are inherently harder to disambiguate. In practice, for developers looking for a system to support their use case, is often unclear if an existing solution is applicable, leading those developers to trial-and-error and ad hoc usage of multiple systems in an attempt to achieve their objective. In this dissertation, I propose a conceptual model that unifies related techniques in this space under a common multi-dimensional framework that enables the elucidation of strengths and limitations of each technique, supporting developers in their search for a suitable tool for their needs. Moreover, the model serves as the basis for the development of flexible systems that have the ability of supporting document tagging for different use cases. I describe such an implementation, DBpedia Spotlight, along with extensions that we performed to the knowledge base DBpedia to support this implementation. I report evaluations of this tool on several well known data sets, and demonstrate applications to diverse use cases for further validation

    Multi-Faceted Search and Navigation of Biological Databases

    Get PDF

    Cognitive aspects-based short text representation with named entity, concept and knowledge

    Full text link
    © 2020 by the authors. Short text is widely seen in applications including Internet of Things (IoT). The appropriate representation and classification of short text could be severely disrupted by the sparsity and shortness of short text. One important solution is to enrich short text representation by involving cognitive aspects of text, including semantic concept, knowledge, and category. In this paper, we propose a named Entity-based Concept Knowledge-Aware (ECKA) representation model which incorporates semantic information into short text representation. ECKA is a multi-level short text semantic representation model, which extracts the semantic features from the word, entity, concept and knowledge levels by CNN, respectively. Since word, entity, concept and knowledge entity in the same short text have different cognitive informativeness for short text classification, attention networks are formed to capture these category-related attentive representations from the multi-level textual features, respectively. The final multi-level semantic representations are formed by concatenating all of these individual-level representations, which are used for text classification. Experiments on three tasks demonstrate our method significantly outperforms the state-of-the-art methods

    Inter-Personal Relation Extraction Model Based on Bidirectional GRU and Attention Mechanism

    Get PDF
    Inter-Personal Relationship Extraction is an important part of knowledge extraction and is also the fundamental work of constructing the knowledge graph of people's relationships. Compared with the traditional pattern recognition methods, the deep learning methods are more prominent in the relation extraction (RE) tasks. At present, the research of Chinese relation extraction technology is mainly based on the method of kernel function and Distant Supervision. In this paper, we propose a Chinese relation extraction model based on Bidirectional GRU network and Attention mechanism. Combining with the structural characteristics of the Chinese language, the input vector is input in the form of word vectors. Aiming at the problem of context memory, a Bidirectional GRU neural network is used to fuse the input vectors. The feature information of the word level is extracted from a sentence, and the sentence feature is extracted through the Attention mechanism of the word level. To verify the feasibility of this method, we use the distant supervision method to extract data from websites and compare it with existing relationship extraction methods. The experimental results show that Bi-directional GRU with Attention mechanism model can make full use of all the feature information of sentences, and the accuracy of Bi-directional GRU model is significantly higher than that of other neural network models without Attention mechanism
    corecore