751 research outputs found

    Automatic extraction of paraphrastic phrases from medium size corpora

    Full text link
    This paper presents a versatile system intended to acquire paraphrastic phrases from a representative corpus. In order to decrease the time spent on the elaboration of resources for NLP system (for example Information Extraction, IE hereafter), we suggest to use a machine learning system that helps defining new templates and associated resources. This knowledge is automatically derived from the text collection, in interaction with a large semantic network

    Context Aware Textual Entailment

    Get PDF
    In conversations, stories, news reporting, and other forms of natural language, understanding requires participants to make assumptions (hypothesis) based on background knowledge, a process called entailment. These assumptions may then be supported, contradicted, or refined as a conversation or story progresses and additional facts become known and context changes. It is often the case that we do not know an aspect of the story with certainty but rather believe it to be the case; i.e., what we know is associated with uncertainty or ambiguity. In this research a method has been developed to identify different contexts of the input raw text along with specific features of the contexts such as time, location, and objects. The method includes a two-phase SVM classifier along with a voting mechanism in the second phase to identify the contexts. Rule-based algorithms were utilized to extract the context elements. This research also develops a new context˗aware text representation. This representation maintains semantic aspects of sentences, as well as textual contexts and context elements. The method can offer both graph representation and First-Order-Logic representation of the text. This research also extracts a First-Order Logic (FOL) and XML representation of a text or series of texts. The method includes entailment using background knowledge from sources (VerbOcean and WordNet), with resolution of conflicts between extracted clauses, and handling the role of context in resolving uncertain truth

    Capturing lexical variation in MT evaluation using automatically built sense-cluster inventories

    Get PDF
    The strict character of most of the existing Machine Translation (MT) evaluation metrics does not permit them to capture lexical variation in translation. However, a central issue in MT evaluation is the high correlation that the metrics should have with human judgments of translation quality. In order to achieve a higher correlation, the identification of sense correspondences between the compared translations becomes really important. Given that most metrics are looking for exact correspondences, the evaluation results are often misleading concerning translation quality. Apart from that, existing metrics do not permit one to make a conclusive estimation of the impact of Word Sense Disambiguation techniques into MT systems. In this paper, we show how information acquired by an unsupervised semantic analysis method can be used to render MT evaluation more sensitive to lexical semantics. The sense inventories built by this data-driven method are incorporated into METEOR: they replace WordNet for evaluation in English and render METEOR’s synonymy module operable in French. The evaluation results demonstrate that the use of these inventories gives rise to an increase in the number of matches and the correlation with human judgments of translation quality, compared to precision-based metrics

    DLSITE-1: lexical analysis for solving textual entailment recognition

    Get PDF
    This paper discusses the recognition of textual entailment in a text-hypothesis pair by applying a wide variety of lexical measures. We consider that the entailment phenomenon can be tackled from three general levels: lexical, syntactic and semantic. The main goals of this research are to deal with this phenomenon from a lexical point of view, and achieve high results considering only such kind of knowledge. To accomplish this, the information provided by the lexical measures is used as a set of features for a Support Vector Machine which will decide if the entailment relation is produced. A study of the most relevant features and a comparison with the best state-of-the-art textual entailment systems is exposed throughout the paper. Finally, the system has been evaluated using the Second PASCAL Recognising Textual Entailment Challenge data and evaluation methodology, obtaining an accuracy rate of 61.88%.QALL-ME consortium, 6º Programa Marco, Unión Europea, referencia del proyecto FP6-IST-033860. Gobierno de España, proyecto CICyT número TIN2006-1526-C06-01

    Twitter Event Summarization Using Phrase Reinforcement Algorithm and NLP Features

    Get PDF
    Abstract-Now a day’s social networking sites are the fastest medium which delivers news to user as compare to the news paper and television. There so many social networking sites are present and one of them is Twitter. Twitter allows large no. of users to share/post their views, ideas on any particular event. According to recent survey daily 340 million Tweets are sent on Twitter which is on a different topic and only 4% of posts on Twitter have relevant news data. It is not possible for any human to read the posts to get meaningful information related to specific event. There is one solution to this problem i.e. we have to apply Summarization technique on it. In this paper we have used an algorithm which uses frequency count technique along with this we have also used some NLP features to summarize event specified by user. This automatic summarization algorithm handles the numerous, short, dissimilar, and noisy nature of tweets. We believe our novel approach helps users as well as researchers. DOI: 10.17762/ijritcc2321-8169.15020

    Machine translation evaluation resources and methods: a survey

    Get PDF
    We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content

    TiFi: Taxonomy Induction for Fictional Domains [Extended version]

    No full text
    Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin

    An Effective Sentence Ordering Approach For Multi-Document Summarization Using Text Entailment

    Get PDF
    With the rapid development of modern technology electronically available textual information has increased to a considerable amount. Summarization of textual inform ation manually from unstructured text sources creates overhead to the user, therefore a systematic approach is required. Summarization is an approach that focuses on providing the user with a condensed version of the origina l text but in real time applicat ions extended document summarization is required for summarizing the text from multiple documents. The main focus of multi - document summarization is sentence ordering and ranking that arranges the collected sentences from multiple document in order to gene rate a well - organized summary. The improper order of extracted sentences significantly degrades readability and understandability of the summary. The existing system does multi document summarization by combining several preference measures such as chronology, probabilistic, precedence, succession, topical closeness experts to calculate the preference value between sentences. These approach to sent ence ordering and ranking does not address context based similarity measure between sentences which is very ess ential for effective summarization. The proposed system addresses this issues through textual entailment expert system. This approach builds an entailment model which incorpo rates the cause and effect between sentences in the documents using the symmetric measure such as cosine similarity and non - symmetric measures such as unigram match, bigram match, longest common sub - sequence, skip gram match, stemming. The proposed system is efficient in providing user with a contextual summary which significantly impro ves the readability and understandability of the final coherent summa
    corecore