505 research outputs found

    NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.

    Full text link
    This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent and readable extractive summaries of past research on the topic. In addition to summarizing prior research, good survey articles should also forecast future trends. With this motivation, we present work on forecasting future impact of scientific publications using NLP driven features.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113407/1/rahuljha_1.pd

    Studying, developing, and experimenting contextual advertising systems

    Get PDF
    The World Wide Web has grown so fast in the last decade and it is today a vital daily part of people. The Internet is used for many purposes by an ever growing number of users, mostly for daily activities, tasks, and services. To face the needs of users, an efficient and effective access to information is required. To deal with this task, the adoption of Information Retrieval and Information Filtering techniques is continuously growing. Information Re-trieval (IR) is the field concerned with searching for documents, information within documents, and metadata about documents, as well as searching for structured storage, relational databases, and the World Wide Web. Infor- mation Filtering deals with the problem of selecting relevant information for a given user, according to her/his preferences and interest. Nowadays, Web advertising is one of the major sources of income for a large number of websites. Its main goal is to suggest products and services to the still ever growing population of Internet users. Web advertising is aimed at suggesting products and services to the users. A significant part of Web ad-vertising consists of textual ads, the ubiquitous short text messages usually marked as sponsored links. There are two primary channels for distributing ads: Sponsored Search (or Paid Search Advertising) and Contextual Ad-vertising (or Content Match). Sponsored Search advertising is the task of displaying ads on the page returned from a Web search engine following a query. Contextual Advertising (CA) displays ads within the content of a generic, third party, webpage. In this thesis I study, develop, and evaluated novel solutions in the field of Contextual Advertising. In particular, I studied and developed novel text summarization techniques, I adopted a novel semantic approach, I studied and adopted collaborative approaches, I started a conjunct study of Contex-tual Advertising and Geo-Localization, and I study the task of advertising in the field of Multi-Modal Aggregation. The thesis is organized as follows. In Chapter 1, we briefly describe the main aspects of Information Retrieval. Following, the Chapter 2 shows the problem of Contextual Advertising and describes the main contributes of the literature. Chapter 3 sketches a typical adopted approach and the eval-uation metrics of a Contextual Advertising system. Chapter 4 is related to the syntactic aspects, and its focus is on text summarization. In Chapter 5 the semantic aspects are taken into account, and a novel approach based on ConceptNet is proposed. Chapter 6 proposes a novel view of CA by the adoption of a collaborative filtering approach. Chapter 7 shows a prelim-inary study of Geo Location, performed in collaboration with the Yahoo! Research center in Barcelona. The target is to study several techniques of suggesting localized advertising in the field of mobile applications and search engines. In Chapter 8 is shown a joint work with the RAI Centre for Research and Technological Innovation. The main goal is to study and propose a system of advertising for Multimodal Aggregation data. Chapter 9 ends this work with conclusions and future directions

    Neural Graph Transfer Learning in Natural Language Processing Tasks

    Get PDF
    Natural language is essential in our daily lives as we rely on languages to communicate and exchange information. A fundamental goal for natural language processing (NLP) is to let the machine understand natural language to help or replace human experts to mine knowledge and complete tasks. Many NLP tasks deal with sequential data. For example, a sentence is considered as a sequence of works. Very recently, deep learning-based language models (i.e.,BERT \citep{devlin2018bert}) achieved significant improvement in many existing tasks, including text classification and natural language inference. However, not all tasks can be formulated using sequence models. Specifically, graph-structured data is also fundamental in NLP, including entity linking, entity classification, relation extraction, abstractive meaning representation, and knowledge graphs \citep{santoro2017simple,hamilton2017representation,kipf2016semi}. In this scenario, BERT-based pretrained models may not be suitable. Graph Convolutional Neural Network (GCN) \citep{kipf2016semi} is a deep neural network model designed for graphs. It has shown great potential in text classification, link prediction, question answering and so on. This dissertation presents novel graph models for NLP tasks, including text classification, prerequisite chain learning, and coreference resolution. We focus on different perspectives of graph convolutional network modeling: for text classification, a novel graph construction method is proposed which allows interpretability for the prediction; for prerequisite chain learning, we propose multiple aggregation functions that utilize neighbors for better information exchange; for coreference resolution, we study how graph pretraining can help when labeled data is limited. Moreover, an important branch is to apply pretrained language models for the mentioned tasks. So, this dissertation also focuses on the transfer learning method that generalizes pretrained models to other domains, including medical, cross-lingual, and web data. Finally, we propose a new task called unsupervised cross-domain prerequisite chain learning, and study novel graph-based methods to transfer knowledge over graphs

    Text Summarization Using Semantic Technique

    Get PDF
    This work proposes semantic technique as a new approach for text summarization of online news/ journal articles. This text summarization project contains two part: parsing and semantic analysis. At first, for the parsing, we set up criteria to evaluate importance of sentences within text such as position, length. According to this, only sentences with high score of importance will be selected. The number of sentence selected depends on how much compact that users expect summary output would be. System combines those sentences into summary draft. Second part is semantic analysis; the technique we use here is lexical semantic approach. During this part, system will use Word Net lexical database to analyze words within summary draft. This database had words linked through semantic relations such as synonym, antonym, hyponymy, and more. Our project would end up by evaluation process. We evaluate accuracy of summary output, also compare and find differences between human summary work and system result.

    Research on the automatic construction of the resource space model for scientific literature

    Get PDF
    The resource space model is a semantic data model to organize Web resources based on a classification of resources. The scientific resource space is an application of the resource space model on massive scientific literature resources. The construction of a scientific resource space needs to build a category (or concept) hierarchy and classify resources. Manual design suffers from heavy workload and low efficiency. In this thesis, we propose novel methods to solve the following two problems in the construction of a scientific resource space: 1. Automatic maintenance of a category hierarchy. A category hierarchy needs to evolve dynamically with new resources continually arriving so as to satisfy the dynamic re-quirements of the organization and management of resources. We propose an automatic maintenance approach to modifying the category hierarchy according to the hierarchical clustering of resources and show the effectiveness of this method by a series of comparison experiments on multiple datasets. 2. Automatic construction of a concept hierarchy. We propose a joint extraction model based on a deep neural network to extract entities and relations from scientific articles and build a concept hierarchy. Experimental results show the effectiveness of the joint model on the Semeval 2017 Task 10 dataset. We also implement a prototype system of the scientific resource space. The prototype system enables the comparative summarization on scientific articles. A set of novel comparative summarization methods based on the differential topic models (dTM) are proposed in this thesis. The effectiveness of the dTM-based methods is shown by a series of experimental results
    • …
    corecore