Search CORE

75,658 research outputs found

Building a Generation Knowledge Source using Internet-Accessible Newswire

Author: McKeown Kathleen R.
Radev Dragomir R.
Publication venue
Publication date: 01/01/1997
Field of study

In this paper, we describe a method for automatic creation of a knowledge source for text generation using information extraction over the Internet. We present a prototype system called PROFILE which uses a client-server architecture to extract noun-phrase descriptions of entities such as people, places, and organizations. The system serves two purposes: as an information extraction tool, it allows users to search for textual descriptions of entities; as a utility to generate functional descriptions (FD), it is used in a functional-unification based generation system. We present an evaluation of the approach and its applications to natural language generation and summarization.Comment: 8 pages, uses eps

arXiv.org e-Print Archive

CiteSeerX

Crossref

Columbia University Academic Commons

Document Filtering for Long-tail Entities

Author: Allan J.
Balog K.
Banerjee S.
Boschee E.
Cano I.
Dietz L.
Doddington G. R.
Fader A.
Frank J. R.
Frank J. R.
Gebremeskel G. G.
Jiang J.
Li P.
Liu X.
Pantel P.
Reinanda R.
Wang J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Filtering relevant documents with respect to entities is an essential task in the context of knowledge base construction and maintenance. It entails processing a time-ordered stream of documents that might be relevant to an entity in order to select only those that contain vital information. State-of-the-art approaches to document filtering for popular entities are entity-dependent: they rely on and are also trained on the specifics of differentiating features for each specific entity. Moreover, these approaches tend to use so-called extrinsic information such as Wikipedia page views and related entities which is typically only available only for popular head entities. Entity-dependent approaches based on such signals are therefore ill-suited as filtering methods for long-tail entities. In this paper we propose a document filtering method for long-tail entities that is entity-independent and thus also generalizes to unseen or rarely seen entities. It is based on intrinsic features, i.e., features that are derived from the documents in which the entities are mentioned. We propose a set of features that capture informativeness, entity-saliency, and timeliness. In particular, we introduce features based on entity aspect similarities, relation patterns, and temporal expressions and combine these with standard features for document filtering. Experiments following the TREC KBA 2014 setup on a publicly available dataset show that our model is able to improve the filtering performance for long-tail entities over several baselines. Results of applying the model to unseen entities are promising, indicating that the model is able to learn the general characteristics of a vital document. The overall performance across all entities---i.e., not just long-tail entities---improves upon the state-of-the-art without depending on any entity-specific training data.Comment: CIKM2016, Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities

Author: Radev Dragomir R.
Publication venue
Publication date: 01/01/1998
Field of study

This paper presents the results of a study on the semantic constraints imposed on lexical choice by certain contextual indicators. We show how such indicators are computed and how correlations between them and the choice of a noun phrase description of a named entity can be automatically established using supervised learning. Based on this correlation, we have developed a technique for automatic lexical choice of descriptions of entities in text generation. We discuss the underlying relationship between the pragmatics of choosing an appropriate description that serves a specific purpose in the automatically generated text and the semantics of the description itself. We present our work in the framework of the more general concept of reuse of linguistic structures that are automatically extracted from large corpora. We present a formal evaluation of our approach and we conclude with some thoughts on potential applications of our method.Comment: 7 pages, uses colacl.sty and acl.bst, uses epsfig. To appear in the Proceedings of the Joint 17th International Conference on Computational Linguistics 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL'98

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

WISER: A Semantic Approach for Expert Finding in Academia based on Entity Linking

Author: Cifariello Paolo
Ferragina Paolo
Ponza Marco
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

We present WISER, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking. WISER indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the author's publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to author's expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia. At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the author's documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the author's expertise and the query topic via the above graph-based profiles. The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that WISER achieves better performance than all the other competitors, thus proving the effectiveness of modelling author's profile via our "semantic" graph of entities. Finally, we comment on the use of WISER for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Together we stand, Together we fall, Together we win: Dynamic Team Formation in Massive Open Online Courses

Author: Sinha Tanmay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/04/2014
Field of study

Massive Open Online Courses (MOOCs) offer a new scalable paradigm for e-learning by providing students with global exposure and opportunities for connecting and interacting with millions of people all around the world. Very often, students work as teams to effectively accomplish course related tasks. However, due to lack of face to face interaction, it becomes difficult for MOOC students to collaborate. Additionally, the instructor also faces challenges in manually organizing students into teams because students flock to these MOOCs in huge numbers. Thus, the proposed research is aimed at developing a robust methodology for dynamic team formation in MOOCs, the theoretical framework for which is grounded at the confluence of organizational team theory, social network analysis and machine learning. A prerequisite for such an undertaking is that we understand the fact that, each and every informal tie established among students offers the opportunities to influence and be influenced. Therefore, we aim to extract value from the inherent connectedness of students in the MOOC. These connections carry with them radical implications for the way students understand each other in the networked learning community. Our approach will enable course instructors to automatically group students in teams that have fairly balanced social connections with their peers, well defined in terms of appropriately selected qualitative and quantitative network metrics.Comment: In Proceedings of 5th IEEE International Conference on Application of Digital Information & Web Technologies (ICADIWT), India, February 2014 (6 pages, 3 figures

arXiv.org e-Print Archive

Crossref

A Formal Approach to Exploiting Multi-Stage Attacks based on File-System Vulnerabilities of Web Applications (Extended Version)

Author: A Armando
A Doupé
D Dolev
F Meo De
M Rocchetto
Publication venue
Publication date: 10/05/2017
Field of study

Web applications require access to the file-system for many different tasks. When analyzing the security of a web application, secu- rity analysts should thus consider the impact that file-system operations have on the security of the whole application. Moreover, the analysis should take into consideration how file-system vulnerabilities might in- teract with other vulnerabilities leading an attacker to breach into the web application. In this paper, we first propose a classification of file- system vulnerabilities, and then, based on this classification, we present a formal approach that allows one to exploit file-system vulnerabilities. We give a formal representation of web applications, databases and file- systems, and show how to reason about file-system vulnerabilities. We also show how to combine file-system vulnerabilities and SQL-Injection vulnerabilities for the identification of complex, multi-stage attacks. We have developed an automatic tool that implements our approach and we show its efficiency by discussing several real-world case studies, which are witness to the fact that our tool can generate, and exploit, complex attacks that, to the best of our knowledge, no other state-of-the-art-tool for the security of web applications can find

arXiv.org e-Print Archive

Crossref