Search CORE

1,580 research outputs found

Relation Extraction Using Convolution Tree Kernel Expanded with Entity Features

Author: Qian Longhua
Qian Peide
Zhou Guodong
Zhu Qiaomin
Publication venue: The Korean Society for Language and Information (KSLI)
Publication date: 01/01/2007
Field of study

PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 200

Waseda University Repository

Survey on Kernel-Based Relation Extraction

Author: Choi Sung-Pil
Jung Hanmin
Lee Seungwoo
Song Sa-Kwang
Publication venue: 'IntechOpen'
Publication date: 21/11/2012
Field of study

IntechOpen

Document Layout Analysis and Recognition Systems

Author: Kosaraju Sai
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 08/11/2019
Field of study

Automatic extraction of relevant knowledge to domain-specific questions from Optical Character Recognition (OCR) documents is critical for developing intelligent systems, such as document search engines, sentiment analysis, and information retrieval, since hands-on knowledge extraction by a domain expert with a large volume of documents is intensive, unscalable, and time-consuming. There have been a number of studies that have automatically extracted relevant knowledge from OCR documents, such as ABBY and Sandford Natural Language Processing (NLP). Despite the progress, there are still limitations yet-to-be solved. For instance, NLP often fails to analyze a large document. In this thesis, we propose a knowledge extraction framework, which takes domain-specific questions as input and provides the most relevant sentence/paragraph to the given questions in the document. Overall, our proposed framework has two phases. First, an OCR document is reconstructed into a semi-structured document (a document with hierarchical structure of (sub)sections and paragraphs). Then, relevant sentence/paragraph for a given question is identified from the reconstructed semi structured document. Specifically, we proposed (1) a method that converts an OCR document into a semi structured document using text attributes such as font size, font height, and boldface (in Chapter 2), (2) an image-based machine learning method that extracts Table of Contents (TOC) to provide an overall structure of the document (in Chapter 3), (3) a document texture-based deep learning method (DoT-Net) that classifies types of blocks such as text, image, and table (in Chapter 4), and (4) a Question & Answer (Q&A) system that retrieves most relevant sentence/paragraph for a domain-specific question. A large number of document intelligent systems can benefit from our proposed automatic knowledge extraction system to construct a Q&A system for OCR documents. Our Q&A system has applied to extract domain specific information from business contracts at GE Power

DigitalCommons@Kennesaw State University

Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media

Author: Becker Hila
Flora
Ji Heng
Khandpur Rupinder P.
Lee Wenke
Li Frank
Liu Yang
Modi A.
Muthiah Sathappan
Ovelgonne Michael
Rehurek Radim
Sabottke Carl
Soska Kyle
Tanev Hristo
Weller-Fahy David J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2017
Field of study

Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner using just a limited fixed set of seed event triggers. A new query expansion strategy based on convolutional kernels and dependency parses helps model reporting structure and aids in identifying key event characteristics. Through a large-scale analysis over Twitter, we demonstrate that our approach consistently identifies and encodes events, outperforming existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201

arXiv.org e-Print Archive

Crossref

Protein-Protein Interactions Extraction from Biomedical Literatures

Author: Hongfei Lin
Yanpeng Li
Zhihao Yang
Publication venue: 'IntechOpen'
Publication date: 08/01/2011
Field of study

IntechOpen

PPI-IRO: A two-stage method for protein-protein interaction extraction based on interaction relation ontology

Author: Chen P
Li CX
Li J
Su YR
Wang RJ
Wang XJ
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2014
Field of study

Mining Protein-Protein Interactions (PPIs) from the fast-growing biomedical literature resources has been proven as an effective approach for the identifi cation of biological regulatory networks. This paper presents a novel method based on the idea of Interaction Relation Ontology (IRO), which specifi es and organises words of various proteins interaction relationships. Our method is a two-stage PPI extraction method. At fi rst, IRO is applied in a binary classifi er to determine whether sentences contain a relation or not. Then, IRO is taken to guide PPI extraction by building sentence dependency parse tree. Comprehensive and quantitative evaluations and detailed analyses are used to demonstrate the signifi cant performance of IRO on relation sentences classifi cation and PPI extraction. Our PPI extraction method yielded a recall of around 80% and 90% and an F1 of around 54% and 66% on corpora of AIMed and Bioinfer, respectively, which are superior to most existing extraction methods. Copyright © 2014 Inderscience Enterprises Ltd

OPUS - University of Technology Sydney