Search CORE

72,683 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Tracking Cyber Adversaries with Adaptive Indicators of Compromise

Author: Ching-Lung Cheung (5591246)
Danny Chan (8936)
Dino Samartzis (118537)
Jaro Karppinen (81295)
Kathryn Song-Eng Cheah (5591255)
Kazuhiro Chiba (453665)
Kenneth Man-Chee Cheung (3128883)
Pak Chung Sham (181109)
Shiro Ikegawa (106723)
Tatsuki Karasugi (5591249)
Timothy Shin-Heng Mak (5591252)
Xueya Zhou (3990752)
Yan Li (23143)
Yi-Hsiang Hsu (247531)
Yoshiharu Kawaguchi (166748)
You-Qiang Song (165465)
Publication venue
Publication date: 20/12/2017
Field of study

A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular expressions (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities. In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naive solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naive solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science & Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas, Nevada, US

arXiv.org e-Print Archive

FigShare

Tracking Cyber Adversaries with Adaptive Indicators of Compromise

Author: Aimone James B.
Cox Jonathan A.
Dixon Kevin R.
Doak Justin E.
Follett David R.
Ingram Joe B.
James Conrad D.
Mulder Sam A.
Naegle John H.
Publication venue
Publication date: 20/12/2017
Field of study

arXiv.org e-Print Archive

Crossref

A Framework for Specifying and Monitoring User Tasks

Author: Bailey Brian P.
Chang Tony Y.
Chilson Neil A.
Publication venue
Publication date: 01/09/2004
Field of study

Knowledge about user task execution can help systems better reason about when to interrupt users. To enable recognition and forecasting of task execution, we develop a novel framework for specifying and monitoring user task sequences. For task specification, our framework provides an XML-based language with tags inspired by regular expressions. For task monitoring, our framework provides an event handler that manages events from any instrumented application and a monitor that observes a user's transitions within and among specified tasks. The monitor supports multiple active tasks and multiple instances of the same task. The use of our framework will enable systems to consider a user's position within a task model when reasoning about when to interrupt

Illinois Digital Environment for Access to Learning and Scholarship Repository

Terminology Extraction for and from Communications in Multi-disciplinary Domains

Author: Ahmad Khurshid
Musacchio MARIA TERESA
Panizzon Raffaella
Zhang Xiubo
Publication venue
Publication date: 01/01/2016
Field of study

Terminology extraction generally refers to methods and systems for identifying term candidates in a uni-disciplinary and uni-lingual environment such as engineering, medical, physical and geological sciences, or administration, business and leisure. However, as human enterprises get more and more complex, it has become increasingly important for teams in one discipline to collaborate with others from not only a non-cognate discipline but also speaking a different language. Disaster mitigation and recovery, and conflict resolution are amongst the areas where there is a requirement to use standardised multilingual terminology for communication. This paper presents a feasibility study conducted to build terminology (and ontology) in the domain of disaster management and is part of the broader work conducted for the EU project Sland \ub4 ail (FP7 607691). We have evaluated CiCui (for Chinese name \ub4 \u8bcd\u8403, which translates to words gathered), a corpus-based text analytic system that combine frequency, collocation and linguistic analyses to extract candidates terminologies from corpora comprised of domain texts from diverse sources. CiCui was assessed against four terminology extraction systems and the initial results show that it has an above average precision in extracting terms

Archivio istituzionale della ricerca - Università di Trieste

Archivio istituzionale della ricerca - Università di Padova

Teaching Listening Comprehension Through Video in First-Year College Spanish

Author: Liskin-Gasparro Judith E.
Veguez Roberto A.
Publication venue: 'The University of Kansas'
Publication date: 18/02/1990
Field of study

The University of Kansas: Journals@KU

Biodiversity Informatics

Using Social Media to Promote STEM Education: Matching College Students with Role Models

Author: A Bandura
A Bandura
AB Heldman
C Merrill
CJ Owen
D Karunanayake
DM Blei
EA Ensher
G Salton
HV Emmerik
JE Lydon
K Weber
L Tsui
M Hall
P Jaccard
P Lockwood
S Metz
TA Judge
VI Levenshtein
Publication venue
Publication date: 01/07/2016
Field of study

STEM (Science, Technology, Engineering, and Mathematics) fields have become increasingly central to U.S. economic competitiveness and growth. The shortage in the STEM workforce has brought promoting STEM education upfront. The rapid growth of social media usage provides a unique opportunity to predict users' real-life identities and interests from online texts and photos. In this paper, we propose an innovative approach by leveraging social media to promote STEM education: matching Twitter college student users with diverse LinkedIn STEM professionals using a ranking algorithm based on the similarities of their demographics and interests. We share the belief that increasing STEM presence in the form of introducing career role models who share similar interests and demographics will inspire students to develop interests in STEM related fields and emulate their models. Our evaluation on 2,000 real college students demonstrated the accuracy of our ranking algorithm. We also design a novel implementation that recommends matched role models to the students.Comment: 16 pages, 8 figures, accepted by ECML/PKDD 2016, Industrial Trac

arXiv.org e-Print Archive

Crossref