Search CORE

1,292 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Academic writing for IT students

Author: Artamonova Lidiya V.
Evtushenko Tatyana G.
Shilova Tatiana V.
Publication venue: Томск : Издательство Томского государственного университета
Publication date: 01/01/2023
Field of study

This textbook is intended for Master and PhD Information Technology students (B1-C1 level of English proficiency). The instructions of how to write a research paper in English and the relevant exercises are given. The peculiarities of each section of a paper are presented. The exercises are based on real science materials taken from peer-reviewed journals. The subject area covers a wide scope of different Information Technology domains

Tomsk State University Repository

The Repeatability Experiment of SIGMOD 2008

Author: Afanasiev L. (Loredana)
Arion A.
Dittrich J.
Manegold S. (Stefan)
Manolescu I.
Polyzotis N.
Schnaitter K.
Senellart P.
Shasha D.
Zoupanos S.
Publication venue: A.C.M.
Publication date: 01/01/2008
Field of study

SIGMOD 2008 was the first database conference that offered to test submitters' programs against their data to verify the experiments published. This paper discusses the rationale for this effort, the community's reaction, our experiences, and advice for future similar efforts

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

AppMoD: Helping older adults manage mobile security with online social help

Author: BAO Lingfeng
GAO Debin
LO David
MENDEL Tamir
TOCH Eran
WAN Zhiyuan
XIA Xin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Institutional Knowledge at Singapore Management University

Monash University Research Portal

Tapjacking Threats and Mitigation Techniques for Android Applications

Author: Cooper Vanessa
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 01/05/2014
Field of study

With the increased dependency on web applications through mobile devices, malicious attack techniques have now shifted from traditional web applications running on desktop or laptop (allowing mouse click- based interactions) to mobile applications running on mobile devices (allowing touch-based interactions). Clickjacking is a type of malicious attack originating in web applications, where victims are lured to click on seemingly benign objects in web pages. However, when clicked, unintended actions are performed without the user’s knowledge. In particular, it is shown that users are lured to touch an object of an application triggering unintended actions not actually intended by victims. This new form of clickjacking on mobile devices is called tapjacking. There is little research that thoroughly investigates attacks and mitigation techniques due to tapjacking in mobile devices. In this thesis, we identify coding practices that can be helpful for software practitioners to avoid malicious attacks and define a detection techniques to prevent the consequence of malicious attacks for the end users. We first find out where tapjacking attack type falls within the broader literature of malware, in particular for Android malware. In this direction, we propose a classification of Android malware. Then, we propose a novel technique based on Kullback-Leibler Divergence (KLD) to identify possible tapjacking behavior in applications. We validate the approach with a set of benign and malicious android applications. We also implemented a prototype tool for detecting tapjacking attack symptom using the KLD based measurement. The evaluation results show that tapjacking can be detected effectively with KLD

DigitalCommons@Kennesaw State University

Achieving the Potential: The Future of Federal e-Rulemaking: A Report to Congress and the President

Author: Committee on the Status and Future of Federal e-Rulemaking (U.S.)
Farina Cynthia R.
Publication venue: Scholarship@Cornell Law: A Digital Repository
Publication date: 01/01/2008
Field of study

Federal regulations are among the most important and widely used tools for implementing the laws of the land – affecting the food we eat, the air we breathe, the safety of consumer products, the quality of the workplace, the soundness of our financial institutions, the smooth operation of our businesses, and much more. Despite the central role of rulemaking in executing public policy, both regulated entities (especially small businesses) and the general public find it extremely difficult to follow the regulatory process; actively participating in it is even harder. E-rulemaking is the use of technology (particularly, computers and the World Wide Web) to: (i) help develop proposed rules; (ii) make rulemaking materials broadly available online, along with tools for searching, analyzing, explaining and managing the information they contain; and (iii) enable more effective and diverse public participation. E-rulemaking has transformative potential to increase the comprehensibility, transparency and accountability of the regulatory process. Specifically, e-rulemaking – effectively implemented – can open the rulemaking process to a broader range of participants, offer easier access to rulemaking and implementation materials, facilitate dialogue among interested parties about policy and enforcement, enhance regulatory coordination, and help produce better decisions that lead to more effective, accepted and enforceable rules. If realized, this vision would greatly strengthen civic participation and our democratic form of government

Supporting Multi-Domain Model Management

Author: Silva Torres Weslley
Publication venue: Eindhoven University of Technology
Publication date: 21/09/2021
Field of study

Model-driven engineering has been used in different domains such as software engineering, robotics, and automotive. This approach has models as the primary artifacts, and it is expected to improve quality of system specification and design, as well as the communication among the development team. Managing models that belong to the same domain might not be a complex task because of the features provided by the available development tools. However, managing interrelated models of different domains is challenging. A robot is an example of such a multi-domain system. To develop it one might need to combine models created by experts from mechanics, electronics and software domains. These models might be created using domain specific tools of each domain, and a change in one model of one domain might impact a model from a different domain causing inconsistency in the entire system. This thesis therefore aims to facilitate the evolution of the models in this multi-domain setting. It starts with a systematic literature review in order to identify the open issues, and strategies used to manage models from different domains. We identified that making explicit the relationship between models from different domains can support the models maintenance, making it easy to recognize affected models because of a change. The following step was to investigate ways of extracting information from different engineering models that were created using different modeling notations. For this goal, we required a uniform approach that would be independent from the peculiarities of the notations. This uniform approach can only be based on elements typically present in various modeling notations, i.e., text, boxes, and lines. Thus, we investigated the suitability of optical character recognition (OCR) for extracting textual elements from models from different domains. We also identified the common errors made by the off-the-shelf OCR services, and we proposed two approaches to correct one of these errors. After that, we used name matching techniques on the textual elements extracted by OCR to identify relationships between models from different domains. To conclude, we created an infrastructure that combines all the previous elements into one single tool that can also store the relationships in a structured manner making it easier to maintain the consistency of an entire system. We evaluated it by means of an observational study with a multidisciplinary team that builds autonomous robots designed to play football

Pure OAI Repository

Achieving the Potential: The Future of Federal e-Rulemaking: A Report to Congress and the President

Author: Committee on the Status and Future of Federal e-Rulemaking (U.S.)
Farina Cynthia R.
Publication venue: Scholarship@Cornell Law: A Digital Repository
Publication date: 01/01/2008
Field of study

bepress Legal Repository

Scholarship @ Cornell Law