Search CORE

1,257 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Nomenclature and Benchmarking Models of Text Classification Models: Contemporary Affirmation of the Recent Literature

Author: Dr. E. Kesavulu Reddy
Venkata Ramana.A
Publication venue: Global Journals Inc. (US)
Publication date: 15/05/2014
Field of study

In this paper we present automated text classification in text mining that is gaining greater relevance in various fields every day Text mining primarily focuses on developing text classification systems able to automatically classify huge volume of documents comprising of unstructured and semi structured data The process of retrieval classification and summarization simplifies extract of information by the user The finding of the ideal text classifier feature generator and distinct dominant technique of feature selection leading all other previous research has received attention from researchers of diverse areas as information retrieval machine learning and the theory of algorithms To automatically classify and discover patterns from the different types of the documents 1 techniques like Machine Learning Natural Language Processing NLP and Data Mining are applied together In this paper we review some effective feature selection researches and show the results in a table for

Global Journal of Computer Science and Technology (GJCST)

Securely extending and running low-code applications with C#

Author: Brüggemann Lennart
Publication venue
Publication date: 12/07/2023
Field of study

Low-code development platforms provide an accessible infrastructure for the creation of software by domain experts, also called "citizen developers", without the need for formal programming education. Development is facilitated through graphical user interfaces, although traditional programming can still be used to extend low-code applications, for example when external services or complex business logic needs to be implemented that cannot be realized with the features available on a platform. Since citizen developers are usually not specifically trained in software development, they require additional support when writing code, particularly with regard to security and advanced techniques like debugging or versioning. In this thesis, several options to assist developers of low-code applications are investigated and implemented. A framework to quickly build code editor extensions is developed, and an approach to leverage the Roslyn compiler platform to implement custom static code analysis rules for low-code development platforms using the .NET platform is demonstrated. Furthermore, a sample application showing how Roslyn can be used to build a simple, integrated debugging tool, as well as an abstraction of the version control system Git for easier usage by citizen developers, is implemented. Security is a critical aspect when low-code applications are deployed. To provide an overview over possible options to ensure the secure and isolated execution of low-code applications, a threat model is developed and used as the basis for a comparison between OS-level virtualization, sandboxing, and runtime code security implementations

arXiv.org e-Print Archive

A teachable semi-automatic web information extraction system based on evolved regular expression patterns

Author: Nor Zainah Siau (7169549)
Publication venue
Publication date: 01/01/2014
Field of study

This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages. Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements

Loughborough University Institutional Repository

Preface of the Proceedings of WRAP 2004

Author: Thiran Philippe
Van den Heuvel Willem-Jan
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

Repository of the University of Namur

Proceedings of the Workshop on the Wrapper Techniques for Legacy Systems

Author: Thiran Philippe
van den Heuvel Willem-Jan
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

Repository of the University of Namur

Mobile platforms and multi-mobile platform development

Author: Albert István
Charaf Hassan
Ekler Péter
Forstner Bertalan
Kelényi Imre
Kővári Bence
Lengyel László
Mészáros Tamás
Publication venue
Publication date: 01/01/2014
Field of study

Mobile devices and mobile applications have a significant effect on the present and on the future of the software industry. The diversity of mobile platforms necessitates the development of the same mobile application for all major mobile platforms, which requires considerable development effort. Mobile application developers are multiplatform developers, but they prioritize the platforms, therefore, not all platforms are equally important for them. Appropriate methods, processes and tools are required to support the development in order to achieve better productivity. The main motivation of our research activity is to provide a method, which increases the development productivity and the quality of the applications and also reduces the time to market. The paper discusses our model-driven results on the field of multi-mobile platform development

University of Szeged

Mobile Museum Guides Applications based on Knowledge Graphs

Author: Muromtsev Dmitry
Zamula Dmitry
Zhukova Nataly
Publication venue: CORP – Competence Center of Urban and Regional Planning
Publication date: 01/01/2017
Field of study

In the paper, we discuss our experience in design and development of a content consumption focused mobile applications with data sources in the form of Linked Data by the example of developing museum guide application for The State Russian Museum. We describe our approach to formalizing a model in a dynamictyped programming language (JavaScript) and the way to keep it consistent. The paper contains the description of the system’s main component: a framework for generating a model from an ontology. Ontology-based application architecture can facilitate Domain Driven Design approach, and we demonstrate the ways how to combine these techniques in practice. Lastly, we discuss challenges and problems we faced during the development, then present our conclusions and future direction for exploration

REAL CORP