1,257 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Nomenclature and Benchmarking Models of Text Classification Models: Contemporary Affirmation of the Recent Literature
In this paper we present automated text classification in text mining that is gaining greater relevance in various fields every day Text mining primarily focuses on developing text classification systems able to automatically classify huge volume of documents comprising of unstructured and semi structured data The process of retrieval classification and summarization simplifies extract of information by the user The finding of the ideal text classifier feature generator and distinct dominant technique of feature selection leading all other previous research has received attention from researchers of diverse areas as information retrieval machine learning and the theory of algorithms To automatically classify and discover patterns from the different types of the documents 1 techniques like Machine Learning Natural Language Processing NLP and Data Mining are applied together In this paper we review some effective feature selection researches and show the results in a table for
Securely extending and running low-code applications with C#
Low-code development platforms provide an accessible infrastructure for the
creation of software by domain experts, also called "citizen developers",
without the need for formal programming education. Development is facilitated
through graphical user interfaces, although traditional programming can still
be used to extend low-code applications, for example when external services or
complex business logic needs to be implemented that cannot be realized with the
features available on a platform. Since citizen developers are usually not
specifically trained in software development, they require additional support
when writing code, particularly with regard to security and advanced techniques
like debugging or versioning. In this thesis, several options to assist
developers of low-code applications are investigated and implemented. A
framework to quickly build code editor extensions is developed, and an approach
to leverage the Roslyn compiler platform to implement custom static code
analysis rules for low-code development platforms using the .NET platform is
demonstrated. Furthermore, a sample application showing how Roslyn can be used
to build a simple, integrated debugging tool, as well as an abstraction of the
version control system Git for easier usage by citizen developers, is
implemented. Security is a critical aspect when low-code applications are
deployed. To provide an overview over possible options to ensure the secure and
isolated execution of low-code applications, a threat model is developed and
used as the basis for a comparison between OS-level virtualization, sandboxing,
and runtime code security implementations
A teachable semi-automatic web information extraction system based on evolved regular expression patterns
This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages.
Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements
Mobile platforms and multi-mobile platform development
Mobile devices and mobile applications have a significant effect on the present and on the future of the software industry. The diversity of mobile platforms necessitates the development of the same mobile application for all major mobile platforms, which requires considerable development effort. Mobile application developers are multiplatform developers, but they prioritize the platforms, therefore, not all platforms are equally important for them. Appropriate methods, processes and tools are required to support the development in order to achieve better productivity. The main motivation of our research activity is to provide a method, which increases the development productivity and the quality of the applications and also reduces the time to market. The paper discusses our model-driven results on the field of multi-mobile platform development
Mobile Museum Guides Applications based on Knowledge Graphs
In the paper, we discuss our experience in design and development of a content consumption focused mobile
applications with data sources in the form of Linked Data by the example of developing museum guide
application for The State Russian Museum. We describe our approach to formalizing a model in a dynamictyped
programming language (JavaScript) and the way to keep it consistent. The paper contains the
description of the system’s main component: a framework for generating a model from an ontology.
Ontology-based application architecture can facilitate Domain Driven Design approach, and we demonstrate
the ways how to combine these techniques in practice. Lastly, we discuss challenges and problems we faced
during the development, then present our conclusions and future direction for exploration
- …