1,257 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Nomenclature and Benchmarking Models of Text Classification Models: Contemporary Affirmation of the Recent Literature

    Get PDF
    In this paper we present automated text classification in text mining that is gaining greater relevance in various fields every day Text mining primarily focuses on developing text classification systems able to automatically classify huge volume of documents comprising of unstructured and semi structured data The process of retrieval classification and summarization simplifies extract of information by the user The finding of the ideal text classifier feature generator and distinct dominant technique of feature selection leading all other previous research has received attention from researchers of diverse areas as information retrieval machine learning and the theory of algorithms To automatically classify and discover patterns from the different types of the documents 1 techniques like Machine Learning Natural Language Processing NLP and Data Mining are applied together In this paper we review some effective feature selection researches and show the results in a table for

    Securely extending and running low-code applications with C#

    Full text link
    Low-code development platforms provide an accessible infrastructure for the creation of software by domain experts, also called "citizen developers", without the need for formal programming education. Development is facilitated through graphical user interfaces, although traditional programming can still be used to extend low-code applications, for example when external services or complex business logic needs to be implemented that cannot be realized with the features available on a platform. Since citizen developers are usually not specifically trained in software development, they require additional support when writing code, particularly with regard to security and advanced techniques like debugging or versioning. In this thesis, several options to assist developers of low-code applications are investigated and implemented. A framework to quickly build code editor extensions is developed, and an approach to leverage the Roslyn compiler platform to implement custom static code analysis rules for low-code development platforms using the .NET platform is demonstrated. Furthermore, a sample application showing how Roslyn can be used to build a simple, integrated debugging tool, as well as an abstraction of the version control system Git for easier usage by citizen developers, is implemented. Security is a critical aspect when low-code applications are deployed. To provide an overview over possible options to ensure the secure and isolated execution of low-code applications, a threat model is developed and used as the basis for a comparison between OS-level virtualization, sandboxing, and runtime code security implementations

    A teachable semi-automatic web information extraction system based on evolved regular expression patterns

    Get PDF
    This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages. Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements

    Preface of the Proceedings of WRAP 2004

    Get PDF

    Mobile platforms and multi-mobile platform development

    Get PDF
    Mobile devices and mobile applications have a significant effect on the present and on the future of the software industry. The diversity of mobile platforms necessitates the development of the same mobile application for all major mobile platforms, which requires considerable development effort. Mobile application developers are multiplatform developers, but they prioritize the platforms, therefore, not all platforms are equally important for them. Appropriate methods, processes and tools are required to support the development in order to achieve better productivity. The main motivation of our research activity is to provide a method, which increases the development productivity and the quality of the applications and also reduces the time to market. The paper discusses our model-driven results on the field of multi-mobile platform development

    Mobile Museum Guides Applications based on Knowledge Graphs

    Get PDF
    In the paper, we discuss our experience in design and development of a content consumption focused mobile applications with data sources in the form of Linked Data by the example of developing museum guide application for The State Russian Museum. We describe our approach to formalizing a model in a dynamictyped programming language (JavaScript) and the way to keep it consistent. The paper contains the description of the system’s main component: a framework for generating a model from an ontology. Ontology-based application architecture can facilitate Domain Driven Design approach, and we demonstrate the ways how to combine these techniques in practice. Lastly, we discuss challenges and problems we faced during the development, then present our conclusions and future direction for exploration
    • …
    corecore