18,698 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Extracting corpus specific knowledge bases from Wikipedia

    Get PDF
    Thesauri are useful knowledge structures for assisting information retrieval. Yet their production is labor-intensive, and few domains have comprehensive thesauri that cover domain-specific concepts and contemporary usage. One approach, which has been attempted without much success for decades, is to seek statistical natural language processing algorithms that work on free text. Instead, we propose to replace costly professional indexers with thousands of dedicated amateur volunteers--namely, those that are producing Wikipedia. This vast, open encyclopedia represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide WikiSauri: manually-defined yet inexpensive thesaurus structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We also offer concrete evidence of the effectiveness of WikiSauri for assisting information retrieval

    Emerging technologies for learning report (volume 3)

    Get PDF

    Integration of BPM systems

    Get PDF
    New technologies have emerged to support the global economy where for instance suppliers, manufactures and retailers are working together in order to minimise the cost and maximise efficiency. One of the technologies that has become a buzz word for many businesses is business process management or BPM. A business process comprises activities and tasks, the resources required to perform each task, and the business rules linking these activities and tasks. The tasks may be performed by human and/or machine actors. Workflow provides a way of describing the order of execution and the dependent relationships between the constituting activities of short or long running processes. Workflow allows businesses to capture not only the information but also the processes that transform the information - the process asset (Koulopoulos, T. M., 1995). Applications which involve automated, human-centric and collaborative processes across organisations are inherently different from one organisation to another. Even within the same organisation but over time, applications are adapted as ongoing change to the business processes is seen as the norm in today’s dynamic business environment. The major difference lies in the specifics of business processes which are changing rapidly in order to match the way in which businesses operate. In this chapter we introduce and discuss Business Process Management (BPM) with a focus on the integration of heterogeneous BPM systems across multiple organisations. We identify the problems and the main challenges not only with regards to technologies but also in the social and cultural context. We also discuss the issues that have arisen in our bid to find the solutions

    Improving maintenance strategies from experience feedback

    Get PDF
    A huge amount of rough data is available in companies on past maintenance activities as a result of the implementation of CMMS (Computerized Maintenance Management System). In that context, we focus on an experience feedback system dedicated to maintenance, allowing the capitalization of past interventions by means of a formal knowledge representation language, and the extraction from these interventions of new knowledge for future reuse

    A comparative study of the AHP and TOPSIS methods for implementing load shedding scheme in a pulp mill system

    Get PDF
    The advancement of technology had encouraged mankind to design and create useful equipment and devices. These equipment enable users to fully utilize them in various applications. Pulp mill is one of the heavy industries that consumes large amount of electricity in its production. Due to this, any malfunction of the equipment might cause mass losses to the company. In particular, the breakdown of the generator would cause other generators to be overloaded. In the meantime, the subsequence loads will be shed until the generators are sufficient to provide the power to other loads. Once the fault had been fixed, the load shedding scheme can be deactivated. Thus, load shedding scheme is the best way in handling such condition. Selected load will be shed under this scheme in order to protect the generators from being damaged. Multi Criteria Decision Making (MCDM) can be applied in determination of the load shedding scheme in the electric power system. In this thesis two methods which are Analytic Hierarchy Process (AHP) and Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) were introduced and applied. From this thesis, a series of analyses are conducted and the results are determined. Among these two methods which are AHP and TOPSIS, the results shown that TOPSIS is the best Multi criteria Decision Making (MCDM) for load shedding scheme in the pulp mill system. TOPSIS is the most effective solution because of the highest percentage effectiveness of load shedding between these two methods. The results of the AHP and TOPSIS analysis to the pulp mill system are very promising
    • 

    corecore