Search CORE

17,500 research outputs found

Semi-monolayer covering rough set on set-valued information systems and its efficient computation

Author: Chen Ning
Luo Junwei
Wang Hui
Wu Zhengjiang
Publication venue: 'Elsevier BV'
Publication date: 01/03/2021
Field of study

Queen's University Belfast Research Portal

Ulster University's Research Portal

Experiments on Incomplete Data Sets Using Modifications to Characteristic Relation

Author: Alalwani Sumiah A.
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2017
Field of study

Rough set theory is a useful approach for decision rule induction which is applied to large life data sets. Lower and upper approximations of concept values are used to induce rules for incomplete data sets. In our research we will study validity of modifications suggested to characteristic relation. We discuss the implementation of modifications to characteristic relation, and the local definability of each modified set.We show that all suggested modification sets are not locally definable except for maximal consistent blocks that are restricted to data set with "do not care" conditions. A comparative analysis was conducted for characteristic sets and modifications in terms of cardinality of lower and upper approximations of each concept and decision rules induced by each modification. In this research, experiments were conducted on four incomplete data sets with lost and do not care conditions. LEM2 algorithm was implemented to induce certain and possible rules from the incomplete data set. To measure the classification average error rate for induced rules, ten-fold cross validation was implemented. Our results show that there is no significant difference between the qualities of rule induced from each modification

KU ScholarWorks

Global discretization of continuous attributes as preprocessing for machine learning

Author: Chmielewski Michal R.
Grzymala-Busse Jerzy W.
Publication venue: Published by Elsevier Inc.
Publication date: 01/11/1996
Field of study

AbstractReal-life data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. In this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis, is presented and compared experimentally with three known local methods, transformed into global. Experiments include tenfold cross-validation and leaving-one-out methods for ten real-life data sets

Elsevier - Publisher Connector

KU ScholarWorks

Article Segmentation in Digitised Newspapers

Author: Naoum Andrew
Publication venue: Faculty of Engineering and Information Technologies, School of Computer Science
Publication date: 01/01/2020
Field of study

Digitisation projects preserve and make available vast quantities of historical text. Among these, newspapers are an invaluable resource for the study of human culture and history. Article segmentation identifies each region in a digitised newspaper page that contains an article. Digital humanities, information retrieval (IR), and natural language processing (NLP) applications over digitised archives improve access to text and allow automatic information extraction. The lack of article segmentation impedes these applications. We contribute a thorough review of the existing approaches to article segmentation. Our analysis reveals divergent interpretations of the task, and inconsistent and often ambiguously defined evaluation metrics, making comparisons between systems challenging. We solve these issues by contributing a detailed task definition that examines the nuances and intricacies of article segmentation that are not immediately apparent. We provide practical guidelines on handling borderline cases and devise a new evaluation framework that allows insightful comparison of existing and future approaches. Our review also reveals that the lack of large datasets hinders meaningful evaluation and limits machine learning approaches. We solve these problems by contributing a distant supervision method for generating large datasets for article segmentation. We manually annotate a portion of our dataset and show that our method produces article segmentations over characters nearly as well as costly human annotators. We reimplement the seminal textual approach to article segmentation (Aiello and Pegoretti, 2006) and show that it does not generalise well when evaluated on a large dataset. We contribute a framework for textual article segmentation that divides the task into two distinct phases: block representation and clustering. We propose several techniques for block representation and contribute a novel highly-compressed semantic representation called similarity embeddings. We evaluate and compare different clustering techniques, and innovatively apply label propagation (Zhu and Ghahramani, 2002) to spread headline labels to similar blocks. Our similarity embeddings and label propagation approach substantially outperforms Aiello and Pegoretti but still falls short of human performance. Exploring visual approaches to article segmentation, we reimplement and analyse the state-of-the-art Bansal et al. (2014) approach. We contribute an innovative 2D Markov model approach that captures reading order dependencies and reduces the structured labelling problem to a Markov chain that we decode with Viterbi (1967). Our approach substantially outperforms Bansal et al., achieves accuracy as good as human annotators, and establishes a new state of the art in article segmentation. Our task definition, evaluation framework, and distant supervision dataset will encourage progress in the task of article segmentation. Our state-of-the-art textual and visual approaches will allow sophisticated IR and NLP applications over digitised newspaper archives, supporting research in the digital humanities

Sydney eScholarship

Recommended from our members

Machine learning : techniques and foundations

Author: Carbonell Jaime G.
Langley Pat
Publication venue: eScholarship, University of California
Publication date: 30/03/1987
Field of study

The field of machine learning studies computational methods for acquiring new knowledge, new skills, and new ways to organize existing knowledge. In this paper we present some of the basic techniques and principles that underlie AI research on learning, including methods for learning from examples, learning in problem solving, learning by analogy, grammar acquisition, and machine discovery. In each case, we illustrate the techniques with paradigmatic examples

eScholarship - University of California

Recommended from our members

A Generic Library of Problem Solving Methods for Scheduling Applications

Author: Rajpathak Dnyanesh
Publication venue
Publication date: 01/01/2005
Field of study

In this thesis we propose a generic library of scheduling problem-solving methods. As a first approximation, scheduling can be defined as an assignment of jobs and activities to resources and time ranges in accordance with a number of constraints and requirements. In some cases optimisation criteria may also be included in the problem specification. Although, several attempts have been made in the past at developing the libraries of scheduling problem-solvers, these only provide limited coverage. Many lack generality, as they subscribe to a particular scheduling domain. Others simply implement a particular problem-solving technique, which may be applicable only to a subset of the space of scheduling problems. In addition, most of these libraries fail to provide the required degree of depth and precision, which is needed both to obtain a formal account of scheduling problem solving and to provide effective support for development of scheduling applications by reuse. Our library subscribes to the Task-Method-Domain-Application (TMDA) knowledge modelling framework, which provides a structured organisation for the different components of the library. In line with the organisation proposed by TMDA, we first developed a generic scheduling task ontology, which formalises the space of scheduling problems independently of any particular application domain, or problem solving method. Then we constructed a task-specific, but domain independent model of scheduling problem-solving, which generalises from the variety of approaches to scheduling problem-solving, which can be found in literature. The generic nature of this model was demonstrated by constructing seven methods for scheduling, as alternative specialisation of the model. Finally, we validated our library on a number of applications to demonstrate its generic nature and effective support for the analysis and development of scheduling applications

Open Research Online (The Open University)

OpenGrey Repository

Knowledge Based Systems: A Critical Survey of Major Concepts, Issues, and Techniques

Author: Dominick Wayne D.
Kavi Srinu
Publication venue
Publication date
Field of study

This Working Paper Series entry presents a detailed survey of knowledge based systems. After being in a relatively dormant state for many years, only recently is Artificial Intelligence (AI) - that branch of computer science that attempts to have machines emulate intelligent behavior - accomplishing practical results. Most of these results can be attributed to the design and use of Knowledge-Based Systems, KBSs (or ecpert systems) - problem solving computer programs that can reach a level of performance comparable to that of a human expert in some specialized problem domain. These systems can act as a consultant for various requirements like medical diagnosis, military threat analysis, project risk assessment, etc. These systems possess knowledge to enable them to make intelligent desisions. They are, however, not meant to replace the human specialists in any particular domain. A critical survey of recent work in interactive KBSs is reported. A case study (MYCIN) of a KBS, a list of existing KBSs, and an introduction to the Japanese Fifth Generation Computer Project are provided as appendices. Finally, an extensive set of KBS-related references is provided at the end of the report

NASA Technical Reports Server

User-centered visual analysis using a hybrid reasoning architecture for intensive care units

Author: Allart Laurent
Hubert Hervé
Kamsu-Foguem Bernard
Lemdani Mohamed
Mehdaoui Hossein
Ravaux Pierre
Tchuenté Foguem Germaine
Vilhelm Christian
Zennir Youcef
Zitouni Djamel
Publication venue: 'Elsevier BV'
Publication date: 01/12/2012
Field of study

One problem pertaining to Intensive Care Unit information systems is that, in some cases, a very dense display of data can result. To ensure the overview and readability of the increasing volumes of data, some special features are required (e.g., data prioritization, clustering, and selection mechanisms) with the application of analytical methods (e.g., temporal data abstraction, principal component analysis, and detection of events). This paper addresses the problem of improving the integration of the visual and analytical methods applied to medical monitoring systems. We present a knowledge- and machine learning-based approach to support the knowledge discovery process with appropriate analytical and visual methods. Its potential benefit to the development of user interfaces for intelligent monitors that can assist with the detection and explanation of new, potentially threatening medical events. The proposed hybrid reasoning architecture provides an interactive graphical user interface to adjust the parameters of the analytical methods based on the users' task at hand. The action sequences performed on the graphical user interface by the user are consolidated in a dynamic knowledge base with specific hybrid reasoning that integrates symbolic and connectionist approaches. These sequences of expert knowledge acquisition can be very efficient for making easier knowledge emergence during a similar experience and positively impact the monitoring of critical situations. The provided graphical user interface incorporating a user-centered visual analysis is exploited to facilitate the natural and effective representation of clinical information for patient care

Open Archive Toulouse Archive Ouverte

Generating approximate region boundaries from heterogeneous spatial information: an evolutionary approach

Author: Schockaert Steven
Smart Philip
Twaroch Florian
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Spatial information takes different forms in different applications, ranging from accurate coordinates in geographic information systems to the qualitative abstractions that are used in artificial intelligence and spatial cognition. As a result, existing spatial information processing techniques tend to be tailored towards one type of spatial information, and cannot readily be extended to cope with the heterogeneity of spatial information that often arises in practice. In applications such as geographic information retrieval, on the other hand, approximate boundaries of spatial regions need to be constructed, using whatever spatial information that can be obtained. Motivated by this observation, we propose a novel methodology for generating spatial scenarios that are compatible with available knowledge. By suitably discretizing space, this task is translated to a combinatorial optimization problem, which is solved using a hybridization of two well-known meta-heuristics: genetic algorithms and ant colony optimization. What results is a flexible method that can cope with both quantitative and qualitative information, and can easily be adapted to the specific needs of specific applications. Experiments with geographic data demonstrate the potential of the approach

Online Research @ Cardiff

Ghent University Academic Bibliography