Search CORE

128,592 research outputs found

Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology

Author: Despeyroux Thierry
Lechevallier Yves
Trousse Brigitte
Vercoustre Anne-Marie
Publication venue
Publication date: 01/01/2005
Field of study

This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics.We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors' names in the bibliograph

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Recommended from our members

Approaches to conceptual clustering

Author: Fisher Douglas
Langley Pat
Publication venue: eScholarship, University of California
Publication date: 12/07/1985
Field of study

Methods for Conceptual Clustering may be explicated in two lights. Conceptual Clustering methods may be viewed as extensions to techniques of numerical taxonomy, a collection of methods developed by social and natural scientists for creating classification schemes over object sets. Alternatively, conceptual clustering may be viewed as a form of learning by observation or concept formation, as opposed to methods of learning from examples or concept identification. In this paper we survey and compare a number of conceptual clustering methods along dimensions suggested by each of these views. The point we most wish to clarify is that conceptual clustering processes can be explicated as being composed of three distinct but inter-dependent subprocesses: the process of deriving a hierarchical classification scheme; the process of aggregating objects into individual classes; and the process of assigning conceptual descriptions to object classes. Each subprocess may be characterized along a number of dimensions related to search, thus facilitating a better understanding of the conceptual clustering process as a whole

eScholarship - University of California

Recommended from our members

Methods of conceptual clustering and their relation to numerical taxonomy

Author: Fisher Douglas
Langley Pat
Publication venue: eScholarship, University of California
Publication date: 22/07/1985
Field of study

Artificial Intelligence (AI) methods for machine learning can be viewed as forms of exploratory data analysis, even though they differ markedly from the statistical methods generally connoted by the term. The distinction between methods of machine learning and statistical data analysis is primarily due to differences in the way techniques of each type represent data and structure within data. That is, methods of machine learning are strongly biased toward symbolic (as opposed to numeric) data representations. We explore this difference within a limited context, devoting the bulk of our paper to the explication of conceptual clustering, an extension to the statistically based methods of numerical taxonomy. In conceptual clustering the formation of object clusters is dependent on the quality of 'higher-level' characterizations, termed concepts, of the clusters. The form of concepts used by existing conceptual clustering systems (sets of necessary and sufficient conditions) is described in some detail. This is followed by descriptions of several conceptual clustering techniques, along with sample output. We conclude with a discussion of how alternative concept representations might enhance the effectiveness of future conceptual clustering systems

eScholarship - University of California

Identifying smart design attributes for Industry 4.0 customization using a clustering Genetic Algorithm

Author: Chen Yi
Flores Saldivar Alfredo
Goh Cindy SF
Li Yun
Yu Hongnian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Industry 4.0 aims at achieving mass customization at a mass production cost. A key component to realizing this is accurate prediction of customer needs and wants, which is however a challenging issue due to the lack of smart analytics tools. This paper investigates this issue in depth and then develops a predictive analytic framework for integrating cloud computing, big data analysis, business informatics, communication technologies, and digital industrial production systems. Computational intelligence in the form of a cluster k-means approach is used to manage relevant big data for feeding potential customer needs and wants to smart designs for targeted productivity and customized mass production. The identification of patterns from big data is achieved with cluster k-means and with the selection of optimal attributes using genetic algorithms. A car customization case study shows how it may be applied and where to assign new clusters with growing knowledge of customer needs and wants. This approach offer a number of features suitable to smart design in realizing Industry 4.0

Enlighten

Multicriteria mapping manual: version 1.0

Author: Coburn Josie
Stirling Andy
Publication venue: 'University of Sussex'
Publication date: 01/01/2014
Field of study

This Manual offers basic advice on how to do multicriteria mapping (MCM). It suggests how to: go about designing and building a typical MCM project; engage with participants and analyse results – and get the most out of the online MCM tool. Key terms are shown in bold italics and defined and explained in a final Annex. The online MCM software tool provides its own operational help. So this Manual is more focused on the general approach. There are no rigid rules. MCM is structured, but very flexible. It allows many more detailed features than can be covered here. MCM users are encouraged to think for themselves and be responsible and creative

Crossref

Sussex Research Online

Ant colony optimization approach for the capacitated vehicle routing problem with simultaneous delivery and pick-up

Author: Catay Bulent
Çatay Bülent
Publication venue: Technical University of Bari
Publication date: 01/09/2006
Field of study

We propose an Ant Colony Optimization (ACO) algorithm to the NPhard Vehicle Routing Problem with Simultaneous Delivery and Pick-up (VRPSDP). In VRPSDP, commodities are delivered to customers from a single depot utilizing a fleet of identical vehicles and empty packages are collected from the customers and transported back to the depot. The objective is to minimize the total distance traveled. The algorithm is tested with the well-known benchmark problems from the literature. The experimental study indicates that our approach produces comparable results to those of the benchmark problems in the literature

Sabanci University Research Database

Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems

Author: Carbonneau Marc-André
Gagnon Ghyslain
Granger Eric
Publication venue
Publication date: 06/10/2017
Field of study

A growing number of applications, e.g. video surveillance and medical image analysis, require training recognition systems from large amounts of weakly annotated data while some targeted interactions with a domain expert are allowed to improve the training process. In such cases, active learning (AL) can reduce labeling costs for training a classifier by querying the expert to provide the labels of most informative instances. This paper focuses on AL methods for instance classification problems in multiple instance learning (MIL), where data is arranged into sets, called bags, that are weakly labeled. Most AL methods focus on single instance learning problems. These methods are not suitable for MIL problems because they cannot account for the bag structure of data. In this paper, new methods for bag-level aggregation of instance informativeness are proposed for multiple instance active learning (MIAL). The \textit{aggregated informativeness} method identifies the most informative instances based on classifier uncertainty, and queries bags incorporating the most information. The other proposed method, called \textit{cluster-based aggregative sampling}, clusters data hierarchically in the instance space. The informativeness of instances is assessed by considering bag labels, inferred instance labels, and the proportion of labels that remain to be discovered in clusters. Both proposed methods significantly outperform reference methods in extensive experiments using benchmark data from several application domains. Results indicate that using an appropriate strategy to address MIAL problems yields a significant reduction in the number of queries needed to achieve the same level of performance as single instance AL methods

arXiv.org e-Print Archive