128,592 research outputs found
Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology
This paper presents some experiments in clustering homogeneous XMLdocuments
to validate an existing classification or more generally anorganisational
structure. Our approach integrates techniques for extracting knowledge from
documents with unsupervised classification (clustering) of documents. We focus
on the feature selection used for representing documents and its impact on the
emerging classification. We mix the selection of structured features with fine
textual selection based on syntactic characteristics.We illustrate and evaluate
this approach with a collection of Inria activity reports for the year 2003.
The objective is to cluster projects into larger groups (Themes), based on the
keywords or different chapters of these activity reports. We then compare the
results of clustering using different feature selections, with the official
theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors'
names in the bibliograph
Recommended from our members
Approaches to conceptual clustering
Methods for Conceptual Clustering may be explicated in two lights. Conceptual Clustering methods may be viewed as extensions to techniques of numerical taxonomy, a collection of methods developed by social and natural scientists for creating classification schemes over object sets. Alternatively, conceptual clustering may be viewed as a form of learning by observation or concept formation, as opposed to methods of learning from examples or concept identification. In this paper we survey and compare a number of conceptual clustering methods along dimensions suggested by each of these views. The point we most wish to clarify is that conceptual clustering processes can be explicated as being composed of three distinct but inter-dependent subprocesses: the process of deriving a hierarchical classification scheme; the process of aggregating objects into individual classes; and the process of assigning conceptual descriptions to object classes. Each subprocess may be characterized along a number of dimensions related to search, thus facilitating a better understanding of the conceptual clustering process as a whole
Recommended from our members
Methods of conceptual clustering and their relation to numerical taxonomy
Artificial Intelligence (AI) methods for machine learning can be viewed as forms of exploratory data analysis, even though they differ markedly from the statistical methods generally connoted by the term. The distinction between methods of machine learning and statistical data analysis is primarily due to differences in the way techniques of each type represent data and structure within data. That is, methods of machine learning are strongly biased toward symbolic (as opposed to numeric) data representations. We explore this difference within a limited context, devoting the bulk of our paper to the explication of conceptual clustering, an extension to the statistically based methods of numerical taxonomy. In conceptual clustering the formation of object clusters is dependent on the quality of 'higher-level' characterizations, termed concepts, of the clusters. The form of concepts used by existing conceptual clustering systems (sets of necessary and sufficient conditions) is described in some detail. This is followed by descriptions of several conceptual clustering techniques, along with sample output. We conclude with a discussion of how alternative concept representations might enhance the effectiveness of future conceptual clustering systems
Identifying smart design attributes for Industry 4.0 customization using a clustering Genetic Algorithm
Industry 4.0 aims at achieving mass customization at a
mass production cost. A key component to realizing this is accurate
prediction of customer needs and wants, which is however a
challenging issue due to the lack of smart analytics tools. This
paper investigates this issue in depth and then develops a predictive
analytic framework for integrating cloud computing, big data
analysis, business informatics, communication technologies, and
digital industrial production systems. Computational intelligence
in the form of a cluster k-means approach is used to manage
relevant big data for feeding potential customer needs and wants
to smart designs for targeted productivity and customized mass
production. The identification of patterns from big data is achieved
with cluster k-means and with the selection of optimal attributes
using genetic algorithms. A car customization case study shows
how it may be applied and where to assign new clusters with
growing knowledge of customer needs and wants. This approach
offer a number of features suitable to smart design in realizing
Industry 4.0
Multicriteria mapping manual: version 1.0
This Manual offers basic advice on how to do multicriteria mapping (MCM). It suggests how to: go about designing and building a typical MCM project; engage with participants and analyse results – and get the most out of the online MCM tool. Key terms are shown in bold italics and defined and explained in a final Annex.
The online MCM software tool provides its own operational help. So this Manual is more focused on the general approach. There are no rigid rules. MCM is structured, but very flexible. It allows many more detailed features than can be covered here.
MCM users are encouraged to think for themselves and be responsible and creative
Ant colony optimization approach for the capacitated vehicle routing problem with simultaneous delivery and pick-up
We propose an Ant Colony Optimization (ACO) algorithm to the NPhard Vehicle Routing Problem with Simultaneous Delivery and Pick-up (VRPSDP). In VRPSDP, commodities are delivered to customers from a single depot utilizing a fleet of identical vehicles and empty packages are collected from the customers and transported back to the depot. The objective is to minimize the total distance traveled. The algorithm is tested with the well-known benchmark problems from the literature. The experimental study indicates that our approach produces comparable results to those of the benchmark problems in the literature
Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems
A growing number of applications, e.g. video surveillance and medical image
analysis, require training recognition systems from large amounts of weakly
annotated data while some targeted interactions with a domain expert are
allowed to improve the training process. In such cases, active learning (AL)
can reduce labeling costs for training a classifier by querying the expert to
provide the labels of most informative instances. This paper focuses on AL
methods for instance classification problems in multiple instance learning
(MIL), where data is arranged into sets, called bags, that are weakly labeled.
Most AL methods focus on single instance learning problems. These methods are
not suitable for MIL problems because they cannot account for the bag structure
of data. In this paper, new methods for bag-level aggregation of instance
informativeness are proposed for multiple instance active learning (MIAL). The
\textit{aggregated informativeness} method identifies the most informative
instances based on classifier uncertainty, and queries bags incorporating the
most information. The other proposed method, called \textit{cluster-based
aggregative sampling}, clusters data hierarchically in the instance space. The
informativeness of instances is assessed by considering bag labels, inferred
instance labels, and the proportion of labels that remain to be discovered in
clusters. Both proposed methods significantly outperform reference methods in
extensive experiments using benchmark data from several application domains.
Results indicate that using an appropriate strategy to address MIAL problems
yields a significant reduction in the number of queries needed to achieve the
same level of performance as single instance AL methods
- …