58,053 research outputs found
A study on model selection of binary and non-Gaussian factor analysis.
An, Yujia.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 71-76).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.1.1 --- Review on BFA --- p.2Chapter 1.1.2 --- Review on NFA --- p.3Chapter 1.1.3 --- Typical model selection criteria --- p.5Chapter 1.1.4 --- New model selection criterion and automatic model selection --- p.6Chapter 1.2 --- Our contributions --- p.7Chapter 1.3 --- Thesis outline --- p.8Chapter 2 --- Combination of B and BI architectures for BFA with automatic model selection --- p.10Chapter 2.1 --- Implementation of BFA using BYY harmony learning with au- tomatic model selection --- p.11Chapter 2.1.1 --- Basic issues of BFA --- p.11Chapter 2.1.2 --- B-architecture for BFA with automatic model selection . --- p.12Chapter 2.1.3 --- BI-architecture for BFA with automatic model selection . --- p.14Chapter 2.2 --- Local minima in B-architecture and BI-architecture --- p.16Chapter 2.2.1 --- Local minima in B-architecture --- p.16Chapter 2.2.2 --- One unstable result in BI-architecture --- p.21Chapter 2.3 --- Combination of B- and BI-architecture for BFA with automatic model selection --- p.23Chapter 2.3.1 --- Combine B-architecture and BI-architecture --- p.23Chapter 2.3.2 --- Limitations of BI-architecture --- p.24Chapter 2.4 --- Experiments --- p.25Chapter 2.4.1 --- Frequency of local minima occurring in B-architecture --- p.25Chapter 2.4.2 --- Performance comparison for several methods in B-architecture --- p.26Chapter 2.4.3 --- Comparison of local minima in B-architecture and BI- architecture --- p.26Chapter 2.4.4 --- Frequency of unstable cases occurring in BI-architecture --- p.27Chapter 2.4.5 --- Comparison of performance of three strategies --- p.27Chapter 2.4.6 --- Limitations of BI-architecture --- p.28Chapter 2.5 --- Summary --- p.29Chapter 3 --- A Comparative Investigation on Model Selection in Binary Factor Analysis --- p.31Chapter 3.1 --- Binary Factor Analysis and ML Learning --- p.32Chapter 3.2 --- Hidden Factors Number Determination --- p.33Chapter 3.2.1 --- Using Typical Model Selection Criteria --- p.33Chapter 3.2.2 --- Using BYY harmony Learning --- p.34Chapter 3.3 --- Empirical Comparative Studies --- p.36Chapter 3.3.1 --- Effects of Sample Size --- p.37Chapter 3.3.2 --- Effects of Data Dimension --- p.37Chapter 3.3.3 --- Effects of Noise Variance --- p.39Chapter 3.3.4 --- Effects of hidden factor number --- p.43Chapter 3.3.5 --- Computing Costs --- p.43Chapter 3.4 --- Summary --- p.46Chapter 4 --- A Comparative Investigation on Model Selection in Non-gaussian Factor Analysis --- p.47Chapter 4.1 --- Non-Gaussian Factor Analysis and ML Learning --- p.48Chapter 4.2 --- Hidden Factor Determination --- p.51Chapter 4.2.1 --- Using typical model selection criteria --- p.51Chapter 4.2.2 --- BYY harmony Learning --- p.52Chapter 4.3 --- Empirical Comparative Studies --- p.55Chapter 4.3.1 --- Effects of Sample Size on Model Selection Criteria --- p.56Chapter 4.3.2 --- Effects of Data Dimension on Model Selection Criteria --- p.60Chapter 4.3.3 --- Effects of Noise Variance on Model Selection Criteria --- p.64Chapter 4.3.4 --- Discussion on Computational Cost --- p.64Chapter 4.4 --- Summary --- p.68Chapter 5 --- Conclusions --- p.69Bibliography --- p.7
Detecting Family Resemblance: Automated Genre Classification.
This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.
Anomaly Detection Based on Indicators Aggregation
Automatic anomaly detection is a major issue in various areas. Beyond mere
detection, the identification of the source of the problem that produced the
anomaly is also essential. This is particularly the case in aircraft engine
health monitoring where detecting early signs of failure (anomalies) and
helping the engine owner to implement efficiently the adapted maintenance
operations (fixing the source of the anomaly) are of crucial importance to
reduce the costs attached to unscheduled maintenance. This paper introduces a
general methodology that aims at classifying monitoring signals into normal
ones and several classes of abnormal ones. The main idea is to leverage expert
knowledge by generating a very large number of binary indicators. Each
indicator corresponds to a fully parametrized anomaly detector built from
parametric anomaly scores designed by experts. A feature selection method is
used to keep only the most discriminant indicators which are used at inputs of
a Naive Bayes classifier. This give an interpretable classifier based on
interpretable anomaly detectors whose parameters have been optimized indirectly
by the selection process. The proposed methodology is evaluated on simulated
data designed to reproduce some of the anomaly types observed in real world
engines.Comment: International Joint Conference on Neural Networks (IJCNN 2014),
Beijing : China (2014). arXiv admin note: substantial text overlap with
arXiv:1407.088
Learning Determinantal Point Processes
Determinantal point processes (DPPs), which arise in random matrix theory and
quantum physics, are natural models for subset selection problems where
diversity is preferred. Among many remarkable properties, DPPs offer tractable
algorithms for exact inference, including computing marginal probabilities and
sampling; however, an important open question has been how to learn a DPP from
labeled training data. In this paper we propose a natural feature-based
parameterization of conditional DPPs, and show how it leads to a convex and
efficient learning formulation. We analyze the relationship between our model
and binary Markov random fields with repulsive potentials, which are
qualitatively similar but computationally intractable. Finally, we apply our
approach to the task of extractive summarization, where the goal is to choose a
small subset of sentences conveying the most important information from a set
of documents. In this task there is a fundamental tradeoff between sentences
that are highly relevant to the collection as a whole, and sentences that are
diverse and not repetitive. Our parameterization allows us to naturally balance
these two characteristics. We evaluate our system on data from the DUC 2003/04
multi-document summarization task, achieving state-of-the-art results
- …