120,302 research outputs found
A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data
Gene expression analysis aims at identifying the genes able to accurately
predict biological parameters like, for example, disease subtyping or
progression. While accurate prediction can be achieved by means of many
different techniques, gene identification, due to gene correlation and the
limited number of available samples, is a much more elusive problem. Small
changes in the expression values often produce different gene lists, and
solutions which are both sparse and stable are difficult to obtain. We propose
a two-stage regularization method able to learn linear models characterized by
a high prediction performance. By varying a suitable parameter these linear
models allow to trade sparsity for the inclusion of correlated genes and to
produce gene lists which are almost perfectly nested. Experimental results on
synthetic and microarray data confirm the interesting properties of the
proposed method and its potential as a starting point for further biological
investigationsComment: 17 pages, 8 Post-script figure
A general guide to applying machine learning to computer architecture
The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which are extremely difficult to achieve manually, helps to produce effective predictive models. Whilst computer architects have been accelerating the performance of machine learning algorithms with GPUs and custom hardware, there have been few implementations leveraging these algorithms to improve the computer system performance. The work that has been conducted, however, has produced considerably promising results.
The purpose of this paper is to serve as a foundational base and guide to future computer
architecture research seeking to make use of machine learning models for improving system efficiency.
We describe a method that highlights when, why, and how to utilize machine learning
models for improving system performance and provide a relevant example showcasing the effectiveness of applying machine learning in computer architecture. We describe a process of data
generation every execution quantum and parameter engineering. This is followed by a survey of a
set of popular machine learning models. We discuss their strengths and weaknesses and provide
an evaluation of implementations for the purpose of creating a workload performance predictor
for different core types in an x86 processor. The predictions can then be exploited by a scheduler
for heterogeneous processors to improve the system throughput. The algorithms of focus are
stochastic gradient descent based linear regression, decision trees, random forests, artificial neural
networks, and k-nearest neighbors.This work has been supported by the European Research Council (ERC) Advanced Grant RoMoL (Grant Agreemnt 321253) and by the Spanish Ministry of Science and Innovation (contract TIN 2015-65316P).Peer ReviewedPostprint (published version
Identifying Real Estate Opportunities using Machine Learning
The real estate market is exposed to many fluctuations in prices because of
existing correlations with many variables, some of which cannot be controlled
or might even be unknown. Housing prices can increase rapidly (or in some
cases, also drop very fast), yet the numerous listings available online where
houses are sold or rented are not likely to be updated that often. In some
cases, individuals interested in selling a house (or apartment) might include
it in some online listing, and forget about updating the price. In other cases,
some individuals might be interested in deliberately setting a price below the
market price in order to sell the home faster, for various reasons. In this
paper, we aim at developing a machine learning application that identifies
opportunities in the real estate market in real time, i.e., houses that are
listed with a price substantially below the market price. This program can be
useful for investors interested in the housing market. We have focused in a use
case considering real estate assets located in the Salamanca district in Madrid
(Spain) and listed in the most relevant Spanish online site for home sales and
rentals. The application is formally implemented as a regression problem that
tries to estimate the market price of a house given features retrieved from
public online listings. For building this application, we have performed a
feature engineering stage in order to discover relevant features that allows
for attaining a high predictive performance. Several machine learning
algorithms have been tested, including regression trees, k-nearest neighbors,
support vector machines and neural networks, identifying advantages and
handicaps of each of them.Comment: 24 pages, 13 figures, 5 table
Machine Learning Classification of SDSS Transient Survey Images
We show that multiple machine learning algorithms can match human performance
in classifying transient imaging data from the Sloan Digital Sky Survey (SDSS)
supernova survey into real objects and artefacts. This is a first step in any
transient science pipeline and is currently still done by humans, but future
surveys such as the Large Synoptic Survey Telescope (LSST) will necessitate
fully machine-enabled solutions. Using features trained from eigenimage
analysis (principal component analysis, PCA) of single-epoch g, r and
i-difference images, we can reach a completeness (recall) of 96 per cent, while
only incorrectly classifying at most 18 per cent of artefacts as real objects,
corresponding to a precision (purity) of 84 per cent. In general, random
forests performed best, followed by the k-nearest neighbour and the SkyNet
artificial neural net algorithms, compared to other methods such as na\"ive
Bayes and kernel support vector machine. Our results show that PCA-based
machine learning can match human success levels and can naturally be extended
by including multiple epochs of data, transient colours and host galaxy
information which should allow for significant further improvements, especially
at low signal-to-noise.Comment: 14 pages, 8 figures. In this version extremely minor adjustments to
the paper were made - e.g. Figure 5 is now easier to view in greyscal
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
- …