527 research outputs found
A Fisher consistent multiclass loss function with variable margin on positive examples
The concept of pointwise Fisher consistency (or classification calibration) states necessary and sufficient conditions to have Bayes consistency when a classifier minimizes a surrogate loss function instead of the 0-1 loss. We present a family of multiclass hinge loss functions defined by a continuous control parameter. representing the margin of the positive points of a given class. The parameter. allows shifting from classification uncalibrated to classification calibrated loss functions. Though previous results suggest that increasing the margin of positive points has positive effects on the classification model, other approaches have failed to give increasing weight to the positive examples without losing the classification calibration property. Our lambda-based loss function can give unlimited weight to the positive examples without breaking the classification calibration property. Moreover, when embedding these loss functions into the Support Vector Machine's framework (lambda-SVM), the parameter. defines different regions for the Karush-Kuhn-Tucker conditions. A large margin on positive points also facilitates faster convergence of the Sequential Minimal Optimization algorithm, leading to lower training times than other classification calibrated methods. lambda-SVM allows easy implementation, and its practical use in different datasets not only supports our theoretical analysis, but also provides good classification performance and fast training times.The authors acknowledge the referees' comments and suggestions that helped to improve the manuscript. This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the Federal Bureau of Investigations, Finance Division. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. I.R-L acknowledges partial support by Spain's grants TIN2013-42351-P (MINECO) and S2013/ICE-2845 CASI-CAM-CM (Comunidad de Madrid). The authors gratefully acknowledge the use of the facilities of Centro de Computacion Cientifica (CCC) at Universidad Autonoma de Madrid
On the equivalence of Kernel Fisher discriminant analysis and Kernel Quadratic Programming Feature Selection
This is the author’s version of a work that was accepted for publication in Pattern Recognition Letters. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition Letters, Vol. 32, Iss. 11, (2011) DOI: 10.1016/j.patrec.2011.04.007We reformulate the Quadratic Programming Feature Selection (QPFS) method in a Kernel space to obtain a vector which maximizes the quadratic objective function of QPFS. We demonstrate that the vector obtained by Kernel Quadratic Programming Feature Selection is equivalent to the Kernel Fisher vector and, therefore, a new interpretation of the Kernel Fisher discriminant analysis is given which provides some computational advantages for highly unbalanced datasets.I.R.-L. is supported by an FPU grant from Universidad Autónoma de Madrid, and partially supported by the Universidad Autónoma de Madrid-IIC Chair and TIN 2010-21575-C02-01. RH was partially supported by Grants ONRN00014-07-1-0741, and US Army Medical and Material Command under contract #W81XWH-10-C-0040 in collaboration with Elintrix, Inc
Hierarchical linear support vector machine
This is the author’s version of a work that was accepted for publication in Pattern Recognition. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition, Vol. 45, Iss. 12, (2012) DOI: 10.1016/j.patcog.2012.06.002The increasing size and dimensionality of real-world datasets make it necessary to design efficient algorithms not only in the training process but also in the prediction phase. In applications such as credit card fraud detection, the classifier needs to predict an event in 10 ms at most. In these environments the speed of the prediction constraints heavily outweighs the training costs. We propose a new classification method, called a Hierarchical Linear Support Vector Machine (H-LSVM), based on the construction of an oblique decision tree in which the node split is obtained as a Linear Support Vector Machine. Although other methods have been proposed to break the data space down in subregions to speed up Support Vector Machines, the H-LSVM algorithm represents a very simple and efficient model in training but mainly in prediction for large-scale datasets. Only a few hyperplanes need to be evaluated in the prediction step, no kernel computation is required and the tree structure makes parallelization possible. In experiments with medium and large datasets, the H-LSVM reduces the prediction cost considerably while achieving classification results closer to the non-linear SVM than that of the linear case.The authors would like to thank the anonymous reviewers for their comments that help improve the manuscript. I.R.-L. is supported by an FPU Grant from Universidad Autónoma de Madrid, and partially supported by the Universidad Autónoma de Madrid-IIC Chair and TIN2010-21575-C02-01. R.H. acknowledges partial support by ONRN00014-07-1-0741, USARIEM-W81XWH-10-C-0040 (ELINTRIX) and JPL-2012-1455933
Metodología y evaluación de la reconversión sustentable de la vivienda por conjuntos barriales en el municipio de Guadalajara, Jal.
En la última década se han generado programas públicos nacionales para desarrollar vivienda sustentable dirigidas a la población como Hipoteca Verde de INFONAVIT y a desarrolladores como Eco Casa de la Sociedad Hipotecaria Federal, esto con el objetivo principal de reducir la demanda de energía del sector que representa el 14.1% (SENER, 2018) de la energía total del país. A pesar de esto, existe poca preocupación por el mejoramiento sustentable del parque habitacional existente. Los subsidios y programas actuales se ofrecen solo para la reconversión de vivienda al interior de la unidad lo cual representa impactos poco significativos. Por lo tanto, es importante crear programas que se puedan aplicar a escala barrial y que los beneficios obtenidos de esta reconversión habitacional generen impactos significativos, comparados con los impactos que podrían tenerse en estrategias colectivas de reconversión en las dimensiones económica, social y medioambiental. Este trabajo presenta una metodología para identificar los barrios del municipio de Guadalajara más indicados para plantear una reconversión habitacional a escala barrial, así como la clasificación de la vivienda en tipologías habitacionales, la propuesta de reconversión sustentable por conjuntos barriales mediante acciones unidad-conjunto y evalúa el potencial de generación de energía, reducción de emisiones de GEI y mejoras sociales.ITESO, A.C
A practical view of large-scale classification: feature selection and real-time classification
Tesis doctoral inédita, Universidad Autónoma de Madrid, Escuela Politécnica Superior, mayo de 201
Caracterización y simulación de arborizaciones dentríticas con redes bayesianas incluyendo variables angulares
El funcionamiento interno del cerebro es todavía hoy en día un misterio, siendo su
comprensión uno de los principales desafíos a los que se enfrenta la ciencia moderna.
El córtex cerebral es el área del cerebro donde tienen lugar los procesos cerebrales
de más alto nivel, cómo la imaginación, el juicio o el pensamiento abstracto. Las
neuronas piramidales, un tipo específico de neurona, suponen cerca del 80% de los
cerca de los 10.000 millones de que componen el córtex cerebral, haciendo de ellas
un objetivo principal en el estudio del funcionamiento del cerebro.
La morfología neuronal, y más específicamente la morfología dendrítica, determina
cómo estas procesan la información y los patrones de conexión entre neuronas,
siendo los modelos computacionales herramientas imprescindibles para el estudio de
su rol en el funcionamiento del cerebro. En este trabajo hemos creado un modelo
computacional, con más de 50 variables relativas a la morfología dendrítica, capaz
de simular el crecimiento de arborizaciones dendríticas basales completas a partir de
reconstrucciones de neuronas piramidales reales, abarcando desde el número de dendritas
hasta el crecimiento los los árboles dendríticos. A diferencia de los trabajos
anteriores, nuestro modelo basado en redes Bayesianas contempla la arborización
dendrítica en su conjunto, teniendo en cuenta las interacciones entre dendritas y
detectando de forma automática las relaciones entre las variables morfológicas que
caracterizan la arborización. Además, el análisis de las redes Bayesianas puede ayudar
a identificar relaciones hasta ahora desconocidas entre variables morfológicas.
Motivado por el estudio de la orientación de las dendritas basales, en este trabajo
se introduce una regularización L1 generalizada, aplicada al aprendizaje de
la distribución von Mises multivariante, una de las principales distribuciones de
probabilidad direccional multivariante. También se propone una distancia circular
multivariante que puede utilizarse para estimar la divergencia de Kullback-Leibler
entre dos muestras de datos circulares. Comparamos los modelos con y sin regularizaci
ón en el estudio de la orientación de la dendritas basales en neuronas humanas,
comprobando que, en general, el modelo regularizado obtiene mejores resultados.
El muestreo, ajuste y representación de la distribución von Mises multivariante se
implementa en un nuevo paquete de R denominado mvCircular.---ABSTRACT---The inner workings of the brain are, as of today, a mystery. To understand the
brain is one of the main challenges faced by current science. The cerebral cortex is
the region of the brain where all superior brain processes, like imagination, judge
and abstract reasoning take place. Pyramidal neurons, a specific type of neurons,
constitute approximately the 80% of the more than 10.000 million neurons that
compound the cerebral cortex. It makes the study of the pyramidal neurons crucial
in order to understand how the brain works.
Neuron morphology, and specifically the dendritic morphology, determines how
the information is processed in the neurons, as well as the connection patterns
among neurons. Computational models are one of the main tools for studying dendritic
morphology and its role in the brain function. We have built a computational
model that contains more than 50 morphological variables of the dendritic arborizations.
This model is able to simulate the growth of complete dendritic arborizations
from real neuron reconstructions, starting with the number of basal dendrites, and
ending modeling the growth of dendritic trees. One of the main diferences between
our approach, mainly based on the use of Bayesian networks, and other models in the
state of the art is that we model the whole dendritic arborization instead of focusing
on individual trees, which makes us able to take into account the interactions between
dendrites and to automatically detect relationships between the morphologic
variables that characterize the arborization. Moreover, the posterior analysis of the
relationships in the model can help to identify new relations between morphological
variables.
Motivated by the study of the basal dendrites orientation, a generalized L1 regularization
applied to the multivariate von Mises distribution, one of the most used
distributions in multivariate directional statistics, is also introduced in this work.
We also propose a circular multivariate distance that can be used to estimate the
Kullback-Leibler divergence between two circular data samples. We compare the regularized
and unregularized models on basal dendrites orientation of human neurons
and prove that regularized model achieves better results than non regularized von
Mises model. Sampling, fitting and plotting functions for the multivariate von Mises
are implemented in a new R packaged called mvCircular
Dataset from chemical gas sensor array in turbulent wind tunnel
The dataset includes the acquired time series of a chemical detection platform exposed to different gas conditions in a turbulent wind tunnel. The chemo-sensory elements were sampling directly the environment. In contrast to traditional approaches that include measurement chambers, open sampling systems are sensitive to dispersion mechanisms of gaseous chemical analytes, namely diffusion, turbulence, and advection, making the identification and monitoring of chemical substances more challenging. The sensing platform included 72 metal-oxide gas sensors that were positioned at 6 different locations of the wind tunnel. At each location, 10 distinct chemical gases were released in the wind tunnel, the sensors were evaluated at 5 different operating temperatures, and 3 different wind speeds were generated in the wind tunnel to induce different levels of turbulence. Moreover, each configuration was repeated 20 times, yielding a dataset of 18,000 measurements. The dataset was collected over a period of 16 months. The data is related to "On the performance of gas sensor arrays in open sampling systems using Inhibitory Support Vector Machines", by Vergara et al.[1]. The dataset can be accessed publicly at the UCI repository upon citation of [1]: http://archive.ics.uci.edu/ml/datasets/Gas+sensor+arrays+in+open+sampling+settings.This work has been supported by the California Institute for Telecommunications and Information Technology (CALIT2) under Grant number 2014 CSRO 136
Data set from chemical sensor array exposed to turbulent gas mixtures
A chemical detection platform composed of 8 chemo-resistive gas sensors was exposed to turbulent gas mixtures generated naturally in a wind tunnel. The acquired time series of the sensors are provided. The experimental setup was designed to test gas sensors in realistic environments. Traditionally, chemical detection systems based on chemo-resistive sensors include a gas chamber to control the sample air flow and minimize turbulence. Instead, we utilized a wind tunnel with two independent gas sources that generate two gas plumes. The plumes get naturally mixed along a turbulent flow and reproduce the gas concentration fluctuations observed in natural environments. Hence, the gas sensors can capture the spatio-temporal information contained in the gas plumes. The sensor array was exposed to binary mixtures of ethylene with either methane or carbon monoxide. Volatiles were released at four different rates to induce different concentration levels in the vicinity of the sensor array. Each configuration was repeated 6 times, for a total of 180 measurements. The data is related to "Chemical Discrimination in Turbulent Gas Mixtures with MOX Sensors Validated by Gas Chromatography-Mass Spectrometry", by Fonollosa et al. [1]. The dataset can be accessed publicly at the UCI repository upon citation of [1]: http://archive.ics.uci.edu/ml/datasets/Gas+senso+rarray+exposed+to+turbulent+gas+mixtures.This work has been supported by the California Institute for Telecommunications and Information Technology (CALIT2) under Grant Number 2014 CSRO 136
Practical values and uncertainty in regulatory decision making
Regulatory science, which generates knowledge relevant for regulatory decision‐making, is different from standard academic science in that it is oriented mainly towards the attainment of non‐epistemic (practical) aims. The role of uncertainty and the limits to the relevance of academic science are being recognized more and more explicitly in regulatory decision‐making. This has led to the introduction of regulation‐specific scientific methodologies in order to generate decision‐relevant data. However, recent practical experience with such non‐standard methodologies indicates that they, too, may be subject to important limitations. We argue that the attainment of non‐epistemic values and aims (like the protection of human health and the environment) requires not only control of the quality of the data and the methodologies, but also the selection of the level of regulation deemed adequate in each specific case (including a decision about which of the two, under‐regulation or over‐regulation, would be more acceptable)
- …