3,863 research outputs found
Fuzzy ARTMAP: A Neural Network Architecture for Incremental Supervised Learning of Analog Multidimensional Maps
A new neural network architecture is introduced for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, resonance, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units", to met accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator (Λ) and the MAX operator (v) of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights correspond to increasing sizes of category "boxes". Smaller vigilance values lead to larger category boxes. Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Four classes of simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; and (iv) a letter recognition database. The Fuzzy ARTMAP system is also compared to Salzberg's NGE system and to Simpson's FMMC system.British Petroleum (89-A-1204); Defense Advanced Research Projects Agency (90-0083); National Science Foundation (IRI 90-00530); Office of Naval Research (N00014-91-J-4100); Air Force Office of Scientific Research (90-0175
Succinct and Self-Indexed Data Structures for the Exploitation and Representation of Moving Objects
Programa Oficial de Doutoramento en Computación . 5009V01[Abstract]
This thesis deals with the efficient representation and exploitation of trajectories of
objects that move in space without any type of restriction (airplanes, birds, boats,
etc.). Currently, this is a very relevant problem due to the proliferation of GPS
devices, which makes it possible to collect a large number of trajectories. However,
until now there is no efficient way to properly store and exploit them.
In this thesis, we propose eight structures that meet two fundamental objectives.
First, they are capable of storing space-time data, describing the trajectories, in a
reduced space, so that their exploitation takes advantage of the memory hierarchy.
Second, those structures allow exploiting the information by object queries, given
an object, they retrieve the position or trajectory of that object along that time; or
space-time range queries, given a region of space and a time interval, the objects
that are within the region at that time are obtained. It should be noted that
state-of-the-art solutions are only capable of efficiently answering one of the two
types of queries.
All of these data structures have a common nexus, they all use two elements:
snapshots and logs. Each snapshot works as a spatial index that periodically indexes
the absolute position of each object or the Minimum Bounding Rectangle (MBR) of
its trajectory. They serve to speed up the spatio-temporal range queries. We have
implemented two types of snapshots: based on k2-trees or R-trees.
With respect to the log, it represents the trajectory (sequence of movements) of
each object. It is the main element of the structures, and facilitates the resolution
of object and spatio-temporal range queries. Four strategies have been implemented
to represent the log in a compressed form: ScdcCT, GraCT, ContaCT and RCT.
With the combination of these two elements we build eight different structures for
the representation of trajectories. All of them have been implemented and evaluated
experimentally, showing that they reduce the space required by traditional methods
by up to two orders of magnitude. Furthermore, they are all competitive in solving
object queries as well as spatial-temporal ones.[Resumen]
Esta tesis aborda la representación y explotación eficiente de trayectorias de objetos
que se mueven en el espacio sin ningún tipo de restricción (aviones, pájaros, barcos,
etc.). En la actualidad, este es un problema muy relevante debido a la proliferación
de dispositivos GPS, lo que permite coleccionar una gran cantidad de trayectorias.
Sin embargo, hasta ahora no existe un modo eficiente para almacenarlas y explotarlas
adecuadamente.
Esta tesis propone ocho estructuras que cumplen con dos objetivos fundamentales.
En primer lugar, son capaces de almacenar en espacio reducido los datos espaciotemporales,
que describen las trayectorias, de modo que su explotación saque partido
a la jerarquía de memoria.
En segundo lugar, las estructuras permiten explotar la información realizando
consultas sobre objetos, dado el objeto se calcula su posición o trayectoria durante
un intervalo de tiempo; o consultas de rango espacio-temporal, dada una región del
espacio y un intervalo de tiempo se obtienen los objetos que estaban dentro de la
región en ese tiempo. Hay que destacar que las soluciones del estado del arte solo
son capaces de responder eficientemente uno de los dos tipos de consultas.
Todas estas estructuras de datos tienen un nexo común, todas ellas usan dos
elementos: snapshots y logs. Cada snapshot funciona como un índice espacial que
periódicamente indexa la posición absoluta de cada objeto o el Minimum Bounding
Rectangle (MBR) de su trayectoria. Sirven para agilizar las consultas de rango
espacio-temporal. Hemos implementado dos tipos de snapshot: basadas en k2-trees
o en R-trees.
Con respecto al log, éste representa la trayectoria (secuencia de movimientos) de
cada objeto. Es el principal elemento de nuestras estructuras, y facilita la resolución
de consultas de objeto y de rango espacio-temporal. Se han implementado cuatro
estrategias para representar el log de forma comprimida: ScdcCT, GraCT, ContaCT
y RCT.
Con la combinación de estos dos elementos construimos ocho estructuras diferentes
para la representación de trayectorias. Todas ellas han sido implementadas y
evaluadas experimentalmente, donde reducen hasta dos órdenes de magnitud el
espacio que requieren los métodos tradicionales. Además, todas ellas son competitivas resolviendo tanto consultas de objeto como de rango espacio-temporal.[Resumo]
Esta tese trata sobre a representación e explotación eficiente de traxectorias de
obxectos que se moven no espazo sen ningún tipo de restrición (avións, paxaros,
buques, etc.). Na actualidade, este é un problema moi relevante debido á proliferación
de dispositivos GPS, o que fai posible a recollida dun gran número de traxectorias.
Non obstante, ata o de agora non existe un xeito eficiente de almacenalos e explotalos.
Esta tese propón oito estruturas que cumpren dous obxectivos fundamentais. En
primeiro lugar, son capaces de almacenar datos espazo-temporais, que describen
as traxectorias, nun espazo reducido, de xeito que a súa explotación aproveita a
xerarquía da memoria.
En segundo lugar, as estruturas permiten explotar a información realizando
consultas de obxectos, dado o obxecto calcúlase a súa posición ou traxectoria nun
período de tempo; ou consultas de rango espazo-temporal, dada unha rexión de
espazo e un intervalo de tempo, obtéñense os obxectos que estaban dentro da rexión
nese momento. Cómpre salientar que as solucións do estado do arte só son capaces
de responder eficientemente a un dos dous tipos de consultas.
Todas estas estruturas de datos teñen unha ligazón común, empregan dous
elementos: snapshots e logs. Cada snapshot funciona como un índice espacial que
indexa periodicamente a posición absoluta de cada obxecto ou o Minimum Bounding
Rectangle (MBR) da súa traxectoria. Serven para acelerar as consultas de rango
espazo-temporal. Implementamos dous tipos de snapshot: baseadas en k2-trees ou
en R-trees.
Con respecto ao log, este representa a traxectoria (secuencia de movementos) de
cada obxecto. É o principal elemento das nosas estruturas, e facilita a resolución
de consultas sobre obxectos e de rango espacio-temporal. Implementáronse catro
estratexias para representar o log nunha forma comprimida: ScdcCT, GraCT,
ContaCT e RCT.
Coa combinación destes dous elementos construímos oito estruturas diferentes
para a representación de traxectorias. Todas elas foron implementadas e avaliadas
experimentalmente, onde reducen ata dúas ordes de magnitude o espazo requirido
polos métodos tradicionais. Ademais, todas elas son competitivas para resolver tanto
consultas de obxectos como espazo-temporais
Bio : A Mulrimodal biometric authentication system for person identification and verification
Not availabl
Bayesian Classification and Regression with High Dimensional Features
This thesis responds to the challenges of using a large number, such as
thousands, of features in regression and classification problems.
There are two situations where such high dimensional features arise. One is
when high dimensional measurements are available, for example, gene expression
data produced by microarray techniques. For computational or other reasons,
people may select only a small subset of features when modelling such data, by
looking at how relevant the features are to predicting the response, based on
some measure such as correlation with the response in the training data.
Although it is used very commonly, this procedure will make the response appear
more predictable than it actually is. In Chapter 2, we propose a Bayesian
method to avoid this selection bias, with application to naive Bayes models and
mixture models.
High dimensional features also arise when we consider high-order
interactions. The number of parameters will increase exponentially with the
order considered. In Chapter 3, we propose a method for compressing a group of
parameters into a single one, by exploiting the fact that many predictor
variables derived from high-order interactions have the same values for all the
training cases. The number of compressed parameters may have converged before
considering the highest possible order. We apply this compression method to
logistic sequence prediction models and logistic classification models.
We use both simulated data and real data to test our methods in both
chapters.Comment: PhD Thesis Submitted to University of Toronto, 129 Page
Spatial Data Mining Analytical Environment for Large Scale Geospatial Data
Nowadays, many applications are continuously generating large-scale geospatial data. Vehicle GPS tracking data, aerial surveillance drones, LiDAR (Light Detection and Ranging), world-wide spatial networks, and high resolution optical or Synthetic Aperture Radar imagery data all generate a huge amount of geospatial data. However, as data collection increases our ability to process this large-scale geospatial data in a flexible fashion is still limited. We propose a framework for processing and analyzing large-scale geospatial and environmental data using a “Big Data” infrastructure. Existing Big Data solutions do not include a specific mechanism to analyze large-scale geospatial data. In this work, we extend HBase with Spatial Index(R-Tree) and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting framework has a robust capability to analyze large-scale geospatial data using spatial data mining and making its outputs available to end users
Computing MEMs and Relatives on Repetitive Text Collections
We consider the problem of computing the Maximal Exact Matches (MEMs) of a
given pattern on a large repetitive text collection ,
which is represented as a (hopefully much smaller) run-length context-free
grammar of size . We show that the problem can be solved in time , for any constant , on a data structure of size
. Further, on a locally consistent grammar of size
, the time decreases to . The value is a function of the substring
complexity of and is a tight lower
bound on the compressibility of repetitive texts , so our structure has
optimal size in terms of and . We extend our results to several
related problems, such as finding -MEMs, MUMs, rare MEMs, and applications
- …