137 research outputs found

    Multivariate Mixed Data Mining with Gifi System using Genetic Algorithm and Information Complexity

    Get PDF
    Statistical analysis is very much dependent on the quality and type of a data set. There are three types of data - continuous, categorical and mixed. Of these three types, statistical modeling on a mixed data had been a challenging job for a long time. This is due to the fact that most of the traditional statistical techniques are defined either for purely continuous data or for purely categorical data but not mixed data. In reality, most of the data sets are neither continuous nor categorical in a pure sense but are in mixed form which makes the statistical analysis quite difficult. For instance, in the medical sector where classification of the data is very important, presence of many categorical and continuous predictors results in a poor model. In the insurance and finance sectors, lots of categorical and continuous data are collected on customers for targeted marketing, detection of suspicious insurance claims, actuarial modeling, risk analysis, modeling of financial derivatives, detection of profitable zones etc. In this work, we bring together several relatively new developments in statistical model selection and data mining. In this work, we address two problems. The first problem is to determine the optimal number of mixtures from a multivariate Bernoulli distributed data using genetic algorithm and Bozdogan\u27s information complexity, ICOMP. We show that the results of the maximum likelihood values are not just sufficient in determining the optimal number of mixtures. We also address the issue of high dimensional binary data using a genetic algorithm to determine the optimal predictors. Finally, we show the results of our algorithm on a simulated and two real data sets. The second problem is to discovering interesting patterns from a complicated mixed data set. Since mixed data are a combination of continuous and categorical variables, we trans- form the non linear categorical variables to a linear scale by a mechanism called Gifi transformation, [Gifi, 1989]. Once the non linear variables are transformed to a linear scale (Euclidean space), we apply several classical multivariate techniques on the transformed continuous data to identify the unusual patterns. The advantage with this transformation is that it has a one-to-one mapping mechanism. Hence, the transformed set of continuous value(s) in the Gifi space can be remapped to a unique set of categorical value(s) in the original space. Once the data is transformed to the Gifi space, we implement various statistical techniques to identify interesting patterns. We also address the problem of high dimensional data using genetic algorithm for variable selection and Bozdogan\u27s information complexity (ICOMP) as our fitness function. We present details of our newly-developed Matlab toolbox, called Gifi System, that implements everything presented, and can readily be extended to add new functionality. Finally, results on both simulated and real world data sets are presented and discussed. Keywords: Gifi, homals, regression, multivariate logistic regression, fraud detection, medical diagnostics, supervised classification, unsupervised classification, variable selection, high dimensional data mining, stock market trading, detection of suspicious insurance claim estimates

    Reconhecimento de ações em vídeos baseado na fusão de representações de ritmos visuais

    Get PDF
    Orientadores: Hélio Pedrini, David Menotti GomesTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Avanços nas tecnologias de captura e armazenamento de vídeos têm promovido uma grande demanda pelo reconhecimento automático de ações. O uso de câmeras para propó- sitos de segurança e vigilância tem aplicações em vários cenários, tais coomo aeroportos, parques, bancos, estações, estradas, hospitais, supermercados, indústrias, estádios, escolas. Uma dificuldade inerente ao problema é a complexidade da cena sob condições habituais de gravação, podendo conter fundo complexo e com movimento, múltiplas pes- soas na cena, interações com outros atores ou objetos e movimentos de câmera. Bases de dados mais recentes são construídas principalmente com gravações compartilhadas no YouTube e com trechos de filmes, situações em que não se restringem esses obstáculos. Outra dificuldade é o impacto da dimensão temporal, pois ela infla o tamanho dos da- dos, aumentando o custo computacional e o espaço de armazenamento. Neste trabalho, apresentamos uma metodologia de descrição de volumes utilizando a representação de Ritmos Visuais (VR). Esta técnica remodela o volume original do vídeo em uma imagem, em que se computam descritores bidimensionais. Investigamos diferentes estratégias para construção do ritmo visual, combinando configurações em diversos domínios de imagem e direções de varredura dos quadros. A partir disso, propomos dois métodos de extração de características originais, denominados Naïve Visual Rhythm (Naïve VR) e Visual Rhythm Trajectory Descriptor (VRTD). A primeira abordagem é a aplicação direta da técnica no volume de vídeo original, formando um descritor holístico que considera os eventos da ação como padrões e formatos na imagem de ritmo visual. A segunda variação foca na análise de pequenas vizinhanças obtidas a partir do processo das trajetórias densas, que permite que o algoritmo capture detalhes despercebidos pela descrição global. Testamos a nossa proposta em oito bases de dados públicas, sendo uma de gestos (SKIG), duas em primeira pessoa (DogCentric e JPL), e cinco em terceira pessoa (Weizmann, KTH, MuHAVi, UCF11 e HMDB51). Os resultados mostram que a técnica empregada é capaz de extrair elementos de movimento juntamente com informações de formato e de aparência, obtendo taxas de acurácia competitivas comparadas com o estado da arteAbstract: Advances in video acquisition and storage technologies have promoted a great demand for automatic recognition of actions. The use of cameras for security and surveillance purposes has applications in several scenarios, such as airports, parks, banks, stations, roads, hospitals, supermarkets, industries, stadiums, schools. An inherent difficulty of the problem is the complexity of the scene under usual recording conditions, which may contain complex background and motion, multiple people on the scene, interactions with other actors or objects, and camera motion. Most recent databases are built primarily with shared recordings on YouTube and with snippets of movies, situations where these obstacles are not restricted. Another difficulty is the impact of the temporal dimension since it expands the size of the data, increasing computational cost and storage space. In this work, we present a methodology of volume description using the Visual Rhythm (VR) representation. This technique reshapes the original volume of the video into an image, where two-dimensional descriptors are computed. We investigated different strategies for constructing the representation by combining configurations in several image domains and traversing directions of the video frames. From this, we propose two feature extraction methods, Naïve Visual Rhythm (Naïve VR) and Visual Rhythm Trajectory Descriptor (VRTD). The first approach is the straightforward application of the technique in the original video volume, forming a holistic descriptor that considers action events as patterns and formats in the visual rhythm image. The second variation focuses on the analysis of small neighborhoods obtained from the process of dense trajectories, which allows the algorithm to capture details unnoticed by the global description. We tested our methods in eight public databases, one of hand gestures (SKIG), two in first person (DogCentric and JPL), and five in third person (Weizmann, KTH, MuHAVi, UCF11 and HMDB51). The results show that the developed techniques are able to extract motion elements along with format and appearance information, achieving competitive accuracy rates compared to state-of-the-art action recognition approachesDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2015/03156-7FAPES

    VLSI Design

    Get PDF
    This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc

    Cluster analysis based on density estimates and its application to LANDSAT imagery

    Get PDF
    Includes bibliographical footnotes.This study was funded partly by the Federation of Rocky Mountain States, the U.S. Army Corps of Engineers, St. Paul District, Contract no. DAC 37-77-C-0133 and the Colorado State University Experiment Station 107

    The palaeontology of Ediacaran Avalonia: new insights using morphometrics and multivariate statistical analyses

    Get PDF
    The Avalonian Ediacaran fossil assemblage of Newfoundland, Canada contains abundant fossils with a wide range of morphologies and preservational styles. Quantitative morphological and statistical analysis in Ediacaran fossil assemblages has recently been used to recognize natural morphological groupings, providing evidence for variability within and between taxa. This approach is first used herein to test the grouping of the serially arranged, millimeterscale chambered organism known as Palaeopascichnus. The combined morphometric and statistical analytical approach was applied to collected specimens from Ferryland, and demonstrates constrained, discrete growth patterns. The same technique was used to compare fossil palaeopascichnids with extant Protista, which has supported the protistan affinity for the hitherto enigmatic palaeopascichnids. This thesis also statistically investigates an Ediacaran taxonomic dispute known as the Beothukis/Culmofrons problem. The two taxa (Beothukis mistakensis and Culmofrons plumosa) were established separately, but were later synonymized. To determine the validity of this taxonomic reassignment, this thesis investigates the clustering of specimens based on their morphology and morphometrics and assesses the validity of certain taxonomic characters within the specimen dataset. These findings validate the original genus-level differentiation of Beothukis and Culmofrons, while also showing evidence for previously unrecognized variation within the genus Beothukis. Overall, this technique has led to the finding that more morphotypes may exist within the Ediacaran biota than originally thought, and proves the utility of detailed statistical and morphological analysis in determining morphological diversity and disparity

    Biometric Systems

    Get PDF
    Biometric authentication has been widely used for access control and security systems over the past few years. The purpose of this book is to provide the readers with life cycle of different biometric authentication systems from their design and development to qualification and final application. The major systems discussed in this book include fingerprint identification, face recognition, iris segmentation and classification, signature verification and other miscellaneous systems which describe management policies of biometrics, reliability measures, pressure based typing and signature verification, bio-chemical systems and behavioral characteristics. In summary, this book provides the students and the researchers with different approaches to develop biometric authentication systems and at the same time includes state-of-the-art approaches in their design and development. The approaches have been thoroughly tested on standard databases and in real world applications

    Statistical Modelling

    Get PDF
    The book collects the proceedings of the 19th International Workshop on Statistical Modelling held in Florence on July 2004. Statistical modelling is an important cornerstone in many scientific disciplines, and the workshop has provided a rich environment for cross-fertilization of ideas from different disciplines. It consists in four invited lectures, 48 contributed papers and 47 posters. The contributions are arranged in sessions: Statistical Modelling; Statistical Modelling in Genomics; Semi-parametric Regression Models; Generalized Linear Mixed Models; Correlated Data Modelling; Missing Data, Measurement of Error and Survival Analysis; Spatial Data Modelling and Time Series and Econometrics

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum
    corecore