281 research outputs found

    Supervised Classification and Mathematical Optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data

    Supervised classification and mathematical optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data.Ministerio de Ciencia e InnovaciónJunta de Andalucí

    Integrated Development and Parallelization of Automated Dicentric Chromosome Identification Software to Expedite Biodosimetry Analysis

    Get PDF
    Manual cytogenetic biodosimetry lacks the ability to handle mass casualty events. We present an automated dicentric chromosome identification (ADCI) software utilizing parallel computing technology. A parallelization strategy combining data and task parallelism, as well as optimization of I/O operations, has been designed, implemented, and incorporated in ADCI. Experiments on an eight-core desktop show that our algorithm can expedite the process of ADCI by at least four folds. Experiments on Symmetric Computing, SHARCNET, Blue Gene/Q multi-processor computers demonstrate the capability of parallelized ADCI to process thousands of samples for cytogenetic biodosimetry in a few hours. This increase in speed underscores the effectiveness of parallelization in accelerating ADCI. Our software will be an important tool to handle the magnitude of mass casualty ionizing radiation events by expediting accurate detection of dicentric chromosomes

    Collision Avoidance on Unmanned Aerial Vehicles using Deep Neural Networks

    Get PDF
    Unmanned Aerial Vehicles (UAVs), although hardly a new technology, have recently gained a prominent role in many industries, being widely used not only among enthusiastic consumers but also in high demanding professional situations, and will have a massive societal impact over the coming years. However, the operation of UAVs is full of serious safety risks, such as collisions with dynamic obstacles (birds, other UAVs, or randomly thrown objects). These collision scenarios are complex to analyze in real-time, sometimes being computationally impossible to solve with existing State of the Art (SoA) algorithms, making the use of UAVs an operational hazard and therefore significantly reducing their commercial applicability in urban environments. In this work, a conceptual framework for both stand-alone and swarm (networked) UAVs is introduced, focusing on the architectural requirements of the collision avoidance subsystem to achieve acceptable levels of safety and reliability. First, the SoA principles for collision avoidance against stationary objects are reviewed. Afterward, a novel image processing approach that uses deep learning and optical flow is presented. This approach is capable of detecting and generating escape trajectories against potential collisions with dynamic objects. Finally, novel models and algorithms combinations were tested, providing a new approach for the collision avoidance of UAVs using Deep Neural Networks. The feasibility of the proposed approach was demonstrated through experimental tests using a UAV, created from scratch using the framework developed.Os veículos aéreos não tripulados (VANTs), embora dificilmente considerados uma nova tecnologia, ganharam recentemente um papel de destaque em muitas indústrias, sendo amplamente utilizados não apenas por amadores, mas também em situações profissionais de alta exigência, sendo expectável um impacto social massivo nos próximos anos. No entanto, a operação de VANTs está repleta de sérios riscos de segurança, como colisões com obstáculos dinâmicos (pássaros, outros VANTs ou objetos arremessados). Estes cenários de colisão são complexos para analisar em tempo real, às vezes sendo computacionalmente impossível de resolver com os algoritmos existentes, tornando o uso de VANTs um risco operacional e, portanto, reduzindo significativamente a sua aplicabilidade comercial em ambientes citadinos. Neste trabalho, uma arquitectura conceptual para VANTs autônomos e em rede é apresentada, com foco nos requisitos arquitetônicos do subsistema de prevenção de colisão para atingir níveis aceitáveis de segurança e confiabilidade. Os estudos presentes na literatura para prevenção de colisão contra objectos estacionários são revistos e uma nova abordagem é descrita. Esta tecnica usa técnicas de aprendizagem profunda e processamento de imagem, para realizar a prevenção de colisões em tempo real com objetos móveis. Por fim, novos modelos e combinações de algoritmos são propostos, fornecendo uma nova abordagem para evitar colisões de VANTs usando Redes Neurais Profundas. A viabilidade da abordagem foi demonstrada através de testes experimentais utilizando um VANT, desenvolvido a partir da arquitectura apresentada

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    Bayesian statistical approach for protein residue-residue contact prediction

    Get PDF
    Despite continuous efforts in automating experimental structure determination and systematic target selection in structural genomics projects, the gap between the number of known amino acid sequences and solved 3D structures for proteins is constantly widening. While DNA sequencing technologies are advancing at an extraordinary pace, thereby constantly increasing throughput while at the same time reducing costs, protein structure determination is still labour intensive, time-consuming and expensive. This trend illustrates the essential importance of complementary computational approaches in order to bridge the so-called sequence-structure gap. About half of the protein families lack structural annotation and therefore are not amenable to techniques that infer protein structure from homologs. These protein families can be addressed by de novo structure prediction approaches that in practice are often limited by the immense computational costs required to search the conformational space for the lowest-energy conformation. Improved predictions of contacts between amino acid residues have been demonstrated to sufficiently constrain the overall protein fold and thereby extend the applicability of de novo methods to larger proteins. Residue-residue contact prediction is based on the idea that selection pressure on protein structure and function can lead to compensatory mutations between spatially close residues. This leaves an echo of correlation signatures that can be traced down from the evolutionary record. Despite the success of contact prediction methods, there are several challenges. The most evident limitation lies in the requirement of deep alignments, which excludes the majority of protein families without associated structural information that are the focus for contact guided de novo structure prediction. The heuristics applied by current contact prediction methods pose another challenge, since they omit available coevolutionary information. This work presents two different approaches for addressing the limitations of contact prediction methods. Instead of inferring evolutionary couplings by maximizing the pseudo-likelihood, I maximize the full likelihood of the statistical model for protein sequence families. This approach performed with comparable precision up to minor improvements over the pseudo-likelihood methods for protein families with few homologous sequences. A Bayesian statistical approach has been developed that provides posterior probability estimates for residue-residue contacts and eradicates the use of heuristics. The full information of coevolutionary signatures is exploited by explicitly modelling the distribution of statistical couplings that reflects the nature of residue-residue interactions. Surprisingly, the posterior probabilities do not directly translate into more precise predictions than obtained by pseudo-likelihood methods combined with prior knowledge. However, the Bayesian framework offers a statistically clean and theoretically solid treatment for the contact prediction problem. This flexible and transparent framework provides a convenient starting point for further developments, such as integrating more complex prior knowledge. The model can also easily be extended towards the Derivation of probability estimates for residue-residue distances to enhance the precision of predicted structures

    Rank-based Decomposable Losses in Machine Learning: A Survey

    Full text link
    Recent works have revealed an essential paradigm in designing loss functions that differentiate individual losses vs. aggregate losses. The individual loss measures the quality of the model on a sample, while the aggregate loss combines individual losses/scores over each training sample. Both have a common procedure that aggregates a set of individual values to a single numerical value. The ranking order reflects the most fundamental relation among individual values in designing losses. In addition, decomposability, in which a loss can be decomposed into an ensemble of individual terms, becomes a significant property of organizing losses/scores. This survey provides a systematic and comprehensive review of rank-based decomposable losses in machine learning. Specifically, we provide a new taxonomy of loss functions that follows the perspectives of aggregate loss and individual loss. We identify the aggregator to form such losses, which are examples of set functions. We organize the rank-based decomposable losses into eight categories. Following these categories, we review the literature on rank-based aggregate losses and rank-based individual losses. We describe general formulas for these losses and connect them with existing research topics. We also suggest future research directions spanning unexplored, remaining, and emerging issues in rank-based decomposable losses.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Mathematical optimization for the visualization of complex datasets

    Get PDF
    This PhD dissertation focuses on developing new Mathematical Optimization models and solution approaches which help to gain insight into complex data structures arising in Information Visualization. The approaches developed in this thesis merge concepts from Multivariate Data Analysis and Mathematical Optimization, bridging theoretical mathematics with real life problems. The usefulness of Information Visualization lies with its power to improve interpretability and decision making from the unknown phenomena described by raw data, as fully discussed in Chapter 1. In particular, datasets involving frequency distributions and proximity relations, which even might vary over the time, are the ones studied in this thesis. Frameworks to visualize such enclosed information, which make use of Mixed Integer (Non)linear Programming and Difference of Convex tools, are formally proposed. Algorithmic approaches such as Large Neighborhood Search or Difference of Convex Algorithm enable us to develop matheuristics to handle such models. More specifically, Chapter 2 addresses the problem of visualizing a frequency distribution and an adjacency relation attached to a set of individuals. This information is represented using a rectangular map, i.e., a subdivision of a rectangle into rectangular portions so that their areas reflect the frequencies, and the adjacencies between portions represent the adjacencies between the individuals. The visualization problem is formulated as a Mixed Integer Linear Programming model, and a matheuristic that has this model at its heart is proposed. Chapter 3 generalizes the model presented in the previous chapter by developing a visualization framework which handles simultaneously the representation of a frequency distribution and a dissimilarity relation. This framework consists of a partition of a given rectangle into piecewise rectangular portions so that the areas of the regions represent the frequencies and the distances between them represent the dissimilarities. This visualization problem is formally stated as a Mixed Integer Nonlinear Programming model, which is solved by means of a matheuristic based on Large Neighborhood Search. Contrary to previous chapters in which a partition of the visualization region is sought, Chapter 4 addresses the problem of visualizing a set of individuals, which has attached a dissimilarity measure and a frequency distribution, without necessarily cov-ering the visualization region. In this visualization problem individuals are depicted as convex bodies whose areas are proportional to the given frequencies. The aim is to determine the location of the convex bodies in the visualization region. In order to solve this problem, which generalizes the standard Multidimensional Scaling, Difference of Convex tools are used. In Chapter 5, the model stated in the previous chapter is extended to the dynamic case, namely considering that frequencies and dissimilarities are observed along a set of time periods. The solution approach combines Difference of Convex techniques with Nonconvex Quadratic Binary Optimization. All the approaches presented are tested in real datasets. Finally, Chapter 6 closes this thesis with general conclusions and future lines of research.Esta tesis se centra en desarrollar nuevos modelos y algoritmos basados en la Optimización Matemática que ayuden a comprender estructuras de datos complejas frecuentes en el área de Visualización de la Información. Las metodologías propuestas fusionan conceptos de Análisis de Datos Multivariantes y de Optimización Matemática, aunando las matemáticas teóricas con problemas reales. Como se analiza en el Capítulo 1, una adecuada visualización de los datos ayuda a mejorar la interpretabilidad de los fenómenos desconocidos que describen, así como la toma de decisiones. Concretamente, esta tesis se centra en visualizar datos que involucran distribuciones de frecuencias y relaciones de proximidad, pudiendo incluso ambas variar a lo largo del tiempo. Se proponen diferentes herramientas para visualizar dicha información, basadas tanto en la Optimización (No) Lineal Entera Mixta como en la optimización de funciones Diferencia de Convexas. Además, metodologías como la Búsqueda por Entornos Grandes y el Algoritmo DCA permiten el desarrollo de mateheurísticas para resolver dichos modelos. Concretamente, el Capítulo 2 trata el problema de visualizar simultáneamente una distribución de frequencias y una relación de adyacencias en un conjunto de individuos. Esta información se representa a través de un mapa rectangular, es decir, una subdivisión de un rectángulo en porciones rectangulares, de manera que las áreas de estas porciones representen las frecuencias y las adyacencias entre las porciones representen las adyacencias entre los individuos. Este problema de visualización se formula con la ayuda de la Optimización Lineal Entera Mixta. Además, se propone una mateheurística basada en este modelo como método de resolución. En el Capítulo 3 se generaliza el modelo presentado en el capítulo anterior, construyendo una herramienta que permite visualizar simultáneamente una distribución de frecuencias y una relación de disimilaridades. Dicha visualización se realiza mediante la partición de un rectángulo en porciones rectangulares a trozos de manera que el área de las porciones refleje la distribución de frecuencias y las distancias entre las mismas las disimilaridades. Se plantea un modelo No Lineal Entero Mixto para este problema de visualización, que es resuelto a través de una mateheurística basada en la Búsqueda por Entornos Grandes. En contraposición a los capítulos anteriores, en los que se busca una partición de la región de visualización, el Capítulo 4 trata el problema de representar una distribución de frecuencias y una relación de disimilaridad sobre un conjunto de individuos, sin forzar a que haya que recubrir dicha región de visualización. En este modelo de visualización los individuos son representados como cuerpos convexos cuyas áreas son proporcionales a las frecuencias dadas. El objetivo es determinar la localización de dichos cuerpos convexos dentro de la región de visualización. Para resolver este problema, que generaliza el tradicional Escalado Multidimensional, se utilizan técnicas de optimización basadas en funciones Diferencia de Convexas. En el Capítulo 5, se extiende el modelo desarrollado en el capítulo anterior para el caso en el que los datos son dinámicos, es decir, las frecuencias y disimilaridades se observan a lo largo de varios instantes de tiempo. Se emplean técnicas de optimización de funciones Diferencias de Convexas así como Optimización Cuadrática Binaria No Convexa para la resolución del modelo. Todas las metodologías propuestas han sido testadas en datos reales. Finalmente, el Capítulo 6 contiene las conclusiones a esta tesis, así como futuras líneas de investigación.Premio Extraordinario de Doctorado U

    State-of-the-art Assessment For Simulated Forces

    Get PDF
    Summary of the review of the state of the art in simulated forces conducted to support the research objectives of Research and Development for Intelligent Simulated Forces
    • …
    corecore