85 research outputs found

    A review of population-based metaheuristics for large-scale black-box global optimization: Part A

    Get PDF
    Scalability of optimization algorithms is a major challenge in coping with the ever growing size of optimization problems in a wide range of application areas from high-dimensional machine learning to complex large-scale engineering problems. The field of large-scale global optimization is concerned with improving the scalability of global optimization algorithms, particularly population-based metaheuristics. Such metaheuristics have been successfully applied to continuous, discrete, or combinatorial problems ranging from several thousand dimensions to billions of decision variables. In this two-part survey, we review recent studies in the field of large-scale black-box global optimization to help researchers and practitioners gain a bird’s-eye view of the field, learn about its major trends, and the state-of-the-art algorithms. Part of the series covers two major algorithmic approaches to large-scale global optimization: problem decomposition and memetic algorithms. Part of the series covers a range of other algorithmic approaches to large-scale global optimization, describes a wide range of problem areas, and finally touches upon the pitfalls and challenges of current research and identifies several potential areas for future research

    Learning from Structured Data with High Dimensional Structured Input and Output Domain

    Get PDF
    Structured data is accumulated rapidly in many applications, e.g. Bioinformatics, Cheminformatics, social network analysis, natural language processing and text mining. Designing and analyzing algorithms for handling these large collections of structured data has received significant interests in data mining and machine learning communities, both in the input and output domain. However, it is nontrivial to adopt traditional machine learning algorithms, e.g. SVM, linear regression to structured data. For one thing, the structural information in the input domain and output domain is ignored if applying the normal algorithms to structured data. For another, the major challenge in learning from many high-dimensional structured data is that input/output domain can contain tens of thousands even larger number of features and labels. With the high dimensional structured input space and/or structured output space, learning a low dimensional and consistent structured predictive function is important for both robustness and interpretability of the model. In this dissertation, we will present a few machine learning models that learn from the data with structured input features and structured output tasks. For learning from the data with structured input features, I have developed structured sparse boosting for graph classification, structured joint sparse PCA for anomaly detection and localization. Besides learning from structured input, I also investigated the interplay between structured input and output under the context of multi-task learning. In particular, I designed a multi-task learning algorithms that performs structured feature selection & task relationship Inference. We will demonstrate the applications of these structured models on subgraph based graph classification, networked data stream anomaly detection/localization, multiple cancer type prediction, neuron activity prediction and social behavior prediction. Finally, through my intern work at IBM T.J. Watson Research, I will demonstrate how to leverage structural information from mobile data (e.g. call detail record and GPS data) to derive important places from people's daily life for transit optimization and urban planning

    Data portability for activities of daily living and fall detection in different environments using radar micro-doppler

    Get PDF
    The health status of an older or vulnerable person can be determined by looking into the additive effects of aging as well as any associated diseases. This status can lead the person to a situation of ‘unstable incapacity’ for normal aging and is determined by the decrease in response to the environment and to specific pathologies with apparent decrease of independence in activities of daily living (ADL). In this paper, we use micro-Doppler images obtained using a frequency-modulated continuous wave radar (FMCW) operating at 5.8 GHz with 400 MHz bandwidth as the sensor to perform assessment of this health status. The core idea is to develop a generalized system where the data obtained for ADL can be portable across different environments and groups of subjects, and critical events such as falls in mature individuals can be detected. In this context, we have conducted comprehensive experimental campaigns at nine different locations including four laboratory environments and five elderly care homes. A total of 99 subjects participated in the experiments where 1453 micro-Doppler signatures were recorded for six activities. Different machine learning, deep learning algorithms and transfer learning technique were used to classify the ADL. The support vector machine (SVM), K-nearest neighbor (KNN) and convolutional neural network (CNN) provided adequate classification accuracies for particular scenarios; however, the autoencoder neural network outperformed the mentioned classifiers by providing classification accuracy of ~ 88%. The proposed system for fall detection in elderly people can be deployed in care centers and is application for any indoor settings with various age group of people. For future work, we would focus on monitoring multiple older adults, concurrently in indoor settings using continuous radar sensor data stream which is limitation of the present system

    A New Approach to Investigate the Association between Brain Functional Connectivity and Disease Characteristics of Attention-Deficit/Hyperactivity Disorder: Topological Neuroimaging Data Analysis

    Get PDF
    BACKGROUND: Attention-deficit/hyperactivity disorder (ADHD) is currently diagnosed by a diagnostic interview, mainly based on subjective reports from parents or teachers. It is necessary to develop methods that rely on objectively measureable neurobiological data to assess brain-behavior relationship in patients with ADHD. We investigated the application of a topological data analysis tool, Mapper, to analyze the brain functional connectivity data from ADHD patients. METHODS: To quantify the disease severity using the neuroimaging data, the decomposition of individual functional networks into normal and disease components by the healthy state model (HSM) was performed, and the magnitude of the disease component (MDC) was computed. Topological data analysis using Mapper was performed to distinguish children with ADHD (n = 196) from typically developing controls (TDC) (n = 214). RESULTS: In the topological data analysis, the partial clustering results of patients with ADHD and normal subjects were shown in a chain-like graph. In the correlation analysis, the MDC showed a significant increase with lower intelligence scores in TDC. We also found that the rates of comorbidity in ADHD significantly increased when the deviation of the functional connectivity from HSM was large. In addition, a significant correlation between ADHD symptom severity and MDC was found in part of the dataset. CONCLUSIONS: The application of HSM and topological data analysis methods in assessing the brain functional connectivity seem to be promising tools to quantify ADHD symptom severity and to reveal the hidden relationship between clinical phenotypic variables and brain connectivity.ope

    An Approach for the Customized High-Dimensional Segmentation of Remote Sensing Hyperspectral Images

    Get PDF
    Abstract: This paper addresses three problems in the field of hyperspectral image segmentation: the fact that the way an image must be segmented is related to what the user requires and the application; the lack and cost of appropriately labeled reference images; and, finally, the information loss problem that arises in many algorithms when high dimensional images are projected onto lower dimensional spaces before starting the segmentation process. To address these issues, the Multi-Gradient based Cellular Automaton (MGCA) structure is proposed to segment multidimensional images without projecting them to lower dimensional spaces. The MGCA structure is coupled with an evolutionary algorithm (ECAS-II) in order to produce the transition rule sets required by MGCA segmenters. These sets are customized to specific segmentation needs as a function of a set of low dimensional training images in which the user expresses his segmentation requirements. Constructing high dimensional image segmenters from low dimensional training sets alleviates the problem of lack of labeled training images. These can be generated online based on a parametrization of the desired segmentation extracted from a set of examples. The strategy has been tested in experiments carried out using synthetic and real hyperspectral images, and it has been compared to state-of-the-art segmentation approaches over benchmark images in the area of remote sensing hyperspectral imaging.Ministerio de Economía y competitividad; TIN2015-63646-C5-1-RMinisterio de Economía y competitividad; RTI2018-101114-B-I00Xunta de Galicia: ED431C 2017/1

    On Motion Parameterizations in Image Sequences from Fixed Viewpoints

    Get PDF
    This dissertation addresses the problem of parameterizing object motion within a set of images taken with a stationary camera. We develop data-driven methods across all image scales: characterizing motion observed at the scale of individual pixels, along extended structures such as roads, and whole image deformations such as lungs deforming over time. The primary contributions include: a) fundamental studies of the relationship between spatio-temporal image derivatives accumulated at a pixel, and the object motions at that pixel,: b) data driven approaches to parameterize breath motion and reconstruct lung CT data volumes, and: c) defining and offering initial results for a new class of Partially Unsupervised Manifold Learning: PUML) problems, which often arise in medical imagery. Specifically, we create energy functions for measuring how consistent a given velocity vector is with observed spatio-temporal image derivatives. These energy functions are used to fit parametric snake models to roads using velocity constraints. We create an automatic data-driven technique for finding the breath phase of lung CT scans which is able to replace external belt measurements currently in use clinically. This approach is extended to automatically create a full deformation model of a CT lung volume during breathing or heart MRI during breathing and heartbeat. Additionally, motivated by real use cases, we address a scenario in which a dataset is collected along with meta-data which describes some, but not all, aspects of the dataset. We create an embedding which displays the remaining variability in a dataset after accounting for variability related to the meta-data

    Preserving Trustworthiness and Confidentiality for Online Multimedia

    Get PDF
    Technology advancements in areas of mobile computing, social networks, and cloud computing have rapidly changed the way we communicate and interact. The wide adoption of media-oriented mobile devices such as smartphones and tablets enables people to capture information in various media formats, and offers them a rich platform for media consumption. The proliferation of online services and social networks makes it possible to store personal multimedia collection online and share them with family and friends anytime anywhere. Considering the increasing impact of digital multimedia and the trend of cloud computing, this dissertation explores the problem of how to evaluate trustworthiness and preserve confidentiality of online multimedia data. The dissertation consists of two parts. The first part examines the problem of evaluating trustworthiness of multimedia data distributed online. Given the digital nature of multimedia data, editing and tampering of the multimedia content becomes very easy. Therefore, it is important to analyze and reveal the processing history of a multimedia document in order to evaluate its trustworthiness. We propose a new forensic technique called ``Forensic Hash", which draws synergy between two related research areas of image hashing and non-reference multimedia forensics. A forensic hash is a compact signature capturing important information from the original multimedia document to assist forensic analysis and reveal processing history of a multimedia document under question. Our proposed technique is shown to have the advantage of being compact and offering efficient and accurate analysis to forensic questions that cannot be easily answered by convention forensic techniques. The answers that we obtain from the forensic hash provide valuable information on the trustworthiness of online multimedia data. The second part of this dissertation addresses the confidentiality issue of multimedia data stored with online services. The emerging cloud computing paradigm makes it attractive to store private multimedia data online for easy access and sharing. However, the potential of cloud services cannot be fully reached unless the issue of how to preserve confidentiality of sensitive data stored in the cloud is addressed. In this dissertation, we explore techniques that enable confidentiality-preserving search of encrypted multimedia, which can play a critical role in secure online multimedia services. Techniques from image processing, information retrieval, and cryptography are jointly and strategically applied to allow efficient rank-ordered search over encrypted multimedia database and at the same time preserve data confidentiality against malicious intruders and service providers. We demonstrate high efficiency and accuracy of the proposed techniques and provide a quantitative comparative study with conventional techniques based on heavy-weight cryptography primitives
    corecore