11,364 research outputs found

    Mapping Intrinsic Electromechanical Responses at the Nanoscale via Sequential Excitation Scanning Probe Microscopy Empowered by Deep Data

    Full text link
    Ever increasing hardware capabilities and computation powers have made acquisition and analysis of big scientific data at the nanoscale routine, though much of the data acquired often turns out to be redundant, noisy, and/or irrelevant to the problems of interests, and it remains nontrivial to draw clear mechanistic insights from pure data analytics. In this work, we use scanning probe microscopy (SPM) as an example to demonstrate deep data methodology, transitioning from brute force analytics such as data mining, correlation analysis, and unsupervised classification to informed and/or targeted causative data analytics built on sound physical understanding. Three key ingredients of such deep data analytics are presented. A sequential excitation scanning probe microscopy (SE-SPM) technique is first adopted to acquire high quality, efficient, and physically relevant data, which can be easily implemented on any standard atomic force microscope (AFM). Brute force physical analysis is then carried out using simple harmonic oscillator (SHO) model, enabling us to derive intrinsic electromechanical coupling of interests. Finally, principal component analysis (PCA) is carried out, which not only speeds up the analysis by four orders of magnitude, but also allows a clear physical interpretation of its modes in combination with SHO analysis. A rough piezoelectric material has been probed using such strategy, enabling us to map its intrinsic electromechanical properties at the nanoscale with high fidelity, where conventional methods fail. The SE in combination with deep data methodology can be easily adapted for other SPM techniques to probe a wide range of functional phenomena at the nanoscale

    Web Content Classification: A Survey

    Full text link
    As the information contained within the web is increasing day by day, organizing this information could be a necessary requirement.The data mining process is to extract information from a data set and transform it into an understandable structure for further use. Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling.The uncontrolled type of nature of web content presents additional challenges to web page classification as compared to the traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. In this paper the web classification is discussed in detail and its importance in field of data mining is explored.Comment: 5 pages, 1 figure. arXiv admin note: text overlap with arXiv:1307.1024, arXiv:1310.4647 by other author

    Hyperspectral Image Classification in the Presence of Noisy Labels

    Full text link
    Label information plays an important role in supervised hyperspectral image classification problem. However, current classification methods all ignore an important and inevitable problem---labels may be corrupted and collecting clean labels for training samples is difficult, and often impractical. Therefore, how to learn from the database with noisy labels is a problem of great practical importance. In this paper, we study the influence of label noise on hyperspectral image classification, and develop a random label propagation algorithm (RLPA) to cleanse the label noise. The key idea of RLPA is to exploit knowledge (e.g., the superpixel based spectral-spatial constraints) from the observed hyperspectral images and apply it to the process of label propagation. Specifically, RLPA first constructs a spectral-spatial probability transfer matrix (SSPTM) that simultaneously considers the spectral similarity and superpixel based spatial information. It then randomly chooses some training samples as "clean" samples and sets the rest as unlabeled samples, and propagates the label information from the "clean" samples to the rest unlabeled samples with the SSPTM. By repeating the random assignment (of "clean" labeled samples and unlabeled samples) and propagation, we can obtain multiple labels for each training sample. Therefore, the final propagated label can be calculated by a majority vote algorithm. Experimental studies show that RLPA can reduce the level of noisy label and demonstrates the advantages of our proposed method over four major classifiers with a significant margin---the gains in terms of the average OA, AA, Kappa are impressive, e.g., 9.18%, 9.58%, and 0.1043. The Matlab source code is available at https://github.com/junjun-jiang/RLPAComment: Accepted by IEEE TGRS. In this version, the Table III is revise

    Active Learning for Crowd-Sourced Databases

    Full text link
    Crowd-sourcing has become a popular means of acquiring labeled data for a wide variety of tasks where humans are more accurate than computers, e.g., labeling images, matching objects, or analyzing sentiment. However, relying solely on the crowd is often impractical even for data sets with thousands of items, due to time and cost constraints of acquiring human input (which cost pennies and minutes per label). In this paper, we propose algorithms for integrating machine learning into crowd-sourced databases, with the goal of allowing crowd-sourcing applications to scale, i.e., to handle larger datasets at lower costs. The key observation is that, in many of the above tasks, humans and machine learning algorithms can be complementary, as humans are often more accurate but slow and expensive, while algorithms are usually less accurate, but faster and cheaper. Based on this observation, we present two new active learning algorithms to combine humans and algorithms together in a crowd-sourced database. Our algorithms are based on the theory of non-parametric bootstrap, which makes our results applicable to a broad class of machine learning models. Our results, on three real-life datasets collected with Amazon's Mechanical Turk, and on 15 well-known UCI data sets, show that our methods on average ask humans to label one to two orders of magnitude fewer items to achieve the same accuracy as a baseline that labels random images, and two to eight times fewer questions than previous active learning schemes.Comment: A shorter version of this manuscript has been published in Proceedings of Very Large Data Bases 2015, entitled "Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning

    Gaussian Processes Semantic Map Representation

    Full text link
    In this paper, we develop a high-dimensional map building technique that incorporates raw pixelated semantic measurements into the map representation. The proposed technique uses Gaussian Processes (GPs) multi-class classification for map inference and is the natural extension of GP occupancy maps from binary to multi-class form. The technique exploits the continuous property of GPs and, as a result, the map can be inferred with any resolution. In addition, the proposed GP Semantic Map (GPSM) learns the structural and semantic correlation from measurements rather than resorting to assumptions, and can flexibly learn the spatial correlation as well as any additional non-spatial correlation between map points. We extend the OctoMap to Semantic OctoMap representation and compare with the GPSM mapping performance using NYU Depth V2 dataset. Evaluations of the proposed technique on multiple partially labeled RGBD scans and labels from noisy image segmentation show that the GP semantic map can handle sparse measurements, missing labels in the point cloud, as well as noise corrupted labels.Comment: Accepted for RSS 2017 Workshop on Spatial-Semantic Representations in Robotic

    Combining Feature Reduction and Case Selection in Building CBR Classifiers

    Get PDF
    Abstract—CBR systems that are built for the classification problems are called CBR classifiers. This paper presents a novel and fast approach to building efficient and competent CBR classifiers that combines both feature reduction (FR) and case selection (CS). It has three central contributions: 1) it develops a fast rough-set method based on relative attribute dependency among features to compute the approximate reduct, 2) it constructs and compares different case selection methods based on the similarity measure and the concepts of case coverage and case reachability, and 3) CBR classifiers built using a combination of the FR and CS processes can reduce the training burden as well as the need to acquire domain knowledge. The overall experimental results demonstrating on four real-life data sets show that the combined FR and CS method can preserve, and may also improve, the solution accuracy while at the same time substantially reducing the storage space. The case retrieval time is also greatly reduced because the use of CBR classifier contains a smaller amount of cases with fewer features. The developed FR and CS combination method is also compared with the kernel PCA and SVMs techniques. Their storage requirement, classification accuracy, and classification speed are presented and discussed. Index Terms—Case-based reasoning, CBR classifier, case selection, feature reduction, k-NN principle, rough sets.

    Text-Independent Speaker Recognition for Low SNR Environments with Encryption

    Full text link
    Recognition systems are commonly designed to authenticate users at the access control levels of a system. A number of voice recognition methods have been developed using a pitch estimation process which are very vulnerable in low Signal to Noise Ratio (SNR) environments thus, these programs fail to provide the desired level of accuracy and robustness. Also, most text independent speaker recognition programs are incapable of coping with unauthorized attempts to gain access by tampering with the samples or reference database. The proposed text-independent voice recognition system makes use of multilevel cryptography to preserve data integrity while in transit or storage. Encryption and decryption follow a transform based approach layered with pseudorandom noise addition whereas for pitch detection, a modified version of the autocorrelation pitch extraction algorithm is used. The experimental results show that the proposed algorithm can decrypt the signal under test with exponentially reducing Mean Square Error over an increasing range of SNR. Further, it outperforms the conventional algorithms in actual identification tasks even in noisy environments. The recognition rate thus obtained using the proposed method is compared with other conventional methods used for speaker identification.Comment: Biometrics, Pattern Recognition, Security, Speaker Individuality, Text-independence, Pitch Extraction, Voice Recognition, Autocorrelation; Published by Foundation of Computer Science, New York, US

    Parsing Geometry Using Structure-Aware Shape Templates

    Full text link
    Real-life man-made objects often exhibit strong and easily-identifiable structure, as a direct result of their design or their intended functionality. Structure typically appears in the form of individual parts and their arrangement. Knowing about object structure can be an important cue for object recognition and scene understanding - a key goal for various AR and robotics applications. However, commodity RGB-D sensors used in these scenarios only produce raw, unorganized point clouds, without structural information about the captured scene. Moreover, the generated data is commonly partial and susceptible to artifacts and noise, which makes inferring the structure of scanned objects challenging. In this paper, we organize large shape collections into parameterized shape templates to capture the underlying structure of the objects. The templates allow us to transfer the structural information onto new objects and incomplete scans. We employ a deep neural network that matches the partial scan with one of the shape templates, then match and fit it to complete and detailed models from the collection. This allows us to faithfully label its parts and to guide the reconstruction of the scanned object. We showcase the effectiveness of our method by comparing it to other state-of-the-art approaches

    A Multi-Disciplinary Review of Knowledge Acquisition Methods: From Human to Autonomous Eliciting Agents

    Full text link
    This paper offers a multi-disciplinary review of knowledge acquisition methods in human activity systems. The review captures the degree of involvement of various types of agencies in the knowledge acquisition process, and proposes a classification with three categories of methods: the human agent, the human-inspired agent, and the autonomous machine agent methods. In the first two categories, the acquisition of knowledge is seen as a cognitive task analysis exercise, while in the third category knowledge acquisition is treated as an autonomous knowledge-discovery endeavour. The motivation for this classification stems from the continuous change over time of the structure, meaning and purpose of human activity systems, which are seen as the factor that fuelled researchers' and practitioners' efforts in knowledge acquisition for more than a century. We show through this review that the KA field is increasingly active due to the higher and higher pace of change in human activity, and conclude by discussing the emergence of a fourth category of knowledge acquisition methods, which are based on red-teaming and co-evolution

    HandSeg: An Automatically Labeled Dataset for Hand Segmentation from Depth Images

    Full text link
    We propose an automatic method for generating high-quality annotations for depth-based hand segmentation, and introduce a large-scale hand segmentation dataset. Existing datasets are typically limited to a single hand. By exploiting the visual cues given by an RGBD sensor and a pair of colored gloves, we automatically generate dense annotations for two hand segmentation. This lowers the cost/complexity of creating high quality datasets, and makes it easy to expand the dataset in the future. We further show that existing datasets, even with data augmentation, are not sufficient to train a hand segmentation algorithm that can distinguish two hands. Source and datasets will be made publicly available
    • …
    corecore