11,364 research outputs found
Mapping Intrinsic Electromechanical Responses at the Nanoscale via Sequential Excitation Scanning Probe Microscopy Empowered by Deep Data
Ever increasing hardware capabilities and computation powers have made
acquisition and analysis of big scientific data at the nanoscale routine,
though much of the data acquired often turns out to be redundant, noisy, and/or
irrelevant to the problems of interests, and it remains nontrivial to draw
clear mechanistic insights from pure data analytics. In this work, we use
scanning probe microscopy (SPM) as an example to demonstrate deep data
methodology, transitioning from brute force analytics such as data mining,
correlation analysis, and unsupervised classification to informed and/or
targeted causative data analytics built on sound physical understanding. Three
key ingredients of such deep data analytics are presented. A sequential
excitation scanning probe microscopy (SE-SPM) technique is first adopted to
acquire high quality, efficient, and physically relevant data, which can be
easily implemented on any standard atomic force microscope (AFM). Brute force
physical analysis is then carried out using simple harmonic oscillator (SHO)
model, enabling us to derive intrinsic electromechanical coupling of interests.
Finally, principal component analysis (PCA) is carried out, which not only
speeds up the analysis by four orders of magnitude, but also allows a clear
physical interpretation of its modes in combination with SHO analysis. A rough
piezoelectric material has been probed using such strategy, enabling us to map
its intrinsic electromechanical properties at the nanoscale with high fidelity,
where conventional methods fail. The SE in combination with deep data
methodology can be easily adapted for other SPM techniques to probe a wide
range of functional phenomena at the nanoscale
Web Content Classification: A Survey
As the information contained within the web is increasing day by day,
organizing this information could be a necessary requirement.The data mining
process is to extract information from a data set and transform it into an
understandable structure for further use. Classification of web page content is
essential to many tasks in web information retrieval such as maintaining web
directories and focused crawling.The uncontrolled type of nature of web content
presents additional challenges to web page classification as compared to the
traditional text classification, but the interconnected nature of hypertext
also provides features that can assist the process. In this paper the web
classification is discussed in detail and its importance in field of data
mining is explored.Comment: 5 pages, 1 figure. arXiv admin note: text overlap with
arXiv:1307.1024, arXiv:1310.4647 by other author
Hyperspectral Image Classification in the Presence of Noisy Labels
Label information plays an important role in supervised hyperspectral image
classification problem. However, current classification methods all ignore an
important and inevitable problem---labels may be corrupted and collecting clean
labels for training samples is difficult, and often impractical. Therefore, how
to learn from the database with noisy labels is a problem of great practical
importance. In this paper, we study the influence of label noise on
hyperspectral image classification, and develop a random label propagation
algorithm (RLPA) to cleanse the label noise. The key idea of RLPA is to exploit
knowledge (e.g., the superpixel based spectral-spatial constraints) from the
observed hyperspectral images and apply it to the process of label propagation.
Specifically, RLPA first constructs a spectral-spatial probability transfer
matrix (SSPTM) that simultaneously considers the spectral similarity and
superpixel based spatial information. It then randomly chooses some training
samples as "clean" samples and sets the rest as unlabeled samples, and
propagates the label information from the "clean" samples to the rest unlabeled
samples with the SSPTM. By repeating the random assignment (of "clean" labeled
samples and unlabeled samples) and propagation, we can obtain multiple labels
for each training sample. Therefore, the final propagated label can be
calculated by a majority vote algorithm. Experimental studies show that RLPA
can reduce the level of noisy label and demonstrates the advantages of our
proposed method over four major classifiers with a significant margin---the
gains in terms of the average OA, AA, Kappa are impressive, e.g., 9.18%, 9.58%,
and 0.1043. The Matlab source code is available at
https://github.com/junjun-jiang/RLPAComment: Accepted by IEEE TGRS. In this version, the Table III is revise
Active Learning for Crowd-Sourced Databases
Crowd-sourcing has become a popular means of acquiring labeled data for a
wide variety of tasks where humans are more accurate than computers, e.g.,
labeling images, matching objects, or analyzing sentiment. However, relying
solely on the crowd is often impractical even for data sets with thousands of
items, due to time and cost constraints of acquiring human input (which cost
pennies and minutes per label). In this paper, we propose algorithms for
integrating machine learning into crowd-sourced databases, with the goal of
allowing crowd-sourcing applications to scale, i.e., to handle larger datasets
at lower costs. The key observation is that, in many of the above tasks, humans
and machine learning algorithms can be complementary, as humans are often more
accurate but slow and expensive, while algorithms are usually less accurate,
but faster and cheaper.
Based on this observation, we present two new active learning algorithms to
combine humans and algorithms together in a crowd-sourced database. Our
algorithms are based on the theory of non-parametric bootstrap, which makes our
results applicable to a broad class of machine learning models. Our results, on
three real-life datasets collected with Amazon's Mechanical Turk, and on 15
well-known UCI data sets, show that our methods on average ask humans to label
one to two orders of magnitude fewer items to achieve the same accuracy as a
baseline that labels random images, and two to eight times fewer questions than
previous active learning schemes.Comment: A shorter version of this manuscript has been published in
Proceedings of Very Large Data Bases 2015, entitled "Scaling Up
Crowd-Sourcing to Very Large Datasets: A Case for Active Learning
Gaussian Processes Semantic Map Representation
In this paper, we develop a high-dimensional map building technique that
incorporates raw pixelated semantic measurements into the map representation.
The proposed technique uses Gaussian Processes (GPs) multi-class classification
for map inference and is the natural extension of GP occupancy maps from binary
to multi-class form. The technique exploits the continuous property of GPs and,
as a result, the map can be inferred with any resolution. In addition, the
proposed GP Semantic Map (GPSM) learns the structural and semantic correlation
from measurements rather than resorting to assumptions, and can flexibly learn
the spatial correlation as well as any additional non-spatial correlation
between map points. We extend the OctoMap to Semantic OctoMap representation
and compare with the GPSM mapping performance using NYU Depth V2 dataset.
Evaluations of the proposed technique on multiple partially labeled RGBD scans
and labels from noisy image segmentation show that the GP semantic map can
handle sparse measurements, missing labels in the point cloud, as well as noise
corrupted labels.Comment: Accepted for RSS 2017 Workshop on Spatial-Semantic Representations in
Robotic
Combining Feature Reduction and Case Selection in Building CBR Classifiers
Abstract—CBR systems that are built for the classification problems are called CBR classifiers. This paper presents a novel and fast approach to building efficient and competent CBR classifiers that combines both feature reduction (FR) and case selection (CS). It has three central contributions: 1) it develops a fast rough-set method based on relative attribute dependency among features to compute the approximate reduct, 2) it constructs and compares different case selection methods based on the similarity measure and the concepts of case coverage and case reachability, and 3) CBR classifiers built using a combination of the FR and CS processes can reduce the training burden as well as the need to acquire domain knowledge. The overall experimental results demonstrating on four real-life data sets show that the combined FR and CS method can preserve, and may also improve, the solution accuracy while at the same time substantially reducing the storage space. The case retrieval time is also greatly reduced because the use of CBR classifier contains a smaller amount of cases with fewer features. The developed FR and CS combination method is also compared with the kernel PCA and SVMs techniques. Their storage requirement, classification accuracy, and classification speed are presented and discussed. Index Terms—Case-based reasoning, CBR classifier, case selection, feature reduction, k-NN principle, rough sets.
Text-Independent Speaker Recognition for Low SNR Environments with Encryption
Recognition systems are commonly designed to authenticate users at the access
control levels of a system. A number of voice recognition methods have been
developed using a pitch estimation process which are very vulnerable in low
Signal to Noise Ratio (SNR) environments thus, these programs fail to provide
the desired level of accuracy and robustness. Also, most text independent
speaker recognition programs are incapable of coping with unauthorized attempts
to gain access by tampering with the samples or reference database. The
proposed text-independent voice recognition system makes use of multilevel
cryptography to preserve data integrity while in transit or storage. Encryption
and decryption follow a transform based approach layered with pseudorandom
noise addition whereas for pitch detection, a modified version of the
autocorrelation pitch extraction algorithm is used. The experimental results
show that the proposed algorithm can decrypt the signal under test with
exponentially reducing Mean Square Error over an increasing range of SNR.
Further, it outperforms the conventional algorithms in actual identification
tasks even in noisy environments. The recognition rate thus obtained using the
proposed method is compared with other conventional methods used for speaker
identification.Comment: Biometrics, Pattern Recognition, Security, Speaker Individuality,
Text-independence, Pitch Extraction, Voice Recognition, Autocorrelation;
Published by Foundation of Computer Science, New York, US
Parsing Geometry Using Structure-Aware Shape Templates
Real-life man-made objects often exhibit strong and easily-identifiable
structure, as a direct result of their design or their intended functionality.
Structure typically appears in the form of individual parts and their
arrangement. Knowing about object structure can be an important cue for object
recognition and scene understanding - a key goal for various AR and robotics
applications. However, commodity RGB-D sensors used in these scenarios only
produce raw, unorganized point clouds, without structural information about the
captured scene. Moreover, the generated data is commonly partial and
susceptible to artifacts and noise, which makes inferring the structure of
scanned objects challenging. In this paper, we organize large shape collections
into parameterized shape templates to capture the underlying structure of the
objects. The templates allow us to transfer the structural information onto new
objects and incomplete scans. We employ a deep neural network that matches the
partial scan with one of the shape templates, then match and fit it to complete
and detailed models from the collection. This allows us to faithfully label its
parts and to guide the reconstruction of the scanned object. We showcase the
effectiveness of our method by comparing it to other state-of-the-art
approaches
A Multi-Disciplinary Review of Knowledge Acquisition Methods: From Human to Autonomous Eliciting Agents
This paper offers a multi-disciplinary review of knowledge acquisition
methods in human activity systems. The review captures the degree of
involvement of various types of agencies in the knowledge acquisition process,
and proposes a classification with three categories of methods: the human
agent, the human-inspired agent, and the autonomous machine agent methods. In
the first two categories, the acquisition of knowledge is seen as a cognitive
task analysis exercise, while in the third category knowledge acquisition is
treated as an autonomous knowledge-discovery endeavour. The motivation for this
classification stems from the continuous change over time of the structure,
meaning and purpose of human activity systems, which are seen as the factor
that fuelled researchers' and practitioners' efforts in knowledge acquisition
for more than a century.
We show through this review that the KA field is increasingly active due to
the higher and higher pace of change in human activity, and conclude by
discussing the emergence of a fourth category of knowledge acquisition methods,
which are based on red-teaming and co-evolution
HandSeg: An Automatically Labeled Dataset for Hand Segmentation from Depth Images
We propose an automatic method for generating high-quality annotations for
depth-based hand segmentation, and introduce a large-scale hand segmentation
dataset. Existing datasets are typically limited to a single hand. By
exploiting the visual cues given by an RGBD sensor and a pair of colored
gloves, we automatically generate dense annotations for two hand segmentation.
This lowers the cost/complexity of creating high quality datasets, and makes it
easy to expand the dataset in the future. We further show that existing
datasets, even with data augmentation, are not sufficient to train a hand
segmentation algorithm that can distinguish two hands. Source and datasets will
be made publicly available
- …