16,294 research outputs found
Adaptive grid based localized learning for multidimensional data
Rapid advances in data-rich domains of science, technology, and business has amplified the computational challenges of Big Data synthesis necessary to slow the widening gap between the rate at which the data is being collected and analyzed for knowledge. This has led to the renewed need for efficient and accurate algorithms, framework, and algorithmic mechanisms essential for knowledge discovery, especially in the domains of clustering, classification, dimensionality reduction, feature ranking, and feature selection. However, data mining algorithms are frequently challenged by the sparseness due to the high dimensionality of the datasets in such domains which is particularly detrimental to the performance of unsupervised learning algorithms.
The motivation for the research presented in this dissertation is to develop novel data mining algorithms to address the challenges of high dimensionality, sparseness and large volumes of datasets by using a unique grid-based localized learning paradigm for data movement clustering and classification schema. The grid-based learning is recognized in data mining as these algorithms are inherently efficient since they reduce the search space by partitioning the feature space into effective partitions. However, these approaches have not been successfully devised for supervised learning algorithms or sparseness reduction algorithm as they require careful estimation of grid sizes, partitions and data movement error calculations. Grid-based localized learning algorithms can scale well with an increase in dimensionality and the size of the datasets.
To fulfill the goal of designing and developing learning algorithms that can handle data sparseness, high data dimensionality, and large size of data, in a concurrent manner to avoid the feature selection biases, a set of novel data mining algorithms using grid-based localized learning principles are developed and presented. The first algorithm is a unique computational framework for feature ranking that employs adaptive grid-based data shrinking for feature ranking. This method addresses the limitations of existing feature ranking methods by using a scoring function that discovers and exploits dependencies from all the features in the data. Data shrinking principles are established and metricized to capture and exploit dependencies between features. The second core algorithmic contribution is a novel supervised learning algorithm that utilizes grid-based localized learning to build a nonparametric classification model. In this classification model, feature space is divided using uniform/non-uniform partitions and data space subdivision is performed using a grid structure which is then used to build a classification model using grid-based nearest-neighbor learning. The third algorithm is an unsupervised clustering algorithm that is augmented with data shrinking to enhance the clustering performance of the algorithm. This algorithm addresses the limitations of the existing grid-based data shrinking and clustering algorithms by using an adaptive grid-based learning. Multiple experiments on a diversified set of datasets evaluate and discuss the effectiveness of dimensionality reduction, feature selection, unsupervised and supervised learning, and the scalability of the proposed methods compared to the established methods in the literature
DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams
Similarity matching and join of time series data streams has gained a lot of
relevance in today's world that has large streaming data. This process finds
wide scale application in the areas of location tracking, sensor networks,
object positioning and monitoring to name a few. However, as the size of the
data stream increases, the cost involved to retain all the data in order to aid
the process of similarity matching also increases. We develop a novel framework
to addresses the following objectives. Firstly, Dimension reduction is
performed in the preprocessing stage, where large stream data is segmented and
reduced into a compact representation such that it retains all the crucial
information by a technique called Multi-level Segment Means (MSM). This reduces
the space complexity associated with the storage of large time-series data
streams. Secondly, it incorporates effective Similarity Matching technique to
analyze if the new data objects are symmetric to the existing data stream. And
finally, the Pruning Technique that filters out the pseudo data object pairs
and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction
Factor. We have performed exhaustive experimental trials to show that the
proposed framework is both efficient and competent in comparison with earlier
works.Comment: 20 pages,8 figures, 6 Table
A Convex Formulation for Spectral Shrunk Clustering
Spectral clustering is a fundamental technique in the field of data mining
and information processing. Most existing spectral clustering algorithms
integrate dimensionality reduction into the clustering process assisted by
manifold learning in the original space. However, the manifold in
reduced-dimensional subspace is likely to exhibit altered properties in
contrast with the original space. Thus, applying manifold information obtained
from the original space to the clustering process in a low-dimensional subspace
is prone to inferior performance. Aiming to address this issue, we propose a
novel convex algorithm that mines the manifold structure in the low-dimensional
subspace. In addition, our unified learning process makes the manifold learning
particularly tailored for the clustering. Compared with other related methods,
the proposed algorithm results in more structured clustering result. To
validate the efficacy of the proposed algorithm, we perform extensive
experiments on several benchmark datasets in comparison with some
state-of-the-art clustering approaches. The experimental results demonstrate
that the proposed algorithm has quite promising clustering performance.Comment: AAAI201
String theory duals of Lifshitz-Chern-Simons gauge theories
We propose candidate gravity duals for a class of non-Abelian z=2 Lifshitz
Chern-Simons (LCS) gauge theories studied by Mulligan, Kachru and Nayak. These
are nonrelativistic gauge theories in 2+1 dimensions in which parity and
time-reversal symmetries are explicitly broken by the presence of a
Chern-Simons term. We show that these field theories can be realized as
deformations of DLCQ N=4 super Yang-Mills theory. Using the holographic
dictionary, we identify the bulk fields that are dual to these deformations.
The geometries describing the groundstates of the non-Abelian LCS gauge
theories realized here exhibit a mass gap.Comment: 25+14 pages, 3 figures; v2: significant corrections regarding IR
geometry, resulting in new section 5; journal versio
- …