95,109 research outputs found
Large-Scale Multi-Label Learning with Incomplete Label Assignments
Multi-label learning deals with the classification problems where each
instance can be assigned with multiple labels simultaneously. Conventional
multi-label learning approaches mainly focus on exploiting label correlations.
It is usually assumed, explicitly or implicitly, that the label sets for
training instances are fully labeled without any missing labels. However, in
many real-world multi-label datasets, the label assignments for training
instances can be incomplete. Some ground-truth labels can be missed by the
labeler from the label set. This problem is especially typical when the number
instances is very large, and the labeling cost is very high, which makes it
almost impossible to get a fully labeled training set. In this paper, we study
the problem of large-scale multi-label learning with incomplete label
assignments. We propose an approach, called MPU, based upon positive and
unlabeled stochastic gradient descent and stacked models. Unlike prior works,
our method can effectively and efficiently consider missing labels and label
correlations simultaneously, and is very scalable, that has linear time
complexities over the size of the data. Extensive experiments on two real-world
multi-label datasets show that our MPU model consistently outperform other
commonly-used baselines
Information Theory-Guided Heuristic Progressive Multi-View Coding
Multi-view representation learning aims to capture comprehensive information
from multiple views of a shared context. Recent works intuitively apply
contrastive learning to different views in a pairwise manner, which is still
scalable: view-specific noise is not filtered in learning view-shared
representations; the fake negative pairs, where the negative terms are actually
within the same class as the positive, and the real negative pairs are
coequally treated; evenly measuring the similarities between terms might
interfere with optimization. Importantly, few works study the theoretical
framework of generalized self-supervised multi-view learning, especially for
more than two views. To this end, we rethink the existing multi-view learning
paradigm from the perspective of information theory and then propose a novel
information theoretical framework for generalized multi-view learning. Guided
by it, we build a multi-view coding method with a three-tier progressive
architecture, namely Information theory-guided hierarchical Progressive
Multi-view Coding (IPMC). In the distribution-tier, IPMC aligns the
distribution between views to reduce view-specific noise. In the set-tier, IPMC
constructs self-adjusted contrasting pools, which are adaptively modified by a
view filter. Lastly, in the instance-tier, we adopt a designed unified loss to
learn representations and reduce the gradient interference. Theoretically and
empirically, we demonstrate the superiority of IPMC over state-of-the-art
methods.Comment: This paper is accepted by the jourcal of Neural Networks (Elsevier)
by 2023. A revised manuscript of arXiv:2109.0234
Evolving Large-Scale Data Stream Analytics based on Scalable PANFIS
Many distributed machine learning frameworks have recently been built to
speed up the large-scale data learning process. However, most distributed
machine learning used in these frameworks still uses an offline algorithm model
which cannot cope with the data stream problems. In fact, large-scale data are
mostly generated by the non-stationary data stream where its pattern evolves
over time. To address this problem, we propose a novel Evolving Large-scale
Data Stream Analytics framework based on a Scalable Parsimonious Network based
on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving
algorithm is distributed over the worker nodes in the cloud to learn
large-scale data stream. Scalable PANFIS framework incorporates the active
learning (AL) strategy and two model fusion methods. The AL accelerates the
distributed learning process to generate an initial evolving large-scale data
stream model (initial model), whereas the two model fusion methods aggregate an
initial model to generate the final model. The final model represents the
update of current large-scale data knowledge which can be used to infer future
data. Extensive experiments on this framework are validated by measuring the
accuracy and running time of four combinations of Scalable PANFIS and other
Spark-based built in algorithms. The results indicate that Scalable PANFIS with
AL improves the training time to be almost two times faster than Scalable
PANFIS without AL. The results also show both rule merging and the voting
mechanisms yield similar accuracy in general among Scalable PANFIS algorithms
and they are generally better than Spark-based algorithms. In terms of running
time, the Scalable PANFIS training time outperforms all Spark-based algorithms
when classifying numerous benchmark datasets.Comment: 20 pages, 5 figure
A Scalable Deep Neural Network Architecture for Multi-Building and Multi-Floor Indoor Localization Based on Wi-Fi Fingerprinting
One of the key technologies for future large-scale location-aware services
covering a complex of multi-story buildings --- e.g., a big shopping mall and a
university campus --- is a scalable indoor localization technique. In this
paper, we report the current status of our investigation on the use of deep
neural networks (DNNs) for scalable building/floor classification and
floor-level position estimation based on Wi-Fi fingerprinting. Exploiting the
hierarchical nature of the building/floor estimation and floor-level
coordinates estimation of a location, we propose a new DNN architecture
consisting of a stacked autoencoder for the reduction of feature space
dimension and a feed-forward classifier for multi-label classification of
building/floor/location, on which the multi-building and multi-floor indoor
localization system based on Wi-Fi fingerprinting is built. Experimental
results for the performance of building/floor estimation and floor-level
coordinates estimation of a given location demonstrate the feasibility of the
proposed DNN-based indoor localization system, which can provide near
state-of-the-art performance using a single DNN, for the implementation with
lower complexity and energy consumption at mobile devices.Comment: 9 pages, 6 figure
- …