249 research outputs found
Few Shot Learning With No Labels
Few-shot learners aim to recognize new categories given only a small number
of training samples. The core challenge is to avoid overfitting to the limited
data while ensuring good generalization to novel classes. Existing literature
makes use of vast amounts of annotated data by simply shifting the label
requirement from novel classes to base classes. Since data annotation is
time-consuming and costly, reducing the label requirement even further is an
important goal. To that end, our paper presents a more challenging few-shot
setting where no label access is allowed during training or testing. By
leveraging self-supervision for learning image representations and image
similarity for classification at test time, we achieve competitive baselines
while using \textbf{zero} labels, which is at least fewer labels than
state-of-the-art. We hope that this work is a step towards developing few-shot
learning methods which do not depend on annotated data at all. Our code will be
publicly released
Incorporating Word Embeddings into Open Directory Project based Large-scale Classification
Recently, implicit representation models, such as embedding or deep learning,
have been successfully adopted to text classification task due to their
outstanding performance. However, these approaches are limited to small- or
moderate-scale text classification. Explicit representation models are often
used in a large-scale text classification, like the Open Directory Project
(ODP)-based text classification. However, the performance of these models is
limited to the associated knowledge bases. In this paper, we incorporate word
embeddings into the ODP-based large-scale classification. To this end, we first
generate category vectors, which represent the semantics of ODP categories by
jointly modeling word embeddings and the ODP-based text classification. We then
propose a novel semantic similarity measure, which utilizes the category and
word vectors obtained from the joint model and word embeddings, respectively.
The evaluation results clearly show the efficacy of our methodology in
large-scale text classification. The proposed scheme exhibits significant
improvements of 10% and 28% in terms of macro-averaging F1-score and precision
at k, respectively, over state-of-the-art techniques.Comment: 12 pages, 2 figures, In proceedings of the 22nd Pacific-Asia
Conference on Knowledge Discovery and Data Mining (PAKDD
Few-Shot Learning with a Strong Teacher
Few-shot learning (FSL) aims to train a strong classifier using limited
labeled examples. Many existing works take the meta-learning approach, sampling
few-shot tasks in turn and optimizing the few-shot learner's performance on
classifying the query examples. In this paper, we point out two potential
weaknesses of this approach. First, the sampled query examples may not provide
sufficient supervision for the few-shot learner. Second, the effectiveness of
meta-learning diminishes sharply with increasing shots (i.e., the number of
training examples per class). To resolve these issues, we propose a novel
objective to directly train the few-shot learner to perform like a strong
classifier. Concretely, we associate each sampled few-shot task with a strong
classifier, which is learned with ample labeled examples. The strong classifier
has a better generalization ability and we use it to supervise the few-shot
learner. We present an efficient way to construct the strong classifier, making
our proposed objective an easily plug-and-play term to existing meta-learning
based FSL methods. We validate our approach in combinations with many
representative meta-learning methods. On several benchmark datasets including
miniImageNet and tiredImageNet, our approach leads to a notable improvement
across a variety of tasks. More importantly, with our approach, meta-learning
based FSL methods can consistently outperform non-meta-learning based ones,
even in a many-shot setting, greatly strengthening their applicability
Fast, Scalable, and Accurate Algorithms for Time-Series Analysis
Time is a critical element for the understanding of natural processes (e.g., earthquakes and weather) or human-made artifacts (e.g., stock market and speech signals). The analysis of time series, the result of sequentially collecting observations of such processes and artifacts, is becoming increasingly prevalent across scientific and industrial applications. The extraction of non-trivial features (e.g., patterns, correlations, and trends) in time series is a critical step for devising effective time-series mining methods for real-world problems and the subject of active research for decades. In this dissertation, we address this fundamental problem by studying and presenting computational methods for efficient unsupervised learning of robust feature representations from time series. Our objective is to (i) simplify and unify the design of scalable and accurate time-series mining algorithms; and (ii) provide a set of readily available tools for effective time-series analysis. We focus on applications operating solely over time-series collections and on applications where the analysis of time series complements the analysis of other types of data, such as text and graphs.
For applications operating solely over time-series collections, we propose a generic computational framework, GRAIL, to learn low-dimensional representations that natively preserve the invariances offered by a given time-series comparison method. GRAIL represents a departure from classic approaches in the time-series literature where representation methods are agnostic to the similarity function used in subsequent learning processes. GRAIL relies on the attractive idea that once we construct the data-to-data similarity matrix most time-series mining tasks can be trivially solved. To overcome scalability issues associated with approaches relying on such matrices, GRAIL exploits time-series clustering to construct a small set of landmark time series and learns representations to reduce the data-to-data matrix to a data-to-landmark points matrix. To demonstrate the effectiveness of GRAIL, we first present domain-independent, highly accurate, and scalable time-series clustering methods to facilitate exploration and summarization of time-series collections. Then, we show that GRAIL representations, when combined with suitable methods, significantly outperform, in terms of efficiency and accuracy, state-of-the-art methods in major time-series mining tasks, such as querying, clustering, classification, sampling, and visualization. Overall, GRAIL rises as a new primitive for highly accurate, yet scalable, time-series analysis.
For applications where the analysis of time series complements the analysis of other types of data, such as text and graphs, we propose generic, simple, and lightweight methodologies to learn features from time-varying measurements. Such applications often organize operations over different types of data in a pipeline such that one operation provides input---in the form of feature vectors---to subsequent operations. To reason about the temporal patterns and trends in the underlying features, we need to (i) track the evolution of features over different time periods; and (ii) transform these time-varying features into actionable knowledge (e.g., forecasting an outcome). To address this challenging problem, we propose principled approaches to model time-varying features and study two large-scale, real-world, applications. Specifically, we first study the problem of predicting the impact of scientific concepts through temporal analysis of characteristics extracted from the metadata and full text of scientific articles. Then, we explore the promise of harnessing temporal patterns in behavioral signals extracted from web search engine logs for early detection of devastating diseases. In both applications, combinations of features with time-series relevant features yielded the greatest impact than any other indicator considered in our analysis. We believe that our simple methodology, along with the interesting domain-specific findings that our work revealed, will motivate new studies across different scientific and industrial settings
Multiple Manifold Clustering Using Curvature Constrained Path
The problem of multiple surface clustering is a challenging task,
particularly when the surfaces intersect. Available methods such as Isomap fail
to capture the true shape of the surface nearby the intersection and result in
incorrect clustering. The Isomap algorithm uses the shortest path between
points. The main draw back of the shortest path algorithm is due to the lack of
curvature constrained where causes to have a path between points on different
surfaces. In this paper, we tackle this problem by imposing a curvature
constraint to the shortest path algorithm used in Isomap. The algorithm chooses
several landmark nodes at random and then checks whether there is a curvature
constrained path between each landmark node and every other node in the
neighbourhood graph. We build a binary feature vector for each point where each
entry represents the connectivity of that point to a particular landmark. Then
the binary feature vectors could be used as an input of conventional clustering
algorithm such as hierarchical clustering. We apply our method to simulated and
some real datasets and show, it performs comparably to the best methods such as
K-manifold and spectral multi-manifold clustering.Comment: arXiv admin note: text overlap with arXiv:1802.07416; text overlap
with arXiv:1509.00947 by other author
A Survey on Data Mining Techniques Applied to Energy Time Series Forecasting
Data mining has become an essential tool during the last decade to analyze large sets of data. The variety of techniques it includes and the successful results obtained in many application fields, make this family of approaches powerful and widely used. In particular, this work explores the application of these techniques to time series forecasting. Although classical statistical-based methods provides reasonably good results, the result of the application of data mining outperforms those of classical ones. Hence, this work faces two main challenges: (i) to provide a compact mathematical formulation of the mainly used techniques; (ii) to review the latest works of time series forecasting and, as case study, those related to electricity price and demand markets.Ministerio de Economía y Competitividad TIN2014-55894-C2-RJunta de Andalucía P12- TIC-1728Universidad Pablo de Olavide APPB81309
Malgazer: An Automated Malware Classifier With Running Window Entropy and Machine Learning
This dissertation explores functional malware classification using running window entropy and machine learning classifiers. This topic was under researched in the prior literature, but the implications are important for malware defense. This dissertation will present six new design science artifacts. The first artifact was a generalized machine learning based malware classifier model. This model was used to categorize and explain the gaps in the prior literature. This artifact was also used to compare the prior literature to the classifiers created in this dissertation, herein referred to as “Malgazer” classifiers.
Running window entropy data was required, but the algorithm was too slow to compute at scale. This dissertation presents an optimized version of the algorithm that requires less than 2% of the time of the original algorithm. Next, the classifications for the malware samples were required, but there was no one unified and consistent source for this information. One of the design science artifacts was the method to determine the classifications from publicly available resources.
Once the running window entropy data was computed and the functional classifications were collected, the machine learning algorithms were trained at scale so that one individual could complete over 200 computationally intensive experiments for this dissertation. The method to scale the computations was an instantiation design science artifact. The trained classifiers were another design science artifact. Lastly, a web application was developed so that the classifiers could be utilized by those without a programming background. This was the last design science artifact created by this research.
Once the classifiers were developed, they were compared to prior literature theoretically and empirically. A malware classification method from prior literature was chosen (referred to herein as “GIST”) for an empirical comparison to the Malgazer classifiers. The best Malgazer classifier produced an accuracy of approximately 95%, which was around 0.76% more accurate than the GIST method on the same data sets. Then, the Malgazer classifier was compared to the prior literature theoretically, based upon the empirical analysis with GIST, and Malgazer performed at least as well as the prior literature. While the data, methods, and source code are open sourced from this research, most prior literature did not provide enough information or data to replicate and verify each method. This prevented a full and true comparison to prior literature, but it did not prevent recommending the Malgazer classifier for some use cases
Rethinking Kernel Methods for Node Representation Learning on Graphs
Graph kernels are kernel methods measuring graph similarity and serve as a
standard tool for graph classification. However, the use of kernel methods for
node classification, which is a related problem to graph representation
learning, is still ill-posed and the state-of-the-art methods are heavily based
on heuristics. Here, we present a novel theoretical kernel-based framework for
node classification that can bridge the gap between these two representation
learning problems on graphs. Our approach is motivated by graph kernel
methodology but extended to learn the node representations capturing the
structural information in a graph. We theoretically show that our formulation
is as powerful as any positive semidefinite kernels. To efficiently learn the
kernel, we propose a novel mechanism for node feature aggregation and a
data-driven similarity metric employed during the training phase. More
importantly, our framework is flexible and complementary to other graph-based
deep learning models, e.g., Graph Convolutional Networks (GCNs). We empirically
evaluate our approach on a number of standard node classification benchmarks,
and demonstrate that our model sets the new state of the art.Comment: Accepted to NeurIPS 2019. The first two authors contributed equally.
The source code is publicly available at
https://github.com/bluer555/KernelGC
Similarity Models in Distributional Semantics using Task Specific Information
In distributional semantics, the unsupervised learning approach has been widely used for a large number of tasks. On the other hand, supervised learning has less coverage.
In this dissertation, we investigate the supervised learning approach for semantic relatedness tasks in distributional semantics. The investigation considers mainly semantic similarity and semantic classification tasks. Existing and newly-constructed datasets are used as an input for the experiments. The new datasets are constructed from thesauruses like Eurovoc. The Eurovoc thesaurus is a multilingual thesaurus maintained by the Publications Office of the European Union. The meaning of the words in the dataset is represented by using a distributional semantic approach.
The distributional semantic approach collects co-occurrence information from large texts and represents the words in high-dimensional vectors. The English words are represented by using UkWaK corpus while German words are represented by using DeWaC corpus. After representing each word by the high dimensional vector, different supervised machine learning methods are used on the selected tasks. The outputs from the supervised machine learning methods are evaluated by comparing the tasks performance and accuracy with the state of the art unsupervised machine learning methods’ results. In addition, multi-relational matrix factorization is introduced as one supervised learning method in distributional semantics. This dissertation shows the multi-relational matrix factorization method as a good alternative method to integrate different sources of information of words in distributional semantics.
In the dissertation, some new applications are also introduced. One of the applications is an application which analyzes a German company’s website text, and provides information about the company with a concept cloud visualization. The other applications are automatic recognition/disambiguation of the library of congress subject headings and automatic identification of synonym relations in the Dutch Parliament thesaurus applications
- …