5 research outputs found


    Get PDF
    In this paper, we propose a spectral-spatial feature extraction framework based on deep learning (DL) for hyperspectral image (HSI) classification. In this framework, the variational autoencoder (VAE) is used for extraction of spectral features from two widely used hyperspectral datasets- Kennedy Space Centre, Florida and University of Pavia, Italy. Additionally, a convolutional neural network (CNN) is utilized to obtain spatial features. The spatial and spectral feature vectors are then stacked together to form a joint feature vector. Finally, the joint feature vector is trained using multinomial logistic regression (softmax regression) for prediction of class labels. The classification performance analysis is done through generation of the confusion matrix. The confusion matrix is then used to calculate Cohen’s Kappa (Κ) to get a quantitative measure of classification performance. The results show that the K value is higher than 0.99 for both HSI datasets

    Benchmarking Top-K Keyword and Top-K Document Processing with T2{}^2K2{}^2 and T2{}^2K2{}^2D2{}^2

    Full text link
    Top-k keyword and top-k document extraction are very popular text analysis techniques. Top-k keywords and documents are often computed on-the-fly, but they exploit weighted vocabularies that are costly to build. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present T2{}^2K2{}^2, a top-k keywords and documents benchmark, and its decision support-oriented evolution T2{}^2K2{}^2D2{}^2. Both benchmarks feature a real tweet dataset and queries with various complexities and selectivities. They help evaluate weighting schemes and database implementations in terms of computing performance. To illustrate our bench-marks' relevance and genericity, we successfully ran performance tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand

    A Transferable and Automatic Tuning of Deep Reinforcement Learning for Cost Effective Phishing Detection

    Full text link
    Many challenging real-world problems require the deployment of ensembles multiple complementary learning models to reach acceptable performance levels. While effective, applying the entire ensemble to every sample is costly and often unnecessary. Deep Reinforcement Learning (DRL) offers a cost-effective alternative, where detectors are dynamically chosen based on the output of their predecessors, with their usefulness weighted against their computational cost. Despite their potential, DRL-based solutions are not widely used in this capacity, partly due to the difficulties in configuring the reward function for each new task, the unpredictable reactions of the DRL agent to changes in the data, and the inability to use common performance metrics (e.g., TPR/FPR) to guide the algorithm's performance. In this study we propose methods for fine-tuning and calibrating DRL-based policies so that they can meet multiple performance goals. Moreover, we present a method for transferring effective security policies from one dataset to another. Finally, we demonstrate that our approach is highly robust against adversarial attacks

    How to Rank Answers in Text Mining

    Get PDF
    In this thesis, we mainly focus on case studies about answers. We present the methodology CEW-DTW and assess its performance about ranking quality. Based on the CEW-DTW, we improve this methodology by combining Kullback-Leibler divergence with CEW-DTW, since Kullback-Leibler divergence can check the difference of probability distributions in two sequences. However, CEW-DTW and KL-CEW-DTW do not care about the effect of noise and keywords from the viewpoint of probability distribution. Therefore, we develop a new methodology, the General Entropy, to see how probabilities of noise and keywords affect answer qualities. We firstly analyze some properties of the General Entropy, such as the value range of the General Entropy. Especially, we try to find an objective goal, which can be regarded as a standard to assess answers. Therefore, we introduce the maximum general entropy. We try to use the general entropy methodology to find an imaginary answer with the maximum entropy from the mathematical viewpoint (though this answer may not exist). This answer can also be regarded as an “ideal” answer. By comparing maximum entropy probabilities and global probabilities of noise and keywords respectively, the maximum entropy probability of noise is smaller than the global probability of noise, maximum entropy probabilities of chosen keywords are larger than global probabilities of keywords in some conditions. This allows us to determinably select the max number of keywords. We also use Amazon dataset and a small group of survey to assess the general entropy. Though these developed methodologies can analyze answer qualities, they do not incorporate the inner connections among keywords and noise. Based on the Markov transition matrix, we develop the Jump Probability Entropy. We still adapt Amazon dataset to compare maximum jump entropy probabilities and global jump probabilities of noise and keywords respectively. Finally, we give steps about how to get answers from Amazon dataset, including obtaining original answers from Amazon dataset, removing stopping words and collinearity. We compare our developed methodologies to see if these methodologies are consistent. Also, we introduce Wald–Wolfowitz runs test and compare it with developed methodologies to verify their relationships. Depending on results of comparison, we get conclusions about consistence of these methodologies and illustrate future plans

    Machine Learning Morphisms: A Framework for Designing and Analyzing Machine Learning Work ows, Applied to Separability, Error Bounds, and 30-Day Hospital Readmissions

    Get PDF
    A machine learning workflow is the sequence of tasks necessary to implement a machine learning application, including data collection, preprocessing, feature engineering, exploratory analysis, and model training/selection. In this dissertation we propose the Machine Learning Morphism (MLM) as a mathematical framework to describe the tasks in a workflow. The MLM is a tuple consisting of: Input Space, Output Space, Learning Morphism, Parameter Prior, Empirical Risk Function. This contains the information necessary to learn the parameters of the learning morphism, which represents a workflow task. In chapter 1, we give a short review of typical tasks present in a workflow, as well as motivation for and innovations in the MLM framework. In chapter 2, we first define data as realizations of an unknown probability space. Then, after a brief introduction to statistical learning, the MLM is formally defined. Examples of MLM\u27s are presented, including linear regression, standardization, and the Naive Bayes Classifier. Asymptotic equality is defined between MLM\u27s by analyzing the parameters in the limit of infinite training data. Two definitions of composition are proposed, output and structural. Output composition is a sequential optimization of MLM\u27s, for example standardization followed by regression. Structural composition is a joint optimization inspired by backpropagation from neural nets. While structural compositions yield better overall performance, output compositions are easier to compute and interpret. In Chapter 3, we define the property of separability, where an MLM can be optimized by solving lower dimensional sub problems. A separable MLM represents a divide and conquer strategy for learning without sacrificing optimality. We show three cases of separable MLM\u27s for mean-squared error with increasing complexity. First, if the input space consists of centered, independent random variables, OLS Linear Regression is separable. This is extended to linear combinations of uncorrelated ensembles, and ensembles of non-linear, uncorrelated learning morphisms. The example of principal component regression is explored thoroughly as a separable workflow, and the choice between equivalent linear regressions is discussed. These separability results apply to a wide variety of problems via asymptotic equality. Functions which can be represented as power series can be learned via polynomial regression. Further, independent and centered power series can be generated using an orthogonal extension of principal component analysis (PCA). In Chapter 4, we explore the connection between generalization error and lower bounds used in estimation. We start by defining the ``Bayes MLM , the best possible MLM for a given problem. When the loss function is mean-squared error, Cramer-Rao lower bounds exist for an MLM which depend on the bias of the MLM and the underlying probability distribution. This can be used as a design tool when selecting candidate MLM\u27s, or as a tool for sensitivity analysis to examine the error of an MLM across a variety of parameterizations. A lower bound on the composition of MLM\u27s is constructed by applying a nonlinear filtering framework to the composition. Examples are presented for centering, PCA, ordinary least-squares linear regression, and the composition of these MLM\u27s. In Chapter 5 we apply the MLM framework to design a workflow that predicts 30-day hospital readmissions. Hospital readmissions occur when a patient is admitted less than 30 days after a previous hospital stay. We examine readmissions for a group of medicare/medicaid patients with the four most common diagnoses at Barnes Jewish Hospital. Using MLM\u27s, we incorporate the Mapper algorithm from topological data analysis into the predictive workflow in a novel ensemble. This ensemble first performs fuzzy clustering on the training set, and then trains models independently on each cluster. We compare an assortment of workflows predicting readmissions, and workflows featuring mapper outperform other standard models and current tools used for risk prediction at Barnes Jewish. Finally, we examine the separability of this workflow. Mapper workflows incorporating AdaBoost and logistic regression create node models with low correlation. When PCA is applied to each node, Random Forest node models also become decorrelated. Support Vector Machine node models are highly correlated, and do not converge when PCA is applied. This is consistent with their worse performance. In Chapter 6 we provide final comments and future work