    Discrimination Analysis for Predicting Defect-Prone Software Modules

    Software defect prediction studies usually build models without analyzing the data used in the procedure. As a result, the same approach has different performances on different data sets. In this paper, we introduce discrimination analysis for providing a good method to give insight into the inherent property of the software data. Based on the analysis, we find that the data sets used in this field have nonlinearly separable and class-imbalanced problems. Unlike the prior works, we try to exploit the kernel method to nonlinearly map the data into a high-dimensional feature space. By combating these two problems, we propose an algorithm based on kernel discrimination analysis called KDC to build more effective prediction model. Experimental results on the data sets from different organizations indicate that KDC is more accurate in terms of F-measure than the state-of-the-art methods. We are optimistic that our discrimination analysis method can guide more studies on data structure, which may derive useful knowledge from data science for building more accurate prediction models


    The objective of this research work is to design compact and discriminative dictionaries for e�ective classi�cation. The motivation stems from the fact that dictionaries inherently contain redundant dictionary atoms. This is because the aim of dictionary learning is reconstruction, not classi�cation. In this thesis, we propose methods to obtain minimum number discriminative dictionary atoms for e�ective classi�cation and also reduced computational time. First, we propose a classi�cation scheme where an example is assigned to a class based on the weight assigned to both maximum projection and minimum reconstruction error. Here, the input data is learned by K-SVD dictionary learning which alternates between sparse coding and dictionary update. For sparse coding, orthogonal matching pursuit (OMP) is used and for dictionary update, singular value decomposition is used. This way of classi�cation though e�ective, still there is a scope to improve dictionary learning by removing redundant atoms because our goal is not reconstruction. In order to remove such redundant atoms, we propose two approaches based on information theory to obtain compact discriminative dictionaries. In the �rst approach, we remove redundant atoms from the dictionary while maintaining discriminative information. Speci�cally, we propose a constraint optimization problem which minimizes the mutual information between optimized dictionary and initial dictionary while maximizing mutual information between class labels and optimized dictionary. This helps to determine information loss between before and after the dictionary optimization. To compute information loss, we use Jensen-Shannon diver- gence with adaptive weights to compare class distributions of each dictionary atom. The advantage of Jensen-Shannon divergence is its computational e�ciency rather than calculating information loss from mutual information


    Developments in sensing and communication technologies have led to an explosion in the availability of visual data from multiple sources and modalities. Millions of cameras have been installed in buildings, streets, and airports around the world that are capable of capturing multimodal information such as light, depth, heat etc. These data are potentially a tremendous resource for building robust visual detectors and classifiers. However, the data are often large, mostly unlabeled and increasingly of mixed modality. To extract useful information from these heterogeneous data, one needs to exploit the underlying physical, geometrical or statistical structure across data modalities. For instance, in computer vision, the number of pixels in an image can be rather large, but most inference or representation models use only a few parameters to describe the appearance, geometry, and dynamics of a scene. This has motivated researchers to develop a number of techniques for finding a low-dimensional representation of a high-dimensional dataset. The dominant methodology for modeling and exploiting the low-dimensional structure in high dimensional data is sparse dictionary-based modeling. While discriminative dictionary learning have demonstrated tremendous success in computer vision applications, their performance is often limited by the amount and type of labeled data available for training. In this dissertation, we extend the sparse dictionary learning framework for weakly supervised learning problems such as semi-supervised learning, ambiguously labeled learning and Multiple Instance Learning (MIL). Furthermore, we present nonlinear extensions of these methods using the kernel trick. We also address the problem of choosing the optimal kernel for sparse representation-based classification using Multiple Kernel Learning (MKL) methods. Finally, in order to deal with heterogeneous multimodal data, we present a feature level fusion method based on quadratic programing. The dissertation has been divided into following four parts: 1) In the first part, we develop a discriminative non-linear dictionary learning technique which utilizes both labeled and unlabeled data for learning dictionaries. We compute a probability distribution over class labels for all the unlabeled samples which is updated together with dictionary and sparse coefficients. The algorithm is also extended for ambiguously labeled data when part of the data contains multiple labels for a training sample. 2) Using non-linear dictionaries, we present a multi-class Multiple Instance Learning (MIL) algorithm where the data is given in the form of bags. Each bag contains multiple samples, called instances, out of which at least one belongs to the class of the bag. We propose a noisy-OR model and a generalized mean-based optimization framework for learning the dictionaries in the feature space. The proposed method can be viewed as a generalized dictionary learning algorithm since it reduces to a novel discriminative dictionary learning framework when there is only one instance in each bag. 3) We propose a Multiple Kernel Learning (MKL) algorithm that is based on the Sparse Representation-based Classification (SRC) method. Taking advantage of the non-linear kernel SRC in efficiently representing the non-linearities in the high-dimensional feature space, we propose an MKL method based on the kernel alignment criteria. Our method uses a two step training method to learn the kernel weights and the sparse codes. At each iteration, the sparse codes are updated first while fixing the kernel mixing coefficients, and then the kernel mixing coefficients are updated while fixing the sparse codes. These two steps are repeated until a stopping criteria is met. 4) Finally, using a linear classification model, we study the problem of fusing information from multiple modalities. Many current recognition algorithms combine different modalities based on training accuracy but do not consider the possibility of noise at test time. We describe an algorithm that perturbs test features so that all modalities predict the same class. We enforce this perturbation to be as small as possible via a quadratic program (QP) for continuous features, and a mixed integer program (MIP) for binary features. To efficiently solve the MIP, we provide a greedy algorithm and empirically show that its solution is very close to that of a state-of-the-art MIP solver

    Design of Non-Linear Discriminative Dictionaries for Image Classification

