26 research outputs found
On the Performance of Latent Semantic Indexing-based Information Retrieval
Conventional vector based Information Retrieval (IR) models, Vector Space Model (VSM) and Generalized Vector Space Model (GVSM), represents documents and queries as vectors in a multidimensional space. This high dimensional data places great demands for computing resources. To overcome these problems, Latent Semantic Indexing (LSI): a variant of VSM, projects the documents into a lower dimensional space, computed via Singular Value Decomposition. It is stated in IR literature that LSI model is 30% more effective than classical VSM models. However statistical significance tests are required to evaluate the reliability of such comparisons. But to the best of our knowledge significance of performance of LSI model is not analyzed so far. Focus of this paper is to address this issue. We discuss the tradeoffs of VSM, GVSM and LSI and empirically evaluate the difference in performance on four testing document collections. Then we analyze the statistical significance of these performance differences
Improving Accuracy of Intrusion Detection Model Using PCA and optimized SVM
Intrusion detection is very essential for providing security to different network domains and is mostly used for locating and tracing the intruders. There are many problems with traditional intrusion detection models (IDS) such as low detection capability against unknown network attack, high false alarm rate and insufficient analysis capability. Hence the major scope of the research in this domain is to develop an intrusion detection model with improved accuracy and reduced training time. This paper proposes a hybrid intrusiondetection model by integrating the principal component analysis (PCA) and support vector machine (SVM). The novelty of the paper is the optimization of kernel parameters of the SVM classifier using automatic parameter selection technique. This technique optimizes the punishment factor (C) and kernel parameter gamma (γ), thereby improving the accuracy of the classifier and reducing the training and testing time. The experimental results obtained on the NSL KDD and gurekddcup dataset show that the proposed technique performs better with higher accuracy, faster convergence speed and better generalization. Minimum resources are consumed as the classifier input requires reduced feature set for optimum classification. A comparative analysis of hybrid models with the proposed model is also performed
Synthetic Data for Feature Selection
Feature selection is an important and active field of research in machine
learning and data science. Our goal in this paper is to propose a collection of
synthetic datasets that can be used as a common reference point for feature
selection algorithms. Synthetic datasets allow for precise evaluation of
selected features and control of the data parameters for comprehensive
assessment. The proposed datasets are based on applications from electronics in
order to mimic real life scenarios. To illustrate the utility of the proposed
data we employ one of the datasets to test several popular feature selection
algorithms. The datasets are made publicly available on GitHub and can be used
by researchers to evaluate feature selection algorithms
Exploring Attributes with Domain Knowledge in Formal Concept Analysis
Recent literature reports the growing interests in data analysis using FormalConceptAnalysis (FCA), in which data is represented in the form of object and attribute relations. FCA analyzes and then subsequently visualizes the data based on duality called Galois connection. Attribute exploration is a knowledge acquisition process in FCA, which interactively determines the implications holding between the attributes. The objective of this paper is to demonstrate the attribute exploration to understand the dependencies among the attributes in the data. While performing this process, we add domain experts’ knowledge as background knowledge. We demonstrate the method through experiments on two real world healthcare datasets. The results show that the knowledge acquired through exploration process coupled with domain expert knowledge has better classification accuracy
A study and analysis of recommendation systems for location-based social network (LBSN) with big data
Recommender systems play an important role in our day-to-day life. A recommender system automatically suggests an item to a user that he/she might be interested in. Small-scale datasets are used to provide recommendations based on location, but in real time, the volume of data is large. We have selected Foursquare dataset to study the need for big data in recommendation systems for location-based social network (LBSN). A few quality parameters like parallel processing and multimodal interface have been selected to study the need for big data in recommender systems. This paper provides a study and analysis of quality parameters of recommendation systems for LBSN with big data
Revisiting Fully Homomorphic Encryption Schemes
Homomorphic encryption is a sophisticated encryption technique that allows
computations on encrypted data to be done without the requirement for
decryption. This trait makes homomorphic encryption appropriate for safe
computation in sensitive data scenarios, such as cloud computing, medical data
exchange, and financial transactions. The data is encrypted using a public key
in homomorphic encryption, and the calculation is conducted on the encrypted
data using an algorithm that retains the encryption. The computed result is
then decrypted with a private key to acquire the final output. This abstract
notion protects data while allowing complicated computations to be done on the
encrypted data, resulting in a secure and efficient approach to analysing
sensitive information. This article is intended to give a clear idea about the
various fully Homomorphic Encryption Schemes present in the literature and
analyse and compare the results of each of these schemes. Further, we also
provide applications and open-source tools of homomorphic encryption schemes.Comment: A quick summary of Fully Homomorphic Encryption Schemes along with
their background, concepts, applications and open-source librarie
LATENT SEMANTIC INDEXING USING EIGENVALUE ANALYSIS FOR EFFICIENT INFORMATION RETRIEVAL
Text retrieval using Latent Semantic Indexing (LSI) with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. However, the expensive complexity involved in computing truncated SVD constitutes a major drawback of the LSI method. In this paper, we demonstrate how matrix rank approximation can influence the effectiveness of information retrieval systems. Besides, we present an implementation of the LSI method based on an eigenvalue analysis for rank approximation without computing truncated SVD, along with its computational details. Significant improvements in computational time while maintaining retrieval accuracy are observed over the tested document collections