3,473 research outputs found
Intrusion Detection Systems Using Adaptive Regression Splines
Past few years have witnessed a growing recognition of intelligent techniques
for the construction of efficient and reliable intrusion detection systems. Due
to increasing incidents of cyber attacks, building effective intrusion
detection systems (IDS) are essential for protecting information systems
security, and yet it remains an elusive goal and a great challenge. In this
paper, we report a performance analysis between Multivariate Adaptive
Regression Splines (MARS), neural networks and support vector machines. The
MARS procedure builds flexible regression models by fitting separate splines to
distinct intervals of the predictor variables. A brief comparison of different
neural network learning algorithms is also given
Malware Analysis on Android Using Supervised Machine Learning Techniques
In recent years, a widespread research is conducted with the growth of malware resulted in the domain of malware analysis and detection in Android devices. Android, a mobile-based operating system currently having more than one billion active users with a high market impact that have inspired the expansion of malware by cyber criminals. Android implements a different architecture and security controls to solve the problems caused by malware, such as unique user ID (UID) for each application, system permissions, and its distribution platform Google Play. There are numerous ways to violate that fortification, and how the complexity of creating a new solution is enlarged while cybercriminals progress their skills to develop malware. A community including developer and researcher has been evolving substitutes aimed at refining the level of safety where numerous machine learning algorithms already been proposed or applied to classify or cluster malware including analysis techniques, frameworks, sandboxes, and systems security. One of the most promising techniques is the implementation of artificial intelligence solutions for malware analysis. In this paper, we evaluate numerous supervised machine learning algorithms by implementing a static analysis framework to make predictions for detecting malware on Android
Improving Database Quality through Eliminating Duplicate Records
Redundant or duplicate data are the most troublesome problem in database management and applications. Approximate field matching is the key solution to resolve the problem by identifying semantically equivalent string values in syntactically different representations. This paper considers token-based solutions and proposes a general field matching framework to generalize the field matching problem in different domains. By introducing a concept of String Matching Points (SMP) in string comparison, string matching accuracy and efficiency are improved, compared with other commonly-applied field matching algorithms. The paper discusses the development of field matching algorithms from the developed general framework. The framework and corresponding algorithm are tested on a public data set of the NASA publication abstract database. The approach can be applied to address the similar problems in other databases
Quasi-phase-matched Faraday rotation in semiconductor waveguides with a magnetooptic cladding for monolithically integrated optical isolators
Strategies are developed for obtaining nonreciprocal polarization mode conversion, also known as Faraday rotation, in waveguides in a format consistent with silicon-on-insulator or III–V semiconductor photonic integrated circuits. Fabrication techniques are developed using liftoff lithography and sputtering to obtain garnet segments as upper claddings, which have an evanescent wave interaction with the guided light. A mode solver approach is used to determine the modal Stokes parameters for such structures, and design considerations indicate that quasi-phase-matched Faraday rotation for optical isolator applications could be obtained with devices on the millimeter length scale
Enhancing Machine Learning Performance with Continuous In-Session Ground Truth Scores: Pilot Study on Objective Skeletal Muscle Pain Intensity Prediction
Machine learning (ML) models trained on subjective self-report scores
struggle to objectively classify pain accurately due to the significant
variance between real-time pain experiences and recorded scores afterwards.
This study developed two devices for acquisition of real-time, continuous
in-session pain scores and gathering of ANS-modulated endodermal activity
(EDA).The experiment recruited N = 24 subjects who underwent a post-exercise
circulatory occlusion (PECO) with stretch, inducing discomfort. Subject data
were stored in a custom pain platform, facilitating extraction of time-domain
EDA features and in-session ground truth scores. Moreover, post-experiment
visual analog scale (VAS) scores were collected from each subject. Machine
learning models, namely Multi-layer Perceptron (MLP) and Random Forest (RF),
were trained using corresponding objective EDA features combined with
in-session scores and post-session scores, respectively. Over a 10-fold
cross-validation, the macro-averaged geometric mean score revealed MLP and RF
models trained with objective EDA features and in-session scores achieved
superior performance (75.9% and 78.3%) compared to models trained with
post-session scores (70.3% and 74.6%) respectively. This pioneering study
demonstrates that using continuous in-session ground truth scores significantly
enhances ML performance in pain intensity characterization, overcoming ground
truth sparsity-related issues, data imbalance, and high variance. This study
informs future objective-based ML pain system training.Comment: 18 pages, 2-page Appendix, 7 figure
Recommended from our members
Supervised Learning-Based tagSNP Selection for Genome-Wide Disease Classifications
Background: Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information redundancy from associations between SNP markers. Results: We have developed a feature selection method named Supervised Recursive Feature Addition (SRFA). This method combines supervised learning and statistical measures for the chosen candidate features/SNPs to reconcile the redundancy information and, in doing so, improve the classification performance in association studies. Additionally, we have proposed a Support Vector based Recursive Feature Addition (SVRFA) scheme in SNP-disease association analysis. Conclusions: We have proposed using SRFA with different statistical learning classifiers and SVRFA for both SNP selection and disease classification and then applying them to two complex disease data sets. In general, our approaches outperform the well-known feature selection method of Support Vector Machine Recursive Feature Elimination and logic regression-based SNP selection for disease classification in genetic association studies. Our study further indicates that both genetic and environmental variables should be taken into account when doing disease predictions and classifications for the most complex human diseases that have gene-environment interactions
Smartphone Sensor-Based Activity Recognition by Using Machine Learning and Deep Learning Algorithms
Article originally published International Journal of Machine Learning and ComputingSmartphones are widely used today, and it
becomes possible to detect the user's environmental changes by using the smartphone sensors, as demonstrated in this paper where we propose a method to identify human activities with
reasonably high accuracy by using smartphone sensor data. First, the raw smartphone sensor data are collected from two categories of human activity: motion-based, e.g., walking and running; and phone movement-based, e.g., left-right, up-down, clockwise and counterclockwise movement. Firstly, two types of features extraction are designed from the raw sensor data, and activity recognition is analyzed using machine learning classification models based on these features. Secondly, the
activity recognition performance is analyzed through the Convolutional Neural Network (CNN) model using only the raw data. Our experiments show substantial improvement in the result with the addition of features and the use of CNN model
based on smartphone sensor data with judicious learning techniques and good feature designs
Supervised learning-based tagSNP selection for genome-wide disease classifications
The article was originally published by BMC Genomics. doi:10.1186/1471-2164-9-S1-S6Comprehensive evaluation of common genetic variations through association of
single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale
is an active area in human genome research. One of the fundamental questions in a SNP-disease
association study is to find an optimal subset of SNPs with predicting power for disease status. To
find that subset while reducing study burden in terms of time and costs, one can potentially
reconcile information redundancy from associations between SNP markersResearch supports received from ICASA (Institute for Complex Additive
Systems Analysis, a division of New Mexico Tech) and the Radiology
Department of Brigham and Women's Hospital (BWH) are gratefully
acknowledged. The authors highly appreciate Dr. Liang at SUNY-Buffalo for
her invaluable help and insightful discussion during this study and Ms. Kim
Lawson at BWH Radiology Department for her manuscript editing and very
constructive comments.Supervised Recursive Feature AdditionsSupport Vector bases Recursive Feature Additioncomplex diseasegeneticsdisease prediction
Influence of Machine Learning vs. Ranking Algorithm on the Critical Dimension
Article originally published in International Journal of Future Computer and CommunicationThe critical dimension is the minimum number of
features required for a learning machine to perform with “high” accuracy, which for a specific dataset is dependent upon the learning machine and the ranking algorithm. Discovering the critical dimension, if one exists for a dataset, can help to
reduce the feature size while maintaining the learning machine’s performance. It is important to understand the influence of learning machines and ranking algorithms on critical dimension to reduce the feature size effectively. In this paper we experiment with three ranking algorithms and three
learning machines on several datasets to study their combined effect on the critical dimension. Results show the ranking algorithm has greater influence on the critical dimension than the learning machine.ICASA (Institute for Complex Additive
Systems Analysis) of New Mexico Tech and the National Institute of Justice, U.S. Department of Justice (Award No. 2010-DN-BX-K223
Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data
Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort out the datasets. In this paper, we propose a gene selection method called Recursive Feature Addition (RFA), which combines supervised learning and statistical similarity measures. We compare our method with the following gene selection methods
- …