689 research outputs found

    Approximate Disassembly using Dynamic Programming

    Get PDF
    Most commercial anti-virus software uses signature based techniques to detect whether a file is infected by a virus or not. However, signature based detection systems are unable to detect metamorphic viruses, since such viruses change their internal structure from generation to generation. Previous work has shown that hidden Markov models (HMMs) can be used to detect metamorphic viruses. In this technique, the code is disassembled and the resulting opcode sequences are used for training and detection. Due to the disassembly step, this process is not efficient enough to use when a decision has to be made in real time. In this project, we explore whether dynamic programming can be used to speed up the process of disassembling, with minimal loss of accuracy. Dynamic programming is generally used to solve problems having two key attributes: optimal substructure and overlapping sub problems. During each iteration our algorithm reads part of the input stream from the executable file and determines assembly instructions, thus dividing problems into sub problems. We have created a score matrix representing digraphs of the most common opcode instructions and we have implanted a dynamic program based on this scoring matrix. For various file sizes, we determine the time taken by our dynamic program and we show that our approach is significantly faster than a standard disassembler (OllyDbg). Finally, we analyze the accuracy of our results

    Classification of twitter trends using feature ranking and forward feature selection

    Get PDF
    Twitter scales 500 million tweets per day and has 316 million monthly active users. The majority of tweets are in the form of natural language. Using natural language makes it difficult to understand Twitter's data programmatically. In our research, we attempt to solve this challenge using various machine learning techniques. This thesis includes a new approach for classifying Twitter trends by adding a layer of feature selection and feature ranking. A variety of feature ranking algorithms, such as TF-IDF and bag-of-words, are used to facilitate the feature selection process. This helps in surfacing the important features, while reducing the feature space and making the classification process more efficient. Four Na�ve Bayes text classifiers (one for each class), backed by these sophisticated feature ranking and feature selection techniques, are used to successfully categorize Twitter trends. Using the bag-of-words and TF-IDF rankings, our research provides an average class precision improvement, over the current methodologies, of 33.14% and 28.67% correspondingl

    Opinion-Mining on Marglish and Devanagari Comments of YouTube Cookery Channels Using Parametric and Non-Parametric Learning Models

    Get PDF
    YouTube is a boon, and through it people can educate, entertain, and express themselves about various topics. YouTube India currently has millions of active users. As there are millions of active users it can be understood that the data present on the YouTube will be large. With India being a very diverse country, many people are multilingual. People express their opinions in a code-mix form. Code-mix form is the mixing of two or more languages. It has become a necessity to perform Sentiment Analysis on the code-mix languages as there is not much research on Indian code-mix language data. In this paper, Sentiment Analysis (SA) is carried out on the Marglish (Marathi + English) as well as Devanagari Marathi comments which are extracted from the YouTube API from top Marathi channels. Several machine-learning models are applied on the dataset along with 3 different vectorizing techniques. Multilayer Perceptron (MLP) with Count vectorizer provides the best accuracy of 62.68% on the Marglish dataset and Bernoulli Naïve Bayes along with the Count vectorizer, which gives accuracy of 60.60% on the Devanagari dataset. Multilayer Perceptron and Bernoulli Naïve Bayes are considered to be the best performing algorithms. 10-fold cross-validation and statistical testing was also carried out on the dataset to confirm the results

    A Hybrid Approach for Depression Classification: Random Forest-ANN Ensemble on Motor Activity Signals

    Full text link
    Regarding the rising number of people suffering from mental health illnesses in today's society, the importance of mental health cannot be overstated. Wearable sensors, which are increasingly widely available, provide a potential way to track and comprehend mental health issues. These gadgets not only monitor everyday activities but also continuously record vital signs like heart rate, perhaps providing information on a person's mental state. Recent research has used these sensors in conjunction with machine learning methods to identify patterns relating to different mental health conditions, highlighting the immense potential of this data beyond simple activity monitoring. In this research, we present a novel algorithm called the Hybrid Random forest - Neural network that has been tailored to evaluate sensor data from depressed patients. Our method has a noteworthy accuracy of 80\% when evaluated on a special dataset that included both unipolar and bipolar depressive patients as well as healthy controls. The findings highlight the algorithm's potential for reliably determining a person's depression condition using sensor data, making a substantial contribution to the area of mental health diagnostics.Comment: 8 page
    • …
    corecore