2 research outputs found

    Detecting Malicious Websites Using Machine Learning

    Get PDF
    The growing use of the internet resulted in emerging of new websites every day (Total number of Websites - Internet Live Stats, 2020). Web surfing has become important for everyone regardless of their occupation, age or location. However, as the use of the internet is increasing so is the vulnerability to malware attacks through malicious websites (Softpedia, 2016). Identifying and dealing with such malicious website has been quite difficult in the past as it is quite challenging to separate good websites from bad websites. However, by using machine learning algorithms on large datasets it is now possible to detect such websites beforehand. Classifiers trained using algorithms such as logistic regression and Support Vector Machine (SVM) can be used to detect malicious websites and the users can be warned about the risk before they visit such sites. This project focuses on using a variety of different classification algorithms to distinguish whether a website is malicious or not using the Kaggle Malicious and Benign Website Dataset. We have showcased that it is possible to detect malicious websites with a reasonable amount of certainty (90% of the 75 malicious websites in the test set were identified) using machine learning models. We have also determined the features that were critical in predicting the likelihood of a website being malicious. Most of our key features are easily available (URL Length, number of Special characters, Country, Age of website)

    Finding Outliers in Satellite Patterns by Learning Pattern Identities

    Get PDF
    Spacecrafts provide a large set of on-board components information such as their temperature, power and pressure. This information is constantly monitored by engineers, who capture the outliers and determine whether the situation is abnormal or not. However, due to the large quantity of information, only a small part of the data is being processed or used to perform anomaly prediction. A common accepted research concept for anomaly prediction as described in literature yields on using projections, based on probabilities, estimated on learned patterns from the past (Fujimaki et al., 2005) and data mining methods to enhance the conventional diagnosis approach (Li et al., 2010). Most of them conclude on the need to build a status vector. We propose an algorithm for efficient outlier detection that builds an identity chart of the patterns using the past data based on their curve fitting information. It detects the functional units of the patterns without apriori knowledge with the intent to learn its structure and to reconstruct the sequence of events described by the signal. On top of statistical elements, each pattern is allotted a characteristics chart. This pattern identity enables fast pattern matching across the data. The extracted features allow classification with regular clustering methods like support vector machines (SVM). The algorithm has been tested and evaluated using real satellite telemetry data. The outcome and performance show promising results for faster anomaly prediction
    corecore