418,683 research outputs found

    Spam Email Detection on Data Mining: A Review

    Get PDF
    As we know email is an effective tool for communication and it is the fastest way to send information from one place to another and it saves time and also cost. But the email is affected by attacks which include spam mails. Spam is unwanted email or it is bulk data that is flooding the internet with many duplication of similar message, in an attempt to force the email on people who would not otherwise choose to receive it. To address the growing of spam email on the internet the interest of spam filtering also grow accordingly. In this paper we review various spam detection technics. We are use the technics with feature selection algorithm and without feature selection algorithm and apply all the classifier of data mining tool. In this study we analyze the classifier algorithm using two different data mining tools those are WEKA and TANAGRA. Data mining is the discovery of knowledge from the large database and it is the technique of finding out new patterns in a huge data sets. Both data mining tool use different classification algorithms like K-Nearest Neighbor (K-NN), Naïve Bayes (NB) and others. Then finally, the best classifier for email spam is identified based on the accuracy of the algorithm on each data mining tools. Keywords: Classifier, Feature selection, Spam E-mail. DOI: 10.7176/JIEA/9-2-01 Publication date: April 30th 201

    Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees

    Get PDF
    This paper investigates an important problem in stream mining, i.e., classification under streaming emerging new classes or SENC. The common approach is to treat it as a classification problem and solve it using either a supervised learner or a semi-supervised learner. We propose an alternative approach by using unsupervised learning as the basis to solve this problem. The SENC problem can be decomposed into three sub problems: detecting emerging new classes, classifying for known classes, and updating models to enable classification of instances of the new class and detection of more emerging new classes. The proposed method employs completely random trees which have been shown to work well in unsupervised learning and supervised learning independently in the literature. This is the first time, as far as we know, that completely random trees are used as a single common core to solve all three sub problems: unsupervised learning, supervised learning and model update in data streams. We show that the proposed unsupervised-learning-focused method often achieves significantly better outcomes than existing classification-focused methods

    Sentiment Analysis over Online Product Reviews: A Survey

    Get PDF
    Prior to the invention of the internet while purchasing any product people used to ask the opinions to his family, friends for particular product. but now a days as the swift increase of usage of the internet, more users are motivated to write their feelings about particulars in the form of comments on different sites like Facebook, twitter, online shopping sites, blogs, etc. this comments are nothing but the sentiments of the users this may be positive, negative or neutral. There are various techniques used for summarizing the customer comments like Data mining, Text clssification, Retrieval of informtaion, and summarizing the text. People tend to write their reviews over a product over different sites. Most of the reviews are critical to conclude so it generates difficulty for usefulness of information. If anyone want to know the impact of the particular post/product then it becomes difficult to read all the comments and to classify it. Sentiment analysis is the ongoing research field in the data mining, Sentiment analysis is also referred as opinion mining. This field mainly deals with classifying the sentiments among different types of comments that are written by various users. This paper is about to discuss different techniques, challenges and applications related to sentiment analysis

    INVESTORS BEHAVIOR IN INDONESIA

    Get PDF
    This study aims to describe investor behavior in stock, mutual fund, and bank deposit. The psychology elements that are used in this research are mental accounting, representativeness, familiarity, considering the past, overconfidence, data mining, social interaction, fear and greed, status quo, and emotion. This research uses primary data with a help of questionnaire. The total respondent of this research is 110 people. Data collected by spreading questionnaire manually and online with the help of Google doc. The results showed that most of the respondents give positive respond to all of the elements. The element that has the highest mean value is familiarity element. It means that the respondent think that before they invest in something, they need to know first about that investment

    Analysis of Dynamic Mode Decomposition

    Get PDF
    In this master thesis, a study was conducted on a method known as Dynamic mode decomposition(DMD), an equation-free technique which does not require to know the underlying governing equations of the complex data. As a result of massive datasets from various resources, like experiments, simulation, historical records, etc. has led to an increasing demand for an efficient method for data mining and analysis techniques. The main goals of data mining are the description and prediction. Description involves finding patterns in the data and prediction involves predicting the system dynamics. An important aspect when analyzing an algorithm is testing. In this work, DMD-a data based technique is used to test different cases to find the underlying patterns, predict the system dynamics and for reconstruction of original data. Using real data for analyzing a new algorithm may not be appropriate due to lack of knowledge of the algorithm performance in various cases. So, testing is done on synthetic data for all the cases discussed in this work, as it is useful for visualization and to find the robustness of the new algorithm. Finally, this work makes an attempts to understand the DMD\u27s performance and limitations better for the future applications with real data

    Precision Agriculture System

    Get PDF
    The purpose of this project is Agricultural Land Suitability Evaluation for crop production. According to the weather condition and type of soil this system will predict whether crop is suitable for their land or not. Normally, when soil testing is performed that time farmers only get to know about land properties and what kind of fertilizers they have to use to increase their crop production. So they do not get to know is the crop they are going to produce in their land is suitable or not. Our system will tell them their land suitability level with respect to environmental factors and crop type. It is data mining software which will analyze all data which consist land properties and environmental properties for crops. And after applying data mining algorithm on that data user will get the land suitability level for that crop which farmer want to produce in his land. Our project is user friendly interface so user can easily use it and it is cost efficient.

    Real-time tracking and mining of users’ actions over social media

    Get PDF
    © 2020, ComSIS Consortium. All rights reserved. With the advent of Web 2.0 technologies and social media, companies are actively looking for ways to know and understand what users think and say about their products and services. Indeed, it has become the practice that users go online using social media like Facebook to raise concerns, make comments, and share recommendations. All these actions can be tracked in real-time and then mined using advanced techniques like data analytics and sentiment analysis. This paper discusses such tracking and mining through a system called Social Miner that allows companies to make decisions about what, when, and how to respond to users’ actions over social media. Questions that Social Miner allows to answer include what actions were frequently executed and why certain actions were executed more than others

    Bridging the Gap Between the Least and the Most Influential Twitter Users

    Get PDF
    Social networks play an increasingly important role in shaping the behaviour of users of the Web. Conceivably Twitter stands out from the others, not only for the platform's simplicity but also for the great influence that the messages sent over the network can have. The impact of such messages determines the influence of a Twitter user and is what tools such as Klout, PeerIndex or TwitterGrader aim to calculate. Reducing all the factors that make a person influential into a single number is not an easy task, and the effort involved could become useless if the Twitter users do not know how to improve it. In this paper we identify what specific actions should be carried out for a Twitterer to increase their influence in each of above-mentioned tools applying, for this purpose, data mining techniques based on classification and regression algorithms to the information collected from a set of Twitter users.This work has been partially founded by the European Commission Project ”SiSOB: An Observatorium for Science in Society based in Social Models” (http://sisob.lcc.uma.es) (Contract no.: FP7 266588), ”Sistemas Inalámbricos de Gestión de Información Crítica” (with code number TIN2011-23795 and granted by the MEC, Spain) and ”3DTUTOR: Sistema Interoperable de Asistencia y Tutoría Virtual e Inteligente 3D” (with code number IPT-2011-0889- 900000 and granted by the MINECO, Spain

    Comparative analysis of efficiency in the economic sectors of Lima stock exchange

    Get PDF
    Lima Stock Exchange is considered one of the smallest capital markets in Latin America, despite its favorable growth in the last five years. The performance of listed companies in the stock market may be compromised with national macroeconomic gaps and impact on the development of the economic sector, which is why this study aimed to estimate the financial efficiency of companies listed on the Lima Stock Exchange to know their performance by economic sector during the period 2015-2020. The non-parametric technique of Data Envelopment Analysis was used in a set of 76 companies belonging to the Agrarian, Industrial, Public Services, and Mining sectors; finally, the change in performance was estimated through the Malmquist Productivity Index. The results indicated that 2016 was the most efficient year for companies and 2018 the least efficient year. The most efficient sector was Mining with an efficiency of 0.56, Agrarian sector was the least efficient and with the highest volatility. Likewise, productivity results concluded that technological change does not contribute to productivity, while efficiency change contributed positively to all sectors. In addition, a trend of annual growth and stability of the Mining sector was evidenced, which, in the face of the economic crisis, only had a slight drop of -1.7% in its productivity, unlike the other sectors that were notably affected. Results of this study reflected that the macroeconomic indicators of the country often don't affect the performance of the economic sector, to know the performance of the companies it is necessary to analyze the characteristic factors of each sector. It is recommended to use the results of this study as a complementary instrument for making investment decisions in Lima Stock Exchange-listed companies
    corecore