2,967 research outputs found

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA's current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyse

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA’s current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyses

    Data Mining Techniques in Telecommunication Company

    Get PDF
    Due to emerging of amalgam amount of data from variety sources, the data mining has become a hot trend in field of Computer Science. Data mining extracts useful pattern and information from huge amount of existing data with the help of machine learning algorithms that can be helpful in solving many sophisticated problems. Telecommunication companies also generates big amount of data from providing services to their customers, besides that telecommunication companies suffers from many problems like fraud, Customer churn and …etc. The generated amount of data from these companies can help them to address the solution for their problems such as Customer Churn. Customer churn indicates to the event when a customer stops using the service of a company and starts to use the service of another company. Churning of a Customer plays a vital role in having a sustainable business development for a telecommunication company since attracting new customers do not profit a company without retaining the old ones. Data mining can address the problem by predicting the occurrence of customer churn in Telecom Company, which helps the company to be proactive in this event and can have the chance to retain them before the churn occurs. In this study, I have chosen two open Telecom Churn data sets and have applied Support Vector Machine, Logistic Regression and Decision Tree Machine Learning Algorithms on each data sets independently, which conclude my work to six experiments. I have used k-fold cross validation as validation technique during my experiments and confusion matrix for calculating the accuracy of each algorithm, the result of experiments will provide the accuracy of each algorithm in churn prediction for each data set. At the end we will have a general comparison table from all six experiments which will show the algorithms performance summary and will indicate which algorithm will outperform the others

    Feature Extraction and Duplicate Detection for Text Mining: A Survey

    Get PDF
    Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user

    Intelligent Management and Efficient Operation of Big Data

    Get PDF
    This chapter details how Big Data can be used and implemented in networking and computing infrastructures. Specifically, it addresses three main aspects: the timely extraction of relevant knowledge from heterogeneous, and very often unstructured large data sources, the enhancement on the performance of processing and networking (cloud) infrastructures that are the most important foundational pillars of Big Data applications or services, and novel ways to efficiently manage network infrastructures with high-level composed policies for supporting the transmission of large amounts of data with distinct requisites (video vs. non-video). A case study involving an intelligent management solution to route data traffic with diverse requirements in a wide area Internet Exchange Point is presented, discussed in the context of Big Data, and evaluated.Comment: In book Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence, IGI Global, 201

    Machine Learning Based Crop Prediction on Region Wise Weather Data

    Get PDF
    Agriculture is a primordial occupation for human civilization, whereby farmers cultivate domesticated species of food. It refers to farming in general, which is an art and science that attempts to reform a component of the Earth's exterior through the cultivation of plants and other crops, as well as raising livestock for sustenance or other necessities for the soul and economic gain. As a result of the vital role that sustainable agriculture plays in the overall health of the nation, this sector of the economy has been the incubator for some of the most cutting-edge technological advances in recent history. Scientists and farmers have been working together to discover new methods that will allow them to increase crop production while simultaneously decreasing their water consumption and lessening their negative effects on the environment.  Machine learning, deep learning, and a number of other methodologies are some examples of these approaches. A crop's expansion and maturation are both heavily influenced by the climate in which it is grown. The local climate, namely its wind speed, temperature, rainfall, and humidity,  is the most exigent factor in determining the advancement or failure of crop production. If the weather is predicted prior to crop cultivation, it will be beneficial to the farmer. Machine learning is a new innovation that can solve people’s real-life problems. It is a technique where a machine can act like a human and learn through experiences and the use of different types of data. Now a day, Agriculture is one of the fields of machine learning where we use different types of machine learning algorithms to predict crop production based on climate data which can benefited farmers to increase the production of the crop. In these studies, we are going to predict crop yield using LSTM based on predicted weather data
    • …
    corecore