247 research outputs found

    Advances in Data Mining Knowledge Discovery and Applications

    Get PDF
    Advances in Data Mining Knowledge Discovery and Applications aims to help data miners, researchers, scholars, and PhD students who wish to apply data mining techniques. The primary contribution of this book is highlighting frontier fields and implementations of the knowledge discovery and data mining. It seems to be same things are repeated again. But in general, same approach and techniques may help us in different fields and expertise areas. This book presents knowledge discovery and data mining applications in two different sections. As known that, data mining covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. In this book, most of the areas are covered with different data mining applications. The eighteen chapters have been classified in two parts: Knowledge Discovery and Data Mining Applications

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    Surveying port scans and their detection methodologies

    Get PDF
    Scanning of ports on a computer occurs frequently on the Internet. An attacker performs port scans of IP addresses to find vulnerable hosts to compromise. However, it is also useful for system administrators and other network defenders to detect port scans as possible preliminaries to more serious attacks. It is a very difficult task to recognize instances of malicious port scanning. In general, a port scan may be an instance of a scan by attackers or an instance of a scan by network defenders. In this survey, we present research and development trends in this area. Our presentation includes a discussion of common port scan attacks. We provide a comparison of port scan methods based on type, mode of detection, mechanism used for detection, and other characteristics. This survey also reports on the available datasets and evaluation criteria for port scan detection approaches

    Novel applications of Machine Learning to Network Traffic Analysis and Prediction

    Get PDF
    It is now clear that machine learning will be widely used in future telecommunication networks as it is increasingly used in today's networks. However, despite its increasing application and its enormous potential, there are still many areas in which the new techniques developed in the area of machine learning are not yet fully utilized. The aim of this thesis is to present the application of innovative techniques of machine learning (ML-Machine Learning) in the field of Telecommunications, and specifically to problems related to the analysis and prediction of traffic in data networks (NTAP - Network Traffic Analysis and Prediction). The applications of NTAP are very broad, so this thesis focuses on the following five specific areas: - Prediction of connectivity of wireless devices. - Security intrusion detection, using network traffic information - Classification of network traffic, using the headers of the transmitted network packets - Estimation of the quality of the experience perceived by the user (QoE) when viewing multimedia streaming, using aggregate information of the network packets - Generation of synthetic traffic associated with security attacks and use of that synthetic traffic to improve security intrusion detection algorithms. The final intention is to create prediction and analysis models that produce improvements in the NTAP areas mentioned above. With this objective, this thesis provides advances in the application of machine learning techniques to the area of NTAP. These advances consist of: - Development of new machine learning models and architectures for NTAP - Define new ways to structure and transform training data so that existing machine learning models can be applied to specific NTAP problems. - Define algorithms for the creation of synthetic network traffic associated with specific events in the operation of the network (for example, specific types of intrusions), ensuring that the new synthetic data can be used as new training data. - Extension and application of classic models of machine learning to the area of NTAP, obtaining improvements in the classification or regression metrics and/or improvements in the performance measures of the algorithms (e.g. training time, prediction time, memory needs, ...).Departamento de Teoría de la Señal y Comunicaciones e Ingeniería TelemáticaDoctorado en Tecnologías de la Información y las Telecomunicacione

    Advanced Machine Learning Techniques and Meta-Heuristic Optimization for the Detection of Masquerading Attacks in Social Networks

    Get PDF
    According to the report published by the online protection firm Iovation in 2012, cyber fraud ranged from 1 percent of the Internet transactions in North America Africa to a 7 percent in Africa, most of them involving credit card fraud, identity theft, and account takeover or h¼acking attempts. This kind of crime is still growing due to the advantages offered by a non face-to-face channel where a increasing number of unsuspecting victims divulges sensitive information. Interpol classifies these illegal activities into 3 types: • Attacks against computer hardware and software. • Financial crimes and corruption. • Abuse, in the form of grooming or “sexploitation”. Most research efforts have been focused on the target of the crime developing different strategies depending on the casuistic. Thus, for the well-known phising, stored blacklist or crime signals through the text are employed eventually designing adhoc detectors hardly conveyed to other scenarios even if the background is widely shared. Identity theft or masquerading can be described as a criminal activity oriented towards the misuse of those stolen credentials to obtain goods or services by deception. On March 4, 2005, a million of personal and sensitive information such as credit card and social security numbers was collected by White Hat hackers at Seattle University who just surfed the Web for less than 60 minutes by means of the Google search engine. As a consequence they proved the vulnerability and lack of protection with a mere group of sophisticated search terms typed in the engine whose large data warehouse still allowed showing company or government websites data temporarily cached. As aforementioned, platforms to connect distant people in which the interaction is undirected pose a forcible entry for unauthorized thirds who impersonate the licit user in a attempt to go unnoticed with some malicious, not necessarily economic, interests. In fact, the last point in the list above regarding abuses has become a major and a terrible risk along with the bullying being both by means of threats, harassment or even self-incrimination likely to drive someone to suicide, depression or helplessness. California Penal Code Section 528.5 states: “Notwithstanding any other provision of law, any person who knowingly and without consent credibly impersonates another actual person through or on an Internet Web site or by other electronic means for purposes of harming, intimidating, threatening, or defrauding another person is guilty of a public offense punishable pursuant to subdivision [...]”. IV Therefore, impersonation consists of any criminal activity in which someone assumes a false identity and acts as his or her assumed character with intent to get a pecuniary benefit or cause some harm. User profiling, in turn, is the process of harvesting user information in order to construct a rich template with all the advantageous attributes in the field at hand and with specific purposes. User profiling is often employed as a mechanism for recommendation of items or useful information which has not yet considered by the client. Nevertheless, deriving user tendency or preferences can be also exploited to define the inherent behavior and address the problem of impersonation by detecting outliers or strange deviations prone to entail a potential attack. This dissertation is meant to elaborate on impersonation attacks from a profiling perspective, eventually developing a 2-stage environment which consequently embraces 2 levels of privacy intrusion, thus providing the following contributions: • The inference of behavioral patterns from the connection time traces aiming at avoiding the usurpation of more confidential information. When compared to previous approaches, this procedure abstains from impinging on the user privacy by taking over the messages content, since it only relies on time statistics of the user sessions rather than on their content. • The application and subsequent discussion of two selected algorithms for the previous point resolution: – A commonly employed supervised algorithm executed as a binary classifier which thereafter has forced us to figure out a method to deal with the absence of labeled instances representing an identity theft. – And a meta-heuristic algorithm in the search for the most convenient parameters to array the instances within a high dimensional space into properly delimited clusters so as to finally apply an unsupervised clustering algorithm. • The analysis of message content encroaching on more private information but easing the user identification by mining discriminative features by Natural Language Processing (NLP) techniques. As a consequence, the development of a new feature extraction algorithm based on linguistic theories motivated by the massive quantity of features often gathered when it comes to texts. In summary, this dissertation means to go beyond typical, ad-hoc approaches adopted by previous identity theft and authorship attribution research. Specifically it proposes tailored solutions to this particular and extensively studied paradigm with the aim at introducing a generic approach from a profiling view, not tightly bound to a unique application field. In addition technical contributions have been made in the course of the solution formulation intending to optimize familiar methods for a better versatility towards the problem at hand. In summary: this Thesis establishes an encouraging research basis towards unveiling subtle impersonation attacks in Social Networks by means of intelligent learning techniques

    Deteção de propagação de ameaças e exfiltração de dados em redes empresariais

    Get PDF
    Modern corporations face nowadays multiple threats within their networks. In an era where companies are tightly dependent on information, these threats can seriously compromise the safety and integrity of sensitive data. Unauthorized access and illicit programs comprise a way of penetrating the corporate networks, able to traversing and propagating to other terminals across the private network, in search of confidential data and business secrets. The efficiency of traditional security defenses are being questioned with the number of data breaches occurred nowadays, being essential the development of new active monitoring systems with artificial intelligence capable to achieve almost perfect detection in very short time frames. However, network monitoring and storage of network activity records are restricted and limited by legal laws and privacy strategies, like encryption, aiming to protect the confidentiality of private parties. This dissertation proposes methodologies to infer behavior patterns and disclose anomalies from network traffic analysis, detecting slight variations compared with the normal profile. Bounded by network OSI layers 1 to 4, raw data are modeled in features, representing network observations, and posteriorly, processed by machine learning algorithms to classify network activity. Assuming the inevitability of a network terminal to be compromised, this work comprises two scenarios: a self-spreading force that propagates over internal network and a data exfiltration charge which dispatch confidential info to the public network. Although features and modeling processes have been tested for these two cases, it is a generic operation that can be used in more complex scenarios as well as in different domains. The last chapter describes the proof of concept scenario and how data was generated, along with some evaluation metrics to perceive the model’s performance. The tests manifested promising results, ranging from 96% to 99% for the propagation case and 86% to 97% regarding data exfiltration.Nos dias de hoje, várias organizações enfrentam múltiplas ameaças no interior da sua rede. Numa época onde as empresas dependem cada vez mais da informação, estas ameaças podem compremeter seriamente a segurança e a integridade de dados confidenciais. O acesso não autorizado e o uso de programas ilícitos constituem uma forma de penetrar e ultrapassar as barreiras organizacionais, sendo capazes de propagarem-se para outros terminais presentes no interior da rede privada com o intuito de atingir dados confidenciais e segredos comerciais. A eficiência da segurança oferecida pelos sistemas de defesa tradicionais está a ser posta em causa devido ao elevado número de ataques de divulgação de dados sofridos pelas empresas. Desta forma, o desenvolvimento de novos sistemas de monitorização ativos usando inteligência artificial é crucial na medida de atingir uma deteção mais precisa em curtos períodos de tempo. No entanto, a monitorização e o armazenamento dos registos da atividade da rede são restritos e limitados por questões legais e estratégias de privacidade, como a cifra dos dados, visando proteger a confidencialidade das entidades. Esta dissertação propõe metodologias para inferir padrões de comportamento e revelar anomalias através da análise de tráfego que passa na rede, detetando pequenas variações em comparação com o perfil normal de atividade. Delimitado pelas camadas de rede OSI 1 a 4, os dados em bruto são modelados em features, representando observações de rede e, posteriormente, processados por algoritmos de machine learning para classificar a atividade de rede. Assumindo a inevitabilidade de um terminal ser comprometido, este trabalho compreende dois cenários: um ataque que se auto-propaga sobre a rede interna e uma tentativa de exfiltração de dados que envia informações para a rede pública. Embora os processos de criação de features e de modelação tenham sido testados para estes dois casos, é uma operação genérica que pode ser utilizada em cenários mais complexos, bem como em domínios diferentes. O último capítulo inclui uma prova de conceito e descreve o método de criação dos dados, com a utilização de algumas métricas de avaliação de forma a espelhar a performance do modelo. Os testes mostraram resultados promissores, variando entre 96% e 99% para o caso da propagação e entre 86% e 97% relativamente ao roubo de dados.Mestrado em Engenharia de Computadores e Telemátic

    Recognizing Teamwork Activity In Observations Of Embodied Agents

    Get PDF
    This thesis presents contributions to the theory and practice of team activity recognition. A particular focus of our work was to improve our ability to collect and label representative samples, thus making the team activity recognition more efficient. A second focus of our work is improving the robustness of the recognition process in the presence of noisy and distorted data. The main contributions of this thesis are as follows: We developed a software tool, the Teamwork Scenario Editor (TSE), for the acquisition, segmentation and labeling of teamwork data. Using the TSE we acquired a corpus of labeled team actions both from synthetic and real world sources. We developed an approach through which representations of idealized team actions can be acquired in form of Hidden Markov Models which are trained using a small set of representative examples segmented and labeled with the TSE. We developed set of team-oriented feature functions, which extract discrete features from the high-dimensional continuous data. The features were chosen such that they mimic the features used by humans when recognizing teamwork actions. We developed a technique to recognize the likely roles played by agents in teams even before the team action was recognized. Through experimental studies we show that the feature functions and role recognition module significantly increase the recognition accuracy, while allowing arbitrary shuffled inputs and noisy data

    Big data-driven multimodal traffic management : trends and challenges

    Get PDF

    Combining Text Classification and Fact Checking to Detect Fake News

    Get PDF
    Due to the widespread use of fake news in social and news media, it is an emerging research topic gaining attention in today‘s world. In news media and social media, information is spread at high speed but without accuracy, and therefore detection mechanisms should be able to predict news quickly enough to combat the spread of fake news. It has the potential for a negative impact on individuals and society. Therefore, detecting fake news is important and also a technically challenging problem nowadays. The challenge is to use text classification to combat fake news. This includes determining appropriate text classification methods and evaluating how good these methods are at distinguishing between fake and non- fake news. Machine learning is helpful for building Artificial intelligence systems based on tacit knowledge because it can help us solve complex problems based on real-world data. For this reason, I proposed that integrating text classification and fact checking of check-worthy statements can be helpful in detecting fake news. I used text processing and three classifiers such as Passive Aggressive, Naïve Bayes, and Support Vector Machine to classify the news data. Text classification mainly focuses on extracting various features from texts and then incorporating these features into the classification. The big challenge in this area is the lack of an efficient method to distinguish between fake news and non-fake news due to the lack of corpora. I applied three different machine learning classifiers to two publicly available datasets. Experimental analysis based on the available dataset shows very encouraging and improved performance. Simple classification is not quite accurate in detecting fake news because the classification methods are not specialized for fake news. So I added a system that checks the news in depth sentence by sentence. Fact checking is a multi-step process that begins with the extraction of check-worthy statements. Identification of check-worthy statements is a subtask in the fact checking process, the automation of which would reduce the time and effort required to fact check a statement. In this thesis I have proposed an approach that focuses on classifying statements into check-worthy and not check-worthy, while also taking into account the context around a statement. This work shows that inclusion of context in the approach makes a significant contribution to classification, while at the same time using more general features to capture information from sentences. The aim of thischallenge is to propose an approach that automatically identifies check-worthy statements for fact checking, including the context around a statement. The results are analyzed by examining which features contributes more to classification, but also how well the approach performs. For this work, a dataset is created by consulting different fact checking organizations. It contains debates and speeches in the domain of politics. The capability of the approach is evaluated in this domain. The approach starts with extracting sentence and context features from the sentences, and then classifying the sentences based on these features. The feature set and context features are selected after several experiments, based on how well they differentiate check-worthy statements. Fact checking has received increasing attention after the 2016 United States Presidential election; so far that many efforts have been made to develop a viable automated fact checking system. I introduced a web based approach for fact checking that compares the full news text and headline with known facts such as name, location, and place. The challenge is to develop an automated application that takes claims directly from mainstream news media websites and fact checks the news after applying classification and fact checking components. For fact checking a dataset is constructed that contains 2146 news articles labelled fake, non-fake and unverified. I include forty mainstream news media sources to compare the results and also Wikipedia for double verification. This work shows that a combination of text classification and fact checking gives considerable contribution to the detection of fake news, while also using more general features to capture information from sentences

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining
    corecore