328 research outputs found

    Enabling Auditing and Intrusion Detection of Proprietary Controller Area Networks

    Get PDF
    The goal of this dissertation is to provide automated methods for security researchers to overcome ‘security through obscurity’ used by manufacturers of proprietary Industrial Control Systems (ICS). `White hat\u27 security analysts waste significant time reverse engineering these systems\u27 opaque network configurations instead of performing meaningful security auditing tasks. Automating the process of documenting proprietary protocol configurations is intended to improve independent security auditing of ICS networks. The major contributions of this dissertation are a novel approach for unsupervised lexical analysis of binary network data flows and analysis of the time series data extracted as a result. We demonstrate the utility of these methods using Controller Area Network (CAN) data sampled from passenger vehicles

    Data-Driven Understanding of Smart Service Systems Through Text Mining

    Get PDF
    Smart service systems are everywhere, in homes and in the transportation, energy, and healthcare sectors. However, such systems have yet to be fully understood in the literature. Given the widespread applications of and research on smart service systems, we used text mining to develop a unified understanding of such systems in a data-driven way. Specifically, we used a combination of metrics and machine learning algorithms to preprocess and analyze text data related to smart service systems, including text from the scientific literature and news articles. By analyzing 5,378 scientific articles and 1,234 news articles, we identify important keywords, 16 research topics, 4 technology factors, and 13 application areas. We define ???smart service system??? based on the analytics results. Furthermore, we discuss the theoretical and methodological implications of our work, such as the 5Cs (connection, collection, computation, and communications for co-creation) of smart service systems and the text mining approach to understand service research topics. We believe this work, which aims to establish common ground for understanding these systems across multiple disciplinary perspectives, will encourage further research and development of modern service systems

    Real-Time Social Network Data Mining For Predicting The Path For A Disaster

    Get PDF
    Traditional communication channels like news channels are not able to provide spontaneous information about disasters unlike social networks namely, Twitter. The present research work proposes a framework by mining real-time disaster data from Twitter to predict the path a disaster like a tornado will take. The users of Twitter act as the sensors which provide useful information about the disaster by posting first-hand experience, warnings or location of a disaster. The steps involved in the framework are – data collection, data preprocessing, geo-locating the tweets, data filtering and extrapolation of the disaster curve for prediction of susceptible locations. The framework is validated by analyzing the past events. This framework has the potential to be developed into a full-fledged system to predict and warn people about disasters. The warnings can be sent to news channels or broadcasted for pro-active action

    Mining Behavioral Patterns from Mobile Big Data

    Get PDF
    Mobile devices connected to the Internet are a ubiquitous platform that can easily record a large amount of data describing human behavior. Specifically, the data collected from mobile devices --- referred to as mobile big data reveal important social and economic information. Therefore, analyzing mobile big data is valuable for several stakeholders, ranging from smartphone manufacturers to network operators and app developers. This thesis aims to discover and understand behavioral patterns from mobile big data based on large real-world datasets. Specifically, this thesis reveals patterns from three domains: people, time, and location. First, we explore mobile big data from the people domain and propose a framework to discover users' daily activity patterns from their mobile app usage. By applying the framework to a real-world dataset consisting of 653,092 users, we successfully extract five common patterns among millions of people, including commuting, pervasive socializing, nightly entertainment, afternoon reading, and nightly socializing. Second, still from the people domain, we derive group health conditions by using their smartphone usage data. In particular, we collect mobile usage records of 452 users in North America. We then demonstrate the potential for inferring group health conditions (i.e., COVID-19 outbreak stages) by leveraging less privacy-sensitive smartphone data, including CPU usage, memory usage, and network connections. Third, we mine the behavior patterns from the time domain. We reveal the evolution of mobile app usage by conducting a longitudinal study on 1,465 users from 2012 to 2017. The results show that users' app usage significantly changes over time. However, the evolution in app-category usage and individual app usage are different in terms of popularity distribution, usage diversity, and correlations. Last, with respect to the location domain, we leverage city-scale spatiotemporal mobile app usage data to reveal urban land usage patterns. We prove the strong correlation between mobile usage behavior and location features, which brings a new angle to urban analytics.Internetiin kytketyt mobiililaitteet ovat kaikkialla läsnä oleva alusta, joka voi helposti tallentaa suuren määrän tietoja, jotka kuvaavat ihmisen käyttäytymistä. Erityisesti mobiililaitteista kerätyt tiedot, joita kutsutaan mobiiliksi massadataksi (big data), paljastavat tärkeitä sosiaalisia ja taloudellisia tietoja. Siksi mobiilin massadatan analysointi on arvokasta useille sidosryhmille älypuhelinvalmistajista verkko-operaattoreihin ja sovelluskehittäjiin. Tämän väitöskirjan tavoitteena on löytää ja ymmärtää käyttäytymismalleja mobiilista massadatasta, joka perustuu suuriin reaalimaailman tietojoukkoihin. Erityisesti tämä väitöskirja tuottaa malleja kolmelta eri alueelta: ihmisiin, aikaan ja sijaintiin liittyen. Ensinnäkin tutkimme mobiilia massadataa ihmisiin liittyen ja ehdotamme viitekehystä, jonka avulla voidaan löytää käyttäjien päivittäisiä toimintamalleja heidän mobiilisovellustensa käytön perusteella. Soveltamalla tätä viitekehystä tosielämän tietojoukkoon, joka koostuu 653 092 käyttäjästä, löysimme onnistuneesti viisi yleistä mallia miljoonien ihmisten tiedoista, joihin kuuluivat mm. tiedot työmatkoista, sosiaalisista kontakteista, yöllisestä viihteestä, iltapäivän lukemisesta ja yöllisestä seurustelusta. Toiseksi, edelleen ihmisiin liittyen, johdamme tietoja ryhmien terveysolosuhteista käyttämällä heidän älypuhelintensa käyttötietoja. Keräsimme erityisesti 452 käyttäjän mobiilikäyttötietoja Pohjois-Amerikassa. Sitten osoitamme, että on mahdollista päätellä ryhmän terveysolosuhteet (eli COVID-19-epidemiavaiheet) hyödyntämällä vähemmän yksityisyyden kannalta arkoja älypuhelintietoja, mukaan lukien suorittimen käyttö, muistin käyttö ja verkkoyhteydet. Kolmanneksi louhimme käyttäytymismalleja aikaan liittyen. Paljastamme mobiilisovellusten käytön kehityksen tekemällä pitkittäistutkimuksen 1 465 käyttäjälle vuosina 2012–2017. Tulokset osoittavat, että käyttäjien sovellusten käyttö muuttuu merkittävästi ajan myötä. Sovellusluokan käytön ja yksittäisten sovellusten käytön kehitys on kuitenkin erilainen niiden suosion jakautumisen, käytön moninaisuuden ja korrelaatioiden suhteen. Lopuksi liittyen sijaintitietoihin hyödynnämme spatiotemporaalisten mobiilisovellusten käyttötietoja suurkaupunkitasolla paljastaaksemme kaupunkien maankäyttömallit. Todistamme vahvan korrelaation mobiililaitteiden käyttöön liittyvän käyttäytymisen ja sijaintiominaisuuksien välillä, mikä tuottaa uuden näkökulman kaupunkianalytiikkaan

    Estimating Movement from Mobile Telephony Data

    Get PDF
    Mobile enabled devices are ubiquitous in modern society. The information gathered by their normal service operations has become one of the primary data sources used in the understanding of human mobility, social connection and information transfer. This thesis investigates techniques that can extract useful information from anonymised call detail records (CDR). CDR consist of mobile subscriber data related to people in connection with the network operators, the nature of their communication activity (voice, SMS, data, etc.), duration of the activity and starting time of the activity and servicing cell identification numbers of both the sender and the receiver when available. The main contributions of the research are a methodology for distance measurements which enables the identification of mobile subscriber travel paths and a methodology for population density estimation based on significant mobile subscriber regions of interest. In addition, insights are given into how a mobile network operator may use geographically located subscriber data to create new revenue streams and improved network performance. A range of novel algorithms and techniques underpin the development of these methodologies. These include, among others, techniques for CDR feature extraction, data visualisation and CDR data cleansing. The primary data source used in this body of work was the CDR of Meteor, a mobile network operator in the Republic of Ireland. The Meteor network under investigation has just over 1 million customers, which represents approximately a quarter of the country’s 4.6 million inhabitants, and operates using both 2G and 3G cellular telephony technologies. Results show that the steady state vector analysis of modified Markov chain mobility models can return population density estimates comparable to population estimates obtained through a census. Evaluated using a test dataset, results of travel path identification showed that developed distance measurements achieved greater accuracy when classifying the routes CDR journey trajectories took compared to traditional trajectory distance measurements. Results from subscriber segmentation indicate that subscribers who have perceived similar relationships to geographical features can be grouped based on weighted steady state mobility vectors. Overall, this thesis proposes novel algorithms and techniques for the estimation of movement from mobile telephony data addressing practical issues related to sampling, privacy and spatial uncertainty

    Detection of traffic events from Finnish social media data

    Get PDF
    Social media has gained significant popularity and importance during the past few years and has become an essential part of many people s everyday lives. As social media users write about a broad range of topics, popular social networking sites can serve as a perfect base for various data mining and information extraction applications. One possibility among these could be the real-time detection of unexpected traffic events or anomalies, which could be used to help traffic managers to discover and mitigate problematic spots in a timely manner or to assist passengers with making informed decisions about their travel route. The purpose of this study is to develop a Finnish traffic information system that relies on social media data. The potential of using social network streams in traffic information extraction has been demonstrated in several big cities, but no study has so far investigated the possible use in smaller communities such as towns in Finland. The complexity of Finnish language also poses further challenges. The aim of the research is to investigate what methods would be the most suitable to analyse and extract information from Finnish social media messages and to incorporate these into the implementation of a practical application. In order to determine the most effective methods for the purposes of this study, an extensive literature research was performed in the fields of social media mining and textual and linguistic analysis with a special focus on frameworks and methods designed for Finnish language. In addition, a website and a mobile application were developed for data collection, analysis and demonstration. The implemented traffic event detection system is able to detect and classify incidents from the public Twitter stream. Tests of the analysis methods have determined high accuracy both in terms of textual and cluster analysis. Although certain limitations and possible improvements should be considered in the future, the ready traffic information system has already demonstrated satisfactory performance and lay the foundation for further studies and research

    Detection of traffic events from Finnish social media data

    Get PDF
    Social media has gained significant popularity and importance during the past few years and has become an essential part of many people s everyday lives. As social media users write about a broad range of topics, popular social networking sites can serve as a perfect base for various data mining and information extraction applications. One possibility among these could be the real-time detection of unexpected traffic events or anomalies, which could be used to help traffic managers to discover and mitigate problematic spots in a timely manner or to assist passengers with making informed decisions about their travel route. The purpose of this study is to develop a Finnish traffic information system that relies on social media data. The potential of using social network streams in traffic information extraction has been demonstrated in several big cities, but no study has so far investigated the possible use in smaller communities such as towns in Finland. The complexity of Finnish language also poses further challenges. The aim of the research is to investigate what methods would be the most suitable to analyse and extract information from Finnish social media messages and to incorporate these into the implementation of a practical application. In order to determine the most effective methods for the purposes of this study, an extensive literature research was performed in the fields of social media mining and textual and linguistic analysis with a special focus on frameworks and methods designed for Finnish language. In addition, a website and a mobile application were developed for data collection, analysis and demonstration. The implemented traffic event detection system is able to detect and classify incidents from the public Twitter stream. Tests of the analysis methods have determined high accuracy both in terms of textual and cluster analysis. Although certain limitations and possible improvements should be considered in the future, the ready traffic information system has already demonstrated satisfactory performance and lay the foundation for further studies and research

    Aplicação de técnicas de Clustering ao contexto da Tomada de Decisão em Grupo

    Get PDF
    Nowadays, decisions made by executives and managers are primarily made in a group. Therefore, group decision-making is a process where a group of people called participants work together to analyze a set of variables, considering and evaluating a set of alternatives to select one or more solutions. There are many problems associated with group decision-making, namely when the participants cannot meet for any reason, ranging from schedule incompatibility to being in different countries with different time zones. To support this process, Group Decision Support Systems (GDSS) evolved to what today we call web-based GDSS. In GDSS, argumentation is ideal since it makes it easier to use justifications and explanations in interactions between decision-makers so they can sustain their opinions. Aspect Based Sentiment Analysis (ABSA) is a subfield of Argument Mining closely related to Natural Language Processing. It intends to classify opinions at the aspect level and identify the elements of an opinion. Applying ABSA techniques to Group Decision Making Context results in the automatic identification of alternatives and criteria, for example. This automatic identification is essential to reduce the time decision-makers take to step themselves up on Group Decision Support Systems and offer them various insights and knowledge on the discussion they are participants. One of these insights can be arguments getting used by the decision-makers about an alternative. Therefore, this dissertation proposes a methodology that uses an unsupervised technique, Clustering, and aims to segment the participants of a discussion based on arguments used so it can produce knowledge from the current information in the GDSS. This methodology can be hosted in a web service that follows a micro-service architecture and utilizes Data Preprocessing and Intra-sentence Segmentation in addition to Clustering to achieve the objectives of the dissertation. Word Embedding is needed when we apply clustering techniques to natural language text to transform the natural language text into vectors usable by the clustering techniques. In addition to Word Embedding, Dimensionality Reduction techniques were tested to improve the results. Maintaining the same Preprocessing steps and varying the chosen Clustering techniques, Word Embedders, and Dimensionality Reduction techniques came up with the best approach. This approach consisted of the KMeans++ clustering technique, using SBERT as the word embedder with UMAP dimensionality reduction, reducing the number of dimensions to 2. This experiment achieved a Silhouette Score of 0.63 with 8 clusters on the baseball dataset, which wielded good cluster results based on their manual review and Wordclouds. The same approach obtained a Silhouette Score of 0.59 with 16 clusters on the car brand dataset, which we used as an approach validation dataset.Atualmente, as decisões tomadas por gestores e executivos são maioritariamente realizadas em grupo. Sendo assim, a tomada de decisão em grupo é um processo no qual um grupo de pessoas denominadas de participantes, atuam em conjunto, analisando um conjunto de variáveis, considerando e avaliando um conjunto de alternativas com o objetivo de selecionar uma ou mais soluções. Existem muitos problemas associados ao processo de tomada de decisão, principalmente quando os participantes não têm possibilidades de se reunirem (Exs.: Os participantes encontramse em diferentes locais, os países onde estão têm fusos horários diferentes, incompatibilidades de agenda, etc.). Para suportar este processo de tomada de decisão, os Sistemas de Apoio à Tomada de Decisão em Grupo (SADG) evoluíram para o que hoje se chamam de Sistemas de Apoio à Tomada de Decisão em Grupo baseados na Web. Num SADG, argumentação é ideal pois facilita a utilização de justificações e explicações nas interações entre decisores para que possam suster as suas opiniões. Aspect Based Sentiment Analysis (ABSA) é uma área de Argument Mining correlacionada com o Processamento de Linguagem Natural. Esta área pretende classificar opiniões ao nível do aspeto da frase e identificar os elementos de uma opinião. Aplicando técnicas de ABSA à Tomada de Decisão em Grupo resulta na identificação automática de alternativas e critérios por exemplo. Esta identificação automática é essencial para reduzir o tempo que os decisores gastam a customizarem-se no SADG e oferece aos mesmos conhecimento e entendimentos sobre a discussão ao qual participam. Um destes entendimentos pode ser os argumentos a serem usados pelos decisores sobre uma alternativa. Assim, esta dissertação propõe uma metodologia que utiliza uma técnica não-supervisionada, Clustering, com o objetivo de segmentar os participantes de uma discussão com base nos argumentos usados pelos mesmos de modo a produzir conhecimento com a informação atual no SADG. Esta metodologia pode ser colocada num serviço web que segue a arquitetura micro serviços e utiliza Preprocessamento de Dados e Segmentação Intra Frase em conjunto com o Clustering para atingir os objetivos desta dissertação. Word Embedding também é necessário para aplicar técnicas de Clustering a texto em linguagem natural para transformar o texto em vetores que possam ser usados pelas técnicas de Clustering. Também Técnicas de Redução de Dimensionalidade também foram testadas de modo a melhorar os resultados. Mantendo os passos de Preprocessamento e variando as técnicas de Clustering, Word Embedder e as técnicas de Redução de Dimensionalidade de modo a encontrar a melhor abordagem. Essa abordagem consiste na utilização da técnica de Clustering KMeans++ com o SBERT como Word Embedder e UMAP como a técnica de redução de dimensionalidade, reduzindo as dimensões iniciais para duas. Esta experiência obteve um Silhouette Score de 0.63 com 8 clusters no dataset de baseball, que resultou em bons resultados de cluster com base na sua revisão manual e visualização dos WordClouds. A mesma abordagem obteve um Silhouette Score de 0.59 com 16 clusters no dataset das marcas de carros, ao qual usamos esse dataset com validação de abordagem
    corecore