44 research outputs found

    Clustering algorithm for D2D communication in next generation cellular networks : thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Engineering, Massey University, Auckland, New Zealand

    Get PDF
    Next generation cellular networks will support many complex services for smartphones, vehicles, and other devices. To accommodate such services, cellular networks need to go beyond the capabilities of their previous generations. Device-to-Device communication (D2D) is a key technology that can help fulfil some of the requirements of future networks. The telecommunication industry expects a significant increase in the density of mobile devices which puts more pressure on centralized schemes and poses risk in terms of outages, poor spectral efficiencies, and low data rates. Recent studies have shown that a large part of the cellular traffic pertains to sharing popular contents. This highlights the need for decentralized and distributive approaches to managing multimedia traffic. Content-sharing via D2D clustered networks has emerged as a popular approach for alleviating the burden on the cellular network. Different studies have established that D2D communication in clusters can improve spectral and energy efficiency, achieve low latency while increasing the capacity of the network. To achieve effective content-sharing among users, appropriate clustering strategies are required. Therefore, the aim is to design and compare clustering approaches for D2D communication targeting content-sharing applications. Currently, most of researched and implemented clustering schemes are centralized or predominantly dependent on Evolved Node B (eNB). This thesis proposes a distributed architecture that supports clustering approaches to incorporate multimedia traffic. A content-sharing network is presented where some D2D User Equipment (DUE) function as content distributors for nearby devices. Two promising techniques are utilized, namely, Content-Centric Networking and Network Virtualization, to propose a distributed architecture, that supports efficient content delivery. We propose to use clustering at the user level for content-distribution. A weighted multi-factor clustering algorithm is proposed for grouping the DUEs sharing a common interest. Various performance parameters such as energy consumption, area spectral efficiency, and throughput have been considered for evaluating the proposed algorithm. The effect of number of clusters on the performance parameters is also discussed. The proposed algorithm has been further modified to allow for a trade-off between fairness and other performance parameters. A comprehensive simulation study is presented that demonstrates that the proposed clustering algorithm is more flexible and outperforms several well-known and state-of-the-art algorithms. The clustering process is subsequently evaluated from an individual user’s perspective for further performance improvement. We believe that some users, sharing common interests, are better off with the eNB rather than being in the clusters. We utilize machine learning algorithms namely, Deep Neural Network, Random Forest, and Support Vector Machine, to identify the users that are better served by the eNB and form clusters for the rest of the users. This proposed user segregation scheme can be used in conjunction with most clustering algorithms including the proposed multi-factor scheme. A comprehensive simulation study demonstrates that with such novel user segregation, the performance of individual users, as well as the whole network, can be significantly improved for throughput, energy consumption, and fairness

    Stochastic information granules extraction for graph embedding and classification

    Get PDF
    3noopenGraphs are data structures able to efficiently describe real-world systems and, as such, have been extensively used in recent years by many branches of science, including machine learning engineering. However, the design of efficient graph-based pattern recognition systems is bottlenecked by the intrinsic problem of how to properly match two graphs. In this paper, we investigate a granular computing approach for the design of a general purpose graph-based classification system. The overall framework relies on the extraction of meaningful pivotal substructures on the top of which an embedding space can be build and in which the classification can be performed without limitations. Due to its importance, we address whether information can be preserved by performing stochastic extraction on the training data instead of performing an exhaustive extraction procedure which is likely to be unfeasible for large datasets. Tests on benchmark datasets show that stochastic extraction can lead to a meaningful set of pivotal substructures with a much lower memory footprint and overall computational burden, making the proposed strategies suitable also for dealing with big datasets.openAccademicoBaldini, Luca; Martino, Alessio; Rizzi, AntonelloBaldini, Luca; Martino, Alessio; Rizzi, Antonell

    Behaviour modelling with data obtained from the Internet and contributions to cluster validation

    Get PDF
    [EN]This PhD thesis makes contributions in modelling behaviours found in different types of data acquired from the Internet and in the field of clustering evaluation. Two different types of Internet data were processed, on the one hand, internet traffic with the objective of attack detection and on the other hand, web surfing activity with the objective of web personalization, both data being of sequential nature. To this aim, machine learning techniques were applied, mostly unsupervised techniques. Moreover, contributions were made in cluster evaluation, in order to make easier the selection of the best partition in clustering problems. With regard to network attack detection, first, gureKDDCup database was generated which adds payload data to KDDCup99 connection attributes because it is essential to detect non-flood attacks. Then, by modelling this data a network Intrusion Detection System (nIDS) was proposed where context-independent payload processing was done obtaining satisfying detection rates. In the web mining context web surfing activity was modelled for web personalization. In this context, generic and non-invasive systems to extract knowledge were proposed just using the information stored in webserver log files. Contributions were done in two senses: in problem detection and in link suggestion. In the first application a meaningful list of navigation attributes was proposed for each user session to group and detect different navigation profiles. In the latter, a general and non-invasive link suggestion system was proposed which was evaluated with satisfactory results in a link prediction context. With regard to the analysis of Cluster Validity Indices (CVI), the most extensive CVI comparison found up to a moment was carried out using a partition similarity measure based evaluation methodology. Moreover, we analysed the behaviour of CVIs in a real web mining application with elevated number of clusters in which they tend to be unstable. We proposed a procedure which automatically selects the best partition analysing the slope of different CVI values.[EU]Doktorego-tesi honek internetetik eskuratutako datu mota ezberdinetan aurkitutako portaeren modelugintzan eta multzokatzeen ebaluazioan egiten ditu bere ekarpenak. Zehazki, bi mota ezberdinetako interneteko datuak prozesatu dira: batetik, interneteko trafikoa, erasoak hautemateko helburuarekin; eta bestetik, web nabigazioen jarduera, weba pertsonalizatzeko helburuarekin; bi datu motak izaera sekuentzialekoak direlarik. Helburu hauek lortzeko, ikasketa automatikoko teknikak aplikatu dira, nagusiki gainbegiratu-gabeko teknikak. Testuinguru honetan, multzokatzeen partizio onenaren aukeraketak dakartzan arazoak gutxitzeko multzokatzeen ebaluazioan ere ekarpenak egin dira. Sareko erasoen hautemateari dagokionez, lehenik gureKDDCup datubasea eratu da KDDCup99-ko konexio atributuei payload-ak (sareko paketeen datu eremuak) gehituz, izan ere, ez-flood erasoak (pakete gutxi erabiltzen dituzten erasoak) hautemateko ezinbestekoak baitira. Ondoren, datu hauek modelatuz testuinguruarekiko independenteak diren payload prozesaketak oinarri dituen sareko erasoak hautemateko sistema (network Intrusion Detection System (nIDS)) bat proposatu da maila oneko eraso hautemate-tasak lortuz. Web meatzaritzaren testuinguruan, weba pertsonalizatzeko helburuarekin web nabigazioen jarduera modelatu da. Honetarako, web zerbizarietako lorratz fitxategietan metatutako informazioa soilik erabiliz ezagutza erabilgarria erauziko duen sistema orokor eta ez-inbasiboak proposatu dira. Ekarpenak bi zentzutan eginaz: arazoen hautematean eta esteken iradokitzean. Lehen aplikazioan sesioen nabigazioa adierazteko atributu esanguratsuen zerrenda bat proposatu da, gero nabigazioak multzokatu eta nabigazio profil ezberdinak hautemateko. Bigarren aplikazioan, estekak iradokitzeko sistema orokor eta ez-inbasibo bat proposatu da, eta berau, estekak aurresateko testuinguruan ebaluatu da emaitza onak lortuz. Multzokatzeak balioztatzeko indizeen (Cluster Validity Indices (CVI)) azterketari dagokionez, gaurdaino aurkitu den CVI-en konparaketa zabalena burutu da partizioen antzekotasun neurrian oinarritutako ebaluazio metodologia erabiliz. Gainera, CVI-en portaera aztertu da egiazko web meatzaritza aplikazio batean normalean baino multzo kopuru handiagoak dituena, non CVI-ek ezegonkorrak izateko joera baitute. Arazo honi aurre eginaz, CVI ezberdinek partizio ezberdinetarako lortzen dituzten balioen maldak aztertuz automatikoki partiziorik onena hautatzen duen prozedura proposatu da.[ES]Esta tesis doctoral hace contribuciones en el modelado de comportamientos encontrados en diferentes tipos de datos adquiridos desde internet y en el campo de la evaluación del clustering. Dos tipos de datos de internet han sido procesados: en primer lugar el tráfico de internet con el objetivo de detectar ataques; y en segundo lugar la actividad generada por los usuarios web con el objetivo de personalizar la web; siendo los dos tipos de datos de naturaleza secuencial. Para este fin, se han aplicado técnicas de aprendizaje automático, principalmente técnicas no-supervisadas. Además, se han hecho aportaciones en la evaluación de particiones de clusters para facilitar la selección de la mejor partición de clusters. Respecto a la detección de ataques en la red, primero, se generó la base de datos gureKDDCup que añade el payload (la parte de contenido de los paquetes de la red) a los atributos de la conexión de KDDCup99 porque el payload es esencial para la detección de ataques no-flood (ataques que utilizan pocos paquetes). Después, se propuso un sistema de detección de intrusos (network Intrusion Detection System (IDS)) modelando los datos de gureKDDCup donde se propusieron varios preprocesos del payload independientes del contexto obteniendo resultados satisfactorios. En el contexto de la minerı́a web, se ha modelado la actividad de la navegación web para la personalización web. En este contexto se propondrán sistemas genéricos y no-invasivos para la extracción del conocimiento, utilizando únicamente la información almacenada en los ficheros log de los servidores web. Se han hecho aportaciones en dos sentidos: en la detección de problemas y en la sugerencia de links. En la primera aplicación, se propuso una lista de atributos significativos para representar las sesiones de navegación web para después agruparlos y detectar diferentes perfiles de navegación. En la segunda aplicación, se propuso un sistema general y no-invasivo para sugerir links y se evaluó en el contexto de predicción de links con resultados satisfactorios. Respecto al análisis de ı́ndices de validación de clusters (Cluster Validity Indices (CVI)), se ha realizado la más amplia comparación encontrada hasta el momento que utiliza la metodologı́a de evaluación basada en medidas de similitud de particiones. Además, se ha analizado el comportamiento de los CVIs en una aplicación real de minerı́a web con un número elevado de clusters, contexto en el que los CVIs tienden a ser inestables, ası́ que se propuso un procedimiento para la selección automática de la mejor partición en base a la pendiente de los valores de diferentes CVIs.Grant of the Basque Government (ref.: BFI08.226); Grant of Ministry of Economy and Competitiveness of the Spanish Government (ref.: BES-2011-045989); Research stay grant of Spanish Ministry of Economy and Competitiveness (ref.: EEBB-I-14-08862); University of the Basque Country UPV/EHU (BAILab, grant UFI11/45); Department of Education, Universities and Research of the Basque Government (grant IT-395-10); Ministry of Economy and Competitiveness of the Spanish Government and by the European Regional Development Fund - ERDF (eGovernAbility, grant TIN2014-52665-C2-1-R)

    Exploratory approach for network behavior clustering in LoRaWAN

    Get PDF
    The interest in the Internet of Things (IoT) is increasing both as for research and market perspectives. Worldwide, we are witnessing the deployment of several IoT networks for different applications, spanning from home automation to smart cities. The majority of these IoT deployments were quickly set up with the aim of providing connectivity without deeply engineering the infrastructure to optimize the network efficiency and scalability. The interest is now moving towards the analysis of the behavior of such systems in order to characterize and improve their functionality. In these IoT systems, many data related to device and human interactions are stored in databases, as well as IoT information related to the network level (wireless or wired) is gathered by the network operators. In this paper, we provide a systematic approach to process network data gathered from a wide area IoT wireless platform based on LoRaWAN (Long Range Wide Area Network). Our study can be used for profiling IoT devices, in order to group them according to their characteristics, as well as detecting network anomalies. Specifically, we use the k-means algorithm to group LoRaWAN packets according to their radio and network behavior. We tested our approach on a real LoRaWAN network where the entire captured traffic is stored in a proprietary database. Quite important is the fact that LoRaWAN captures, via the wireless interface, packets of multiple operators. Indeed our analysis was performed on 997, 183 packets with 2169 devices involved and only a subset of them were known by the considered operator, meaning that an operator cannot control the whole behavior of the system but on the contrary has to observe it. We were able to analyze clusters’ contents, revealing results both in line with the current network behavior and alerts on malfunctioning devices, remarking the reliability of the proposed approach

    Exploratory approach for network behavior clustering in LoRaWAN

    Get PDF
    The interest in the Internet of Things (IoT) is increasing both as for research and market perspectives. Worldwide, we are witnessing the deployment of several IoT networks for different applications, spanning from home automation to smart cities. The majority of these IoT deployments were quickly set up with the aim of providing connectivity without deeply engineering the infrastructure to optimize the network efficiency and scalability. The interest is now moving towards the analysis of the behavior of such systems in order to characterize and improve their functionality. In these IoT systems, many data related to device and human interactions are stored in databases, as well as IoT information related to the network level (wireless or wired) is gathered by the network operators. In this paper, we provide a systematic approach to process network data gathered from a wide area IoT wireless platform based on LoRaWAN (Long Range Wide Area Network). Our study can be used for profiling IoT devices, in order to group them according to their characteristics, as well as detecting network anomalies. Specifically, we use the k-means algorithm to group LoRaWAN packets according to their radio and network behavior. We tested our approach on a real LoRaWAN network where the entire captured traffic is stored in a proprietary database. Quite important is the fact that LoRaWAN captures, via the wireless interface, packets of multiple operators. Indeed our analysis was performed on 997, 183 packets with 2169 devices involved and only a subset of them were known by the considered operator, meaning that an operator cannot control the whole behavior of the system but on the contrary has to observe it. We were able to analyze clusters’ contents, revealing results both in line with the current network behavior and alerts on malfunctioning devices, remarking the reliability of the proposed approach

    Calibration techniques for binary classification problems: A comparative analysis

    Get PDF
    Calibrating a classification system consists in transforming the output scores, which somehow state the confidence of the classifier regarding the predicted output, into proper probability estimates. Having a well-calibrated classifier has a non-negligible impact on many real-world applications, for example decision making systems synthesis for anomaly detection/fault prediction. In such industrial scenarios, risk assessment is certainly related to costs which must be covered. In this paper we review three state-of-the-art calibration techniques (Platt’s Scaling, Isotonic Regression and SplineCalib) and we propose three lightweight procedures based on a plain fitting of the reliability diagram. Computational results show that the three proposed techniques have comparable performances with respect to the three state-of-the-art approaches

    Exploratory approach for network behavior clustering in LoRaWAN

    Get PDF
    The interest in the Internet of Things (IoT) is increasing both as for research and market perspectives. Worldwide, we are witnessing the deployment of several IoT networks for different applications, spanning from home automation to smart cities. The majority of these IoT deployments were quickly set up with the aim of providing connectivity without deeply engineering the infrastructure to optimize the network efficiency and scalability. The interest is now moving towards the analysis of the behavior of such systems in order to characterize and improve their functionality. In these IoT systems, many data related to device and human interactions are stored in databases, as well as IoT information related to the network level (wireless or wired) is gathered by the network operators. In this paper, we provide a systematic approach to process network data gathered from a wide area IoT wireless platform based on LoRaWAN (Long Range Wide Area Network). Our study can be used for profiling IoT devices, in order to group them according to their characteristics, as well as detecting network anomalies. Specifically, we use the k-means algorithm to group LoRaWAN packets according to their radio and network behavior. We tested our approach on a real LoRaWAN network where the entire captured traffic is stored in a proprietary database. Quite important is the fact that LoRaWAN captures, via the wireless interface, packets of multiple operators. Indeed our analysis was performed on 997, 183 packets with 2169 devices involved and only a subset of them were known by the considered operator, meaning that an operator cannot control the whole behavior of the system but on the contrary has to observe it. We were able to analyze clusters\u2019 contents, revealing results both in line with the current network behavior and alerts on malfunctioning devices, remarking the reliability of the proposed approach

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Modelling and recognition of protein contact networks by multiple kernel learning and dissimilarity representations

    Get PDF
    Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
    corecore