32 research outputs found
Using Graph Theoretical Methods and Traceroute to Visually Represent Hidden Networks
Within the scope of a Wide Area Network (WAN), a large geographical communication network in which a collection of networking devices communicate data to each other, an example being the spanning communication network, known as the Internet, around continents. Within WANs exists a collection of Routers that transfer network packets to other devices. An issue pertinent to WANs is their immeasurable size and density, as we are not sure of the amount, or the scope, of all the devices that exists within the network. By tracing the routes and transits of data that traverses within the WAN, we can identify routers and create both the paths and weights between devices that are communicating. However, there is the issue of hidden routers who transfer data but do not identify themselves to identification requests like Traceroute, and the undocumented edges between Routers. Like a blackbox function that outputs data in a way that we do not know the interior mechanics, we do not know all the internal components that manage the traffic within the WAN. Finding out is called the Anonymous Routing Blackbox Problem, and we will use labelled graphs, vertex and edge coloring, and pathfinding to derive solutions
Fortschritte im unüberwachten Lernen und Anwendungsbereiche: Subspace Clustering mit Hintergrundwissen, semantisches Passworterraten und erlernte Indexstrukturen
Over the past few years, advances in data science, machine learning and, in particular, unsupervised learning have enabled significant progress in many scientific fields and even in everyday life. Unsupervised learning methods are usually successful whenever they can be tailored to specific applications using appropriate requirements based on domain expertise. This dissertation shows how purely theoretical research can lead to circumstances that favor overly optimistic results, and the advantages of application-oriented research based on specific background knowledge. These observations apply to traditional unsupervised learning problems such as clustering, anomaly detection and dimensionality reduction. Therefore, this thesis presents extensions of these classical problems, such as subspace clustering and principal component analysis, as well as several specific applications with relevant interfaces to machine learning. Examples include password guessing using semantic word embeddings and learning spatial index structures using statistical models. In essence, this thesis shows that application-oriented research has many advantages for current and future research.In den letzten Jahren haben Fortschritte in der Data Science, im maschinellen Lernen und insbesondere im unüberwachten Lernen zu erheblichen Fortentwicklungen in vielen Bereichen der Wissenschaft und des täglichen Lebens geführt. Methoden des unüberwachten Lernens sind in der Regel dann erfolgreich, wenn sie durch geeignete, auf Expertenwissen basierende Anforderungen an spezifische Anwendungen angepasst werden können. Diese Dissertation zeigt, wie rein theoretische Forschung zu Umständen führen kann, die allzu optimistische Ergebnisse begünstigen, und welche Vorteile anwendungsorientierte Forschung hat, die auf spezifischem Hintergrundwissen basiert. Diese Beobachtungen gelten für traditionelle unüberwachte Lernprobleme wie Clustering, Anomalieerkennung und Dimensionalitätsreduktion. Daher werden in diesem Beitrag Erweiterungen dieser klassischen Probleme, wie Subspace Clustering und Hauptkomponentenanalyse, sowie einige spezifische Anwendungen mit relevanten Schnittstellen zum maschinellen Lernen vorgestellt. Beispiele sind das Erraten von Passwörtern mit Hilfe semantischer Worteinbettungen und das Lernen von räumlichen Indexstrukturen mit Hilfe statistischer Modelle. Im Wesentlichen zeigt diese Arbeit, dass anwendungsorientierte Forschung viele Vorteile für die aktuelle und zukünftige Forschung hat
A Graph Theoretic Perspective on Internet Topology Mapping
Understanding the topological characteristics of the Internet is an important research issue as the Internet grows with no central authority. Internet topology mapping studies help better understand the structure and dynamics of the Internet backbone. Knowing the underlying topology, researchers can better develop new protocols and services or fine-tune existing ones. Subnet-level Internet topology measurement studies involve three stages: topology collection, topology construction, and topology analysis. Each of these stages contains challenging tasks, especially when large-scale backbone topologies of millions of nodes are studied. In this dissertation, I first discuss issues in subnet-level Internet topology mapping and review state-of-the-art approaches to handle them. I propose a novel graph data indexing approach to to efficiently process large scale topology data. I then conduct an experimental study to understand how the responsiveness of routers has changed over the last decade and how it differs based on the probing mechanism. I then propose an efficient unresponsive resolution approach by incorporating our structural graph indexing technique. Finally, I introduce Cheleby, an integrated Internet topology mapping system. Cheleby first dynamically probes observed subnetworks using a team of PlanetLab nodes around the world to obtain comprehensive backbone topologies. Then, it utilizes efficient algorithms to resolve subnets, IP aliases, and unresponsive routers in the collected data sets to construct comprehensive subnet-level topologies. Sample topologies are provided at http://cheleby.cse.unr.edu
Graph Mining for Cybersecurity: A Survey
The explosive growth of cyber attacks nowadays, such as malware, spam, and
intrusions, caused severe consequences on society. Securing cyberspace has
become an utmost concern for organizations and governments. Traditional Machine
Learning (ML) based methods are extensively used in detecting cyber threats,
but they hardly model the correlations between real-world cyber entities. In
recent years, with the proliferation of graph mining techniques, many
researchers investigated these techniques for capturing correlations between
cyber entities and achieving high performance. It is imperative to summarize
existing graph-based cybersecurity solutions to provide a guide for future
studies. Therefore, as a key contribution of this paper, we provide a
comprehensive review of graph mining for cybersecurity, including an overview
of cybersecurity tasks, the typical graph mining techniques, and the general
process of applying them to cybersecurity, as well as various solutions for
different cybersecurity tasks. For each task, we probe into relevant methods
and highlight the graph types, graph approaches, and task levels in their
modeling. Furthermore, we collect open datasets and toolkits for graph-based
cybersecurity. Finally, we outlook the potential directions of this field for
future research
Combination Methods for Automatic Document Organization
Automatic document classification and clustering are useful for a wide range of applications such as organizing Web, intranet, or portal pages into topic directories, filtering news feeds or mail, focused crawling on the Web or in intranets, and many more. This thesis presents ensemble-based meta methods for supervised learning (i.e., classification based on a small amount of hand-annotated training documents). In addition, we show how these techniques can be carried forward to clustering based on unsupervised learning (i.e., automatic structuring of document corpora without training data). The algorithms are applied in a restrictive manner, i.e., by leaving out some \u27uncertain\u27 documents (rather than assigning them to inappropriate topics or clusters with low confidence). We show how restrictive meta methods can be used to combine different document representations in the context of Web document classification and author recognition. As another application for meta methods we study the combination of difierent information sources in distributed environments, such as peer-to-peer information systems. Furthermore we address the problem of semi-supervised classification on document collections using retraining. A possible application is focused Web crawling which may start with very few, manually selected, training documents but can be enhanced by automatically adding initially unlabeled, positively classified Web pages for retraining. The results of our systematic evaluation on real world data show the viability of the proposed approaches.Automatische Dokumentklassifikation und Clustering sind für eine Vielzahl von Anwendungen von Bedeutung, wie beispielsweise Organisation von Web-, Intranet- oder Portalseiten in thematische Verzeichnisse, Filterung von Nachrichtenmeldungen oder Emails, fokussiertes Crawling im Web oder in Intranets und vieles mehr. Diese Arbeit untersucht Ensemble-basierte Metamethoden für Supervised Learning (d.h. Klassifikation basierend auf einer kleinen Anzahl von manuell annotierten Trainingsdokumenten).
Weiterhin zeigen wir, wie sich diese Techniken auf Clustering basierend auf
Unsupervised Learning (d.h. die automatische Strukturierung von Dokumentkorpora
ohne Trainingsdaten) übertragen lassen. Dabei wenden wir die Algorithmen in restriktiver Form an, d.h. wir treffen keine Aussage über eine Teilmenge von "unsicheren" Dokumenten (anstatt sie mit niedriger Konfidenz ungeeigneten Themen oder Clustern
zuzuordnen).
Wir verwendenen restriktive Metamethoden um unterschiedliche Dokumentrepräsentationen, im Kontext der Klassifikation von Webdokumentem und der Autorenerkennung,
miteinander zu kombinieren. Als weitere Anwendung von Metamethoden
untersuchen wir die Kombination von unterschiedlichen Informationsquellen in
verteilten Umgebungen wie Peer-to-Peer Informationssystemen. Weiterhin betrachten
wir das Problem der Semi-Supervised Klassifikation von Dokumentsammlungen durch
Retraining. Eine mögliche Anwendung ist fokussiertesWeb Crawling, wo wir mit sehr
wenigen, manuell ausgewählten Trainingsdokumenten starten, die durch Hinzufugen
von ursprünglich nicht klassifizierten Dokumenten ergänzt werden.
Die Resultate unserer systematischen Evaluation auf realen Daten zeigen das gute
Leistungsverhalten unserer Methoden
Integrated Sensing and Communications: Recent Advances and Ten Open Challenges
It is anticipated that integrated sensing and communications (ISAC) would be
one of the key enablers of next-generation wireless networks (such as beyond 5G
(B5G) and 6G) for supporting a variety of emerging applications. In this paper,
we provide a comprehensive review of the recent advances in ISAC systems, with
a particular focus on their foundations, system design, networking aspects and
ISAC applications. Furthermore, we discuss the corresponding open questions of
the above that emerged in each issue. Hence, we commence with the information
theory of sensing and communications (SC), followed by the
information-theoretic limits of ISAC systems by shedding light on the
fundamental performance metrics. Next, we discuss their clock synchronization
and phase offset problems, the associated Pareto-optimal signaling strategies,
as well as the associated super-resolution ISAC system design. Moreover, we
envision that ISAC ushers in a paradigm shift for the future cellular networks
relying on network sensing, transforming the classic cellular architecture,
cross-layer resource management methods, and transmission protocols. In ISAC
applications, we further highlight the security and privacy issues of wireless
sensing. Finally, we close by studying the recent advances in a representative
ISAC use case, namely the multi-object multi-task (MOMT) recognition problem
using wireless signals.Comment: 26 pages, 22 figures, resubmitted to IEEE Journal. Appreciation for
the outstanding contributions of coauthors in the paper
Analyzing Granger causality in climate data with time series classification methods
Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested
Sense and Respond
Over the past century, the manufacturing industry has undergone a number of paradigm shifts: from the Ford assembly line (1900s) and its focus on efficiency to the Toyota production system (1960s) and its focus on effectiveness and JIDOKA; from flexible manufacturing (1980s) to reconfigurable manufacturing (1990s) (both following the trend of mass customization); and from agent-based manufacturing (2000s) to cloud manufacturing (2010s) (both deploying the value stream complexity into the material and information flow, respectively). The next natural evolutionary step is to provide value by creating industrial cyber-physical assets with human-like intelligence. This will only be possible by further integrating strategic smart sensor technology into the manufacturing cyber-physical value creating processes in which industrial equipment is monitored and controlled for analyzing compression, temperature, moisture, vibrations, and performance. For instance, in the new wave of the ‘Industrial Internet of Things’ (IIoT), smart sensors will enable the development of new applications by interconnecting software, machines, and humans throughout the manufacturing process, thus enabling suppliers and manufacturers to rapidly respond to changing standards. This reprint of “Sense and Respond” aims to cover recent developments in the field of industrial applications, especially smart sensor technologies that increase the productivity, quality, reliability, and safety of industrial cyber-physical value-creating processes
Machine Learning
Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience