93 research outputs found

    On Maximal Unbordered Factors

    Get PDF
    Given a string SS of length nn, its maximal unbordered factor is the longest factor which does not have a border. In this work we investigate the relationship between nn and the length of the maximal unbordered factor of SS. We prove that for the alphabet of size σ5\sigma \ge 5 the expected length of the maximal unbordered factor of a string of length~nn is at least 0.99n0.99 n (for sufficiently large values of nn). As an application of this result, we propose a new algorithm for computing the maximal unbordered factor of a string.Comment: Accepted to the 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015

    A suffix tree or not a suffix tree?

    Get PDF
    In this paper we study the structure of suffix trees. Given an unlabeled tree τ on n nodes and suffix links of its internal nodes, we ask the question ”Is τ a suffix tree?”, i.e., is there a string S whose suffix tree has the same topological structure as τ? We place no restrictions on S, in particular we do not require that S ends with a unique symbol. This corresponds to considering the more general definition of implicit or extended suffix trees. Such general suffix trees have many applications and are for example needed to allow efficient updates when suffix trees are built online. Deciding if τ is a suffix tree is not an easy task, because, with no restrictions on the final symbol, we cannot guess the length of a string that realizes τ from the number of leaves. And without an upper bound on the length of such a string, it is not even clear how to solve the problem by an exhaustive search. In this paper, we prove that τ is a suffix tree if and only if it is realized by a string S of length n−1, and we give a linear-time algorithm for inferring S when the first letter on each edge is known. This generalizes the work of I et al. [Discrete Appl. Math. 163, 2014]

    Development of efiicient algorithms for identifying users in computer access

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 26/05/2017. Tesis formato europeo (compendio de artículos)Actualmente los ciberataques son un problema serio y cada vez más frecuente en organizaciones, empresas e instituciones de todo el mundo. Se pueden definir como el acceso, transferencia o manipulación no autorizada de información de un ordenador o centro de datos. Los datos confidenciales en empresas y organizaciones incluyen propiedad intelectual, información financiera, información médica, datos personales de tarjetas de crédito y otros tipos de información dependiendo del negocio y la industria involucrada. En esta tesis se realizan varias contribuciones dentro del campo de Detección de Anomalías (AD), Sistema de Detección de Intrusos (IDS) y Detección de Fugas de Información (DLD). Una de las principales aportaciones común a los tres campos mencionados es el desarrollo de una estructura dinámica de datos para representar el comportamiento real y único de los usuarios, lo que permite que cada uno tenga una huella digital que lo identifica. Otras aportaciones están en la línea de la aplicación de técnicas de inteligencia artificial (IA), tanto en el procesamiento de los datos como en el desarrollo de meta clasificadores (combinación de varias técnicas de IA), por ejemplo: árboles de decisión C4.5 y UCS, máquinas de vectores soporte (SVM), redes neuronales, y técnicas como vecinos cercanos (K-NN), entre otras. Se han aplicado con buenos resultados a la detección de intrusos y han sido validadas con bases de datos públicas como Unix, KDD99, y con una base de datos gubernamental de la república del Ecuador. Dentro del campo de detección de anomalías, se han usado algoritmos bio-inspirados para la identificación de comportamientos anómalos de los usuarios, como los sistemas inmunes artificiales y la selección negativa, además de otros algoritmos de alineamiento de secuencias, como el de Knuth Morris Pratt, para identificar subsecuencias posiblemente fraudulentas. Finalmente, en el ámbito de detección de fugas de información, se han desarrollado algoritmos aplicando técnicas estadísticas como las cadenas de Markov a la secuencia de ejecución de tareas de un usuario en un sistema informático, obteniendo buenos resultados que han sido comprobados con bases de datos secuenciales públicas y privadas.Cyber-attacks are currently a serious problem and are becoming increasingly frequent in organizations, companies and institutions worldwide. It can be defined as the unauthorized access, transfer or manipulation of a computer or data center. Confidential data in companies and organizations include intellectual property, financial information, medical information, personal credit card information and other information depending on the business and industry involved. In this thesis, various contributions are made within the field of Anomaly Detection (AD), Intruder Detection Systems (IDS) and Data Leak Detection (DLD). One of the main contributions common to the three aforementioned fields is the development of a dynamic data structure to represent the real and unique user behaviour, which allows each user to have a digital fingerprint that identifies them. Other contributions are related to the application of artificial intelligence (AI) techniques, both in data processing and in the development of meta-classifiers (combination of various AI techniques), for example C4.5, UCS, SVM, neural networks and K-NN, among others. They have been successfully applied to the detection of intruders and have been validated against public data bases such as UNIX, KDD99 and against a government database of the Republic of Ecuador. In the field of anomaly detection, bioinspired algorithms have been used in the detection of anomalous behaviours, such as artificial immune systems and negative selection, in addition to other sequence alignment algorithms, such as the Knuth-Morris-Pratt (KMP) string matching algorithm, to identify potentially fraudulent subsequences. Lastly, in the field of data leak detection, algorithms have been developed applying statistical techniques such as Markov chains to a user's job execution sequence in an information system, obtaining good results which have been verified against sequential databases.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Ressourcen Optimierung von SOA-Technologien in eingebetteten Netzwerken

    Get PDF
    Embedded networks are fundamental infrastructures of many different kinds of domains, such as home or industrial automation, the automotive industry, and future smart grids. Yet they can be very heterogeneous, containing wired and wireless nodes with different kinds of resources and service capabilities, such as sensing, acting, and processing. Driven by new opportunities and business models, embedded networks will play an ever more important role in the future, interconnecting more and more devices, even from other network domains. Realizing applications for such types of networks, however, is a highly challenging task, since various aspects have to be considered, including communication between a diverse assortment of resource-constrained nodes, such as microcontrollers, as well as flexible node infrastructure. Service Oriented Architecture (SOA) with Web services would perfectly meet these unique characteristics of embedded networks and ease the development of applications. Standardized Web services, however, are based on plain-text XML, which is not suitable for microcontroller-based devices with their very limited resources due to XML's verbosity, its memory and bandwidth usage, as well as its associated significant processing overhead. This thesis presents methods and strategies for realizing efficient XML-based Web service communication in embedded networks by means of binary XML using EXI format. We present a code generation approach to create optimized and dedicated service applications in resource-constrained embedded networks. In so doing, we demonstrate how EXI grammar can be optimally constructed and applied to the Web service and service requester context. In addition, so as to realize an optimized service interaction in embedded networks, we design and develop an optimized filter-enabled service data dissemination that takes into account the individual resource capabilities of the nodes and the connection quality within embedded networks. We show different approaches for efficiently evaluating binary XML data and applying it to resource constrained devices, such as microcontrollers. Furthermore, we will present the effectful placement of binary XML filters in embedded networks with the aim of reducing both, the computational load of constrained nodes and the network traffic. Various evaluation results of V2G applications prove the efficiency of our approach as compared to existing solutions and they also prove the seamless and successful applicability of SOA-based technologies in the microcontroller-based environment
    corecore