93 research outputs found
On Maximal Unbordered Factors
Given a string of length , its maximal unbordered factor is the
longest factor which does not have a border. In this work we investigate the
relationship between and the length of the maximal unbordered factor of
. We prove that for the alphabet of size the expected length
of the maximal unbordered factor of a string of length~ is at least
(for sufficiently large values of ). As an application of this result, we
propose a new algorithm for computing the maximal unbordered factor of a
string.Comment: Accepted to the 26th Annual Symposium on Combinatorial Pattern
Matching (CPM 2015
A suffix tree or not a suffix tree?
In this paper we study the structure of suffix trees. Given an unlabeled tree τ on n nodes and suffix links of its internal nodes, we ask the question ”Is τ a suffix tree?”, i.e., is there a string S whose suffix tree has the same topological structure as τ? We place no restrictions on S, in particular we do not require that S ends with a unique symbol. This corresponds to considering the more general definition of implicit or extended suffix trees. Such general suffix trees have many applications and are for example needed to allow efficient updates when suffix trees are built online. Deciding if τ is a suffix tree is not an easy task, because, with no restrictions on the final symbol, we cannot guess the length of a string that realizes τ from the number of leaves. And without an upper bound on the length of such a string, it is not even clear how to solve the problem by an exhaustive search. In this paper, we prove that τ is a suffix tree if and only if it is realized by a string S of length n−1, and we give a linear-time algorithm for inferring S when the first letter on each edge is known. This generalizes the work of I et al. [Discrete Appl. Math. 163, 2014]
Development of efiicient algorithms for identifying users in computer access
Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 26/05/2017. Tesis formato europeo (compendio de artículos)Actualmente los ciberataques son un problema serio y cada vez más frecuente en organizaciones, empresas e instituciones de todo el mundo. Se pueden definir como el acceso, transferencia o manipulación no autorizada de información de un ordenador o centro de datos. Los datos confidenciales en empresas y organizaciones incluyen propiedad intelectual, información financiera, información médica, datos personales de tarjetas de crédito y otros tipos de información dependiendo del negocio y la industria involucrada. En esta tesis se realizan varias contribuciones dentro del campo de Detección de Anomalías (AD), Sistema de Detección de Intrusos (IDS) y Detección de Fugas de Información (DLD). Una de las principales aportaciones común a los tres campos mencionados es el desarrollo de una estructura dinámica de datos para representar el comportamiento real y único de los usuarios, lo que permite que cada uno tenga una huella digital que lo identifica. Otras aportaciones están en la línea de la aplicación de técnicas de inteligencia artificial (IA), tanto en el procesamiento de los datos como en el desarrollo de meta clasificadores (combinación de varias técnicas de IA), por ejemplo: árboles de decisión C4.5 y UCS, máquinas de vectores soporte (SVM), redes neuronales, y técnicas como vecinos cercanos (K-NN), entre otras. Se han aplicado con buenos resultados a la detección de intrusos y han sido validadas con bases de datos públicas como Unix, KDD99, y con una base de datos gubernamental de la república del Ecuador. Dentro del campo de detección de anomalías, se han usado algoritmos bio-inspirados para la identificación de comportamientos anómalos de los usuarios, como los sistemas inmunes artificiales y la selección negativa, además de otros algoritmos de alineamiento de secuencias, como el de Knuth Morris Pratt, para identificar subsecuencias posiblemente fraudulentas. Finalmente, en el ámbito de detección de fugas de información, se han desarrollado algoritmos aplicando técnicas estadísticas como las cadenas de Markov a la secuencia de ejecución de tareas de un usuario en un sistema informático, obteniendo buenos resultados que han sido comprobados con bases de datos secuenciales públicas y privadas.Cyber-attacks are currently a serious problem and are becoming increasingly frequent in organizations, companies and institutions worldwide. It can be defined as the unauthorized access, transfer or manipulation of a computer or data center. Confidential data in companies and organizations include intellectual property, financial information, medical information, personal credit card information and other information depending on the business and industry involved. In this thesis, various contributions are made within the field of Anomaly Detection (AD), Intruder Detection Systems (IDS) and Data Leak Detection (DLD). One of the main contributions common to the three aforementioned fields is the development of a dynamic data structure to represent the real and unique user behaviour, which allows each user to have a digital fingerprint that identifies them. Other contributions are related to the application of artificial intelligence (AI) techniques, both in data processing and in the development of meta-classifiers (combination of various AI techniques), for example C4.5, UCS, SVM, neural networks and K-NN, among others. They have been successfully applied to the detection of intruders and have been validated against public data bases such as UNIX, KDD99 and against a government database of the Republic of Ecuador. In the field of anomaly detection, bioinspired algorithms have been used in the detection of anomalous behaviours, such as artificial immune systems and negative selection, in addition to other sequence alignment algorithms, such as the Knuth-Morris-Pratt (KMP) string matching algorithm, to identify potentially fraudulent subsequences. Lastly, in the field of data leak detection, algorithms have been developed applying statistical techniques such as Markov chains to a user's job execution sequence in an information system, obtaining good results which have been verified against sequential databases.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu
Recommended from our members
The interlocutory tool box: techniques for curtailing coincidental correctness
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonEliminating faults in software systems is important, because they can have catastrophic consequences. This can be achieved by testing and debugging. Testing involves executing the system with a test case to obtain an output. The output is evaluated against the tester’s expectations; deviation from these expectations indicates that a fault has been detected. Debugging involves using information about the fault, that was gleaned during testing, to isolate the fault in the system. Coincidental correctness is a widespread phenomenon in which a fault corrupts a program state, and despite this, the system produces an output that satisfies the tester’s expectations. Coincidental correctness can compromise the effectiveness of testing and debugging techniques.
This thesis investigated methods for alleviating coincidental correctness in testing and debugging. The investigation culminated in four techniques. The first technique is called Interlocutory Testing. Interlocutory Testing is a framework for the development of test oracles that are referred to as Interlocutory Relations. Interlocutory Relations are the first type of oracle that has been specifically designed to operate effectively in the presence of coincidental correctness. Metamorphic Testing was pioneered for testing non-testable systems. However, the effectiveness of this technique can be compromised by coincidental correctness. The second technique, Interlocutory Metamorphic Testing, is a version of Metamorphic Testing that has been integrated with Interlocutory Testing, to alleviate the impact of coincidental correctness on Metamorphic Testing. Interlocutory Mutation Testing is the third technique. This technique uses similar principles to Interlocutory Testing to alleviate the Equivalent Mutant Problem in the presence of coincidental correctness and non-determinism. Finally, the fourth technique is Interlocutory Spectrum-based Fault Localisation. This technique uses Interlocutory Relations to ameliorate the effects of coincidental correctness on fault localisation.
Each technique was empirically evaluated. The results were promising, and indicated that these techniques were capable of mitigating the impact of coincidental correctness
Recommended from our members
GPU-Acceleration of In-Memory Data Analytics
Hardware advances strongly influence the database system design. The flattening speed of CPU cores makes many-core accelerators, such as GPUs, a vital alternative to explore for processing the ever-increasing amounts of data. GPUs have a significantly higher degree of parallelism than multi-core CPUs but their cores are simpler. As a result, they do not face the power constraints limiting the parallelism of CPUs. Their trade-off, however, is the increased implementation complexity. This thesis adapts and redesigns data analytics operators to better exploit the GPU special memory and threading model. Due to the increasing memory capacity and also the user's need for fast interaction with the data, we focus on in-memory analytics.
Our techniques span different steps of the data processing pipeline: (1) Data preprocessing, (2) Query compilation, and (3) Algorithmic optimization of the operators. Our data preprocessing techniques adapt the data layout for numeric and string columns to maximize the achieved GPU memory bandwidth. Our query compilation techniques compute the optimal execution plan for conjunctive filters. We formulate \textit{memory divergence} for string matching algorithms and suggest how to eliminate it. Finally, we parallelize decompression algorithms in our compression framework \textit{Gompresso} to fit more data into the limited GPU memory. Gompresso achieves high speed-ups on GPUs over multi-core CPU state-of-the-art libraries and is suitable for any massively parallel processor
Ressourcen Optimierung von SOA-Technologien in eingebetteten Netzwerken
Embedded networks are fundamental infrastructures of many different kinds of domains, such as home or industrial automation, the automotive industry, and future smart grids. Yet they can be very heterogeneous, containing wired and wireless nodes with different kinds of resources and service capabilities, such as sensing, acting, and processing. Driven by new opportunities and business models, embedded networks will play an ever more important role in the future, interconnecting more and more devices, even from other network domains. Realizing applications for such types of networks, however, is a highly challenging task, since various aspects have to be considered, including communication between a diverse assortment of resource-constrained nodes, such as microcontrollers, as well as flexible node infrastructure. Service Oriented Architecture (SOA) with Web services would perfectly meet these unique characteristics of embedded networks and ease the development of applications. Standardized Web services, however, are based on plain-text XML, which is not suitable for microcontroller-based devices with their very limited resources due to XML's verbosity, its memory and bandwidth usage, as well as its associated significant processing overhead. This thesis presents methods and strategies for realizing efficient XML-based Web service communication in embedded networks by means of binary XML using EXI format. We present a code generation approach to create optimized and dedicated service applications in resource-constrained embedded networks. In so doing, we demonstrate how EXI grammar can be optimally constructed and applied to the Web service and service requester context. In addition, so as to realize an optimized service interaction in embedded networks, we design and develop an optimized filter-enabled service data dissemination that takes into account the individual resource capabilities of the nodes and the connection quality within embedded networks. We show different approaches for efficiently evaluating binary XML data and applying it to resource constrained devices, such as microcontrollers. Furthermore, we will present the effectful placement of binary XML filters in embedded networks with the aim of reducing both, the computational load of constrained nodes and the network traffic. Various evaluation results of V2G applications prove the efficiency of our approach as compared to existing solutions and they also prove the seamless and successful applicability of SOA-based technologies in the microcontroller-based environment
- …