107 research outputs found

    An incremental prototyping methodology for distributed systems based on formal specifications

    This thesis presents a new incremental prototyping methodology for formally specified distributed systems. The objective of this methodology is to fill the gap which currently exists between the phase where a specification is simulated, generally using some sequential logical inference tool, and the phase where the modeled system has a reliable, efficient and maintainable distributed implementation in a main-stream object-oriented programming language. This objective is realized by application of a methodology we call Mixed Prototyping with Object-Orientation (in short: OOMP). This is an extension of an existing approach, namely Mixed Prototyping, that we have adapted to the object-oriented paradigm, of which we exploit the flexibility and inherent capability of modeling abstract entities. The OOMP process proceeds as follows. First, the source specifications are automatically translated into a class-based object-oriented language, thus providing a portable and high-level initial implementation. The generated class hierarchy is designed so that the developer may independently derive new sub-classes in order to make the prototype more efficient or to add functionalities that could not be specified with the given formalism. This prototyping process is performed incrementally in order to safely validate the modifications against the semantics of the specification. The resulting prototype can finally be considered as the end-user implementation of the specified software. The originality of our approach is that we exploit object-oriented programming techniques in the implementation of formal specifications in order to gain flexibility in the development process. Simultaneously, the object paradigm gives the means to harness this newly acquired freedom by allowing automatic generation of test routines which verify the conformance of the hand-written code with respect to the specifications. We demonstrate the generality of our prototyping scheme by applying it to a distributed collaborative diary program within the frame of CO-OPN (Concurrent Object-Oriented Petri Nets), a very powerful specification formalism which allows expressing concurrent and non-deterministic behaviours, and which provides structuring facilities such as modularity, encapsulation and genericity. An important effort has also been accomplished in the development or adaptation of distributed algorithms for cooperative symbolic resolution. These algorithms are used in the run-time support of the generated CO-OPN prototypes

    Security Testing: A Survey

    Identifying vulnerabilities and ensuring security functionality by security testing is a widely applied measure to evaluate and improve the security of software. Due to the openness of modern software-based systems, applying appropriate security testing techniques is of growing importance and essential to perform effective and efficient security testing. Therefore, an overview of actual security testing techniques is of high value both for researchers to evaluate and refine the techniques and for practitioners to apply and disseminate them. This chapter fulfills this need and provides an overview of recent security testing techniques. For this purpose, it first summarize the required background of testing and security engineering. Then, basics and recent developments of security testing techniques applied during the secure software development lifecycle, i.e., model-based security testing, code-based testing and static analysis, penetration testing and dynamic analysis, as well as security regression testing are discussed. Finally, the security testing techniques are illustrated by adopting them for an example three-tiered web-based business application

    G-Tric: enhancing triclustering evaluation using three-way synthetic datasets with ground truth

    Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2020Three-dimensional datasets, or three-way data, started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations _ features _ contexts). With an increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount.These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real three-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. G-Tric can replicate real-world datasets and create new ones that match researchers’ needs across several properties, including data type (numeric or symbolic), dimension, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled by defining the number of missing values, noise, and errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches. Besides reviewing the current state-of-the-art regarding triclustering approaches, comparison studies and evaluation metrics, this work also analyzes how the lack of frameworks to generate synthetic data influences existent evaluation methodologies, limiting the scope of performance insights that can be extracted from each algorithm. As well as exemplifying how the set of decisions made on these evaluations can impact the quality and validity of those results. Alternatively, a different methodology that takes advantage of synthetic data with ground truth is presented. This approach, combined with the proposal of an extension to an existing clustering extrinsic measure, enables to assess solutions’ quality under new perspectives

    Numerical aerodynamic simulation facility feasibility study

    There were three major issues examined in the feasibility study. First, the ability of the proposed system architecture to support the anticipated workload was evaluated. Second, the throughput of the computational engine (the flow model processor) was studied using real application programs. Third, the availability reliability, and maintainability of the system were modeled. The evaluations were based on the baseline systems. The results show that the implementation of the Numerical Aerodynamic Simulation Facility, in the form considered, would indeed be a feasible project with an acceptable level of risk. The technology required (both hardware and software) either already exists or, in the case of a few parts, is expected to be announced this year. Facets of the work described include the hardware configuration, software, user language, and fault tolerance

    On Clustering and Evaluation of Narrow Domain Short-Test Corpora

    En este trabajo de tesis doctoral se investiga el problema del agrupamiento de conjuntos especiales de documentos llamados textos cortos de dominios restringidos. Para llevar a cabo esta tarea, se han analizados diversos corpora y métodos de agrupamiento. Mas aún, se han introducido algunas medidas de evaluación de corpus, técnicas de selección de términos y medidas para la validez de agrupamiento con la finalidad de estudiar los siguientes problemas: -Determinar la relativa dificultad de un corpus para ser agrupado y estudiar algunas de sus características como longitud de los textos, amplitud del dominio, estilometría, desequilibrio de clases y estructura. -Contribuir en el estado del arte sobre el agrupamiento de corpora compuesto de textos cortos de dominios restringidos El trabajo de investigación que se ha llevado a cabo se encuentra parcialmente enfocado en el "agrupamiento de textos cortos". Este tema se considera relevante dado el modo actual y futuro en que las personas tienden a usar un "lenguaje reducido" constituidos por textos cortos (por ejemplo, blogs, snippets, noticias y generación de mensajes de textos como el correo electrónico y el chat). Adicionalmente, se estudia la amplitud del dominio de corpora. En este sentido, un corpus puede ser considerado como restringido o amplio si el grado de traslape de vocabulario es alto o bajo, respectivamente. En la tarea de categorización, es bastante complejo lidiar con corpora de dominio restringido tales como artículos científicos, reportes técnicos, patentes, etc. El objetivo principal de este trabajo consiste en estudiar las posibles estrategias para tratar con los siguientes dos problemas: a) las bajas frecuencias de los términos del vocabulario en textos cortos, y b) el alto traslape de vocabulario asociado a dominios restringidos. Si bien, cada uno de los problemas anteriores es un reto suficientemente alto, cuando se trata con textos cortos de dominios restringidos, la complejidad del problema se incrPinto Avendaño, DE. (2008). On Clustering and Evaluation of Narrow Domain Short-Test Corpora [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2641Palanci

    Controlling for confounding network properties in hypothesis testing and anomaly detection

    An important task in network analysis is the detection of anomalous events in a network time series. These events could merely be times of interest in the network timeline or they could be examples of malicious activity or network malfunction. Hypothesis testing using network statistics to summarize the behavior of the network provides a robust framework for the anomaly detection decision process. Unfortunately, choosing network statistics that are dependent on confounding factors like the total number of nodes or edges can lead to incorrect conclusions (e.g., false positives and false negatives). In this dissertation we describe the challenges that face anomaly detection in dynamic network streams regarding confounding factors. We also provide two solutions to avoiding error due to confounding factors: the first is a randomization testing method that controls for confounding factors, and the second is a set of size-consistent network statistics which avoid confounding due to the most common factors, edge count and node count

    A computational framework for unsupervised analysis of everyday human activities

    In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner. A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity. Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments.Ph.D.Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Reh

    Speededness in Achievement Testing: Relevance, Consequences, and Control

    Da Prüfungen und Tests häufig dazu dienen, den Zugang zu Bildungsprogrammen zu steuern und die Grundlage zur Abschlussvergabe am Ende von Bildungsprogrammen bilden, ist ihre Fairness und Validität von größter Bedeutung. Ein kontrovers diskutierter Aspekt standardisierter Tests ist die Verwendung von Zeitlimits. Unabhängig davon ob eine Testadministration Zeitdruck hervorrufen soll oder nicht, sollten Testentwickler:innen in die Lage versetzt werden, den Zeitdruck einer Testadministrationen explizit gestalten zu können. Zu diesem Zweck schlägt van der Linden (2011a, 2011b) einen Ansatz zur Kontrolle des Zeitdrucks von Tests in der automatisierten Testhefterstellung (ATA) unter Verwendung von Mixed Integer Linear Programming und eines lognormalen Antwortzeitmodells vor. Dabei hat der Ansatz von van der Linden jedoch eine zentrale Limitation: Er ist auf das zwei-parametrische lognormale Antwortzeitmodell beschränkt, das gleiche Geschwindigkeits-Sensitivitäten (d.h. Faktorladungen) für alle Items annimmt. Diese Arbeit zeigt, dass ansonsten parallele Testhefte mit unterschiedlichen Geschwindigkeits-Sensitivitäten für bestimmte Testteilnehmende unfair sind. Darüber hinaus wird eine Erweiterung des van der Linden-Ansatzes vorgestellt, die unterschiedliche Geschwindigkeits-Sensitivitäten von Items in ATA berücksichtigt. Weiter wird diskutiert, wie Testhefte mit identischen, aber unterschiedlich angeordneten Items zu Fairness-Problemen aufgrund von Item-Positionseffekten führen können und wie dies verhindert werden kann. Die vorliegende Arbeit enthält zusätzlich Anleitungen zur Verwendung des R-Pakets eatATA für ATA und zur Verwendung von Stan und rstan für Bayesianische hierarchische Antwortzeitmodellierung. Abschließend werden Alternativen, praktische Implikationen und Grenzen der vorgeschlagenen Ansätze diskutiert und Vorschläge für zukünftige Forschungsthemen gemacht.As examinations and assessments are often used to control access to educational programs and to assess successful participation in an educational program, their fairness and validity is of great importance. A controversially discussed aspect of standardized tests is setting time limits on tests and how this practice can result in test speededness. Regardless of whether a test should be speeded or not, being able to deliberately control the speededness of tests is desirable. For this purpose, van der Linden (2011a, 2011b) proposed an approach to control the speededness of tests in automated test assembly (ATA) using mixed integer linear programming and a lognormal response time model. However, the approach by van der Linden (2011a, 2011b) has an important limitation, in that it is restricted to the two-parameter lognormal response time model which assumes equal speed sensitivities (i.e., factor loadings) across items. This thesis demonstrates that otherwise parallel test forms with differential speed sensitivities are indeed unfair for specific test-takers. Furthermore, an extension of the van der Linden approach is introduced, which incorporates speed sensitivities in ATA. Additionally, test speededness can undermine the fairness of a test if identical but differently ordered test forms are used. To prevent that the score of test-takers depends on whether easy or difficult items are located at the end of a test form, it is proposed that the same, most time intensive items should be placed at the end of all test forms. The thesis also provides introductions and tutorials on using the R package eatATA for ATA and using Stan and rstan for Bayesian hierarchical response time modeling. Finally, the thesis discusses alternatives, practical implications, and limitations of the proposed approaches and provides an outlook on future related research topics

    Manuscript: You Can\u27t Patent Software: Patenting Software is Wrong

