834 research outputs found

    Big Data Analytics in Static and Streaming Provenance

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016With recent technological and computational advances, scientists increasingly integrate sensors and model simulations to understand spatial, temporal, social, and ecological relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However, provenance can be overwhelming in both volume and complexity; the now forecasting potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making

    Graph edit distance from spectral seriation

    Get PDF
    This paper is concerned with computing graph edit distance. One of the criticisms that can be leveled at existing methods for computing graph edit distance is that they lack some of the formality and rigor of the computation of string edit distance. Hence, our aim is to convert graphs to string sequences so that string matching techniques can be used. To do this, we use a graph spectral seriation method to convert the adjacency matrix into a string or sequence order. We show how the serial ordering can be established using the leading eigenvector of the graph adjacency matrix. We pose the problem of graph-matching as a maximum a posteriori probability (MAP) alignment of the seriation sequences for pairs of graphs. This treatment leads to an expression in which the edit cost is the negative logarithm of the a posteriori sequence alignment probability. We compute the edit distance by finding the sequence of string edit operations which minimizes the cost of the path traversing the edit lattice. The edit costs are determined by the components of the leading eigenvectors of the adjacency matrix and by the edge densities of the graphs being matched. We demonstrate the utility of the edit distance on a number of graph clustering problems

    Feature-based methodology for supporting architecture refactoring and maintenance of long-life software systems

    Get PDF
    Zusammenfassung Langlebige Software-Systeme durchlaufen viele bedeutende Veraenderungen im Laufe ihres Lebenszyklus, um der Weiterentwicklung der Problemdomaenen zu folgen. Normalerweise ist es schwierig eine Software-Systemarchitektur den schnellen Weiterentwicklungen einer Problemdomaene anzupassen und mit der Zeit wird der Unterschied zwischen der Problemdomaene und der Software-Systemarchitektur zu groß, um weitere Softwareentwicklung sinnvoll fortzufuehren. Fristgerechte Refactorings der Systemarchitektur sind notwendig, um dieses Problem zu vermeiden. Aufgrund des verhaeltnismaeßig hohen Gefahrenpotenzials und des zeitlich stark verzoegerten Nutzens von Refactorings, werden diese Maßnahmen normalerweise bis zum letztmoeglichen Zeitpunkt hinausgeschoben. In der Regel ist das Management abgeneigt Architektur-Refactorings zu akzeptieren, außer diese sind absolut notwendig. Die bevorzugte Vorgehensweise ist, neue Systemmerkmale ad hoc hinzuzufuegen und nach dem Motto ”Aendere nie etwas an einem funktionierenden System!” vorzugehen. Letztlich ist das Ergebnis ein Architekturzerfall (Architekturdrift). Die Notwendigkeit kleiner Refactoring-Schritte fuehrt zur Notwendigkeit des Architektur-Reengineerings. Im Gegensatz zum Refactoring, das eine normale Entwicklungstaetigkeit darstellt, ist Reengineering eine Form der Software- ”Revolution”. Reengineeringprojekte sind sehr riskant und kostspielig. Der Nutzen des Reengineerings ist normalerweise nicht so hoch wie erwartet. Wenn nach dem Reengineering schließlich die erforderlichen Architekturaenderungen statt.nden, kann dies zu spaet sein. Trotz der enormen in das Projekt gesteckten Bemuehungen erfuellen die Resultate des Reengineerings normalerweise nicht die Erwartungen. Es kann passieren, dass sehr bald ein neues, kostspieliges Reengineering erforderlich wird. In dieser Arbeit werden das Problem der Softwareevolution und der Zerfall von Softwarearchitekturen behandelt. Eine Methode wird vorgestellt, welche die Softwareentwicklung in ihrer entscheidenden Phase, dem Architekturrefactoring, unterstuetzt. Die Softwareentwicklung wird sowohl in technischer als auch organisatorischer Hinsicht unterstuetzt. Diese Arbeit hat neue Techniken entwickelt, welche die Reverse-Engineering-, Architecture-Recovery- und Architecture-Redesign-Taetigkeiten unterst uetzen. Sie schlaegt auch Aenderungen des Softwareentwicklungsprozesses vor, die fristgerechte Architekturrefactorings erzwingen koennen und damit die Notwendigkeit der Durchfuehrung eines Architektur- Reengineerings vermeiden. In dieser Arbeit wird die Merkmalmodellierung als Hauptinstrument verwendet. Merkmale werden genutzt, um die Abstraktionsluecke zwischen den Anforderungen der Problemdomaene und der Systemarchitektur zu fuellen. Merkmalmodelle werden auch als erster Grundriss fr die Wiederherstellung der verlorenen Systemarchitektur genutzt. Merkmalbasierte Analysen fuehren zu diversen, nuetzlichen Hinweisen fuer den erneuten Entwurf (das Re-Design) einer Architektur. Schließlich wird die Merkmalmodellierung als Kommunikationsmittel zwischen unterschiedlichen Projektbeteiligten (Stakeholdern) im Verlauf des Softwareengineering-Prozesses verwendet und auf dieser Grundlage wird ein neuer Anforderungsde.nitionsprozess vorgeschlagen, der die erforderlichen Architekturrefactorings erzwingt.The long-life software systems withstand many significant changes throughout their life-cycle in order to follow the evolution of the problem domains. Usually, the software system architecture can not follow the rapid evolution of a problem domain and with time, the diversion of the architecture in respect to the domain features becomes prohibiting for software evolution. For avoiding this problem, periodical refactorings of the system architecture are required. Usually, architecture refactorings are postponed until the very last moment, because of the relatively high risk involved and the lack of short-term profit. As a rule, the management is unwilling to accept architecture refactorings unless they become absolutely necessary. The preferred way of working is to add new system features in an ad-hoc manner and to keep the rule ”Never touch a running system!”. The final result is an architecture decay. The need of performing small refactoring activities turns into need for architecture reengineering. In contrast to refactoring, which is a normal evolutionary activity, reengineering is a kind of software ”revolution”. Reengineering projects are risky and expensive. The effectiveness of reengineering is also usually not as high as expected. When finally after reengineering the required architecture changes take place, it can be too late. Despite the enormous invested efforts, the results of the reengineering usually do not satisfy the expectations. It might happen that very soon a new expensive reengineering is required. This thesis deals with the problem of software evolution and the decay of software architectures. It presents a method, which assists software evolution in its crucial part, the architecture refactoring. The assistance is performed for both technical and organizational aspects of the software evolution. The thesis provides new techniques for supporting reverse engineering, architecture recovery and redesigning activities. It also proposes changes to the software engineering process, which can force timely architecture refactorings and thus avoid the need of performing architecture reengineering. For the work in this thesis feature modeling is utilized as a main asset. Features are used to fill the abstraction gap between domain requirements and system architecture. Feature models are also used as an outline for recovering of lost system architectures. Through feature-based analyses a number of useful hints and clues for architecture redesign are produced. Finally, feature modeling is used as a communication between different stakeholders of the software engineering process and on this basis a new requirements engineering process is proposed, which forces the needed architecture refactorings

    Inventory Optimization Model Design with Machine Learning Approach in Feed Mill Company

    Get PDF
    This article aims to address the impacts that companies can have with the application of machine learning to carry out their demand forecasts, knowing that a more accurate demand forecast improves the performance of companies, making them more competitive. The methodology used was a literature review through descriptive, qualitative and with bibliographical surveys in International Journal from 2010 – 2022 by different authors. Findings show that the references prove that demand forecasting with the use of machine learning brings many benefits to organizations, for example, since the results are more accurate, there is better inventory management, consequently customer satisfaction for having the product at the right time and place. Further, this article concludes and suggests that the use of machine learning is able to identify variables that affect the demands, with this it makes a forecast closer to reality and helps managers to make more accurate decisions, improving strategic planning and supply chain management. of company supplies

    Domain architecture a design framework for system development and integration

    Get PDF
    The ever growing complexity of software systems has revealed many short-comings in existing software engineering practices and has raised interest in architecture-driven software development. A system\u27s architecture provides a model of the system that suppresses implementation detail, allowing the architects to concentrate on the analysis and decisions that are most critical to structuring the system to satisfy its requirements. Recently, interests of researchers and practi-tioners have shifted from individual system architectures to architectures for classes of software systems which provide more general, reusable solutions to the issues of overall system organization, interoperability, and allocation of services to system components. These generic architectures, such as product line architectures and domain architectures, promote reuse and interoperability, and create a basis for cost effective construction of high-quality systems. Our focus in this dissertation is on domain architectures as a means of development and integration of large-scale, domain-specific business software systems. Business imperatives, including flexibility, productivity, quality, and ability to adapt to changes, have fostered demands for flexible, coherent and enterprise--wide integrated business systems. The components of such systems, developed separately or purchased off the shelf, need to cohesively form an overall compu-tational environment for the business. The inevitable complexity of such integrated solutions and the highly-demanding process of their construction, management, and evolution support require new software engineering methodologies and tools. Domain architectures, prescribing the organization of software systems in a business domain, hold a promise to serve as a foundation on which such integrated business systems can be effectively constructed. To meet the above expectations, software architectures must be properly defined, represented, and applied, which requires suitable methodologies as well as process and tool support. Despite research efforts, however, state-of-the-art methods and tools for architecture-based system development do not yet meet the practical needs of system developers. The primary focus of this dissertation is on developing methods and tools to support domain architecture engineering and on leveraging architectures to achieve improved system development and integration in presence of increased complexity. In particular, the thesis explores issues related to the following three aspects of software technology: system complexity and software architectures as tools to alleviate complexity; domain architectures as frameworks for construction of large scale, flexible, enterprise-wide software systems; and architectural models and representation techniques as a basis for good” design. The thesis presents an archi-tectural taxonomy to help categorize and better understand architectural efforts. Furthermore, it clarifies the purpose of domain architectures and characterizes them in detail. To support the definition and application of domain architectures we have developed a method for domain architecture engineering and representation: GARM-ASPECT. GARM, the Generic Architecture Reference Model, underlying the method, is a system of modeling abstractions, relations and recommendations for building representations of reference software architectures. The model\u27s focus on reference and domain architectures determines its main distinguishing features: multiple views of architectural elements, a separate rule system to express constraints on architecture element types, and annotations such as “libraries” of patterns and “logs” of guidelines. ASPECT is an architecture description language based on GARM. It provides a normalized vocabulary for representing the skeleton of an architecture, its structural view, and establishes a framework for capturing archi-tectural constraints. It also allows extensions of the structural view with auxiliary information, such as behavior or quality specifications. In this respect, ASPECT provides facilities for establishing relationships among different specifications and gluing them together within an overall architectural description. This design allows flexibility and adaptability of the methodology to the specifics of a domain or a family of systems. ASPECT supports the representation of reference architectures as well as individual system architectures. The practical applicability of this method has been tested through a case study in an industrial setting. The approach to architecture engineering and representation, presented in this dissertation, is pragmatic and oriented towards software practitioners. GARM-ASPECT, as well as the taxonomy of architectures are of use to architects, system planners and system engineers. Beyond these practical contributions, this thesis also creates a more solid basis for expbring the applicability of architectural abstractions, the practicality of representation approaches, and the changes required to the devel-opment process in order to achieve the benefits from an architecture-driven software technology

    Relationship analysis : improving the systems analysis process

    Get PDF
    A significant aspect of systems analysis involves discovering and representing entities and their inter-relationships. Guidelines exist to identify entities but do not provide a rigorous and comprehensive process to explicitly capture the relationship structure of the problem domain. Whereas, other analysis techniques lightly address the relationship discovery process, Relationship Analysis is the only systematic, domain-independent analysis technique focusing exclusively on a domain\u27s relationship structure. The quality of design artifacts, such as class diagrams, and development time necessary to generate these artifacts can be improved by first representing the complete relationship structure of the problem domain. The Relationship Analysis Model is the first theory-based taxonomy to classify relationships. A rigorous evaluation was conducted, including a formal experiment comparing novice and experienced analysts with and without Relationship Analysis. It was shown that the Relationship Analysis Process based on the model does provide a fuller and richer systems analysis, resulting in improved quality of and reduced time in generating class diagrams. It also was shown that Relationship Analysis enables analysts of varying experience levels to achieve a similar level of quality of class diagrams. Relationship Analysis significantly enhances the systems analyst\u27s effectiveness, especially in the area of relationship discovery and documentation resulting in improved analysis and design artifacts

    DANIEL: Towards Automated Bug Discovery By Black Box Test Case Generation & Recommendation

    Get PDF
    Finding and documenting bugs in software systems is an essential component of the software development process. A bug is defined as a series of steps that produces behavior which differs from the software specification and requirements. Finding steps to produce such behavior requires expert knowledge of the possible operations of the software in development as well as intuition and creativity. This thesis proposes the Directed Action Node Input Execution Language (DANIEL), a language that represents test cases as directed graphs, where each node represents an action, and possible input arguments for each action are represented along the incoming directed edges. With this representation, it is possible to form a union of all recorded test cases, making a combined directed graph which represents all of the paths of interaction with the developing software. This thesis demonstrates how DANIEL can generate prioritized test cases for a web form application, while also preserving workflow context. Using a graph built on Selenium test cases, we evaluate a random walk, a weighted walk, and model-weighted walks integrating logistic regression and XGBoost to compute the relevant probabilities. We find that the weighted walk discovers the most bugs while the model-weighted walk provides the most meaningful coverage
    corecore