33 research outputs found

    FrozenNode: Static Linking of Node.js Applications

    Get PDF
    abstract: Web applications are ubiquitous. Accessible from almost anywhere, web applications support multiple platforms and can be easily customized. Most people interact with web applications daily for social media, communication, research, purchases, etc. Node.js has gained popularity as a programming language for web applications. A server-side JavaScript implementation, Node.js, allows both the front-end and back-end to be coded in JavaScript. Node.js contains many features such as dynamic inclusion of other modules using a built-in function named require which dynamically locates and loads code. To be effective, web applications must perform actions quickly while avoiding unexpected interruptions. However, dynamically linked libraries can cause delays and thus downtime, because dynamically linked code must load multiple files, often from disk. As loading is one of the slowest operations a computer performs, seeking from disk can have a negative impact on performance which causes the server to feel less responsive for users. Dynamically linked code can also break when the underlying library is updated. Normally, when trying to update a server, developers will use test servers. However, if the developer accidentally updates a library in a dynamically linked system, it may be incompatible with another portion of the program. Statically linking code makes it more reliable and faster (to load) than dynamically linking code. The static linking process varies by programming language. Therefore, different static linkers need to be developed for different languages. This thesis describes the creation of a static linker, called FrozenNode, for the popular back-end web application language, Node.js. FrozenNode resolves Node.js applications into a single file that does not rely on dynamic libraries. FrozenNode was built on top of Closure Compiler to accurately process JavaScript. We found that the resolved application was faster and self-contained yielding significant advantages over the dynamically loaded application. Furthermore, both had the same output. Vulnerabilities in web applications can be found using static analysis tools, however static analysis tools must reason about dynamically linked application. FrozenNode can be used to statically link a Node.js application before being used by a JavaScript static analysis tool.Dissertation/ThesisMasters Thesis Computer Science 201

    Software Engineering with Incomplete Information

    Get PDF
    Information may be the common currency of the universe, the stuff of creation. As the physicist John Wheeler claimed, we get ``it from bit''. Measuring information, however, is a hard problem. Knowing the meaning of information is a hard problem. Directing the movement of information is a hard problem. This hardness comes when our information about information is incomplete. Yet we need to offer decision making guidance, to the computer or developer, when facing this incompleteness. This work addresses this insufficiency within the universe of software engineering. This thesis addresses the first problem by demonstrating that obtaining the relative magnitude of information flow is computationally less expensive than an exact measurement. We propose ranked information flow, or RIF, where different flows are ordered according to their FlowForward, a new measure designed for ease of ordering. To demonstrate the utility of FlowForward, we introduce information contour maps: heatmapped callgraphs of information flow within software. These maps serve multiple engineering uses, such as security and refactoring. By mixing a type system with RIF, we address the problem of meaning. Information security is a common concern in software engineering. We present OaST, the world's first gradual security type system that replaces dynamic monitoring with information theoretic risk assessment. OaST now contextualises FlowForward within a formally verified framework: secure program components communicate over insecure channels ranked by how much information flows through them. This context helps the developer interpret the flows and enables security policy discovery, adaptation and refactoring. Finally, we introduce safestrings, a type-based system for controlling how the information embedded within a string moves through a program. This takes a structural approach, whereby a string subtype is a more precise, information limited, subset of string, ie a string that contains an email address, rather than anything else

    Building a Typed Scripting Language

    Get PDF
    Since the 1990s, scripting languages (e.g. Python, Ruby, JavaScript, and many others) have gained widespread popularity. Features such as ad-hoc data manipulation, dynamic structural typing, and terse syntax permit rapid engineering and improve developer productivity. Unfortunately, programs written in scripting languages execute slower and are less scalable than those written in traditional languages (such as C or Java) due to the challenge of statically analyzing scripting languages' semantics. Although various research projects have made progress on this front, corner cases in the semantics of existing scripting languages continue to defy static analysis and software engineers must generally still choose between program performance and programmer performance when selecting a language. We address that dichotomy in this dissertation by designing a scripting language with the intent of statically analyzing it. We select a set of core primitives in which common language features such as object-orientation and case analysis can be encoded and give a sound and decidable type inference system for it. Our type theory is based on subtype constraint systems but is also closely related to abstract interpretation; we use this connection to guide development of the type system and to employ a novel type soundness proof strategy based on simulation. At the heart of our approach is a type indexed record we call the onion which supports asymmetric concatenation and dispatch; we use onions to formally encode a variety of features, including records, operator overloading, objects, and mixins. An optimistic call-site polymorphism model defined herein captures the ad-hoc, case-analysis-based reasoning often used in scripting languages. Although the language in this dissertation uses a particular set of core primitives, the strategy we use to design it is general: we demonstrate a simple, formulaic process for adding features such as integers and state

    A Business Intelligence Solution, based on a Big Data Architecture, for processing and analyzing the World Bank data

    Get PDF
    The rapid growth in data volume and complexity has needed the adoption of advanced technologies to extract valuable insights for decision-making. This project aims to address this need by developing a comprehensive framework that combines Big Data processing, analytics, and visualization techniques to enable effective analysis of World Bank data. The problem addressed in this study is the need for a scalable and efficient Business Intelligence solution that can handle the vast amounts of data generated by the World Bank. Therefore, a Big Data architecture is implemented on a real use case for the International Bank of Reconstruction and Development. The findings of this project demonstrate the effectiveness of the proposed solution. Through the integration of Apache Spark and Apache Hive, data is processed using Extract, Transform and Load techniques, allowing for efficient data preparation. The use of Apache Kylin enables the construction of a multidimensional model, facilitating fast and interactive queries on the data. Moreover, data visualization techniques are employed to create intuitive and informative visual representations of the analysed data. The key conclusions drawn from this project highlight the advantages of a Big Data-driven Business Intelligence solution in processing and analysing World Bank data. The implemented framework showcases improved scalability, performance, and flexibility compared to traditional approaches. In conclusion, this bachelor thesis presents a Business Intelligence solution based on a Big Data architecture for processing and analysing the World Bank data. The project findings emphasize the importance of scalable and efficient data processing techniques, multidimensional modelling, and data visualization for deriving valuable insights. The application of these techniques contributes to the field by demonstrating the potential of Big Data Business Intelligence solutions in addressing the challenges associated with large-scale data analysis

    Code-injection Verwundbarkeiten in Web Anwendungen am Beispiel von Cross-site Scripting

    Get PDF
    The majority of all security problems in today's Web applications is caused by string-based code injection, with Cross-site Scripting (XSS)being the dominant representative of this vulnerability class. This thesis discusses XSS and suggests defense mechanisms. We do so in three stages: First, we conduct a thorough analysis of JavaScript's capabilities and explain how these capabilities are utilized in XSS attacks. We subsequently design a systematic, hierarchical classification of XSS payloads. In addition, we present a comprehensive survey of publicly documented XSS payloads which is structured according to our proposed classification scheme. Secondly, we explore defensive mechanisms which dynamically prevent the execution of some payload types without eliminating the actual vulnerability. More specifically, we discuss the design and implementation of countermeasures against the XSS payloads Session Hijacking'', Cross-site Request Forgery'', and attacks that target intranet resources. We build upon this and introduce a general methodology for developing such countermeasures: We determine a necessary set of basic capabilities an adversary needs for successfully executing an attack through an analysis of the targeted payload type. The resulting countermeasure relies on revoking one of these capabilities, which in turn renders the payload infeasible. Finally, we present two language-based approaches that prevent XSS and related vulnerabilities: We identify the implicit mixing of data and code during string-based syntax assembly as the root cause of string-based code injection attacks. Consequently, we explore data/code separation in web applications. For this purpose, we propose a novel methodology for token-level data/code partitioning of a computer language's syntactical elements. This forms the basis for our two distinct techniques: For one, we present an approach to detect data/code confusion on run-time and demonstrate how this can be used for attack prevention. Furthermore, we show how vulnerabilities can be avoided through altering the underlying programming language. We introduce a dedicated datatype for syntax assembly instead of using string datatypes themselves for this purpose. We develop a formal, type-theoretical model of the proposed datatype and proof that it provides reliable separation between data and code hence, preventing code injection vulnerabilities. We verify our approach's applicability utilizing a practical implementation for the J2EE application server.Cross-site Scripting (XSS) ist eine der häufigsten Verwundbarkeitstypen im Bereich der Web Anwendungen. Die Dissertation behandelt das Problem XSS ganzheitlich: Basierend auf einer systematischen Erarbeitung der Ursachen und potentiellen Konsequenzen von XSS, sowie einer umfassenden Klassifikation dokumentier Angriffsarten, wird zunächst eine Methodik vorgestellt, die das Design von dynamischen Gegenmaßnahmen zur Angriffseingrenzung erlaubt. Unter Verwendung dieser Methodik wird das Design und die Evaluation von drei Gegemaßnahmen für die Angriffsunterklassen "Session Hijacking", "Cross-site Request Forgery" und "Angriffe auf das Intranet" vorgestellt. Weiterhin, um das unterliegende Problem grundsätzlich anzugehen, wird ein Typ-basierter Ansatz zur sicheren Programmierung von Web Anwendungen beschrieben, der zuverlässigen Schutz vor XSS Lücken garantiert

    Analyse de la sécurité de systèmes critiques embarqués à forte composante logicielle par interprétation abstraite

    Get PDF
    This thesis is dedicated to the analysis of low-level software, like operating systems, by abstract interpretation. Analyzing OSes is a crucial issue to guarantee the safety of software systems since they are the layer immediately above the hardware and that all applicative tasks rely on them. For critical applications, we want to prove that the OS does not crash, and that it ensures the isolation of programs, so that an untrusted program cannot disrupt a trusted one. The analysis of this kind of programs raises specific issues. This is because OSes must control hardware using instructions that are meaningless in ordinary programs. In addition, because hardware features are outside the scope of C, source code includes assembly blocks mixed with C code. These are the two main axes in this thesis: handling mixed C and assembly, and precise abstraction of instructions that are specific to low-level software. This work is motivated by the analysis of a case study emanating from an industrial partner, which required the implementation of proposed methods in the static analyzer Astrée. The first part is about the formalization of a language mixing simplified models of C and assembly, from syntax to semantics. This specification is crucial to define what is legal and what is a bug, while taking into account the intricacy of interactions of C and assembly, in terms of data flow and control flow. The second part is a short introduction to abstract interpretation focusing on what is useful thereafter. The third part proposes an abstraction of the semantics of mixed C and assembly. This is actually a series of parametric abstractions handling each aspect of the semantics. The fourth part is interested in the question of the abstraction of instructions specific to low-level software. Interest properties can easily be proven using ghost variables, but because of technical reasons, it is difficult to design a reduced product of abstract domains that allows a satisfactory handling of ghost variables. This part builds such a general framework with domains that allow us to solve our problem and many others. The final part details properties to prove in order to guarantee isolation of programs that have not been treated since they raise many complicated questions. We also give some suggestions to improve the product of domains with ghost variables introduced in the previous part, in terms of features and performances.Cette thèse est consacrée à l'analyse de logiciels de bas niveau, tels que les systèmes d'exploitation, par interprétation abstraite. L'analyse des OS est une question importante pour garantir la sûreté des systèmes logiciels puisqu'ils forment le niveau immédiatement au-dessus du matériel et que toutes les tâches applicatives dépendent d'eux. Pour des applications critiques, on veut s'assurer que l'OS ne plante pas, mais aussi qu'il assure l'isolation des programmes, de sorte qu'un programme dont la fiabilité n'a pas été établie ne puisse perturber un programme de confiance. L'analyse de ce genre de programmes soulève des problèmes spécifiques. Cela provient du fait que les OS doivent contrôler le matériel avec des opérations qui n'ont pas de sens dans un programme ordinaire. De plus, comme les fonctionnalités matérielles sont en dehors du for du C, le code source contient des blocs de code assembleur mêlés au C. Ce sont les deux axes de cette thèse : gérer les mélanges de C et d'assembleur, et abstraire finement les opérations spécifiques aux logiciels de bas niveau. Ce travail est guidé par l'analyse d'un cas d'étude d'un partenaire industriel, ce qui a nécessité l'implémentation des méthodes proposées dans l'analyseur statique Astrée. La première partie s'intéresse à la formalisation d'un langage mélangeant des modèles simplifiés du C et de l'assembleur, depuis la syntaxe jusqu'à la sémantique. Cette spécification est importante pour définir ce qui est légal et ce qui constitue une erreur, tout en tenant compte de la complexité des interactions du C et de l'assembleur, tant en termes de données que de flot de contrôle. La seconde partie est une introduction sommaire à l'interprétation abstraite qui se limite à ce qui est utile par la suite. La troisième partie propose une abstraction de la sémantique des mélanges de C et d'assembleur. Il s'agit en fait d'une collection d'abstractions paramétriques qui gèrent chacun des aspects de cette sémantique. La quatrième partie s'intéresse à l'abstraction des opérations spécifiques aux logiciels de bas niveau. Les propriétés d'intérêt peuvent être facilement prouvées à l'aide de variables fantômes, mais pour des raisons techniques, il est difficile de concevoir un produit réduit de domaines abstraits qui supporte une gestion satisfaisante des variables fantômes. Cette partie construit un tel cadre très général ainsi que des domaines qui permettent de résoudre beaucoup de problèmes dont le nôtre. L'ultime partie présente quelques propriétés à prouver pour garantir l'isolation des programmes, qui n'ont pas été traitées, car elles posent de nouvelles et complexes questions. On donne aussi quelques propositions d'amélioration du produit de domaines avec variables fantômes introduit dans la partie précédente, tant en termes de fonctionnalités que de performances

    Efficient processing of large-scale spatio-temporal data

    Get PDF
    Millionen Geräte, wie z.B. Mobiltelefone, Autos und Umweltsensoren senden ihre Positionen zusammen mit einem Zeitstempel und weiteren Nutzdaten an einen Server zu verschiedenen Analysezwecken. Die Positionsinformationen und übertragenen Ereignisinformationen werden als Punkte oder Polygone dargestellt. Eine weitere Art räumlicher Daten sind Rasterdaten, die zum Beispiel von Kameras und Sensoren produziert werden. Diese großen räumlich-zeitlichen Datenmengen können nur auf skalierbaren Plattformen wie Hadoop und Apache Spark verarbeitet werden, die jedoch z.B. die Nachbarschaftsinformation nicht ausnutzen können - was die Ausführung bestimmter Anfragen praktisch unmöglich macht. Die wiederholten Ausführungen der Analyseprogramme während ihrer Entwicklung und durch verschiedene Nutzer resultieren in langen Ausführungszeiten und hohen Kosten für gemietete Ressourcen, die durch die Wiederverwendung von Zwischenergebnissen reduziert werden können. Diese Arbeit beschäftigt sich mit den beiden oben beschriebenen Herausforderungen. Wir präsentieren zunächst das STARK Framework für die Verarbeitung räumlich-zeitlicher Vektor- und Rasterdaten in Apache Spark. Wir identifizieren verschiedene Algorithmen für Operatoren und analysieren, wie diese von den Eigenschaften der zugrundeliegenden Plattform profitieren können. Weiterhin wird untersucht, wie Indexe in der verteilten und parallelen Umgebung realisiert werden können. Außerdem vergleichen wir Partitionierungsmethoden, die unterschiedlich gut mit ungleichmäßiger Datenverteilung und der Größe der Datenmenge umgehen können und präsentieren einen Ansatz um die auf Operatorebene zu verarbeitende Datenmenge frühzeitig zu reduzieren. Um die Ausführungszeit von Programmen zu verkürzen, stellen wir einen Ansatz zur transparenten Materialisierung von Zwischenergebnissen vor. Dieser Ansatz benutzt ein Entscheidungsmodell, welches auf den tatsächlichen Operatorkosten basiert. In der Evaluierung vergleichen wir die verschiedenen Implementierungs- sowie Konfigurationsmöglichkeiten in STARK und identifizieren Szenarien wann Partitionierung und Indexierung eingesetzt werden sollten. Außerdem vergleichen wir STARK mit verwandten Systemen. Im zweiten Teil der Evaluierung zeigen wir, dass die transparente Wiederverwendung der materialisierten Zwischenergebnisse die Ausführungszeit der Programme signifikant verringern kann.Millions of location-aware devices, such as mobile phones, cars, and environmental sensors constantly report their positions often in combination with a timestamp to a server for different kinds of analyses. While the location information of the devices and reported events is represented as points and polygons, raster data is another type of spatial data, which is for example produced by cameras and sensors. This Big spatio-temporal Data needs to be processed on scalable platforms, such as Hadoop and Apache Spark, which, however, are unaware of, e.g., spatial neighborhood, what makes them practically impossible to use for this kind of data. The repeated executions of the programs during development and by different users result in long execution times and potentially high costs in rented clusters, which can be reduced by reusing commonly computed intermediate results. Within this thesis, we tackle the two challenges described above. First, we present the STARK framework for processing spatio-temporal vector and raster data on the Apache Spark stack. For operators, we identify several possible algorithms and study how they can benefit from the underlying platform's properties. We further investigate how indexes can be realized in the distributed and parallel architecture of Big Data processing engines and compare methods for data partitioning, which perform differently well with respect to data skew and data set size. Furthermore, an approach to reduce the amount of data to process at operator level is presented. In order to reduce the execution times, we introduce an approach to transparently recycle intermediate results of dataflow programs, based on operator costs. To compute the costs, we instrument the programs with profiling code to gather the execution time and result size of the operators. In the evaluation, we first compare the various implementation and configuration possibilities in STARK and identify scenarios when and how partitioning and indexing should be applied. We further compare STARK to related systems and show that we can achieve significantly better execution times, not only when exploiting existing partitioning information. In the second part of the evaluation, we show that with the transparent cost-based materialization and recycling of intermediate results, the execution times of programs can be reduced significantly

    The Appsmiths: Community, Identity, Affect And Ideology Among Cocoa Developers From Next To Iphone

    Full text link
    This dissertation is an ethnographic study, accomplished through semi-structured interviews and participant observation, of the cultural world of third party Apple software developers who use Apple's Cocoa libraries to create apps. It answers the questions: what motivates Apple developers' devotion to Cocoa technology, and why do they believe it is a superior programming environment? What does it mean to be a "good" Cocoa programmer, technically and morally, in the Cocoa community of practice, and how do people become one? I argue that in this culture, ideologies, normative values, identities, affects, and practices interact with each other and with Cocoa technology in a seamless web, which I call a "techno-cultural frame." This frame includes the construction of a developer's identity as a vocational craftsman, and a utopian vision of software being developed by millions of small-scale freelance developers, or "indies," rather than corporations. This artisanal production is made possible by the productivity gains of Cocoa technology, which ironically makes indies dependent on Apple for tools. This contradiction is reconciled through quasi-religious narratives about Apple and Steve Jobs, which enrolls developers into seeing themselves as partners in a shared mission with Apple to empower users with technology. Although Cocoa helps make software production easier, it is not a deskilling technology but requires extensive learning, because its design heavily incorporates patterns unfamiliar to many programmers. These concepts can only be understood holistically after learning has been achieved, which means that learners must undergo a process of conversion in their mindset. This involves learning to trust that Cocoa will benefit developers before they fully understand it. Such technical and normative lessons occur at sites where Cocoa is taught, such as the training company Big Nerd Ranch. Sharing of technical knowledge and normative practices also occurs in the Cocoa community, online through blog posts, at local club meetings, and at conferences such as Apple's WWDC, which help to enroll developers into the Cocoa techno-cultural frame. Apple's relationship with developers is symbiotic, but asymmetrical, yet despite Apple's coercive power, members of the Cocoa community can influence Apple's policies

    Community memories for sustainable societies: The case of environmental noise

    Get PDF
    Sustainability is the main challenge faced by humanity today on global and local scales. Most environmental problems can be seen as the tragic overexploitation of a commons. In this dissertation we investigate how the latest developments within computer science and ICT can be applied to establish participatory, low-cost tools and practices that enable citizens to monitor, raise awareness about, and contribute to the sustainable management of the commons they rely on, and thereby protect or improve their quality of life. As a general approach we propose the use of community memories – as central data repositories and points of interaction for community members and other stakeholders – and the novel combination of participatory mobile sensing and social tagging – as a low-cost means to collect quantitative and qualitative data about the state of the commons and the health, well-being, behaviour and opinion of those that depend on it. Through applied, interdisciplinary research we develop a concrete solution for a specific, socially relevant problem, namely that of environmental noise – commonly referred to as noise pollution. Under the name NoiseTube we present an operational system that enables a participatory, low-cost approach to the assessment of environmental noise and its impact on citizens’ quality of life. This approach can be applied in the scope of citizen- or authority-led initiatives. The NoiseTube system consists of a sensing application – which turns mobile phones into a sound level meters and allows users to comment on their experience via social tagging – and a community memory – which aggregates and processes data collected by participants anywhere. The system supports and has been tested and deployed at different levels of scale – personal, group and mass sensing. Since May 2009 NoiseTube has been used by hundreds, if not thousands, of people all around the world, allowing us to draw lessons regarding the feasibility of different deployment, collaboration and coordination scenarios for participatory sensing in general. While similar systems have been proposed ours is the completest and most widely used participatory noise mapping solution to date. Our validation experiments demonstrate that the accuracy of mobile phones as sound level meters can be brought to an acceptable level through calibration and statistical reasoning. Through coordinated NoiseTube campaigns with volunteering citizens we establish that participatory noise mapping is a suitable alternative for, or a valuable complement to, conventional methods applied by authorities
    corecore