301 research outputs found

    HathiTrust Research Center: Computational Research on the HathiTrust Repository

    Get PDF
    PIs (exec mgt team): Beth A. Plale, Indiana University; Marshall Scott Poole, University of Illinois Urbana-Champaign ; Robert McDonald, IU; John Unsworth (UIUC) Senior investigators: Loretta Auvil (UIUC); Johan Bollen (IU), Randy Butler (UIUC); Dennis Cromwell (IU), Geoffrey Fox (IU), Eileen Julien (IU), Stacy Kowalczyk (IU); Danny Powell (UIUC); Beth Sandore (UIUC); Craig Stewart (IU); John Towns (UIUC); Carolyn Walters (IU), Michael Welge (UIUC); Eric Wernert (IU

    Proceedings of the ECSCW'95 Workshop on the Role of Version Control in CSCW Applications

    Full text link
    The workshop entitled "The Role of Version Control in Computer Supported Cooperative Work Applications" was held on September 10, 1995 in Stockholm, Sweden in conjunction with the ECSCW'95 conference. Version control, the ability to manage relationships between successive instances of artifacts, organize those instances into meaningful structures, and support navigation and other operations on those structures, is an important problem in CSCW applications. It has long been recognized as a critical issue for inherently cooperative tasks such as software engineering, technical documentation, and authoring. The primary challenge for versioning in these areas is to support opportunistic, open-ended design processes requiring the preservation of historical perspectives in the design process, the reuse of previous designs, and the exploitation of alternative designs. The primary goal of this workshop was to bring together a diverse group of individuals interested in examining the role of versioning in Computer Supported Cooperative Work. Participation was encouraged from members of the research community currently investigating the versioning process in CSCW as well as application designers and developers who are familiar with the real-world requirements for versioning in CSCW. Both groups were represented at the workshop resulting in an exchange of ideas and information that helped to familiarize developers with the most recent research results in the area, and to provide researchers with an updated view of the needs and challenges faced by application developers. In preparing for this workshop, the organizers were able to build upon the results of their previous one entitled "The Workshop on Versioning in Hypertext" held in conjunction with the ECHT'94 conference. The following section of this report contains a summary in which the workshop organizers report the major results of the workshop. The summary is followed by a section that contains the position papers that were accepted to the workshop. The position papers provide more detailed information describing recent research efforts of the workshop participants as well as current challenges that are being encountered in the development of CSCW applications. A list of workshop participants is provided at the end of the report. The organizers would like to thank all of the participants for their contributions which were, of course, vital to the success of the workshop. We would also like to thank the ECSCW'95 conference organizers for providing a forum in which this workshop was possible

    SDSF : social-networking trust based distributed data storage and co-operative information fusion.

    Get PDF
    As of 2014, about 2.5 quintillion bytes of data are created each day, and 90% of the data in the world was created in the last two years alone. The storage of this data can be on external hard drives, on unused space in peer-to-peer (P2P) networks or using the more currently popular approach of storing in the Cloud. When the users store their data in the Cloud, the entire data is exposed to the administrators of the services who can view and possibly misuse the data. With the growing popularity and usage of Cloud storage services like Google Drive, Dropbox etc., the concerns of privacy and security are increasing. Searching for content or documents, from this distributed stored data, given the rate of data generation, is a big challenge. Information fusion is used to extract information based on the query of the user, and combine the data and learn useful information. This problem is challenging if the data sources are distributed and heterogeneous in nature where the trustworthiness of the documents may be varied. This thesis proposes two innovative solutions to resolve both of these problems. Firstly, to remedy the situation of security and privacy of stored data, we propose an innovative Social-based Distributed Data Storage and Trust based co-operative Information Fusion Framework (SDSF). The main objective is to create a framework that assists in providing a secure storage system while not overloading a single system using a P2P like approach. This framework allows the users to share storage resources among friends and acquaintances without compromising the security or privacy and enjoying all the benefits that the Cloud storage offers. The system fragments the data and encodes it to securely store it on the unused storage capacity of the data owner\u27s friends\u27 resources. The system thus gives a centralized control to the user over the selection of peers to store the data. Secondly, to retrieve the stored distributed data, the proposed system performs the fusion also from distributed sources. The technique uses several algorithms to ensure the correctness of the query that is used to retrieve and combine the data to improve the information fusion accuracy and efficiency for combining the heterogeneous, distributed and massive data on the Cloud for time critical operations. We demonstrate that the retrieved documents are genuine when the trust scores are also used while retrieving the data sources. The thesis makes several research contributions. First, we implement Social Storage using erasure coding. Erasure coding fragments the data, encodes it, and through introduction of redundancy resolves issues resulting from devices failures. Second, we exploit the inherent concept of trust that is embedded in social networks to determine the nodes and build a secure net-work where the fragmented data should be stored since the social network consists of a network of friends, family and acquaintances. The trust between the friends, and availability of the devices allows the user to make an informed choice about where the information should be stored using `k\u27 optimal paths. Thirdly, for the purpose of retrieval of this distributed stored data, we propose information fusion on distributed data using a combination of Enhanced N-grams (to ensure correctness of the query), Semantic Machine Learning (to extract the documents based on the context and not just bag of words and also considering the trust score) and Map Reduce (NSM) Algorithms. Lastly we evaluate the performance of distributed storage of SDSF using era- sure coding and identify the social storage providers based on trust and evaluate their trustworthiness. We also evaluate the performance of our information fusion algorithms in distributed storage systems. Thus, the system using SDSF framework, implements the beneficial features of P2P networks and Cloud storage while avoiding the pitfalls of these systems. The multi-layered encrypting ensures that all other users, including the system administrators cannot decode the stored data. The application of NSM algorithm improves the effectiveness of fusion since large number of genuine documents are retrieved for fusion

    AUTOMATED NETWORK SECURITY WITH EXCEPTIONS USING SDN

    Get PDF
    Campus networks have recently experienced a proliferation of devices ranging from personal use devices (e.g. smartphones, laptops, tablets), to special-purpose network equipment (e.g. firewalls, network address translation boxes, network caches, load balancers, virtual private network servers, and authentication servers), as well as special-purpose systems (badge readers, IP phones, cameras, location trackers, etc.). To establish directives and regulations regarding the ways in which these heterogeneous systems are allowed to interact with each other and the network infrastructure, organizations typically appoint policy writing committees (PWCs) to create acceptable use policy (AUP) documents describing the rules and behavioral guidelines that all campus network interactions must abide by. While users are the audience for AUP documents produced by an organization\u27s PWC, network administrators are the responsible party enforcing the contents of such policies using low-level CLI instructions and configuration files that are typically difficult to understand and are almost impossible to show that they do, in fact, enforce the AUPs. In other words, mapping the contents of imprecise unstructured sentences into technical configurations is a challenging task that relies on the interpretation and expertise of the network operator carrying out the policy enforcement. Moreover, there are multiple places where policy enforcement can take place. For example, policies governing servers (e.g., web, mail, and file servers) are often encoded into the server\u27s configuration files. However, from a security perspective, conflating policy enforcement with server configuration is a dangerous practice because minor server misconfigurations could open up avenues for security exploits. On the other hand, policies that are enforced in the network tend to rarely change over time and are often based on one-size-fits-all policies that can severely limit the fast-paced dynamics of emerging research workflows found in campus networks. This dissertation addresses the above problems by leveraging recent advances in Software-Defined Networking (SDN) to support systems that enable novel in-network approaches developed to support an organization\u27s network security policies. Namely, we introduce PoLanCO, a human-readable yet technically-precise policy language that serves as a middle-ground between the imprecise statements found in AUPs and the technical low-level mechanisms used to implement them. Real-world examples show that PoLanCO is capable of implementing a wide range of policies found in campus networks. In addition, we also present the concept of Network Security Caps, an enforcement layer that separates server/device functionality from policy enforcement. A Network Security Cap intercepts packets coming from, and going to, servers and ensures policy compliance before allowing network devices to process packets using the traditional forwarding mechanisms. Lastly, we propose the on-demand security exceptions model to cope with the dynamics of emerging research workflows that are not suited for a one-size-fits-all security approach. In the proposed model, network users and providers establish trust relationships that can be used to temporarily bypass the policy compliance checks applied to general-purpose traffic -- typically by network appliances that perform Deep Packet Inspection, thereby creating network bottlenecks. We describe the components of a prototype exception system as well as experiments showing that through short-lived exceptions researchers can realize significant improvements for their special-purpose traffic

    Enhancing the Programmability of Cloud Object Storage

    Get PDF
    En un món que depèn cada vegada més de la tecnologia, les dades digitals es generen a una escala sense precedents. Això fa que empreses que requereixen d'un gran espai d'emmagatzematge, com Netflix o Dropbox, utilitzin solucions d'emmagatzematge al núvol. Mes concretament, l'emmagatzematge d'objectes, donada la seva simplicitat, escalabilitat i alta disponibilitat. No obstant això, aquests magatzems s'enfronten a tres desafiaments principals: 1) Gestió flexible de càrregues de treball de múltiples usuaris. Normalment, els magatzems d'objectes són sistemes multi-usuari, la qual cosa significa que tots ells comparteixen els mateixos recursos, el que podria ocasionar problemes d'interferència. A més, és complex administrar polítiques d'emmagatzematge heterogènies a gran escala en ells. 2) Autogestió de dades. Els magatzems d'objectes no ofereixen molta flexibilitat pel que fa a l'autogestió de dades per part dels usuaris. Típicament, són sistemes rígids, la qual cosa impedeix gestionar els requisits específics dels objectes. 3) Còmput elàstic prop de les dades. Situar els càlculs prop de les dades pot ser útil per reduir la transferència de dades. Però, el desafiament aquí és com aconseguir la seva elasticitat sense provocar contenció de recursos i interferències en la capa d'emmagatzematge. En aquesta tesi presentem tres contribucions innovadores que resolen aquests desafiaments. En primer lloc, presentem la primera arquitectura d'emmagatzematge definida per programari (SDS) per a magatzems d'objectes que separa les capes de control i de dades. Això permet gestionar les càrregues de treball de múltiples usuaris d'una manera flexible i dinàmica. En segon lloc, hem dissenyat una nova abstracció de polítiques anomenada "microcontrolador" que transforma els objectes comuns en objectes intel·ligents, permetent als usuaris programar el seu comportament. Finalment, presentem la primera plataforma informàtica "serverless" guiada per dades i elàstica, que mitiga els problemes de col·locar el càlcul prop de les dades.En un mundo que depende cada vez más de la tecnología, los datos digitales se generan a una escala sin precedentes. Esto hace que empresas que requieren de un gran espacio de almacenamiento, como Netflix o Dropbox, usen soluciones de almacenamiento en la nube. Mas concretamente, el almacenamiento de objectos, dada su escalabilidad y alta disponibilidad. Sin embargo, estos almacenes se enfrentan a tres desafíos principales: 1) Gestión flexible de cargas de trabajo de múltiples usuarios. Normalmente, los almacenes de objetos son sistemas multi-usuario, lo que significa que todos ellos comparten los mismos recursos, lo que podría ocasionar problemas de interferencia. Además, es complejo administrar políticas de almacenamiento heterogéneas a gran escala en ellos. 2) Autogestión de datos. Los almacenes de objetos no ofrecen mucha flexibilidad con respecto a la autogestión de datos por parte de los usuarios. Típicamente, son sistemas rígidos, lo que impide gestionar los requisitos específicos de los objetos. 3) Cómputo elástico cerca de los datos. Situar los cálculos cerca de los datos puede ser útil para reducir la transferencia de datos. Pero, el desafío aquí es cómo lograr su elasticidad sin provocar contención de recursos e interferencias en la capa de almacenamiento. En esta tesis presentamos tres contribuciones que resuelven estos desafíos. En primer lugar, presentamos la primera arquitectura de almacenamiento definida por software (SDS) para almacenes de objetos que separa las capas de control y de datos. Esto permite gestionar las cargas de trabajo de múltiples usuarios de una manera flexible y dinámica. En segundo lugar, hemos diseñado una nueva abstracción de políticas llamada "microcontrolador" que transforma los objetos comunes en objetos inteligentes, permitiendo a los usuarios programar su comportamiento. Finalmente, presentamos la primera plataforma informática "serverless" guiada por datos y elástica, que mitiga los problemas de colocar el cálculo cerca de los datos.In a world that is increasingly dependent on technology, digital data is generated in an unprecedented way. This makes companies that require large storage space, such as Netflix or Dropbox, use cloud object storage solutions. This is mainly thanks to their built-in characteristics, such as simplicity, scalability and high-availability. However, cloud object stores face three main challenges: 1) Flexible management of multi-tenant workloads. Commonly, cloud object stores are multi-tenant systems, meaning that all tenants share the same system resources, which could lead to interference problems. Furthermore, it is now complex to manage heterogeneous storage policies in a massive scale. 2) Data self-management. Cloud object stores themselves do not offer much flexibility regarding data self-management by tenants. Typically, they are rigid, which prevent tenants to handle the specific requirements of their objects. 3) Elastic computation close to the data. Placing computations close to the data can be useful to reduce data transfers. But, the challenge here is how to achieve elasticity in those computations without provoking resource contention and interferences in the storage layer. In this thesis, we present three novel research contributions that solve the aforementioned challenges. Firstly, we introduce the first Software-defined Storage (SDS) architecture for cloud object stores that separates the control plane from the data plane, allowing to manage multi-tenant workloads in a flexible and dynamic way. For example, by applying different service levels of bandwidth to different tenants. Secondly, we designed a novel policy abstraction called microcontroller that transforms common objects into smart objects, enabling tenants to programmatically manage their behavior. For example, a content-level access control microcontroller attached to an specific object to filter its content depending on who is accessing it. Finally, we present the first elastic data-driven serverless computing platform that mitigates the resource contention problem of placing computation close to the data

    The technology of casually connected collaboration

    Get PDF
    Since the early eighties researchers have been studying the use of technology that supports collaboration amongst co-workers and group members. This field of computer science became known as Computer Supported Cooperative Work (CSCW). With the advent of wireless and mobile Internet communication technologies research in the CSCW field has been focused on providing “access, anytime and anywhere”. The main contribution of this study is to introduce and analyze the technology required to support casually connected collaboration. Firstly, we define casually connected collaboration as having “access, anytime and anywhere” to collaborators and resources without having explicit control or knowledge over the environment and its technical abilities. In order to distinguish between connected, mobile, and casually connected collaboration we introduce a conceptual model of collaboration that extrapolates the term “access, anytime and anywhere”. We then aim to prove the soundness of our model by using it to classify some well known collaboration scenarios. Furthermore, by evaluating the functional and non-functional requirements for a casually connected collaboration solution, we argue that current commercial and CSCW research implementations do not sufficiently meet these demands. We then present Nomad: a Peer-to-Peer framework specifically designed to overcome the challenges encountered in casually connected collaboration. We study the technology requirements and highlight the implementation details that enabled us to successfully conform to the requirements set by casually connected collaboration. Finally, we pave the road for future work by investigating new features introduced into the Microsoft .NET Framework version 4.0, Visual Studio 2010 and language enhancements made to C# version 4.0.Dissertation (MSc)--University of Pretoria, 2009.Computer Scienceunrestricte

    Dynamic Upgrade of Distributed Software Components

    Get PDF
    Die Aktualisierung von komplexen Telekommunikationssystemen, die sich durch die ihnen eigene Verteiltheit und hohe Kosten bei System-Nichtverfügbarkeit auszeichnen, ist ein komplizierter und fehleranfälliger Wartungsprozess. Noch stärkere Herausforderungen bergen solche Software-Aktualisierungen, die die Systemverfügbarkeit nicht beeinträchtigen sollen. Dynamic Upgrade ist eine Wartungstechnik, die das Verwalten und die Durchführung von Software-Aktualisierung automatisiert und damit den Betrieb des Systems während der Wartungszeit nicht unterbricht. In dieser Arbeit wird das Dynamic Upgrade als ein Sonderfall der Bereitstellung und Inbetriebnahme (Deployment) von Software betrachtet, in dem Teile der einen Dienst repräsentierenden Software durch neue Versionen im laufenden Betrieb ersetzt werden. Die Problemstellung des Dynamic Upgrade wird anhand einer vom Autor erarbeiteten Taxonomie erläutert, die die Entwurfsmöglichkeiten für ein System zur Unterstützung von Dynamic Upgrade hinsichtlich dreier Systemaspekte klassifiziert: Deployment, Evolution und Zuverlässigkeit (Dependability). Mit Hilfe dieser Taxonomie lassen sich auch andere Systeme zur Unterstützung von Dynamic Upgrade miteinander vergleichen. Aufbauend auf einem ausführlichen Vergleich über existierende Ansätze zur Unterstützung von Dynamic Upgrade, wird in der vorliegenden Arbeit eine Lösung entwickelt und dargestellt, die Dynamic Upgrade in verteilten komponentenbasierten Software-Systemen ermöglicht. Ausgehend von der Problemanalyse wird mit Hilfe des Unified Process ein als Deployment and Upgrade Facility bezeichnetes Modell entwickelt, das sowohl die benötigten Leistungsfähigkeiten eines Dynamic Upgrade unterstützenden Systems als auch Eigenschaften von aktualisierbaren Software-Komponenten beschreibt. Dieses Modell ist Plattform-unabhängig und einsetzbar für mehrere unterliegende Middleware-Technologien. Das Modell wird in einem Java-basierten prototypischen Rahmenwerk programmiert und um plattformspezifische Mechanismen auf der Jgroup/ARM Middleware erweitert. Das Rahmenwerk umfasst allgemeine Entwurfslösungen und ?muster, die sich für die Konstruktion einer Unterstützung für Dynamic Upgrade eignen. Es erlaubt die Kontrolle der Lebenszyklen von Aktualisierungsprozessen und ihre Koordination im Zielsystem. Darüber hinaus definiert es eine Reihe von Unterstützungsmechanismen und Algorithmen für den dynamischen Aktualisierungsprozess, der gegebenenfalls mit unterschiedlichen Zielsetzungen und unter verschiedenen Randbedingungen erfolgen soll. Insbesondere wird ein Aktualisierungsalgorithmus für replizierte Software-Komponenten dargestellt. Das entwickelte Rahmenwerk wird zwecks Plausibilitätsprüfung der dargestellten Ansätze und zur Auswertung der Auswirkungen der Dynamic Upgrade unterstützenden Mechanismen im Hinblick auf Systemperformanz in mehreren Experimenten eingesetzt. Diese quantitative Evaluierung der Experimente führt zu einer Spezifikationen eines einfachen Bewertungsmaßstabs (Benchmark), der sich zum Vergleich von Dynamic Upgrade unterstützenden Systemen eignet.Upgrading complex telecommunication software systems, characterized by their inherent distribution and a very high cost of system unavailability, is a difficult and error-prone maintenance activity. Even more challenging are such software upgrades that do not compromise the system availability. Dynamic upgrades is a technique, which automates performing and managing upgrades so that the software system remains operational during the upgrade time. In this thesis, the dynamic upgrade is considered as a special case of software deployment, in which a running service has to be replaced with its new version. The problems of dynamic upgrades are introduced using a novel taxonomy that classifies the design issues to be solved when building support for dynamic upgrade with regard to three system aspects: deployment, evolution and dependability and provides a reference to comparing other systems supporting dynamic upgrades. An extensive and thorough survey of existing approaches to dynamic upgrades follows and, furthermore, is as a starting point to designing a solution supporting dynamic upgrades in distributed component-based software systems. Derived from the problem analysis, a model called Deployment and Upgrade Facility describing the capabilities needed for managing and performing dynamic upgrades as well as properties of upgradable software components is developed using the Unified Process approach. The model is platform independent and can be used with a range of underlying middleware technologies. The model is implemented in a Java-based prototypical framework and extended with platform specific mechanisms on top of the JGroup/ARM middleware. The framework captures common design solutions and patterns for building a support for dynamic upgrade. The framework allows for controlling life-cycle and coordination of upgrade processes in the system. It also defines a number of supporting mechanisms and algorithms for the upgrade process. A special attention is drawn to an upgrade algorithm for replicated software components for achieving a synergy of replication techniques and dynamic upgrade . The developed framework is used to validate the feasibility of the approach and to measure the overhead of the mechanisms supporting dynamic upgrade with regard to the performance of the system being upgraded in a number of practical experiments. This quantitative evaluation of the experiments leads to a specification of a simple benchmark for systems supporting dynamic upgrades

    Improving reproducibility and reuse of modelling results in the life sciences

    Get PDF
    Research results are complex and include a variety of heterogeneous data. This entails major computational challenges to (i) to manage simulation studies, (ii) to ensure model exchangeability, stability and validity, and (iii) to foster communication between partners. I describe techniques to improve the reproducibility and reuse of modelling results. First, I introduce a method to characterise differences in computational models. Second, I present approaches to obtain shareable and reproducible research results. Altogether, my methods and tools foster exchange and reuse of modelling results.Die verteilte Entwicklung von komplexen Simulationsstudien birgt eine große Zahl an informationstechnischen Herausforderungen: (i) Modelle müssen verwaltet werden; (ii) Reproduzierbarkeit, Stabilität und Gültigkeit von Ergebnissen muss sichergestellt werden; und (iii) die Kommunikation zwischen Partnern muss verbessert werden. Ich stelle Techniken vor, um die Reproduzierbarkeit und Wiederverwendbarkeit von Modellierungsergebnissen zu verbessern. Meine Implementierungen wurden erfolgreich in internationalen Anwendungen integriert und fördern das Teilen von wissenschaftlichen Ergebnissen
    corecore