67 research outputs found

    Can Common Crawl Reliably Track Persistent Identifier (PID) Use Over Time

    Get PDF
    We report here on the results of two studies using two and four monthly web crawls respectively from the Common Crawl (CC) initiative between 2014 and 2017, whose initial goal was to provide empirical evidence for the changing patterns of use of so-called persistent identifiers. This paper focusses on the tooling needed for dealing with CC data, and the problems we found with it. The first study is based on over 101210^{12} URIs from over 5∗1095 * 10^9 pages crawled in April 2014 and April 2017, the second study adds a further 3∗1093 * 10^9 pages from the April 2015 and April 2016 crawls. We conclude with suggestions on specific actions needed to enable studies based on CC to give reliable longitudinal information.Comment: 7 pages, 1 figure, submitted to TempWeb201

    Towards Understanding Systems Through User Interactions

    Get PDF
    Modern computer systems are complex. Even in the best of conditions, it can be difficult to understand the behavior of the system and identify why certain actions are occurring. Existing systems attempt to provide insight by reviewing the effects of actions on the system and estimating their cause. As computer systems are strongly driven by actions of the user, we propose an approach to identify processes which have interacted with the user and provide data to which system behaviors were caused by the user. We implement three sensors within the graphical user interface capable of extracting the necessary information to identify these processes. We show our instrumentation is effective in characterizing applications with an on-screen presence, and provide data towards the determination of user intentions. We prove that our method for obtaining the information from the user interface can be done in an efficient manner with minimal overheads

    Proceedings of the 12th International Conference on Digital Preservation

    Get PDF
    The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase

    Proceedings of the 12th International Conference on Digital Preservation

    Get PDF
    The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase

    Linked Research on the Decentralised Web

    Get PDF
    This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are disempowered by the centralisation of certain systems, such as academic publishing platforms and social media. I share my findings on the feasibility of a decentralised and interoperable information space where researchers can control their identifiers whilst fulfilling the core functions of scientific communication: registration, awareness, certification, and archiving. The contemporary research communication paradigm operates under a diverse set of sociotechnical constraints, which influence how units of research information and personal data are created and exchanged. Economic forces and non-interoperable system designs mean that researcher identifiers and research contributions are largely shaped and controlled by third-party entities; participation requires the use of proprietary systems. From a technical standpoint, this thesis takes a deep look at semantic structure of research artifacts, and how they can be stored, linked and shared in a way that is controlled by individual researchers, or delegated to trusted parties. Further, I find that the ecosystem was lacking a technical Web standard able to fulfill the awareness function of research communication. Thus, I contribute a new communication protocol, Linked Data Notifications (published as a W3C Recommendation) which enables decentralised notifications on the Web, and provide implementations pertinent to the academic publishing use case. So far we have seen decentralised notifications applied in research dissemination or collaboration scenarios, as well as for archival activities and scientific experiments. Another core contribution of this work is a Web standards-based implementation of a clientside tool, dokieli, for decentralised article publishing, annotations and social interactions. dokieli can be used to fulfill the scholarly functions of registration, awareness, certification, and archiving, all in a decentralised manner, returning control of research contributions and discourse to individual researchers. The overarching conclusion of the thesis is that Web technologies can be used to create a fully functioning ecosystem for research communication. Using the framework of Web architecture, and loosely coupling the four functions, an accessible and inclusive ecosystem can be realised whereby users are able to use and switch between interoperable applications without interfering with existing data. Technical solutions alone do not suffice of course, so this thesis also takes into account the need for a change in the traditional mode of thinking amongst scholars, and presents the Linked Research initiative as an ongoing effort toward researcher autonomy in a social system, and universal access to human- and machine-readable information. Outcomes of this outreach work so far include an increase in the number of individuals self-hosting their research artifacts, workshops publishing accessible proceedings on the Web, in-the-wild experiments with open and public peer-review, and semantic graphs of contributions to conference proceedings and journals (the Linked Open Research Cloud). Some of the future challenges include: addressing the social implications of decentralised Web publishing, as well as the design of ethically grounded interoperable mechanisms; cultivating privacy aware information spaces; personal or community-controlled on-demand archiving services; and further design of decentralised applications that are aware of the core functions of scientific communication

    Building information modeling – A game changer for interoperability and a chance for digital preservation of architectural data?

    Get PDF
    Digital data associated with the architectural design-andconstruction process is an essential resource alongside -and even past- the lifecycle of the construction object it describes. Despite this, digital architectural data remains to be largely neglected in digital preservation research – and vice versa, digital preservation is so far neglected in the design-and-construction process. In the last 5 years, Building Information Modeling (BIM) has seen a growing adoption in the architecture and construction domains, marking a large step towards much needed interoperability. The open standard IFC (Industry Foundation Classes) is one way in which data is exchanged in BIM processes. This paper presents a first digital preservation based look at BIM processes, highlighting the history and adoption of the methods as well as the open file format standard IFC (Industry Foundation Classes) as one way to store and preserve BIM data

    Using the Web Infrastructure for Real Time Recovery of Missing Web Pages

    Get PDF
    Given the dynamic nature of the World Wide Web, missing web pages, or 404 Page not Found responses, are part of our web browsing experience. It is our intuition that information on the web is rarely completely lost, it is just missing. In whole or in part, content often moves from one URI to another and hence it just needs to be (re-)discovered. We evaluate several methods for a \justin- time approach to web page preservation. We investigate the suitability of lexical signatures and web page titles to rediscover missing content. It is understood that web pages change over time which implies that the performance of these two methods depends on the age of the content. We therefore conduct a temporal study of the decay of lexical signatures and titles and estimate their half-life. We further propose the use of tags that users have created to annotate pages as well as the most salient terms derived from a page\u27s link neighborhood. We utilize the Memento framework to discover previous versions of web pages and to execute the above methods. We provide a work ow including a set of parameters that is most promising for the (re-)discovery of missing web pages. We introduce Synchronicity, a web browser add-on that implements this work ow. It works while the user is browsing and detects the occurrence of 404 errors automatically. When activated by the user Synchronicity offers a total of six methods to either rediscover the missing page at its new URI or discover an alternative page that satisfies the user\u27s information need. Synchronicity depends on user interaction which enables it to provide results in real time

    Establishing cyber situational awareness in industrial control systems

    Get PDF
    The cyber threat to industrial control systems is an acknowledged security issue, but a qualified dataset to quantify the risk remains largely unavailable. Senior executives of facilities that operate these systems face competing requirements for investment budgets, but without an understanding of the nature of the threat cyber security may not be a high priority. Operational managers and cyber incident responders at these facilities face a similarly complex situation. They must plan for the defence of critical systems, often unfamiliar to IT security professionals, from potentially capable, adaptable and covert antagonists who will actively attempt to evade detection. The scope of the challenge requires a coherent, enterprise-level awareness of the threat, such that organisations can assess their operational priorities, plan their defensive posture, and rehearse their responses prior to such an attack. This thesis proposes a novel combination of concepts found in risk assessment, intrusion detection, education, exercising, safety and process models, fused with experiential learning through serious games. It progressively builds a common set of shared mental models across an ICS operation to frame the nature of the adversary and establish enterprise situational awareness that permeates through all levels of teams involved in addressing the threat. This is underpinned by a set of coping strategies that identifies probable targets for advanced threat actors, proactively determining antagonistic courses of actions to derive an appropriate response strategy

    Analyzing & designing the security of shared resources on smartphone operating systems

    Get PDF
    Smartphone penetration surpassed 80% in the US and nears 70% in Western Europe. In fact, smartphones became the de facto devices users leverage to manage personal information and access external data and other connected devices on a daily basis. To support such multi-faceted functionality, smartphones are designed with a multi-process architecture, which enables third-party developers to build smartphone applications which can utilize smartphone internal and external resources to offer creative utility to users. Unfortunately, such third-party programs can exploit security inefficiencies in smartphone operating systems to gain unauthorized access to available resources, compromising the confidentiality of rich, highly sensitive user data. The smartphone ecosystem, is designed such that users can readily install and replace applications on their smartphones. This facilitates users’ efforts in customizing the capabilities of their smartphones tailored to their needs. Statistics report an increasing number of available smartphone applications— in 2017 there were approximately 3.5 million third-party apps on the official application store of the most popular smartphone platform. In addition we expect users to have approximately 95 such applications installed on their smartphones at any given point. However, mobile apps are developed by untrusted sources. On Android—which enjoys 80% of the smartphone OS market share—application developers are identified based on self-sign certificates. Thus there is no good way of holding a developer accountable for a malicious behavior. This creates an issue of multi-tenancy on smartphones where principals from diverse untrusted sources share internal and external smartphone resources. Smartphone OSs rely on traditional operating system process isolation strategies to confine untrusted third-party applications. However this approach is insufficient because incidental seemingly harmless resources can be utilized by untrusted tenants as side-channels to bypass the process boundaries. Smartphones also introduced a permission model to allow their users to govern third-party application access to system resources (such as camera, microphone and location functionality). However, this permission model is both coarse-grained and does not distinguish whether a permission has been declared by a trusted or an untrusted principal. This allows malicious applications to perform privilege escalation attacks on the mobile platform. To make things worse, applications might include third- party libraries, for advertising or common recognition tasks. Such libraries share the process address space with their host apps and as such can inherit all the privileges the host app does. Identifying and mitigating these problems on smartphones is not a trivial process. Manual analysis on its own of all mobile apps is cumbersome and impractical, code analysis techniques suffer from scalability and coverage issues, ad-hoc approaches are impractical and susceptible to mistakes, while sometimes vulnerabilities are well hidden at the interplays between smartphone tenants and resources. In this work I follow an analytical approach to discover major security and privacy issues on smartphone platforms. I utilize the Android OS as a use case, because of its open-source nature but also its popularity. In particular I focus on the multi-tenancy characteristic of smartphones and identify the re- sources each tenant within a process, across processes and across devices can access. I design analytical tools to automate the discovery process, attacks to better understand the adversary models, and introduce design changes to the participating systems to enable robust fine-grained access control of resources. My approach revealed a new understanding of the threats introduced from third-party libraries within an application process; it revealed new capabilities of the mobile application adversary exploiting shared filesystem and permission resources; and shows how a mobile app adversary can exploit shared communication mediums to compromise the confidentiality of the data collected by external devices (e.g. fitness and medical accessories, NFC tags etc.). Moreover, I show how we can eradicate these problems following an architectural design approach to introduce backward-compatible, effective and efficient modifications in operating systems to achieve fine-grained application access to shared resources. My work has let to security changes in the official release of Android by Google
    • …
    corecore