67 research outputs found
Can Common Crawl Reliably Track Persistent Identifier (PID) Use Over Time
We report here on the results of two studies using two and four monthly web
crawls respectively from the Common Crawl (CC) initiative between 2014 and
2017, whose initial goal was to provide empirical evidence for the changing
patterns of use of so-called persistent identifiers. This paper focusses on the
tooling needed for dealing with CC data, and the problems we found with it. The
first study is based on over URIs from over pages crawled
in April 2014 and April 2017, the second study adds a further pages
from the April 2015 and April 2016 crawls. We conclude with suggestions on
specific actions needed to enable studies based on CC to give reliable
longitudinal information.Comment: 7 pages, 1 figure, submitted to TempWeb201
Towards Understanding Systems Through User Interactions
Modern computer systems are complex. Even in the best of conditions, it can be difficult to understand the behavior of the system and identify why certain actions are occurring. Existing systems attempt to provide insight by reviewing the effects of actions on the system and estimating their cause. As computer systems are strongly driven by actions of the user, we propose an approach to identify processes which have interacted with the user and provide data to which system behaviors were caused by the user. We implement three sensors within the graphical user interface capable of extracting the necessary information to identify these processes. We show our instrumentation is effective in characterizing applications with an on-screen presence, and provide data towards the determination of user intentions. We prove that our method for obtaining the information from the user interface can be done in an efficient manner with minimal overheads
Proceedings of the 12th International Conference on Digital Preservation
The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase
Proceedings of the 12th International Conference on Digital Preservation
The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase
A large-scale temporal measurement of Android malicious apps: persistence, migration, and lessons learned
CNS-2127232 - National Science FoundationAccepted manuscrip
Linked Research on the Decentralised Web
This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are disempowered by the centralisation of certain systems, such as academic publishing platforms and social media. I share my findings on the feasibility of a decentralised and interoperable information space where researchers can control their identifiers whilst fulfilling the core functions of scientific communication: registration, awareness, certification, and archiving.
The contemporary research communication paradigm operates under a diverse set of sociotechnical constraints, which influence how units of research information and personal data are created and exchanged. Economic forces and non-interoperable system designs mean that researcher identifiers and research contributions are largely shaped and controlled by third-party entities; participation requires the use of proprietary systems.
From a technical standpoint, this thesis takes a deep look at semantic structure of research artifacts, and how they can be stored, linked and shared in a way that is controlled by individual researchers, or delegated to trusted parties. Further, I find that the ecosystem was lacking a technical Web standard able to fulfill the awareness function of research communication. Thus, I contribute a new communication protocol, Linked Data Notifications (published as a W3C Recommendation) which enables decentralised notifications on the Web, and provide implementations pertinent to the academic publishing use case. So far we have seen decentralised notifications applied in research dissemination or collaboration scenarios, as well as for archival activities and scientific experiments.
Another core contribution of this work is a Web standards-based implementation of a clientside tool, dokieli, for decentralised article publishing, annotations and social interactions. dokieli can be used to fulfill the scholarly functions of registration, awareness, certification, and archiving, all in a decentralised manner, returning control of research contributions and discourse to individual researchers.
The overarching conclusion of the thesis is that Web technologies can be used to create a fully functioning ecosystem for research communication. Using the framework of Web architecture, and loosely coupling the four functions, an accessible and inclusive ecosystem can be realised whereby users are able to use and switch between interoperable applications without interfering with existing data.
Technical solutions alone do not suffice of course, so this thesis also takes into account the need for a change in the traditional mode of thinking amongst scholars, and presents the Linked Research initiative as an ongoing effort toward researcher autonomy in a social system, and universal access to human- and machine-readable information. Outcomes of this outreach work so far include an increase in the number of individuals self-hosting their research artifacts, workshops publishing accessible proceedings on the Web, in-the-wild experiments with open and public peer-review, and semantic graphs of contributions to conference proceedings and journals (the Linked Open Research Cloud).
Some of the future challenges include: addressing the social implications of decentralised Web publishing, as well as the design of ethically grounded interoperable mechanisms; cultivating privacy aware information spaces; personal or community-controlled on-demand archiving services; and further design of decentralised applications that are aware of the core functions of scientific communication
Building information modeling – A game changer for interoperability and a chance for digital preservation of architectural data?
Digital data associated with the architectural design-andconstruction
process is an essential resource alongside -and even
past- the lifecycle of the construction object it describes. Despite
this, digital architectural data remains to be largely neglected in
digital preservation research – and vice versa, digital preservation
is so far neglected in the design-and-construction process. In the
last 5 years, Building Information Modeling (BIM) has seen a
growing adoption in the architecture and construction domains,
marking a large step towards much needed interoperability. The
open standard IFC (Industry Foundation Classes) is one way in
which data is exchanged in BIM processes. This paper presents a
first digital preservation based look at BIM processes,
highlighting the history and adoption of the methods as well as
the open file format standard IFC (Industry Foundation Classes)
as one way to store and preserve BIM data
Using the Web Infrastructure for Real Time Recovery of Missing Web Pages
Given the dynamic nature of the World Wide Web, missing web pages, or 404 Page not Found responses, are part of our web browsing experience. It is our intuition that information on the web is rarely completely lost, it is just missing. In whole or in part, content often moves from one URI to another and hence it just needs to be (re-)discovered. We evaluate several methods for a \justin- time approach to web page preservation. We investigate the suitability of lexical signatures and web page titles to rediscover missing content. It is understood that web pages change over time which implies that the performance of these two methods depends on the age of the content. We therefore conduct a temporal study of the decay of lexical signatures and titles and estimate their half-life. We further propose the use of tags that users have created to annotate pages as well as the most salient terms derived from a page\u27s link neighborhood. We utilize the Memento framework to discover previous versions of web pages and to execute the above methods. We provide a work ow including a set of parameters that is most promising for the (re-)discovery of missing web pages. We introduce Synchronicity, a web browser add-on that implements this work ow. It works while the user is browsing and detects the occurrence of 404 errors automatically. When activated by the user Synchronicity offers a total of six methods to either rediscover the missing page at its new URI or discover an alternative page that satisfies the user\u27s information need. Synchronicity depends on user interaction which enables it to provide results in real time
Establishing cyber situational awareness in industrial control systems
The cyber threat to industrial control systems is an acknowledged security issue, but a
qualified dataset to quantify the risk remains largely unavailable. Senior executives of
facilities that operate these systems face competing requirements for investment budgets,
but without an understanding of the nature of the threat cyber security may not
be a high priority. Operational managers and cyber incident responders at these facilities
face a similarly complex situation. They must plan for the defence of critical
systems, often unfamiliar to IT security professionals, from potentially capable, adaptable
and covert antagonists who will actively attempt to evade detection. The scope
of the challenge requires a coherent, enterprise-level awareness of the threat, such that
organisations can assess their operational priorities, plan their defensive posture, and
rehearse their responses prior to such an attack.
This thesis proposes a novel combination of concepts found in risk assessment,
intrusion detection, education, exercising, safety and process models, fused with experiential
learning through serious games. It progressively builds a common set of shared
mental models across an ICS operation to frame the nature of the adversary and establish
enterprise situational awareness that permeates through all levels of teams involved
in addressing the threat. This is underpinned by a set of coping strategies that identifies
probable targets for advanced threat actors, proactively determining antagonistic
courses of actions to derive an appropriate response strategy
Analyzing & designing the security of shared resources on smartphone operating systems
Smartphone penetration surpassed 80% in the US and nears 70% in Western Europe. In fact, smartphones became the de facto devices users leverage to manage personal information and access external data and other connected devices on a daily basis. To support such multi-faceted functionality, smartphones are designed with a multi-process architecture, which enables third-party developers to build smartphone applications which can utilize smartphone internal and external resources to offer creative utility to users. Unfortunately, such third-party programs can exploit security inefficiencies in smartphone operating systems to gain unauthorized access to available resources, compromising the confidentiality of rich, highly sensitive user data.
The smartphone ecosystem, is designed such that users can readily install and replace applications on their smartphones. This facilitates users’ efforts in customizing the capabilities of their smartphones tailored to their needs. Statistics report an increasing number of available smartphone applications— in 2017 there were approximately 3.5 million third-party apps on the official application store of the most popular smartphone platform. In addition we expect users to have approximately 95 such applications installed on their smartphones at any given point. However, mobile apps are developed by untrusted sources. On Android—which enjoys 80% of the smartphone OS market share—application developers are identified based on self-sign certificates. Thus there is no good way of holding a developer accountable for a malicious behavior. This creates an issue of multi-tenancy on smartphones where principals from diverse untrusted sources share internal and external smartphone resources. Smartphone OSs rely on traditional operating system process isolation strategies to confine untrusted third-party applications. However this approach is insufficient because incidental seemingly harmless resources can be utilized by untrusted tenants as side-channels to bypass the process boundaries. Smartphones also introduced a permission model to allow their users to govern third-party application access to system resources (such as camera, microphone and location functionality). However, this permission model is both coarse-grained and does not distinguish whether a permission has been declared by a trusted or an untrusted principal. This allows malicious applications to perform privilege escalation attacks on the mobile platform. To make things worse, applications might include third- party libraries, for advertising or common recognition tasks. Such libraries share the process address space with their host apps and as such can inherit all the privileges the host app does. Identifying and mitigating these problems on smartphones is not a trivial process. Manual analysis on its own of all mobile apps is cumbersome and impractical, code analysis techniques suffer from scalability and coverage issues, ad-hoc approaches are impractical and susceptible to mistakes, while sometimes vulnerabilities are well hidden at the interplays between smartphone tenants and resources.
In this work I follow an analytical approach to discover major security and privacy issues on smartphone platforms. I utilize the Android OS as a use case, because of its open-source nature but also its popularity. In particular I focus on the multi-tenancy characteristic of smartphones and identify the re- sources each tenant within a process, across processes and across devices can access. I design analytical tools to automate the discovery process, attacks to better understand the adversary models, and introduce design changes to the participating systems to enable robust fine-grained access control of resources. My approach revealed a new understanding of the threats introduced from third-party libraries within an application process; it revealed new capabilities of the mobile application adversary exploiting shared filesystem and permission resources; and shows how a mobile app adversary can exploit shared communication mediums to compromise the confidentiality of the data collected by external devices (e.g. fitness and medical accessories, NFC tags etc.). Moreover, I show how we can eradicate these problems following an architectural design approach to introduce backward-compatible, effective and efficient modifications in operating systems to achieve fine-grained application access to shared resources. My work has let to security changes in the official release of Android by Google
- …