Search CORE

775 research outputs found

Predictive Coding Techniques with Manual Review to Identify Privileged Documents in E-Discovery

Author: Vinjumur Jyothi Keshavan
Publication venue
Publication date: 01/01/2018
Field of study

In twenty-first century civil litigation, discovery focuses on the retrieval of electronically stored information. Lawsuits may be won or lost because of incorrect production of electronic evidence. Organizations may generate fewer paper documents, leading to an increase in the amount of electronic documents by many fold. Litigants face the task of searching millions of electronic records for the presence of responsive and not-privileged documents, making the e-discovery process burdensome and expensive. In order to ensure that the material that has to be withheld is not inadvertently revealed, the electronic evidence that is found to be responsive to a production request is typically subjected to an exhaustive manual review for privilege. Although the budgetary constraints on review for responsiveness can be met using automation to some degree, attorneys have been hesitant to adopt similar technology to support the privilege review process. This dissertation draws attention to the potential for adopting predictive coding technology for the privilege review phase during the discovery process. Two main questions that are central to building a privilege classifier are addressed. The first question seeks to determine which set of annotations can serve as a reliable basis for evaluation. The second question seeks to determine which of the remaining annotations, when used for training classifiers, produce the best results. As an answer, binary classifiers are trained on labeled annotations from both junior and senior reviewers. Issues related to training bias and sample variance due to the reviewer's expertise are thoroughly discussed. Results show that the annotations that were randomly drawn and annotated by senior reviewers are useful for evaluation. The remaining annotations can be used for classifier training. A research prototype is built to perform a user study. Privilege judgments are gathered from multiple lawyers using two user interfaces. One of the two interfaces includes automatically generated features to aid the review process. The goal is to help lawyers make faster and more accurate privilege judgments. A significant improvement in recall was noted when comparing the users' review performance when using the automated annotations. Classifier features related to the people involved in privileged communications were found to be particularly important for the privilege review task. Results show that there was no measurable change in review time. As cost is proportional to time during review, as the final step, this work introduces a semi-automated framework that aims to optimize the cost of the manual review process. The framework calls for litigants to make some rational choices about what to manually review. The documents are first automatically classified for responsiveness and privilege, and then some of the automatically classified documents are reviewed by human reviewers for responsiveness and for privilege with the overall goal of minimizing the expected cost of the entire process, including costs that arise from incorrect decisions. A risk-based ranking algorithm is used to determine which documents need to be manually reviewed. Multiple baselines are used to characterize the cost savings achieved by this approach. Although the work in this dissertation is applied to e-discovery, similar approaches could be applied to any case in which retrieval systems have to withhold a set of confidential documents despite their relevance to the request

Digital Repository at the University of Maryland

Penetration Testing Frameworks and methodologies: A comparison and evaluation

Author: Shanley Aleatha
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2016
Field of study

Cyber security is fast becoming a strategic priority across both governments and private organisations. With technology abundantly available, and the unbridled growth in the size and complexity of information systems, cyber criminals have a multitude of targets. Therefore, cyber security assessments are becoming common practice as concerns about information security grow. Penetration testing is one strategy used to mitigate the risk of cyber-attack. Penetration testers attempt to compromise systems using the same tools and techniques as malicious attackers thus, aim to identify vulnerabilities before an attack occurs. Penetration testing can be complex depending on the scope and domain area under investigation, for this reason it is often managed similarly to that of a project necessitating the implementation of some framework or methodology. Fortunately, there are an array of penetration testing methodologies and frameworks available to facilitate such projects, however, determining what is a framework and what is methodology within this context can lend itself to uncertainty. Furthermore, little exists in relation to mature frameworks whereby quality can be measured. This research defines the concept of “methodology” and “framework” within a penetration testing context. In addition, the research presents a gap analysis of the theoretical vs. the practical classification of nine penetration testing frameworks and/or methodologies and subsequently selects two frameworks to undergo quality evaluation using a realworld case study. Quality characteristics were derived from a review of four quality models, thus building the foundation for a proposed penetration testing quality model. The penetration testing quality model is a modified version of an ISO quality model whereby the two chosen frameworks underwent quality evaluation. Defining methodologies and frameworks for the purposes of penetration testing was achieved. A suitable definition was formed by way of analysing properties of each category respectively, thus a Framework vs. Methodology Characteristics matrix is presented. Extending upon the nomenclature resolution, a gap analysis was performed to determine if a framework is actually a framework, i.e., it has a sound underlying ontology. In contrast, many “frameworks” appear to be simply collections of tools or techniques. In addition, two frameworks OWASP’s Testing Guide and Information System Security Assessment Framework (ISSAF), were employed to perform penetration tests based on a real-world case study to facilitate quality evaluation based on a proposed quality model. The research suggests there are various ways in which quality for penetration testing frameworks can be measured; therefore concluded that quality evaluation is possible

Research Online @ ECU

Technical alignment

Author: Rauber A.
Rusbridge A.
Schrimpf S.
Schultz M.
Seadle M.
Publication venue: Educopia Institute Publications
Publication date: 01/01/2012
Field of study

This essay discusses the importance of the areas of infrastructure and testing to help digital preservation services demonstrate reliability, transparency, and accountability. It encourages practitioners to build a strong culture in which transparency and collaborations between technical frameworks are valued highly. It also argues for devising and applying agreed-upon metrics that will enable the systematic analysis of preservation infrastructure. The essay begins by defining technical infrastructure and testing in the digital preservation context, provides case studies that exemplify both progress and challenges for technical alignment in both areas, and concludes with suggestions for achieving greater degrees of technical alignment going forward

Enlighten

A Review of Rule Learning Based Intrusion Detection Systems and Their Prospects in Smart Grids

Author: Hagenmeyer Veit
Keller Hubert B.
Liu Qi
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 07/04/2021
Field of study

KITopen

Behind the Scenes: On the Relationship Between Developer Experience and Refactoring

Author: AlOmar Eman Abdullah
Mkaouer Mohamed Wiem
Newman Christian D.
Ouni Ali
Peruma Anthony
Publication venue: RIT Scholar Works
Publication date: 28/10/2021
Field of study

Refactoring is widely recognized as one of the efficient techniques to manage technical debt and maintain a healthy software project through enforcing best design practices, or coping with design defects. Previous refactoring surveys have shown that code refactoring activities are mainly executed by developers who have sufficient knowledge of the system’s design, and disposing of leadership roles in their development teams. However, these surveys were mainly limited to specific projects and companies. In this paper, we explore the generalizability of the previous results by analyzing 800 open-source projects. We mine their refactoring activities, and we identify their corresponding contributors. Then, we associate an experience score to each contributor in order to test various hypotheses related to whether developers with higher scores tend to 1) perform a higher number of refactoring operations 2) exhibit different motivations behind their refactoring, and 3) better document their refactoring activity. We found that (1) although refactoring is not restricted to a subset of developers, those with higher contribution score tend to perform more refactorings than others; (2) while there is no correlation between experience and motivation behind refactoring, top contributed developers are found to perform a wider variety of refactoring operations, regardless of their complexity; and (3) top contributed developer tend to document less their refactoring activity. Our qualitative analysis of three randomly sampled projects show that the developers who are responsible for the majority of refactoring activities are typically in advanced positions in their development teams, demonstrating their extensive knowledge of the design of the systems they contribute to

Crossref

RIT Scholar Works

Framework for Security Transparency in Cloud Computing

Author: Ismail U.
Ismail U.
Publication venue: University of East London
Publication date: 01/01/2020
Field of study

The migration of sensitive data and applications from the on-premise data centre to a cloud environment increases cyber risks to users, mainly because the cloud environment is managed and maintained by a third-party. In particular, the partial surrender of sensitive data and application to a cloud environment creates numerous concerns that are related to a lack of security transparency. Security transparency involves the disclosure of information by cloud service providers about the security measures being put in place to protect assets and meet the expectations of customers. It establishes trust in service relationship between cloud service providers and customers, and without evidence of continuous transparency, trust and confidence are affected and are likely to hinder extensive usage of cloud services. Also, insufficient security transparency is considered as an added level of risk and increases the difficulty of demonstrating conformance to customer requirements and ensuring that the cloud service providers adequately implement security obligations. The research community have acknowledged the pressing need to address security transparency concerns, and although technical aspects for ensuring security and privacy have been researched widely, the focus on security transparency is still scarce. The relatively few literature mostly approach the issue of security transparency from cloud providers’ perspective, while other works have contributed feasible techniques for comparison and selection of cloud service providers using metrics such as transparency and trustworthiness. However, there is still a shortage of research that focuses on improving security transparency from cloud users’ point of view. In particular, there is still a gap in the literature that (i) dissects security transparency from the lens of conceptual knowledge up to implementation from organizational and technical perspectives and; (ii) support continuous transparency by enabling the vetting and probing of cloud service providers’ conformity to specific customer requirements. The significant growth in moving business to the cloud – due to its scalability and perceived effectiveness – underlines the dire need for research in this area. This thesis presents a framework that comprises the core conceptual elements that constitute security transparency in cloud computing. It contributes to the knowledge domain of security transparency in cloud computing by proposing the following. Firstly, the research analyses the basics of cloud security transparency by exploring the notion and foundational concepts that constitute security transparency. Secondly, it proposes a framework which integrates various concepts from requirement engineering domain and an accompanying process that could be followed to implement the framework. The framework and its process provide an essential set of conceptual ideas, activities and steps that can be followed at an organizational level to attain security transparency, which are based on the principles of industry standards and best practices. Thirdly, for ensuring continuous transparency, the thesis proposes an essential tool that supports the collection and assessment of evidence from cloud providers, including the establishment of remedial actions for redressing deficiencies in cloud provider practices. The tool serves as a supplementary component of the proposed framework that enables continuous inspection of how predefined customer requirements are being satisfied. The thesis also validates the proposed security transparency framework and tool in terms of validity, applicability, adaptability, and acceptability using two different case studies. Feedbacks are collected from stakeholders and analysed using essential criteria such as ease of use, relevance, usability, etc. The result of the analysis illustrates the validity and acceptability of both the framework and tool in enhancing security transparency in a real-world environment

UEL Research Repository at University of East London

Constructing Educational Criticism Of Online Courses: A Model For Implementation By Practitioners

Author: Thompson Kelvin
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2005
Field of study

Online courses are complex, human-driven contexts for formal learning. Little has been said about the environment emerging from the interaction of instructor(s), learners, and other resources in such courses. Theories that focus on instructional settings and methods that are designed to accommodate inquiry into complex phenomena are essential to the systematic study of online courses. Such a line of research is necessary as the basis for a common language with which we can begin to speak holistically about online courses. In this dissertation, I attempt to generate better questions about the nature of online instructional environments. By combining prior works related to educational criticism and qualitative research case study with original innovations, I develop a model for studying the instructional experiences of online courses. I then apply this approach in the study of one specific online course at the University of Central Florida (UCF)

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Machine learning and computational methods to identify molecular and clinical markers for complex diseases – case studies in cancer and obesity

Author: Seyednasrollah Fatemeh
Publication venue: fi=Turun yliopisto|en=University of Turku|
Publication date: 04/02/2022
Field of study

In biomedical research, applied machine learning and bioinformatics are the essential disciplines heavily involved in translating data-driven findings into medical practice. This task is especially accomplished by developing computational tools and algorithms assisting in detection and clarification of underlying causes of the diseases. The continuous advancements in high-throughput technologies coupled with the recently promoted data sharing policies have contributed to presence of a massive wealth of data with remarkable potential to improve human health care. In concordance with this massive boost in data production, innovative data analysis tools and methods are required to meet the growing demand. The data analyzed by bioinformaticians and computational biology experts can be broadly divided into molecular and conventional clinical data categories. The aim of this thesis was to develop novel statistical and machine learning tools and to incorporate the existing state-of-the-art methods to analyze bio-clinical data with medical applications. The findings of the studies demonstrate the impact of computational approaches in clinical decision making by improving patients risk stratification and prediction of disease outcomes. This thesis is comprised of five studies explaining method development for 1) genomic data, 2) conventional clinical data and 3) integration of genomic and clinical data. With genomic data, the main focus is detection of differentially expressed genes as the most common task in transcriptome profiling projects. In addition to reviewing available differential expression tools, a data-adaptive statistical method called Reproducibility Optimized Test Statistic (ROTS) is proposed for detecting differential expression in RNA-sequencing studies. In order to prove the efficacy of ROTS in real biomedical applications, the method is used to identify prognostic markers in clear cell renal cell carcinoma (ccRCC). In addition to previously known markers, novel genes with potential prognostic and therapeutic role in ccRCC are detected. For conventional clinical data, ensemble based predictive models are developed to provide clinical decision support in treatment of patients with metastatic castration resistant prostate cancer (mCRPC). The proposed predictive models cover treatment and survival stratification tasks for both trial-based and realworld patient cohorts. Finally, genomic and conventional clinical data are integrated to demonstrate the importance of inclusion of genomic data in predictive ability of clinical models. Again, utilizing ensemble-based learners, a novel model is proposed to predict adulthood obesity using both genetic and social-environmental factors. Overall, the ultimate objective of this work is to demonstrate the importance of clinical bioinformatics and machine learning for bio-clinical marker discovery in complex disease with high heterogeneity. In case of cancer, the interpretability of clinical models strongly depends on predictive markers with high reproducibility supported by validation data. The discovery of these markers would increase chance of early detection and improve prognosis assessment and treatment choice

UTUPub

Recommended from our members

Semantic discovery and reuse of business process patterns

Author: Aldin Laden
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics PhD Theses
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In modern organisations business process modelling has become fundamental due to the increasing rate of organisational change. As a consequence, an organisation needs to continuously redesign its business processes on a regular basis. One major problem associated with the way business process modelling (BPM) is carried out today is the lack of explicit and systematic reuse of previously developed models. Enabling the reuse of previously modelled behaviour can have a beneficial impact on the quality and efficiency of the overall information systems development process and also improve the effectiveness of an organisation’s business processes. In related disciplines, like software engineering, patterns have emerged as a widely accepted architectural mechanism for reusing solutions. In business process modelling the use of patterns is quite limited apart from few sporadic attempts proposed by the literature. Thus, pattern-based BPM is not commonplace. Business process patterns should ideally be discovered from the empirical analysis of organisational processes. Empiricism is currently not the basis for the discovery of patterns for business process modelling and no systematic methodology for collecting and analysing process models of business organisations currently exists. The purpose of the presented research project is to develop a methodological framework for achieving reuse in BPM via the discovery and adoption of patterns. The framework is called Semantic Discovery and Reuse of Business Process Patterns (SDR). SDR provides a systematic method for identifying patterns among organisational data assets representing business behaviour. The framework adopts ontologies (i.e., formalised conceptual models of real-world domains) in order to facilitate such discovery. The research has also produced an ontology of business processes that provides the underlying semantic definitions of processes and their constituent parts. The use of ontologies to model business processes represents a novel approach and combines advances achieved by the Semantic Web and BPM communities. The methodological framework also relates to a new line of research in BPM on declarative business processes in which the models specify what should be done rather than how to ‘prescriptively’ do it. The research follows a design science method for designing and evaluating SDR. Evaluation is carried out using real world sources and reuse scenarios taken from both the financial and educational domains

Brunel University Research Archive

Linked Research on the Decentralised Web

Author: Capadisli Sarven
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are disempowered by the centralisation of certain systems, such as academic publishing platforms and social media. I share my findings on the feasibility of a decentralised and interoperable information space where researchers can control their identifiers whilst fulfilling the core functions of scientific communication: registration, awareness, certification, and archiving. The contemporary research communication paradigm operates under a diverse set of sociotechnical constraints, which influence how units of research information and personal data are created and exchanged. Economic forces and non-interoperable system designs mean that researcher identifiers and research contributions are largely shaped and controlled by third-party entities; participation requires the use of proprietary systems. From a technical standpoint, this thesis takes a deep look at semantic structure of research artifacts, and how they can be stored, linked and shared in a way that is controlled by individual researchers, or delegated to trusted parties. Further, I find that the ecosystem was lacking a technical Web standard able to fulfill the awareness function of research communication. Thus, I contribute a new communication protocol, Linked Data Notifications (published as a W3C Recommendation) which enables decentralised notifications on the Web, and provide implementations pertinent to the academic publishing use case. So far we have seen decentralised notifications applied in research dissemination or collaboration scenarios, as well as for archival activities and scientific experiments. Another core contribution of this work is a Web standards-based implementation of a clientside tool, dokieli, for decentralised article publishing, annotations and social interactions. dokieli can be used to fulfill the scholarly functions of registration, awareness, certification, and archiving, all in a decentralised manner, returning control of research contributions and discourse to individual researchers. The overarching conclusion of the thesis is that Web technologies can be used to create a fully functioning ecosystem for research communication. Using the framework of Web architecture, and loosely coupling the four functions, an accessible and inclusive ecosystem can be realised whereby users are able to use and switch between interoperable applications without interfering with existing data. Technical solutions alone do not suffice of course, so this thesis also takes into account the need for a change in the traditional mode of thinking amongst scholars, and presents the Linked Research initiative as an ongoing effort toward researcher autonomy in a social system, and universal access to human- and machine-readable information. Outcomes of this outreach work so far include an increase in the number of individuals self-hosting their research artifacts, workshops publishing accessible proceedings on the Web, in-the-wild experiments with open and public peer-review, and semantic graphs of contributions to conference proceedings and journals (the Linked Open Research Cloud). Some of the future challenges include: addressing the social implications of decentralised Web publishing, as well as the design of ethically grounded interoperable mechanisms; cultivating privacy aware information spaces; personal or community-controlled on-demand archiving services; and further design of decentralised applications that are aware of the core functions of scientific communication

bonndoc – Der Publikationsserver der Universität Bonn