775 research outputs found

    Predictive Coding Techniques with Manual Review to Identify Privileged Documents in E-Discovery

    Get PDF
    In twenty-first century civil litigation, discovery focuses on the retrieval of electronically stored information. Lawsuits may be won or lost because of incorrect production of electronic evidence. Organizations may generate fewer paper documents, leading to an increase in the amount of electronic documents by many fold. Litigants face the task of searching millions of electronic records for the presence of responsive and not-privileged documents, making the e-discovery process burdensome and expensive. In order to ensure that the material that has to be withheld is not inadvertently revealed, the electronic evidence that is found to be responsive to a production request is typically subjected to an exhaustive manual review for privilege. Although the budgetary constraints on review for responsiveness can be met using automation to some degree, attorneys have been hesitant to adopt similar technology to support the privilege review process. This dissertation draws attention to the potential for adopting predictive coding technology for the privilege review phase during the discovery process. Two main questions that are central to building a privilege classifier are addressed. The first question seeks to determine which set of annotations can serve as a reliable basis for evaluation. The second question seeks to determine which of the remaining annotations, when used for training classifiers, produce the best results. As an answer, binary classifiers are trained on labeled annotations from both junior and senior reviewers. Issues related to training bias and sample variance due to the reviewer's expertise are thoroughly discussed. Results show that the annotations that were randomly drawn and annotated by senior reviewers are useful for evaluation. The remaining annotations can be used for classifier training. A research prototype is built to perform a user study. Privilege judgments are gathered from multiple lawyers using two user interfaces. One of the two interfaces includes automatically generated features to aid the review process. The goal is to help lawyers make faster and more accurate privilege judgments. A significant improvement in recall was noted when comparing the users' review performance when using the automated annotations. Classifier features related to the people involved in privileged communications were found to be particularly important for the privilege review task. Results show that there was no measurable change in review time. As cost is proportional to time during review, as the final step, this work introduces a semi-automated framework that aims to optimize the cost of the manual review process. The framework calls for litigants to make some rational choices about what to manually review. The documents are first automatically classified for responsiveness and privilege, and then some of the automatically classified documents are reviewed by human reviewers for responsiveness and for privilege with the overall goal of minimizing the expected cost of the entire process, including costs that arise from incorrect decisions. A risk-based ranking algorithm is used to determine which documents need to be manually reviewed. Multiple baselines are used to characterize the cost savings achieved by this approach. Although the work in this dissertation is applied to e-discovery, similar approaches could be applied to any case in which retrieval systems have to withhold a set of confidential documents despite their relevance to the request

    Penetration Testing Frameworks and methodologies: A comparison and evaluation

    Get PDF
    Cyber security is fast becoming a strategic priority across both governments and private organisations. With technology abundantly available, and the unbridled growth in the size and complexity of information systems, cyber criminals have a multitude of targets. Therefore, cyber security assessments are becoming common practice as concerns about information security grow. Penetration testing is one strategy used to mitigate the risk of cyber-attack. Penetration testers attempt to compromise systems using the same tools and techniques as malicious attackers thus, aim to identify vulnerabilities before an attack occurs. Penetration testing can be complex depending on the scope and domain area under investigation, for this reason it is often managed similarly to that of a project necessitating the implementation of some framework or methodology. Fortunately, there are an array of penetration testing methodologies and frameworks available to facilitate such projects, however, determining what is a framework and what is methodology within this context can lend itself to uncertainty. Furthermore, little exists in relation to mature frameworks whereby quality can be measured. This research defines the concept of “methodology” and “framework” within a penetration testing context. In addition, the research presents a gap analysis of the theoretical vs. the practical classification of nine penetration testing frameworks and/or methodologies and subsequently selects two frameworks to undergo quality evaluation using a realworld case study. Quality characteristics were derived from a review of four quality models, thus building the foundation for a proposed penetration testing quality model. The penetration testing quality model is a modified version of an ISO quality model whereby the two chosen frameworks underwent quality evaluation. Defining methodologies and frameworks for the purposes of penetration testing was achieved. A suitable definition was formed by way of analysing properties of each category respectively, thus a Framework vs. Methodology Characteristics matrix is presented. Extending upon the nomenclature resolution, a gap analysis was performed to determine if a framework is actually a framework, i.e., it has a sound underlying ontology. In contrast, many “frameworks” appear to be simply collections of tools or techniques. In addition, two frameworks OWASP’s Testing Guide and Information System Security Assessment Framework (ISSAF), were employed to perform penetration tests based on a real-world case study to facilitate quality evaluation based on a proposed quality model. The research suggests there are various ways in which quality for penetration testing frameworks can be measured; therefore concluded that quality evaluation is possible

    Technical alignment

    Get PDF
    This essay discusses the importance of the areas of infrastructure and testing to help digital preservation services demonstrate reliability, transparency, and accountability. It encourages practitioners to build a strong culture in which transparency and collaborations between technical frameworks are valued highly. It also argues for devising and applying agreed-upon metrics that will enable the systematic analysis of preservation infrastructure. The essay begins by defining technical infrastructure and testing in the digital preservation context, provides case studies that exemplify both progress and challenges for technical alignment in both areas, and concludes with suggestions for achieving greater degrees of technical alignment going forward

    A Review of Rule Learning Based Intrusion Detection Systems and Their Prospects in Smart Grids

    Get PDF

    Behind the Scenes: On the Relationship Between Developer Experience and Refactoring

    Get PDF
    Refactoring is widely recognized as one of the efficient techniques to manage technical debt and maintain a healthy software project through enforcing best design practices, or coping with design defects. Previous refactoring surveys have shown that code refactoring activities are mainly executed by developers who have sufficient knowledge of the system’s design, and disposing of leadership roles in their development teams. However, these surveys were mainly limited to specific projects and companies. In this paper, we explore the generalizability of the previous results by analyzing 800 open-source projects. We mine their refactoring activities, and we identify their corresponding contributors. Then, we associate an experience score to each contributor in order to test various hypotheses related to whether developers with higher scores tend to 1) perform a higher number of refactoring operations 2) exhibit different motivations behind their refactoring, and 3) better document their refactoring activity. We found that (1) although refactoring is not restricted to a subset of developers, those with higher contribution score tend to perform more refactorings than others; (2) while there is no correlation between experience and motivation behind refactoring, top contributed developers are found to perform a wider variety of refactoring operations, regardless of their complexity; and (3) top contributed developer tend to document less their refactoring activity. Our qualitative analysis of three randomly sampled projects show that the developers who are responsible for the majority of refactoring activities are typically in advanced positions in their development teams, demonstrating their extensive knowledge of the design of the systems they contribute to

    Framework for Security Transparency in Cloud Computing

    Get PDF
    The migration of sensitive data and applications from the on-premise data centre to a cloud environment increases cyber risks to users, mainly because the cloud environment is managed and maintained by a third-party. In particular, the partial surrender of sensitive data and application to a cloud environment creates numerous concerns that are related to a lack of security transparency. Security transparency involves the disclosure of information by cloud service providers about the security measures being put in place to protect assets and meet the expectations of customers. It establishes trust in service relationship between cloud service providers and customers, and without evidence of continuous transparency, trust and confidence are affected and are likely to hinder extensive usage of cloud services. Also, insufficient security transparency is considered as an added level of risk and increases the difficulty of demonstrating conformance to customer requirements and ensuring that the cloud service providers adequately implement security obligations. The research community have acknowledged the pressing need to address security transparency concerns, and although technical aspects for ensuring security and privacy have been researched widely, the focus on security transparency is still scarce. The relatively few literature mostly approach the issue of security transparency from cloud providers’ perspective, while other works have contributed feasible techniques for comparison and selection of cloud service providers using metrics such as transparency and trustworthiness. However, there is still a shortage of research that focuses on improving security transparency from cloud users’ point of view. In particular, there is still a gap in the literature that (i) dissects security transparency from the lens of conceptual knowledge up to implementation from organizational and technical perspectives and; (ii) support continuous transparency by enabling the vetting and probing of cloud service providers’ conformity to specific customer requirements. The significant growth in moving business to the cloud – due to its scalability and perceived effectiveness – underlines the dire need for research in this area. This thesis presents a framework that comprises the core conceptual elements that constitute security transparency in cloud computing. It contributes to the knowledge domain of security transparency in cloud computing by proposing the following. Firstly, the research analyses the basics of cloud security transparency by exploring the notion and foundational concepts that constitute security transparency. Secondly, it proposes a framework which integrates various concepts from requirement engineering domain and an accompanying process that could be followed to implement the framework. The framework and its process provide an essential set of conceptual ideas, activities and steps that can be followed at an organizational level to attain security transparency, which are based on the principles of industry standards and best practices. Thirdly, for ensuring continuous transparency, the thesis proposes an essential tool that supports the collection and assessment of evidence from cloud providers, including the establishment of remedial actions for redressing deficiencies in cloud provider practices. The tool serves as a supplementary component of the proposed framework that enables continuous inspection of how predefined customer requirements are being satisfied. The thesis also validates the proposed security transparency framework and tool in terms of validity, applicability, adaptability, and acceptability using two different case studies. Feedbacks are collected from stakeholders and analysed using essential criteria such as ease of use, relevance, usability, etc. The result of the analysis illustrates the validity and acceptability of both the framework and tool in enhancing security transparency in a real-world environment

    Constructing Educational Criticism Of Online Courses: A Model For Implementation By Practitioners

    Get PDF
    Online courses are complex, human-driven contexts for formal learning. Little has been said about the environment emerging from the interaction of instructor(s), learners, and other resources in such courses. Theories that focus on instructional settings and methods that are designed to accommodate inquiry into complex phenomena are essential to the systematic study of online courses. Such a line of research is necessary as the basis for a common language with which we can begin to speak holistically about online courses. In this dissertation, I attempt to generate better questions about the nature of online instructional environments. By combining prior works related to educational criticism and qualitative research case study with original innovations, I develop a model for studying the instructional experiences of online courses. I then apply this approach in the study of one specific online course at the University of Central Florida (UCF)

    Machine learning and computational methods to identify molecular and clinical markers for complex diseases – case studies in cancer and obesity

    Get PDF
    In biomedical research, applied machine learning and bioinformatics are the essential disciplines heavily involved in translating data-driven findings into medical practice. This task is especially accomplished by developing computational tools and algorithms assisting in detection and clarification of underlying causes of the diseases. The continuous advancements in high-throughput technologies coupled with the recently promoted data sharing policies have contributed to presence of a massive wealth of data with remarkable potential to improve human health care. In concordance with this massive boost in data production, innovative data analysis tools and methods are required to meet the growing demand. The data analyzed by bioinformaticians and computational biology experts can be broadly divided into molecular and conventional clinical data categories. The aim of this thesis was to develop novel statistical and machine learning tools and to incorporate the existing state-of-the-art methods to analyze bio-clinical data with medical applications. The findings of the studies demonstrate the impact of computational approaches in clinical decision making by improving patients risk stratification and prediction of disease outcomes. This thesis is comprised of five studies explaining method development for 1) genomic data, 2) conventional clinical data and 3) integration of genomic and clinical data. With genomic data, the main focus is detection of differentially expressed genes as the most common task in transcriptome profiling projects. In addition to reviewing available differential expression tools, a data-adaptive statistical method called Reproducibility Optimized Test Statistic (ROTS) is proposed for detecting differential expression in RNA-sequencing studies. In order to prove the efficacy of ROTS in real biomedical applications, the method is used to identify prognostic markers in clear cell renal cell carcinoma (ccRCC). In addition to previously known markers, novel genes with potential prognostic and therapeutic role in ccRCC are detected. For conventional clinical data, ensemble based predictive models are developed to provide clinical decision support in treatment of patients with metastatic castration resistant prostate cancer (mCRPC). The proposed predictive models cover treatment and survival stratification tasks for both trial-based and realworld patient cohorts. Finally, genomic and conventional clinical data are integrated to demonstrate the importance of inclusion of genomic data in predictive ability of clinical models. Again, utilizing ensemble-based learners, a novel model is proposed to predict adulthood obesity using both genetic and social-environmental factors. Overall, the ultimate objective of this work is to demonstrate the importance of clinical bioinformatics and machine learning for bio-clinical marker discovery in complex disease with high heterogeneity. In case of cancer, the interpretability of clinical models strongly depends on predictive markers with high reproducibility supported by validation data. The discovery of these markers would increase chance of early detection and improve prognosis assessment and treatment choice

    Linked Research on the Decentralised Web

    Get PDF
    This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are disempowered by the centralisation of certain systems, such as academic publishing platforms and social media. I share my findings on the feasibility of a decentralised and interoperable information space where researchers can control their identifiers whilst fulfilling the core functions of scientific communication: registration, awareness, certification, and archiving. The contemporary research communication paradigm operates under a diverse set of sociotechnical constraints, which influence how units of research information and personal data are created and exchanged. Economic forces and non-interoperable system designs mean that researcher identifiers and research contributions are largely shaped and controlled by third-party entities; participation requires the use of proprietary systems. From a technical standpoint, this thesis takes a deep look at semantic structure of research artifacts, and how they can be stored, linked and shared in a way that is controlled by individual researchers, or delegated to trusted parties. Further, I find that the ecosystem was lacking a technical Web standard able to fulfill the awareness function of research communication. Thus, I contribute a new communication protocol, Linked Data Notifications (published as a W3C Recommendation) which enables decentralised notifications on the Web, and provide implementations pertinent to the academic publishing use case. So far we have seen decentralised notifications applied in research dissemination or collaboration scenarios, as well as for archival activities and scientific experiments. Another core contribution of this work is a Web standards-based implementation of a clientside tool, dokieli, for decentralised article publishing, annotations and social interactions. dokieli can be used to fulfill the scholarly functions of registration, awareness, certification, and archiving, all in a decentralised manner, returning control of research contributions and discourse to individual researchers. The overarching conclusion of the thesis is that Web technologies can be used to create a fully functioning ecosystem for research communication. Using the framework of Web architecture, and loosely coupling the four functions, an accessible and inclusive ecosystem can be realised whereby users are able to use and switch between interoperable applications without interfering with existing data. Technical solutions alone do not suffice of course, so this thesis also takes into account the need for a change in the traditional mode of thinking amongst scholars, and presents the Linked Research initiative as an ongoing effort toward researcher autonomy in a social system, and universal access to human- and machine-readable information. Outcomes of this outreach work so far include an increase in the number of individuals self-hosting their research artifacts, workshops publishing accessible proceedings on the Web, in-the-wild experiments with open and public peer-review, and semantic graphs of contributions to conference proceedings and journals (the Linked Open Research Cloud). Some of the future challenges include: addressing the social implications of decentralised Web publishing, as well as the design of ethically grounded interoperable mechanisms; cultivating privacy aware information spaces; personal or community-controlled on-demand archiving services; and further design of decentralised applications that are aware of the core functions of scientific communication
    • 

    corecore