1,689 research outputs found

    Towards information profiling: data lake content metadata management

    Get PDF
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

    Frame-Based Editing: Easing the Transition from Blocks to Text-Based Programming

    Get PDF
    Block-based programming systems, such as Scratch or Alice, are the most popular environments for introducing young children to programming. However, mastery of text-based programming continues to be the educational goal for stu- dents who continue to program into their teenage years and beyond. Transitioning across the significant gap between the two editing styles presents a difficult challenge in school- level teaching of programming. We propose a new style of program manipulation to bridge the gap: frame-based edit- ing. Frame-based editing has the resistance to errors and approachability of block-based programming while retaining the flexibility and more conventional programming seman- tics of text-based programming languages. In this paper, we analyse the issues involved in the transition from blocks to text and argue that they can be overcome by using frame- based editing as an intermediate step. A design and imple- mentation of a frame-based editor is provided

    FixMiner: Mining Relevant Fix Patterns for Automated Program Repair

    Get PDF
    Patching is a common activity in software development. It is generally performed on a source code base to address bugs or add new functionalities. In this context, given the recurrence of bugs across projects, the associated similar patches can be leveraged to extract generic fix actions. While the literature includes various approaches leveraging similarity among patches to guide program repair, these approaches often do not yield fix patterns that are tractable and reusable as actionable input to APR systems. In this paper, we propose a systematic and automated approach to mining relevant and actionable fix patterns based on an iterative clustering strategy applied to atomic changes within patches. The goal of FixMiner is thus to infer separate and reusable fix patterns that can be leveraged in other patch generation systems. Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree structure of the edit scripts that captures the AST-level context of the code changes. FixMiner uses different tree representations of Rich Edit Scripts for each round of clustering to identify similar changes. These are abstract syntax trees, edit actions trees, and code context trees. We have evaluated FixMiner on thousands of software patches collected from open source projects. Preliminary results show that we are able to mine accurate patterns, efficiently exploiting change information in Rich Edit Scripts. We further integrated the mined patterns to an automated program repair prototype, PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J benchmark. Beyond this quantitative performance, we show that the mined fix patterns are sufficiently relevant to produce patches with a high probability of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure

    Partial Evaluation of String Obfuscations for Java Malware Detection

    Get PDF
    The fact that Java is platform independent gives hackers the opportunity to write exploits that can target users on any platform, which has a JVM implementation. Metasploit is a well-known source of Javaexploits and to circumvent detection by Anti Virus (AV) software, obfuscation techniques are routinely applied to make an exploit more difficult to recognise. Popular obfuscation techniques for Java include stringobfuscation and applying reflection to hide method calls; two techniques that can either be used together or independently. This paper shows how to apply partial evaluation to remove these obfuscations and thereby improve AV matching. The paper presents a partial evaluator for Jimple, which is an intermediate language for JVM bytecode designed for optimisation and program analysis, and demonstrates how partially evaluated Jimple code, when transformed back into Java, improves the detection rates of a number of commercial AV products

    Security considerations around the usage of client-side storage APIs

    Get PDF
    Web Storage, Indexed Database API and Web SQL Database are primitives that allow web browsers to store information in the client in a much more advanced way compared to other techniques such as HTTP Cookies. They were originally introduced with the goal of enhancing the capabilities of websites, however, they are often exploited as a way of tracking users across multiple sessions and websites. This work is divided in two parts. First, it quantifies the usage of these three primitives in the context of user tracking. This is done by performing a large-scale analysis on the usage of these techniques in the wild. The results highlight that code snippets belonging to those primitives can be found in tracking scripts at a surprising high rate, suggesting that user tracking is a major use case of these technologies. The second part reviews of the effectiveness of the removal of client-side storage data in modern browsers. A web application, built for specifically for this study, is used to highlight that it is often extremely hard, if not impossible, for users to remove personal data stored using the three primitives considered. This finding has significant implications, because those techniques are often uses as vector for cookie resurrection

    Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences

    No full text
    Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information

    Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract Background The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards. Results This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveys progress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry. Conclusions We show that the Blue Obelisk has been very successful in bringing together researchers and developers with common interests in ODOSOS, leading to development of many useful resources freely available to the chemistry community.Peer Reviewe

    Automatic marking of Shell programs for students coursework assessment

    Get PDF
    The number of students in any programming language course is usually large; more than 100 students is not uncommon in some universities. The member of staff teaching such a course has to mark, perhaps weekly, a very large number of program assignments. Manual marking and assessing is therefore an arduous task. The aim of this work is to describe a computer system for automatic marking and assessment of students' programs written in Unix Bourne Shell. In this study, a student's program will be assessed by testing its dynamic correctness and its maintainability. For dynamic correctness to be checked the program will be run against sets of input data supplied by the teacher, whereas for maintainability the student's program will be tested statically. The program text will be analysed, and its typographic style and its complexity measured. The typographic assessment in this system is adaptable to reflect the change of emphasis as a course progresses. This study presents the results generated from the assessment of a typical class of students in a Shell programming course. The experience with the development of the typographic assessment system has been generally positive. The results have shown that it is feasible to automate the assessment of this quality factor, as well as dynamic testing. Realistic grading can be achieved and useful information feedback can be obtained. The system is useful to both the students learning programming in Shell, (Arthur, L. J. and Burns, T., 1996) and the staff who are teaching the course. Although the work here is focused on the Bourne Shell, (Bourne, S. R., 1987) the study is still valid, with little or no change, to all other shells. The method used can also be applied, with some modification, to other programming languages. Furthermore this method is not limited to university and teaching, it can also be used in other fields for the purposes of software quality assessment
    corecore