2,059 research outputs found
XML content warehousing: Improving sociological studies of mailing lists and web data
In this paper, we present the guidelines for an XML-based approach for the
sociological study of Web data such as the analysis of mailing lists or
databases available online. The use of an XML warehouse is a flexible solution
for storing and processing this kind of data. We propose an implemented
solution and show possible applications with our case study of profiles of
experts involved in W3C standard-setting activity. We illustrate the
sociological use of semi-structured databases by presenting our XML Schema for
mailing-list warehousing. An XML Schema allows many adjunctions or crossings of
data sources, without modifying existing data sets, while allowing possible
structural evolution. We also show that the existence of hidden data implies
increased complexity for traditional SQL users. XML content warehousing allows
altogether exhaustive warehousing and recursive queries through contents, with
far less dependence on the initial storage. We finally present the possibility
of exporting the data stored in the warehouse to commonly-used advanced
software devoted to sociological analysis
User-centred design of flexible hypermedia for a mobile guide: Reflections on the hyperaudio experience
A user-centred design approach involves end-users from the very beginning. Considering users at the early stages compels designers to think in terms of utility and usability and helps develop the system on what is actually needed. This paper discusses the case of HyperAudio, a context-sensitive adaptive and mobile guide to museums developed in the late 90s. User requirements were collected via a survey to understand visitors’ profiles and visit styles in Natural Science museums. The knowledge acquired supported the specification of system requirements, helping defining user model, data structure and adaptive behaviour of the system. User requirements guided the design decisions on what could be implemented by using simple adaptable triggers and what instead needed more sophisticated adaptive techniques, a fundamental choice when all the computation must be done on a PDA. Graphical and interactive environments for developing and testing complex adaptive systems are discussed as a further
step towards an iterative design that considers the user interaction a central point. The paper discusses
how such an environment allows designers and developers to experiment with different system’s behaviours and to widely test it under realistic conditions by simulation of the actual context evolving over time. The understanding gained in HyperAudio is then considered in the perspective of the
developments that followed that first experience: our findings seem still valid despite the passed time
Evaluating and Improving Risk Analysis Methods for Critical Systems
At the same time as our dependence on IT systems increases, the number of reports of problems caused by failures of critical IT systems has also increased. Today, almost every societal system or service, e.g., water supply, power supply, transportation, depends on IT systems, and failures of these systems have serious and negative effects on society. In general, public organizations are responsible for delivering these services to society. Risk analysis is an important activity for the development and operation of critical IT systems, but the increased complexity and size of critical systems put additional requirements on the effectiveness of risk analysis methods. Even if a number of methods for risk analysis of technical systems exist, the failure behavior of information systems is typically very different from mechanical systems. Therefore, risk analysis of IT systems requires different risk analysis techniques, or at least adaptations of traditional approaches. The research objective of this thesis is to improve the analysis process of risks pertaining to critical IT systems, which is addressed in the following three ways. First, by understanding current literature and practices related to risk analysis of IT systems, then by evaluating and comparing existing risk analysis methods, and by suggesting improvements in the risk analysis process and by developing new effective and efficient risk analysis methods to analyze IT systems. To understand current risk analysis methods and practices we carried out a systematic mapping study. The study found only few empirical research papers on the evaluation of existing risk analysis methods. The results of the study suggest to empirically investigate risk analysis methods for analyzing IT systems to conclude which methods are more effective than others. Then, we carried out a semi-structured interview study to investigate several factors regarding current practices and existing challenges of risk analysis and management, e.g., its importance, identification of critical resources, involvement of different stakeholders, used methods, and follow-up analysis. To evaluate and compare the effectiveness of risk analysis methods we carried out a controlled experiment. In that study, we evaluated the effectiveness of risk analysis methods by counting the number of relevant and non-relevant risks identified by the experiment participants. The difficulty level of risk analysis methods and the experiment participants’ confidence about the identified risks were also investigated. Then, we carried out a case study to evaluate the effectiveness and efficiency of existing risk analysis methods, Failure Mode and Effect Analysis (FMEA) and System Theoretic Process Analysis (STPA). The case study investigates the effectiveness of the methods by performing a comparison of how a hazard analysis is conducted for the same system. It also evaluates the analysis process of risk analysis methods by using a set of qualitative criteria, derived from the Technology Acceptance Model (TAM). After this, another case study was carried out to evaluate and assess the resilience of critical IT systems and networks by applying a simulation method. A hybrid modeling approach was used which considers the technical network, represented using graph theory, as well as the repair system, represented by a queuing model. To improve the risk analysis process, this thesis also presents a new risk analysis method, Perspective Based Risk Analysis (PBRA), that uses different perspectives while analyzing IT systems. A perspective is a point of view or a specific role adopted by risk analyst while doing risk analysis, i.e., system engineer, system tester, or system user. Based on the findings, we conclude that the use of different perspectives improves effectiveness of risk analysis process. Then, to improve the risk analysis process we carried out a data mining study to save historical information about IT incidents to be used later for risk analysis. It could be an important aid in the process of building a database of occurred IT incidents that later can be used as an input to improve the risk analysis process. Finally, based on the findings of the studies included in this thesis a list of suggestions is presented to improve the risk analysis process. This list of potential suggestions was evaluated in a focus group meeting. The suggestions are for example, risk analysis awareness and education, defining clear roles and responsibilities, easy-to-use and adapt risk analysis methods, dealing with subjectivity, carry out risk analysis as early as possible and finally using historical risk data to improve the risk analysis process. Based on the findings it can be concluded that these suggestions are important and useful for risk practitioners to improve the risk analysis process.The presented research work in this thesis provides research about methods to improve the risk analysis and management practices. Moreover, the presented work in this thesis is based on solid empirical studies
Knowledge-Based Techniques for Scholarly Data Access: Towards Automatic Curation
Accessing up-to-date and quality scientific literature is a critical preliminary step in any research activity.
Identifying relevant scholarly literature for the extents of a given task or application is, however a complex and time consuming activity.
Despite the large number of tools developed over the years to support scholars in their literature surveying activity, such as Google Scholar, Microsoft Academic search, and others, the best way to access quality papers remains asking a domain expert who is actively involved in the field and knows research trends and directions.
State of the art systems, in fact, either do not allow exploratory search activity, such as identifying the active research directions within a given topic, or do not offer proactive features, such as content recommendation, which are both critical to researchers.
To overcome these limitations, we strongly advocate a paradigm shift in the development of scholarly data access tools: moving from traditional information retrieval and filtering tools towards automated agents able to make sense of the textual content of published papers and therefore monitor the state of the art.
Building such a system is however a complex task that implies tackling non trivial problems in the fields of Natural Language Processing, Big Data Analysis, User Modelling, and Information Filtering.
In this work, we introduce the concept of Automatic Curator System and present its fundamental components.openDottorato di ricerca in InformaticaopenDe Nart, Dari
U-net and its variants for medical image segmentation: A review of theory and applications
U-net is an image segmentation technique developed primarily for image segmentation tasks. These traits provide U-net with a high utility within the medical imaging community and have resulted in extensive adoption of U-net as the primary tool for segmentation tasks in medical imaging. The success of U-net is evident in its widespread use in nearly all major image modalities, from CT scans and MRI to Xrays and microscopy. Furthermore, while U-net is largely a segmentation tool, there have been instances of the use of U-net in other applications. Given that U-net’s potential is still increasing, this narrative literature review examines the numerous developments and breakthroughs in the U-net architecture and provides observations on recent trends. We also discuss the many innovations that have advanced in deep learning and discuss how these tools facilitate U-net. In addition, we review the different image modalities and application areas that have been enhanced by U-net
Recommended from our members
Data cleaning techniques for software engineering data sets
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Data quality is an important issue which has been addressed and recognised in research communities such as data warehousing, data mining and information systems. It has been agreed that poor data quality will impact the quality of results of analyses and that it will therefore impact on decisions made on the basis of these results. Empirical software engineering has neglected the issue of data quality to some extent. This fact poses the question of how researchers in empirical software engineering can trust their results without addressing the quality of the analysed data. One widely accepted definition for data quality describes it as `fitness for purpose', and the issue of poor data quality can be addressed by either introducing preventative measures or by applying means to cope with data quality issues. The research presented in this thesis addresses the latter with the special focus on noise handling.
Three noise handling techniques, which utilise decision trees, are proposed for application to software engineering data sets. Each technique represents a noise handling approach: robust filtering, where training and test sets are the same; predictive filtering, where training and test sets are different; and filtering and polish, where noisy instances are corrected. The techniques were first evaluated in two different investigations by applying them to a large real world software engineering data set. In the first investigation the techniques' ability to improve predictive accuracy in differing noise levels was tested. All three techniques improved predictive accuracy in comparison to the do-nothing approach. The filtering and polish was the most successful technique in improving predictive accuracy. The second investigation utilising the large real world software engineering data set tested the techniques' ability to identify instances with implausible values. These instances were flagged for the purpose of evaluation before applying the three techniques. Robust filtering and predictive filtering decreased the number of instances with implausible values, but substantially decreased the size of the data set too. The filtering and polish technique actually increased the number of implausible values, but it did not reduce the size of the data set.
Since the data set contained historical software project data, it was not possible to know the real extent of noise detected. This led to the production of simulated software engineering data sets, which were modelled on the real data set used in the previous evaluations to ensure domain specific characteristics. These simulated versions of the data set were then injected with noise, such that the real extent of the noise was known. After the noise injection the three noise handling techniques were applied to allow evaluation. This procedure of simulating software engineering data sets combined the incorporation of domain specific characteristics of the real world with the control over the simulated data. This is seen as a special strength of this evaluation approach.
The results of the evaluation of the simulation showed that none of the techniques performed well. Robust filtering and filtering and polish performed very poorly, and based on the results of this evaluation they would not be recommended for the task of noise reduction. The predictive filtering technique was the best performing technique in this evaluation, but it did not perform significantly well either.
An exhaustive systematic literature review has been carried out investigating to what extent the empirical software engineering community has considered data quality. The findings showed that the issue of data quality has been largely neglected by the empirical software engineering community.
The work in this thesis highlights an important gap in empirical software engineering. It provided clarification and distinctions of the terms noise and outliers. Noise and outliers are overlapping, but they are fundamentally different. Since noise and outliers are often treated the same in noise handling techniques, a clarification of the two terms was necessary.
To investigate the capabilities of noise handling techniques a single investigation was deemed as insufficient. The reasons for this are that the distinction between noise and outliers is not trivial, and that the investigated noise cleaning techniques are derived from traditional noise handling techniques where noise and outliers are combined. Therefore three investigations were undertaken to assess the effectiveness of the three presented noise handling techniques. Each investigation should be seen as a part of a multi-pronged approach.
This thesis also highlights possible shortcomings of current automated noise handling techniques. The poor performance of the three techniques led to the conclusion that noise handling should be integrated into a data cleaning process where the input of domain knowledge and the replicability of the data cleaning process are ensured
Code deobfuscation by program synthesis-aided simplification of mixed boolean-arithmetic expressions
Treballs Finals de Grau de Matemàtiques, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2020, Director: Raúl Roca Cánovas, Antoni Benseny i Mario Reyes de los Mozos[en] This project studies the theoretical background of Mixed Boolean-Arithmetic (MBA) expressions as well as its practical applicability within the field of code obfuscation, which is a technique used both by malware threats and software protection in order to complicate the process of reverse engineering (parts of) a program.
An MBA expression is composed of integer arithmetic operators, e.g. and bitwise operators, e.g. MBA expressions can be leveraged to obfuscate the data-flow of code by iteratively applying rewrite rules and function identities that complicate (obfuscate) the initial expression while preserving its semantic behavior. This possibility is motivated by the fact that the combination of operators from these different fields do not interact well together: we have no rules (distributivity, factorization...) or general theory to deal with this mixing of operators.
Current deobfuscation techniques to address simplification of this type of data-flow obfuscation are limited by being strongly tied to syntactic complexity. We explore novel program synthesis approaches for addressing simplification of MBA expressions by reasoning on the semantics of the obfuscated expressions instead of syntax, discussing their applicability as well as their limits.
We present our own tool 2syntia that integrates Syntia, an open source program synthesis tool, into the reverse engineering framework radare 2 in order to retrieve the semantics of obfuscated code from its Input/Output behavior. Finally, we provide some improvement ideas and potential areas for future work to be done
Studying the lives of software bugs
For as long as people have made software, they have made mistakes in that software. Software bugs are widespread, and the maintenance required to fix them has a major impact on the cost of software and how developers' time is spent. Reducing this maintenance time would lower the cost of software and allow for developers to spend more time on new features, improving the software for end-users. Bugs are hugely diverse and have a complex life cycle. This makes them difficult to study, and research is often carried out on synthetic bugs or toy programs. However, a better understanding of the bug life cycle would greatly aid in developing tools to reduce the time spent on maintenance. This thesis will study the life cycle of bugs, and develop such an understanding. Overall, this thesis examines over 3000 real bugs, from real projects, concentrating on three of the most important points in the life cycle: origin, reporting and fix. Firstly, two existing techniques are compared for discovering the origin of a bug. A number of improvements are evaluated, and the most effective approach is found to be combining the techniques. Furthermore, the behaviour of developers is found to have a major impact on the accuracy of the techniques. Secondly, a large number of bugs are analysed to determine what information is provided when users report bugs. For most bugs, much important information is missing, or inaccurate. Most importantly, there appears to be a considerable gap between what users provide and what developers actually want. Finally, an evaluation is carried out on a number of novel alterations to techniques used to determine the location of bug fixes. Compared to existing techniques, these alterations successfully increase the number of bugs which can be usefully localised, aiding developers in removing the bugs.For as long as people have made software, they have made mistakes in that software. Software bugs are widespread, and the maintenance required to fix them has a major impact on the cost of software and how developers' time is spent. Reducing this maintenance time would lower the cost of software and allow for developers to spend more time on new features, improving the software for end-users. Bugs are hugely diverse and have a complex life cycle. This makes them difficult to study, and research is often carried out on synthetic bugs or toy programs. However, a better understanding of the bug life cycle would greatly aid in developing tools to reduce the time spent on maintenance. This thesis will study the life cycle of bugs, and develop such an understanding. Overall, this thesis examines over 3000 real bugs, from real projects, concentrating on three of the most important points in the life cycle: origin, reporting and fix. Firstly, two existing techniques are compared for discovering the origin of a bug. A number of improvements are evaluated, and the most effective approach is found to be combining the techniques. Furthermore, the behaviour of developers is found to have a major impact on the accuracy of the techniques. Secondly, a large number of bugs are analysed to determine what information is provided when users report bugs. For most bugs, much important information is missing, or inaccurate. Most importantly, there appears to be a considerable gap between what users provide and what developers actually want. Finally, an evaluation is carried out on a number of novel alterations to techniques used to determine the location of bug fixes. Compared to existing techniques, these alterations successfully increase the number of bugs which can be usefully localised, aiding developers in removing the bugs
- …