8,761 research outputs found
A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges
Measuring and evaluating source code similarity is a fundamental software
engineering activity that embraces a broad range of applications, including but
not limited to code recommendation, duplicate code, plagiarism, malware, and
smell detection. This paper proposes a systematic literature review and
meta-analysis on code similarity measurement and evaluation techniques to shed
light on the existing approaches and their characteristics in different
applications. We initially found over 10000 articles by querying four digital
libraries and ended up with 136 primary studies in the field. The studies were
classified according to their methodology, programming languages, datasets,
tools, and applications. A deep investigation reveals 80 software tools,
working with eight different techniques on five application domains. Nearly 49%
of the tools work on Java programs and 37% support C and C++, while there is no
support for many programming languages. A noteworthy point was the existence of
12 datasets related to source code similarity measurement and duplicate codes,
of which only eight datasets were publicly accessible. The lack of reliable
datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm
languages are the main challenges in the field. Emerging applications of code
similarity measurement concentrate on the development phase in addition to the
maintenance.Comment: 49 pages, 10 figures, 6 table
Towards A Practical High-Assurance Systems Programming Language
Writing correct and performant low-level systems code is a notoriously demanding job, even for experienced developers. To make the matter worse, formally reasoning about their correctness properties introduces yet another level of complexity to the task. It requires considerable expertise in both systems programming and formal verification. The development can be extremely costly due to the sheer complexity of the systems and the nuances in them, if not assisted with appropriate tools that provide abstraction and automation.
Cogent is designed to alleviate the burden on developers when writing and verifying systems code. It is a high-level functional language with a certifying compiler, which automatically proves the correctness of the compiled code and also provides a purely functional abstraction of the low-level program to the developer. Equational reasoning techniques can then be used to prove functional correctness properties of the program on top of this abstract semantics, which is notably less laborious than directly verifying the C code.
To make Cogent a more approachable and effective tool for developing real-world systems, we further strengthen the framework by extending the core language and its ecosystem. Specifically, we enrich the language to allow users to control the memory representation of algebraic data types, while retaining the automatic proof with a data layout refinement calculus. We repurpose existing tools in a novel way and develop an intuitive foreign function interface, which provides users a seamless experience when using Cogent in conjunction with native C. We augment the Cogent ecosystem with a property-based testing framework, which helps developers better understand the impact formal verification has on their programs and enables a progressive approach to producing high-assurance systems. Finally we explore refinement type systems, which we plan to incorporate into Cogent for more expressiveness and better integration of systems programmers with the verification process
Seamless Multimodal Biometrics for Continuous Personalised Wellbeing Monitoring
Artificially intelligent perception is increasingly present in the lives of
every one of us. Vehicles are no exception, (...) In the near future, pattern
recognition will have an even stronger role in vehicles, as self-driving cars
will require automated ways to understand what is happening around (and within)
them and act accordingly. (...) This doctoral work focused on advancing
in-vehicle sensing through the research of novel computer vision and pattern
recognition methodologies for both biometrics and wellbeing monitoring. The
main focus has been on electrocardiogram (ECG) biometrics, a trait well-known
for its potential for seamless driver monitoring. Major efforts were devoted to
achieving improved performance in identification and identity verification in
off-the-person scenarios, well-known for increased noise and variability. Here,
end-to-end deep learning ECG biometric solutions were proposed and important
topics were addressed such as cross-database and long-term performance,
waveform relevance through explainability, and interlead conversion. Face
biometrics, a natural complement to the ECG in seamless unconstrained
scenarios, was also studied in this work. The open challenges of masked face
recognition and interpretability in biometrics were tackled in an effort to
evolve towards algorithms that are more transparent, trustworthy, and robust to
significant occlusions. Within the topic of wellbeing monitoring, improved
solutions to multimodal emotion recognition in groups of people and
activity/violence recognition in in-vehicle scenarios were proposed. At last,
we also proposed a novel way to learn template security within end-to-end
models, dismissing additional separate encryption processes, and a
self-supervised learning approach tailored to sequential data, in order to
ensure data security and optimal performance. (...)Comment: Doctoral thesis presented and approved on the 21st of December 2022
to the University of Port
Reframing museum epistemology for the information age: a discursive design approach to revealing complexity
This practice-based research inquiry examines the impact of an epistemic shift, brought about by the dawning of the information age and advances in networked communication technologies, on physical knowledge institutions - focusing on museums. The research charts adapting knowledge schemas used in museum knowledge organisation and discusses the potential for a new knowledge schema, the network, to establish a new epistemology for museums that reflects contemporary hyperlinked and networked knowledge. The research investigates the potential for networked and shared virtual reality spaces to reveal new ‘knowledge monuments’ reflecting the epistemic values of the network society and the space of flows.
The central practice for this thesis focuses on two main elements. The first is applying networks and visual complexity to reveal multi-linearity and adapting perspectives in relational knowledge networks. This concept was explored through two discursive design projects, the Museum Collection Engine, which uses data visualisation, cloud data, and image recognition within an immersive projection dome to create a dynamic and searchable museum collection that returns new and interlinking constellations of museum objects and knowledge. The second discursive design project was Shared Pasts: Decoding Complexity, an AR app with a unique ‘anti-personalisation’ recommendation system designed to reveal complex narratives around historic objects and places. The second element is folksonomy and co-design in developing new community-focused archives using the community's language to build the dataset and socially tagged metadata. This was tested by developing two discursive prototypes, Women Reclaiming AI and Sanctuary Stories
Reshaping Higher Education for a Post-COVID-19 World: Lessons Learned and Moving Forward
No abstract available
Transparent Forecasting Strategies in Database Management Systems
Whereas traditional data warehouse systems assume that data is complete or has been carefully preprocessed, increasingly more data is imprecise, incomplete, and inconsistent. This is especially true in the context of big data, where massive amount of data arrives continuously in real-time from vast data sources. Nevertheless, modern data analysis involves sophisticated statistical algorithm that go well beyond traditional BI and, additionally, is increasingly performed by non-expert users. Both trends require transparent data mining techniques that efficiently handle missing data and present a complete view of the database to the user. Time series forecasting estimates future, not yet available, data of a time series and represents one way of dealing with missing data. Moreover, it enables queries that retrieve a view of the database at any point in time - past, present, and future. This article presents an overview of forecasting techniques in database management systems. After discussing possible application areas for time series forecasting, we give a short mathematical background of the main forecasting concepts. We then outline various general strategies of integrating time series forecasting inside a database and discuss some individual techniques from the database community. We conclude this article by introducing a novel forecasting-enabled database management architecture that natively and transparently integrates forecast models
Reasoning about quantities and concepts: studies in social learning
We live and learn in a ‘society of mind’. This means that we form beliefs not
just based on our own observations and prior expectations but also based on the
communications from other people, such as our social network peers. Across seven
experiments, I study how people combine their own private observations with other
people’s communications to form and update beliefs about the environment. I will
follow the tradition of rational analysis and benchmark human learning against optimal Bayesian inference at Marr’s computational level. To accommodate human
resource constraints and cognitive biases, I will further contrast human learning
with a variety of process level accounts. In Chapters 2–4, I examine how people
reason about simple environmental quantities. I will focus on the effect of dependent information sources on the success of group and individual learning across a
series of single-player and multi-player judgement tasks. Overall, the results from
Chapters 2–4 highlight the nuances of real social network dynamics and provide
insights into the conditions under which we can expect collective success versus
failures such as the formation of inaccurate worldviews. In Chapter 5, I develop a
more complex social learning task which goes beyond estimation of environmental
quantities and focuses on inductive inference with symbolic concepts. Here, I investigate how people search compositional theory spaces to form and adapt their
beliefs, and how symbolic belief adaptation interfaces with individual and social
learning in a challenging active learning task. Results from Chapter 5 suggest that
people might explore compositional theory spaces using local incremental search;
and that it is difficult for people to use another person’s learning data to improve
upon their hypothesis
International Academic Symposium of Social Science 2022
This conference proceedings gathers work and research presented at the International Academic Symposium of Social Science 2022 (IASSC2022) held on July 3, 2022, in Kota Bharu, Kelantan, Malaysia. The conference was jointly organized by the Faculty of Information Management of Universiti Teknologi MARA Kelantan Branch, Malaysia; University of Malaya, Malaysia; Universitas Pembangunan Nasional Veteran Jakarta, Indonesia; Universitas Ngudi Waluyo, Indonesia; Camarines Sur Polytechnic Colleges, Philippines; and UCSI University, Malaysia. Featuring experienced keynote speakers from Malaysia, Australia, and England, this proceeding provides an opportunity for researchers, postgraduate students, and industry practitioners to gain knowledge and understanding of advanced topics concerning digital transformations in the perspective of the social sciences and information systems, focusing on issues, challenges, impacts, and theoretical foundations. This conference proceedings will assist in shaping the future of the academy and industry by compiling state-of-the-art works and future trends in the digital transformation of the social sciences and the field of information systems. It is also considered an interactive platform that enables academicians, practitioners and students from various institutions and industries to collaborate
Diversification and fairness in top-k ranking algorithms
Given a user query, the typical user interfaces, such as search engines and recommender systems, only allow a small number of results to be returned to the user. Hence, figuring out what would be the top-k results is an important task in information retrieval, as it helps to ensure that the most relevant results are presented to the user. There exists an extensive body of research that studies how to score the records and return top-k to the user. Moreover, there exists an extensive set of criteria that researchers identify to present the user with top-k results, and result diversification is one of them. Diversifying the top-k result ensures that the returned result set is relevant as well as representative of the entire set of answers to the user query, and it is highly relevant in the context of search, recommendation, and data exploration. The goal of this dissertation is two-fold: the first goal is to focus on adapting existing popular diversification algorithms and studying how to expedite them without losing the accuracy of the answers. This work studies the scalability challenges of expediting the running time of existing diversification algorithms by designing a generic framework that produces the same results as the original algorithms, yet it is significantly faster in running time. This proposed approach handles scenarios where data change over a period of time and studies how to adapt the framework to accommodate data changes. The second aspect of the work studies how the existing top-k algorithms could lead to inequitable exposure of records that are equivalent qualitatively. This scenario is highly important for long-tail data where there exists a long tail of records that have similar utility, but the existing top-k algorithm only shows one of the top-ks, and the rest are never returned to the user. Both of these problems are studied analytically, and their hardness is studied. The contributions of this dissertation lie in (a) formalizing principal problems and studying them analytically. (b) designing scalable algorithms with theoretical guarantees, and (c) evaluating the efficacy and scalability of the designed solutions by comparing them with the state-of-the-art solutions over large-scale datasets
Recommended from our members
Proceedings of the 33rd Annual Workshop of the Psychology of Programming Interest Group
This is the Proceedings of the 33rd Annual Workshop of the Psychology of Programming Interest Group (PPIG). This was the first PPIG to be held physically since 2019, following the two online-only PPIGs in 2020 and 2021, both during the Covid pandemic. It was also the first PPIG conference to be designed specifically for hybrid attendance. Reflecting the theme, it was hosted by Music Computing Lab at the Open University in Milton Keynes
- …