6,710 research outputs found

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Graph Neural Networks For Mapping Variables Between Programs -- Extended Version

    Full text link
    Automated program analysis is a pivotal research domain in many areas of Computer Science -- Formal Methods and Artificial Intelligence, in particular. Due to the undecidability of the problem of program equivalence, comparing two programs is highly challenging. Typically, in order to compare two programs, a relation between both programs' sets of variables is required. Thus, mapping variables between two programs is useful for a panoply of tasks such as program equivalence, program analysis, program repair, and clone detection. In this work, we propose using graph neural networks (GNNs) to map the set of variables between two programs based on both programs' abstract syntax trees (ASTs). To demonstrate the strength of variable mappings, we present three use-cases of these mappings on the task of program repair to fix well-studied and recurrent bugs among novice programmers in introductory programming assignments (IPAs). Experimental results on a dataset of 4166 pairs of incorrect/correct programs show that our approach correctly maps 83% of the evaluation dataset. Moreover, our experiments show that the current state-of-the-art on program repair, greatly dependent on the programs' structure, can only repair about 72% of the incorrect programs. In contrast, our approach, which is solely based on variable mappings, can repair around 88.5%.Comment: Extended version of "Graph Neural Networks For Mapping Variables Between Programs", paper accepted at ECAI 2023. Github: https://github.com/pmorvalho/ecai23-GNNs-for-mapping-variables-between-programs. 11 pages, 5 figures, 4 tables and 3 listing

    Machine Learning Approaches for the Prioritisation of Cardiovascular Disease Genes Following Genome- wide Association Study

    Get PDF
    Genome-wide association studies (GWAS) have revealed thousands of genetic loci, establishing itself as a valuable method for unravelling the complex biology of many diseases. As GWAS has grown in size and improved in study design to detect effects, identifying real causal signals, disentangling from other highly correlated markers associated by linkage disequilibrium (LD) remains challenging. This has severely limited GWAS findings and brought the method’s value into question. Although thousands of disease susceptibility loci have been reported, causal variants and genes at these loci remain elusive. Post-GWAS analysis aims to dissect the heterogeneity of variant and gene signals. In recent years, machine learning (ML) models have been developed for post-GWAS prioritisation. ML models have ranged from using logistic regression to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models (i.e., neural networks). When combined with functional validation, these methods have shown important translational insights, providing a strong evidence-based approach to direct post-GWAS research. However, ML approaches are in their infancy across biological applications, and as they continue to evolve an evaluation of their robustness for GWAS prioritisation is needed. Here, I investigate the landscape of ML across: selected models, input features, bias risk, and output model performance, with a focus on building a prioritisation framework that is applied to blood pressure GWAS results and tested on re-application to blood lipid traits

    Boundary Spanner Corruption in Business Relationships

    Get PDF
    Boundary spanner corruption—voluntary collaborative behaviour between individuals representing different organisations that violates their organisations’ norms—is a serious problem in business relationships. Drawing on insights from the literatures on general corruption perspectives, the dark side of business relationships and deviance in sales and service organisations, this dissertation identifies boundary spanner corruption as a potential dark side complication inherent in close business relationships It builds research questions from these literature streams and proposes a research structure based upon commonly used methods in corruption research to address this new concept. In the first study, using an exploratory survey of boundary spanner practitioners, the dissertation finds that the nature of boundary spanner corruption is broad and encompasses severe and non-severe types. The survey also finds that these deviance types are prevalent in a widespread of geographies and industries. This prevalence is particularly noticeable for less-severe corruption types, which may be an under-researched phenomenon in general corruption research. The consequences of boundary spanner corruption can be serious for both individuals and organisations. Indeed, even less-severe types can generate long-term negative consequences. A second interview-based study found that multi-level trust factors could also motivate the emergence of boundary spanner corruption. This was integrated into a theoretical model that illustrates how trust at the interpersonal, intraorganisational, and interorganisational levels enables corrupt behaviours by allowing deviance-inducing factors stemming from the task environment or from the individual boundary spanner to manifest in boundary spanner corruption. Interpersonal trust between representatives of different organisations, interorganisational trust between these organisations, and intraorganisational agency trust of management in their representatives foster the development of a boundary-spanning social cocoon—a mechanism that can inculcate deviant norms leading to corrupt behaviour. This conceptualisation and model of boundary spanner corruption highlights intriguing directions for future research to support practitioners engaged in a difficult problem in business relationships

    Creating a Dataset for High-Performance Computing Code Translation: A Bridge Between HPC Fortran and C++

    Full text link
    In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is initially refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. We demonstrate how this dataset can significantly improve the translation capabilities of large-scale language models, with improvements of ×5.1\mathbf{\times 5.1} for models with no prior coding knowledge and ×9.9\mathbf{\times 9.9} for models with some coding familiarity. Our work highlights the potential of this dataset to advance the field of code translation for high-performance computing. The dataset is available at https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-datase

    Big Data Analytics and Auditing: A Review and Synthesis of Literature

    Get PDF
    The use of data analytics in auditing is increasingly growing. The application of common data analytics to audit engagements appears to be lagging behind other areas of practice, even though data analytics is thought to represent the future of audit, and there are still few publications that have examined this influence. This article reviews data analytics in audits and its potential for future audit engagements to describe the evolution of this research trend and picture its future growth directions. Future audit research potential and difficulties are also discussed. Data analytics application in auditing has enormous potential for refining audit quality, decreasing errors, increasing process transparency, and enhancing stakeholders’ confidence. We conducted a systematic literature review using the PRISMA approach. A total of 100 articles published in English from January 2011 to November 2021 were identified through a systematic search of reputed databases, including Web of Science and Scopus and many others. Our analysis reveals that data analytics is a promising domain for the auditing practice as it improves audit efficiency and promotes audit work digital transformation. While reviewing the most pertinent literature in the context of data analytics in auditing, this study offers insights on potential new directions and waning views on big data analytics in auditing. Doi: 10.28991/ESJ-2023-07-02-023 Full Text: PD

    Monotheism and the Suffering of Animals in Nature

    Get PDF
    This is the Submitted Manuscript Under Review. The final version is available from Cambridge University Press via the DOI in this recordThis Element concerns itself with a particular aspect of the problem posed to monotheistic religious thought by suffering, namely the suffering of non-human creatures in nature. It makes some comparisons between Judaism, Christianity, and Islam, and then explores the problem in depth within Christian thought. After clarification of the nature of the problem, the Element considers a range of possible responses, including those based on a fall-event, those based on freedom of process, and those hypothesising a constraint on the possibilities for God as creator. Proposals based on the motif of self-emptying are evaluated. Two other aspects of the question concern God's providential relationship to the evolving creation, and the possibility of resurrection lives for animals. After consideration of the possibility of combining different explanations, the Element ends its discussion by looking at two innovative proposals at the cutting-edge of the debate

    Study of the behavior of a thermoplastic injection mold and prediction of fatigue failure with numerical simulation

    Get PDF
    Tese de doutoramento em Engenharia MecânicaO objetivo deste trabalho é a criação de uma metodologia de análise da resistência à fadiga de moldes de injeção de termoplásticos. Uma metodologia capaz de satisfazer o mercado atual que exige a diminuição do tempo de entrega e custos de moldes de injeção, sem comprometer a sua fiabilidade. Para o desenvolvimento desta metodologia, foram utilizados modelos digitais. Com estes modelos é possível executar-se várias iterações sem os custos de um modelo físico. Além do menor custo dos modelos digitais, também é possível compreender o comportamento de cada molde no decorrer da fase de projeto. Com o aumento da complexidade dos componentes injetados, o estudo da resistência à fadiga tende a ser cada vez mais importante. Neste trabalho serão apresentados cuidados a ter na preparação dos modelos digitais, de forma a obter-se resultados fiáveis. No desenvolvimento desta metodologia, usaram-se dois softwares de simulação numérica para gerar os modelos digitais. Um deles dedica-se ao estudo reológico de peças termoplásticas e outro ao comportamento estrutural dos moldes de injeção. A execução de simulações numéricas requer uma boa caracterização dos materiais usados. No caso dos termoplásticos, os fabricantes têm uma grande base de dados com a informação necessária para as simulações numéricas. No entanto, para as simulações estruturais, os fabricantes tendem apenas a fornecer os dados das curvas monotónicas, os quais não fornecem qualquer informação sobre o comportamento à fadiga. Portanto, neste trabalho foram estudados modelos empíricos que se adaptam aos aços usados em moldes de injeção, a partir dos quais é possível gerar as curvas S-N e e-N. De modo a avaliar qual o modelo empírico que se adaptaria melhor a esta área, foram realizados ensaios experimentais com provetes feitos em EN 1.2311. A partir destes ensaios, escolheu-se o modelo empírico mais conservador. Com base no modelo empírico escolhido, foi desenvolvida uma aplicação capaz de gerar as curvas S-N e e-N, a partir das informações fornecidas pela aciaria. Além da caracterização dos materiais, também é importante que as condições de carregamento do modelo numérico estrutural sejam o mais aproximadas possível do que irá ocorrer no modelo físico. Como as cargas deste modelo numérico podem ser previstas a partir do modelo numérico reológico, a criação de uma ponte entre estes dois modelos numéricos é imprescindível. Logo, neste trabalho foi construída uma aplicação capaz de converter os dados gerados pelo software comercial Moldflow em ficheiros capazes de serem lidos por softwares comerciais de simulação numérica estrutural. Usando esta aplicação para a conversão dos dados, foram realizadas simulações e comparadas com os respetivos modelos físicos. Verificou-se que é possível replicar o comportamento do molde em modelos digitais. No entanto, os modelos digitais dos moldes de injeção estudados tenderam a apresentar resultados conservadores quando comparados com os modelos físicos. Por fim, foi desenvolvida uma aplicação capaz de usar dados calculados a partir de softwares comerciais de cálculo numérico estrutural para a determinação da resistência dos moldes à fadiga. Aqui foi tido em conta o modelo para geração das curvas de fadiga dos materiais validado. Os modelos de cálculo à fadiga na aplicação baseiam-se na regra de Palmgren – Miner para a determinação dos ciclos até à nucleação da fissura. O cálculo das tensões alternadas foi realizado a partir de dois métodos, o critério da tensão de corte octaédrica e o método de Sines. Para testar a aplicação foram escolhidos cinco moldes que apresentaram falhas por fadiga. Em seguida, foi aplicada a metodologia proposta neste trabalho para a determinação da resistência dos mesmos à fadiga. A partir da aplicação desta metodologia e das ferramentas desenvolvidas para o seu emprego, foi possível verificar que esta é capaz de prever as zonas onde ocorreram as falhas, bem como outras com probabilidade de nucleação de fissuras. Portanto, no decorrer deste trabalho foi possível criar uma metodologia e ferramentas de apoio para o cálculo de moldes à fadiga. Assim, projetistas de moldes podem ter uma boa perspetiva da resistência à fadiga de moldes de injeção ainda em projeto, tendo por base métodos científicos.The objective of this work is to create a methodology to analyze the fatigue resistance of thermoplastic injection molds. A methodology capable of satisfying the current market that demands a decrease in the delivery time and costs of injection molds, without compromising their reliability. To develop this methodology, digital models were used. With these models it is possible to execute several iterations without the costs of a physical model. Besides the lower cost of digital models, it is also possible to understand the behavior of each mold during the design phase. With the increasing complexity of injected components, the study of fatigue resistance tends to be more and more important. In this work, care will be presented in the preparation of the digital models, in order to obtain reliable results. In the development of this methodology, two numerical simulation software’s were used to generate the digital models. One of them is dedicated to the rheological study of thermoplastic parts and the other to the structural behavior of injection molds. The execution of numerical simulations requires a good characterization of the materials used. In the case of thermoplastics, manufacturers have a large database with the information needed for numerical simulations. However, for structural simulations, manufacturers tend to provide only monotonic curve data, which do not provide any information about fatigue behavior. Therefore, in this work, empirical models that fit the steels used in injection molds were studied, from which it is possible to generate the S-N and e-N curves. In order to evaluate which empirical model would best fit this area, experimental tests were performed with specimens made in EN 1.2311. From these tests, the most conservative empirical model was chosen. Based on the chosen empirical model, an application capable of generating the S-N and e-N curves from the information provided by the steel mill was developed. Besides the characterization of the materials, it is also important that the loading conditions of the numerical structural model are as close as possible to what will occur in the physical model. Since the loads of this numerical model can be predicted from the rheological numerical model, the creation of a bridge between these two numerical models is essential. Therefore, in this work was built an application capable of converting the data generated by the commercial software Moldflow into files capable of being read by commercial structural numerical simulation software. Using this application for data conversion, simulations were performed and compared with the respective physical models. It was found that it is possible to replicate the mold behavior in digital models. However, the digital models of the injection molds studied tended to present conservative results when compared to the physical models. Finally, an application capable of using data calculated from commercial numerical structural calculation software was developed for determining the fatigue resistance of molds. Here the validated model for generating the fatigue curves of the materials was taken into account. The fatigue calculation models in the application are based on the Palmgren - Miner rule for the determination of the cycles until crack nucleation. The alternating stresses calculation was performed from two methods, the octahedral shear stress criterion and the Sines method. To test the application, five molds that presented fatigue failures were chosen. Then, the methodology proposed in this work was applied to determine their fatigue resistance. From the application of this methodology and the tools developed for its use, it was possible to verify that it is able to predict the areas where the failures occurred, as well as others with a probability of crack nucleation. Therefore, during this work it was possible to create a methodology and support tools for the calculation of fatigue molds. Thus, mold designers can have a good perspective of the fatigue resistance of injection molds still in project, based on scientific methods

    The Impact of Artificial Intelligence on the Evolution of Digital Education: A Comparative Study of OpenAI Text Generation Tools including ChatGPT, Bing Chat, Bard, and Ernie

    Full text link
    In the digital era, the integration of artificial intelligence (AI) in education has ushered in transformative changes, redefining teaching methodologies, curriculum planning, and student engagement. This review paper delves deep into the rapidly evolving landscape of digital education by contrasting the capabilities and impact of OpenAI's pioneering text generation tools like Bing Chat, Bard, Ernie with a keen focus on the novel ChatGPT. Grounded in a typology that views education through the lenses of system, process, and result, the paper navigates the multifaceted applications of AI. From decentralizing global education and personalizing curriculums to digitally documenting competence-based outcomes, AI stands at the forefront of educational modernization. Highlighting ChatGPT's meteoric rise to one million users in just five days, the study underscores its role in democratizing education, fostering autodidacticism, and magnifying student engagement. However, with such transformative power comes the potential for misuse, as text-generation tools can inadvertently challenge academic integrity. By juxtaposing the promise and pitfalls of AI in education, this paper advocates for a harmonized synergy between AI tools and the educational community, emphasizing the urgent need for ethical guidelines, pedagogical adaptations, and strategic collaborations
    corecore