Search CORE

200 research outputs found

Supervised Record Linkage For Banking Reconciliations

Author: Simão Lopes Lúcio
Publication venue
Publication date: 19/02/2020
Field of study

Repositório Aberto da Universidade do Porto

Helicopter parenting through the lens of reddit: A text mining study

Author: Caltabiano Nerina
Chan Kai Qin
Keerthigha C.
Singh Smita
Publication venue: Elsevier
Publication date: 01/01/2023
Field of study

The study aimed to understand Reddit users’ experience with helicopter parenting through first- hand accounts. Text mining and natural language processing techniques were employed to extract data from the subreddit r/helicopterparents. A total of 713 original posts were processed from unstructured texts to tidy formats. Latent Dirichlet Allocation (LDA), a popular topic modeling method, was used to discover hidden themes within the corpus. The data revealed common environmental contexts of helicopter parenting (i.e., school, college, work, and home) and its implication on college decisions, privacy, and social relationships. These collectively suggested the importance of autonomy-supportive parenting and mindfulness interventions as viable solutions to the problems posed by helicopter parenting. In addition, findings lent support to past research that has identified more maternal than paternal models of helicopter parenting. Further research on the implications of the COVID-19 pandemic on helicopter parenting is warranted

ResearchOnline at James Cook University

Traceability for trustworthy AI: a review of models and tools

Author: García Barriocanal María Elena
Mora Cantallops Marçal
Sicilia Urbán Miguel Ángel
Sánchez Alonso Salvador
Publication venue: MDPI
Publication date: 04/05/2021
Field of study

Traceability is considered a key requirement for trustworthy artificial intelligence (AI), related to the need to maintain a complete account of the provenance of data, processes, and artifacts involved in the production of an AI model. Traceability in AI shares part of its scope with general purpose recommendations for provenance as W3C PROV, and it is also supported to different extents by specific tools used by practitioners as part of their efforts in making data analytic processes reproducible or repeatable. Here, we review relevant tools, practices, and data models for traceability in their connection to building AI models and systems. We also propose some minimal requirements to consider a model traceable according to the assessment list of the High-Level Expert Group on AI. Our review shows how, although a good number of reproducibility tools are available, a common approach is currently lacking, together with the need for shared semantics. Besides, we have detected that some tools have either not achieved full maturity, or are already falling into obsolescence or in a state of near abandonment by its developers, which might compromise the reproducibility of the research trusted to them

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Automated Machine Learning implementation framework in the banking sector

Author: Carmona Pedro Bernardo Resina Baptista Barreiros
Publication venue
Publication date: 24/01/2022
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsAutomated Machine Learning is a subject in the Machine Learning field, designed to give the possibility of Machine Learning use to non-expert users, it aroused from the lack of subject matter experts, trying to remove humans from these topic implementations. The advantages behind automated machine learning are leaning towards the removal of human implementation, fastening the machine learning deployment speed. The organizations will benefit from effective solutions benchmarking and validations. The use of an automated machine learning implementation framework can deeply transform an organization adding value to the business by freeing the subject matter experts of the low-level machine learning projects, letting them focus on high level projects. This will also help the organization reach new competence, customization, and decision-making levels in a higher analytical maturity level. This work pretends, firstly to investigate the impact and benefits automated machine learning implementation in the banking sector, and afterwards develop an implementation framework that could be used by banking institutions as a guideline for the automated machine learning implementation through their departments. The autoML advantages and benefits are evaluated regarding business value and competitive advantage and it is presented the implementation in a fictitious institution, considering all the need steps and the possible setbacks that could arise. Banking institutions, in their business have different business processes, and since most of them are old institutions, the main concerns are related with the automating their business process, improving their analytical maturity and sensibilizing their workforce to the benefits of the implementation of new forms of work. To proceed to a successful implementation plan should be known the institution particularities, adapt to them and ensured the sensibilization of the workforce and management to the investments that need to be made and the changes in all levels of their organizational work that will come from that, that will lead to a lot of facilities in everyone’s daily work

Repositório da Universidade Nova de Lisboa

Predictive maintenance of electrical grid assets: internship at EDP Distribuição - Energia S.A

Author: Gameiro Carlos Filipe Teixeira
Publication venue
Publication date: 04/02/2021
Field of study

Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis report will describe the activities developed during an internship at EDP Distribuição, focusing on a Predictive Maintenance analytics project directed at high voltage electrical grid assets including Overhead Lines, Power Transformers and Circuit Breakers. The project’s main goal is to support EDP’s asset management processes by improving maintenance and investing planning. The project’s main deliverables are the Probability of Failure metric that forecast asset failures 15 days ahead of time, estimated through supervised machine learning models; the Health Index metric that indicates asset’s current state and condition, implemented though the Ofgem methodology; and two asset management dashboards. The project was implemented by an external service provider, a consultant company, and during the internship it was possible to integrate the team, and participate in the development activities

Repositório da Universidade Nova de Lisboa

Automating data preparation with statistical analysis

Author: Wang Pei
Publication venue
Publication date: 18/03/2021
Field of study

Data preparation is the process of transforming raw data into a clean and consumable format. It is widely known as the bottleneck to extract value and insights from data, due to the number of possible tasks in the pipeline and factors that can largely affect the results, such as human expertise, application scenarios, and solution methodology. Researchers and practitioners devised a great variety of techniques and tools over the decades, while many of them still place a significant burden on human’s side to configure the suitable input rules and parameters. In this thesis, with the goal of reducing human manual effort, we explore using the power of statistical analysis techniques to automate three subtasks in the data preparation pipeline: data enrichment, error detection, and entity matching. Statistical analysis is the process of discovering underlying patterns and trends from data and deducing properties of an underlying distribution of probability from a sample, for example, testing hypotheses and deriving estimates. We first discuss CrawlEnrich, which automatically figures out the queries for data enrichment via web API data, by estimating the potential benefit of issuing a certain query. Then we study how to derive reusable error detection configuration rules from a web table corpus, so that end-users get results with no efforts. Finally, we introduce AutoML-EM, aiming to automate the entity matching model development process. Entity matching is to find the identical entities in real-world. Our work provides powerful angles to automate the process of various data preparation steps, and we conclude this thesis by discussing future directions

Simon Fraser University Institutional Repository

Causality Management and Analysis in Requirement Manuscript for Software Designs

Author: Oluyide Olumide Olugbenga
Publication venue: Tartu Ülikool
Publication date: 01/01/2023
Field of study

For software design tasks involving natural language, the results of a causal investigation provide valuable and robust semantic information, especially for identifying key variables during product (software) design and product optimization. As the interest in analytical data science shifts from correlations to a better understanding of causality, there is an equal task focused on the accuracy of extracting causality from textual artifacts to aid requirement engineering (RE) based decisions. This thesis focuses on identifying, extracting, and classifying causal phrases using word and sentence labeling based on the Bi-directional Encoder Representations from Transformers (BERT) deep learning language model and five machine learning models. The aim is to understand the form and degree of causality based on their impact and prevalence in RE practice. Methodologically, our analysis is centered around RE practice, and we considered 12,438 sentences extracted from 50 requirement engineering manuscripts (REM) for training our machine models. Our research reports that causal expressions constitute about 32% of sentences from REM. We applied four evaluation metrics, namely recall, accuracy, precision, and F1, to assess our machine models’ performance and accuracy to ensure the results’ conformity with our study goal. Further, we computed the highest model accuracy to be 85%, attributed to Naive Bayes. Finally, we noted that the applicability and relevance of our causal analytic framework is relevant to practitioners for different functionalities, such as generating test cases for requirement engineers and software developers and product performance auditing for management stakeholders

DSpace at Tartu University Library

Semantic code search using Code2Vec: A bag-of-paths model

Author: Arumugam Lakshmanan
Publication venue: 'University of Waterloo'
Publication date: 04/05/2020
Field of study

The world is moving towards an age centered around digital artifacts created by individuals, not only are the digital artifacts being created at an alarming rate, also the software to manage such artifacts is increasing than ever. Majority of any software is infused with large number of source code files. Therefore, code search has become an intrinsic part of software development process today and the universe of source code is only growing. Although, there are many general purpose search engines such as Google, Bing and other web search engines that are used for code search, such search engines are not dedicated only for software code search. Moreover, keyword based search may not return relevant documents when the search keyword is not present in the candidate documents. And, it does not take into account the semantic and syntactic properties of software artifacts such as source code. Semantic search (in the context of software engineering) is an emerging area of research that explores the efficiency of searching a code base using natural language queries. In this thesis, we aim to provide developers with the ability to locate source code blocks/snippets through semantic search that is built using neural models. Neural models are capable of representing natural language using vectors that have been shown to carry semantic meanings and are being used in various NLP tasks. Specifically, we want to use Code2Vec, a model that learns distributed representations of source code called code embeddings, to evaluate its performance against the task of semantically searching code snippets. The main idea behind using Code2Vec is that source code is structurally different from natural language and a model that uses the syntactic nature of source code can be helpful in learning the semantic properties. We pair Code2Vec with other neural models that represents natural language through vectors to create a hybrid model that outperforms previous benchmark baseline models developed in the CodeSearchNet challenge. We also studied the impact of various metatdata (such as popularity of the repository, code snippet token length etc.,) on the retrieved code snippets with respect to its relevance

University of Waterloo's Institutional Repository