200 research outputs found
Helicopter parenting through the lens of reddit: A text mining study
The study aimed to understand Reddit users’ experience with helicopter parenting through first- hand accounts. Text mining and natural language processing techniques were employed to extract data from the subreddit r/helicopterparents. A total of 713 original posts were processed from unstructured texts to tidy formats. Latent Dirichlet Allocation (LDA), a popular topic modeling method, was used to discover hidden themes within the corpus. The data revealed common environmental contexts of helicopter parenting (i.e., school, college, work, and home) and its implication on college decisions, privacy, and social relationships. These collectively suggested the importance of autonomy-supportive parenting and mindfulness interventions as viable solutions to the problems posed by helicopter parenting. In addition, findings lent support to past research that has identified more maternal than paternal models of helicopter parenting. Further research on the implications of the COVID-19 pandemic on helicopter parenting is warranted
Traceability for trustworthy AI: a review of models and tools
Traceability is considered a key requirement for trustworthy artificial intelligence (AI), related to the need to maintain a complete account of the provenance of data, processes, and artifacts involved in the production of an AI model. Traceability in AI shares part of its scope with general purpose recommendations for provenance as W3C PROV, and it is also supported to different extents by specific tools used by practitioners as part of their efforts in making data analytic processes reproducible or repeatable. Here, we review relevant tools, practices, and data models for traceability in their connection to building AI models and systems. We also propose some minimal requirements to consider a model traceable according to the assessment list of the High-Level Expert Group on AI. Our review shows how, although a good number of reproducibility tools are available, a common approach is currently lacking, together with the need for shared semantics. Besides, we have detected that some tools have either not achieved full maturity, or are already falling into obsolescence or in a state of near abandonment by its developers, which might compromise the reproducibility of the research trusted to them
Automated Machine Learning implementation framework in the banking sector
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsAutomated Machine Learning is a subject in the Machine Learning field, designed to give the possibility
of Machine Learning use to non-expert users, it aroused from the lack of subject matter experts, trying
to remove humans from these topic implementations. The advantages behind automated machine
learning are leaning towards the removal of human implementation, fastening the machine learning
deployment speed. The organizations will benefit from effective solutions benchmarking and
validations. The use of an automated machine learning implementation framework can deeply
transform an organization adding value to the business by freeing the subject matter experts of the
low-level machine learning projects, letting them focus on high level projects. This will also help the
organization reach new competence, customization, and decision-making levels in a higher analytical
maturity level.
This work pretends, firstly to investigate the impact and benefits automated machine learning
implementation in the banking sector, and afterwards develop an implementation framework that
could be used by banking institutions as a guideline for the automated machine learning
implementation through their departments. The autoML advantages and benefits are evaluated
regarding business value and competitive advantage and it is presented the implementation in a
fictitious institution, considering all the need steps and the possible setbacks that could arise.
Banking institutions, in their business have different business processes, and since most of them are
old institutions, the main concerns are related with the automating their business process, improving
their analytical maturity and sensibilizing their workforce to the benefits of the implementation of new
forms of work. To proceed to a successful implementation plan should be known the institution
particularities, adapt to them and ensured the sensibilization of the workforce and management to
the investments that need to be made and the changes in all levels of their organizational work that
will come from that, that will lead to a lot of facilities in everyone’s daily work
Predictive maintenance of electrical grid assets: internship at EDP Distribuição - Energia S.A
Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis report will describe the activities developed during an internship at EDP Distribuição, focusing on a Predictive Maintenance analytics project directed at high voltage electrical grid assets including Overhead Lines, Power Transformers and Circuit Breakers. The project’s main goal is to support EDP’s asset management processes by improving maintenance and investing planning. The project’s main deliverables are the Probability of Failure metric that forecast asset failures 15 days ahead of time, estimated through supervised machine learning models; the Health Index metric that indicates asset’s current state and condition, implemented though the Ofgem methodology; and two asset management dashboards. The project was implemented by an external service provider, a consultant company, and during the internship it was possible to integrate the team, and participate in the development activities
Automating data preparation with statistical analysis
Data preparation is the process of transforming raw data into a clean and consumable format. It is widely known as the bottleneck to extract value and insights from data, due to the number of possible tasks in the pipeline and factors that can largely affect the results, such as human expertise, application scenarios, and solution methodology. Researchers and practitioners devised a great variety of techniques and tools over the decades, while many of them still place a significant burden on human’s side to configure the suitable input rules and parameters. In this thesis, with the goal of reducing human manual effort, we explore using the power of statistical analysis techniques to automate three subtasks in the data preparation pipeline: data enrichment, error detection, and entity matching. Statistical analysis is the process of discovering underlying patterns and trends from data and deducing properties of an underlying distribution of probability from a sample, for example, testing hypotheses and deriving estimates. We first discuss CrawlEnrich, which automatically figures out the queries for data enrichment via web API data, by estimating the potential benefit of issuing a certain query. Then we study how to derive reusable error detection configuration rules from a web table corpus, so that end-users get results with no efforts. Finally, we introduce AutoML-EM, aiming to automate the entity matching model development process. Entity matching is to find the identical entities in real-world. Our work provides powerful angles to automate the process of various data preparation steps, and we conclude this thesis by discussing future directions
Causality Management and Analysis in Requirement Manuscript for Software Designs
For software design tasks involving natural language, the results of a causal investigation
provide valuable and robust semantic information, especially for identifying key
variables during product (software) design and product optimization. As the interest
in analytical data science shifts from correlations to a better understanding of causality,
there is an equal task focused on the accuracy of extracting causality from textual
artifacts to aid requirement engineering (RE) based decisions. This thesis focuses on
identifying, extracting, and classifying causal phrases using word and sentence labeling
based on the Bi-directional Encoder Representations from Transformers (BERT) deep
learning language model and five machine learning models. The aim is to understand
the form and degree of causality based on their impact and prevalence in RE practice.
Methodologically, our analysis is centered around RE practice, and we considered 12,438
sentences extracted from 50 requirement engineering manuscripts (REM) for training
our machine models. Our research reports that causal expressions constitute about 32%
of sentences from REM. We applied four evaluation metrics, namely recall, accuracy,
precision, and F1, to assess our machine models’ performance and accuracy to ensure
the results’ conformity with our study goal. Further, we computed the highest model
accuracy to be 85%, attributed to Naive Bayes. Finally, we noted that the applicability
and relevance of our causal analytic framework is relevant to practitioners for different
functionalities, such as generating test cases for requirement engineers and software
developers and product performance auditing for management stakeholders
Semantic code search using Code2Vec: A bag-of-paths model
The world is moving towards an age centered around digital artifacts created by individuals, not only are the digital artifacts being created at an alarming rate, also the software to manage such artifacts is increasing than ever. Majority of any software is infused with large number of source code files. Therefore, code search has become an intrinsic part of software development process today and the universe of source code is only growing. Although, there are many general purpose search engines such as Google, Bing and other web search engines that are used for code search, such search engines are not dedicated only for software code search. Moreover, keyword based search may not return relevant documents when the search keyword is not present in the candidate documents. And, it does not take into account the semantic and syntactic properties of software artifacts such as source code. Semantic search (in the context of software engineering) is an emerging area of research that explores the efficiency of searching a code base using natural language queries. In this thesis, we aim to provide developers with the ability to locate source code blocks/snippets through semantic search that is built using neural models.
Neural models are capable of representing natural language using vectors that have been shown to carry semantic meanings and are being used in various NLP tasks. Specifically, we want to use Code2Vec, a model that learns distributed representations of source code called code embeddings, to evaluate its performance against the task of semantically searching code snippets. The main idea behind using Code2Vec is that source code is structurally different from natural language and a model that uses the syntactic nature of source code can be helpful in learning the semantic properties. We pair Code2Vec with other neural models that represents natural language through vectors to create a hybrid model that outperforms previous benchmark baseline models developed in the CodeSearchNet challenge. We also studied the impact of various metatdata (such as popularity of the repository, code snippet token length etc.,) on the retrieved code snippets with respect to its relevance
- …