Search CORE

3,969 research outputs found

Interactivity to improve visual analysis in groups with different literacy levels

Author: Cabral Pedro
Oliveira Mônica
Taibi Davide
Publication venue: AIS Electronic Library (AISeL)
Publication date: 16/10/2021
Field of study

The study presented in this paper investigates how two groups, with different literacies, perceive interactive visualizations by using statistical tests. A prototype with interactive visualizations created with Microsoft Power BI has been used. Validation was made with quantitative and qualitative metrics tested with ANOVA single factor. Three of the variables showed statistically significant differences between groups: accuracy, complexity, and comprehension. This highlights the importance of data literacy in comprehending visualizations, leading to a gap between both groups. The line, pie and bar chart were considered the best visualizations for both groups, and the worst was the bubble chart. Regarding the interactive component, the filter and the slider had a good evaluation among both groups. Using this study, organizations will be able to create appropriate visualizations for different audiences

Repositório da Universidade Nova de Lisboa

AIS Electronic Library (AISeL)

Block size, parallelism and predictive performance: finding the sweet spot in distributed learning

Author: Carneiro Davide Rua
Guimarães Miguel
Novais Paulo
Oliveira Filipe
Oliveira Óscar
Publication venue: Taylor & Francis
Publication date: 01/06/2023
Field of study

As distributed and multi-organization Machine Learning emerges, new challenges must be solved, such as diverse and low-quality data or real-time delivery. In this paper, we use a distributed learning environment to analyze the relationship between block size, parallelism, and predictor quality. Specifically, the goal is to find the optimum block size and the best heuristic to create distributed Ensembles. We evaluated three different heuristics and five block sizes on four publicly available datasets. Results show that using fewer but better base models matches or outperforms a standard Random Forest, and that 32 MB is the best block size.This work was supported by FCT – Fundação para a Ciência e Tecnologia within projects UIDB/04728/2020, EXPL/CCI-COM/0706/2021 and CPCA/IAC/AV/475278/202

Universidade do Minho: RepositoriUM

Assessing dental symmetry: introduction of the Symmetry Measure Score (SMS) in periodontal disease analysis

Author: Carvalho Davide
Mubayi Anuj
Oliveira Teresa
Pereira J. A.
Publication venue: IMS - Institute of Mathematical Statistics
Publication date: 17/12/2023
Field of study

IMS and CEAULinfo:eu-repo/semantics/publishedVersio

Repositório Aberto da Universidade Aberta

The impact of data selection strategies on distributed model performance

Author: Carneiro Davide Rua
Guimarães Miguel
Novais Paulo
Oliveira Filipe
Publication venue: Springer
Publication date: 01/09/2023
Field of study

Distributed Machine Learning, in which data and learning tasks are scattered across a cluster of computers, is one of the answers of the field to the challenges posed by Big Data. Still, in an era in which data abounds, decisions must still be made regarding which specific data to use on the training of the model, either because the amount of available data is simply too large, or because the training time or complexity of the model must be kept low. Typical approaches include, for example, selection based on data freshness. However, old data are not necessarily outdated and might still contain relevant patterns. Likewise, relying only on recent data may significantly decrease data diversity and representativity, and decrease model quality. The goal of this paper is to compare different heuristics for selecting data in a distributed Machine Learning scenario. Specifically, we ascertain whether selecting data based on their characteristics (meta-features), and optimizing for maximum diversity, improves model quality while, eventually, allowing to reduce model complexity. This will allow to develop more informed data selection strategies in distributed settings, in which the criteria are not only the location of the data or the state of each node in the cluster, but also include intrinsic and relevant characteristics of the data.This work has been supported by national funds through FCT – Fundação para a Ciência e Tecnologia through projects UIDB/04728/2020, EXPL/CCI/COM/0706/2021 and CPCA-IAC/AV/475278/202

Universidade do Minho: RepositoriUM

Extracting behavioural patterns from a negotiation game

Author: Davide Carneiro
José Neves
Marco Gomes
Paulo Novais
Tiago Oliveira
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The work presented focuses not only on the behavioural patterns that influence the outcome of a negotiation, but also on the discovery of ways to predict the type of conflict used in the process and the stress levels of the actors. After setting up an experimental intelligent environment provided with sensors to capture behavioural and contextual information, a set of relevant data was collected and analysed, with the underlying objective of using the behavioural patterns (obtained by statistical/probabilistic methods) as a basis to design and present plans and suggestions to the associated participants. In sooth, these proposals may influence in a positive way the course and outcome of a negotiation task in many aspects. This work highlights the importance of knowledge in negotiation, as in other social forms of interaction, providing also some new insights for informed decision support in situations in which uncertainty and conflict may be present

Universidade do Minho: RepositoriUM

Crossref

Population genetic structure of three species in the genus Astrocaryum G. Mey. (Arecaceae).

Author: DAVIDE L. C.
KALISZ S.
OLIVEIRA M. do S. P. de
OLIVEIRA N. P.
Publication venue: 'Genetics and Molecular Research'
Publication date: 04/10/2017
Field of study

We assessed the level and distribution of genetic diversity in three species of the economically important palm genus Astrocaryum located in Pará State, in northern Brazil. Samples were collected in three municipalities for Astrocaryum aculeatum: Belterra, Santarém, and Terra Santa; and in two municipalities for both A. murumuru: Belém and Santo Antônio do Tauá and A. paramaca: Belém and Ananindeua. Eight microsatellite loci amplified well and were used for genetic analysis. The mean number of alleles per locus for A. aculeatum, A. murumuru, and A. paramaca were 2.33, 2.38, and 2.06, respectively. Genetic diversity was similar for the three species, ranging from HE = 0.222 in A. aculeatum to HE = 0.254 in A. murumuru. Both FST and AMOVA showed that most of the genetic variation was found within populations for all three species, but high genetic differentiation among populations was found for A. aculeatum. Three loci were not in Hardy-Weinberg equilibrium, with populations of A. paramaca showing a tendency for the excess of heterozygotes (FIS = -0.144). Gene flow was high for populations of A. paramaca (Nm = 19.35). Our results suggest that the genetic diversity within populations followed the genetic differentiation among populations due to high gene flow among the population. Greater geographic distances among the three collection sites for A. aculeatum likely hampered gene flow for this species

Repository Open Access to Scientific Information from Embrapa

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Predicting model training time to optimize distributed machine learning applications

Author: Alves Victor
Carneiro Davide
Guimarães Miguel
Novais Paulo
Oliveira Filipe
Oliveira Óscar
Palumbo Guilherme
Publication venue: Multidisciplinary Digital Publishing Institute
Publication date: 08/02/2023
Field of study

Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed datasets. In this paper, we describe CEDEs—a distributed learning system in which models are heterogeneous distributed Ensembles, i.e., complex models constituted by different base models, trained with different and distributed subsets of data. Specifically, we address the issue of predicting the training time of a given model, given its characteristics and the characteristics of the data. Given that the creation of an Ensemble may imply the training of hundreds of base models, information about the predicted duration of each of these individual tasks is paramount for an efficient management of the cluster’s computational resources and for minimizing makespan, i.e., the time it takes to train the whole Ensemble. Results show that the proposed approach is able to predict the training time of Decision Trees with an average error of 0.103 s, and the training time of Neural Networks with an average error of 21.263 s. We also show how results depend significantly on the hyperparameters of the model and on the characteristics of the input data.This work has been supported by national funds through FCT – Fundação para a Ciência e Tecnologia through projects UIDB/04728/2020, EXPL/CCI-COM/0706/2021, and CPCA-IAC/AV/475278/2022

Universidade do Minho: RepositoriUM

Learning and Mining Player Motion Profiles in Physically Interactive Robogames

Author: Bonarini Andrea
Morreale Luca
Nascimento Tiago
Oliveira Ewerton
Orrù Davide
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Physically-Interactive RoboGames (PIRG) are an emerging application whose aim is to develop robotic agents able to interact and engage humans in a game situation. In this framework, learning a model of players’ activity is relevant both to understand their engagement, as well as to understand specific strategies they adopted, which in turn can foster game adaptation. Following such directions and given the lack of quantitative methods for player modeling in PIRG, we propose a methodology for representing players as a mixture of existing player’s types uncovered from data. This is done by dealing both with the intrinsic uncertainty associated with the setting and with the agent necessity to act in real time to support the game interaction. Our methodology first focuses on encoding time series data generated from player-robot interaction into images, in particular Gramian angular field images, to represent continuous data. To these, we apply latent Dirichlet allocation to summarize the player’s motion style as a probabilistic mixture of different styles discovered from data. This approach has been tested in a dataset collected from a real, physical robot game, where activity patterns are extracted by using a custom three-axis accelerometer sensor module. The obtained results suggest that the proposed system is able to provide a robust description for the player interaction

Multidisciplinary Digital Publishing Institute

Archivio istituzionale della ricerca - Politecnico di Milano

Directory of Open Access Journals

Dynamic management of distributed machine learning projects

Author: Alves André
Carneiro Davide Rua
Monteiro José
Moço Hugo
Novais Paulo
Oliveira Filipe
Oliveira Óscar
Publication venue: Springer
Publication date: 01/04/2023
Field of study

Given the new requirements of Machine Learning problems in the last years, especially in what concerns the volume, diversity and speed of data, new approaches are needed to deal with the associated challenges. In this paper we describe CEDEs - a distributed learning system that runs on top of an Hadoop cluster and takes advantage of blocks, replication and balancing. CEDEs trains models in a distributed manner following the principle of data locality, and is able to change parts of the model through an optimization module, thus allowing a model to evolve over time as the data changes. This paper describes its generic architecture, details the implementation of the first modules, and provides a first validation.This work was supported by FCT – Fundação para a Ciência e Tecnologia within projects UIDB/04728/2020 and EXPL/CCI-COM/0706/2021

Universidade do Minho: RepositoriUM

Bandeamento cromossômico CMA em espécies de Astrocaryum spp.

Author: DAVIDE L. C.
OLIVEIRA M. do S. P. de
OLIVEIRA N. P. de
SANTOS Y. D.
Publication venue
Publication date: 03/12/2014
Field of study

Repository Open Access to Scientific Information from Embrapa