3,969 research outputs found
Interactivity to improve visual analysis in groups with different literacy levels
The study presented in this paper investigates how two groups, with different literacies, perceive interactive visualizations by using statistical tests. A prototype with interactive visualizations created with Microsoft Power BI has been used. Validation was made with quantitative and qualitative metrics tested with ANOVA single factor. Three of the variables showed statistically significant differences between groups: accuracy, complexity, and comprehension. This highlights the importance of data literacy in comprehending visualizations, leading to a gap between both groups. The line, pie and bar chart were considered the best visualizations for both groups, and the worst was the bubble chart. Regarding the interactive component, the filter and the slider had a good evaluation among both groups. Using this study, organizations will be able to create appropriate visualizations for different audiences
Block size, parallelism and predictive performance: finding the sweet spot in distributed learning
As distributed and multi-organization Machine Learning emerges, new challenges must be solved, such as diverse and low-quality data or real-time delivery. In this paper, we use a distributed learning environment to analyze the relationship between block size, parallelism, and predictor quality. Specifically, the goal is to find the optimum block size and the best heuristic to create distributed Ensembles. We evaluated three different heuristics and five block sizes on four publicly available datasets. Results show that using fewer but better base models matches or outperforms a standard Random Forest, and that 32 MB is the best block size.This work was supported by FCT – Fundação para a Ciência e Tecnologia within projects UIDB/04728/2020, EXPL/CCI-COM/0706/2021 and CPCA/IAC/AV/475278/202
Assessing dental symmetry: introduction of the Symmetry Measure Score (SMS) in periodontal disease analysis
IMS and CEAULinfo:eu-repo/semantics/publishedVersio
The impact of data selection strategies on distributed model performance
Distributed Machine Learning, in which data and learning tasks are scattered across a cluster of computers, is one of the answers of the field to the challenges posed by Big Data. Still, in an era in which data abounds, decisions must still be made regarding which specific data to use on the training of the model, either because the amount of available data is simply too large, or because the training time or complexity of the model must be kept low. Typical approaches include, for example, selection based on data freshness. However, old data are not necessarily outdated and might still contain relevant patterns. Likewise, relying only on recent data may significantly decrease data diversity and representativity, and decrease model quality. The goal of this paper is to compare different heuristics for selecting data in a distributed Machine Learning scenario. Specifically, we ascertain whether selecting data based on their characteristics (meta-features), and optimizing for maximum diversity, improves model quality while, eventually, allowing to reduce model complexity. This will allow to develop more informed data selection strategies in distributed settings, in which the criteria are not only the location of the data or the state of each node in the cluster, but also include intrinsic and relevant characteristics of the data.This work has been supported by national funds through FCT – Fundação para a Ciência e Tecnologia through projects UIDB/04728/2020, EXPL/CCI/COM/0706/2021 and CPCA-IAC/AV/475278/202
Extracting behavioural patterns from a negotiation game
The work presented focuses not only on the behavioural patterns that
influence the outcome of a negotiation, but also on the discovery of ways to predict
the type of conflict used in the process and the stress levels of the actors. After
setting up an experimental intelligent environment provided with sensors to capture
behavioural and contextual information, a set of relevant data was collected
and analysed, with the underlying objective of using the behavioural patterns (obtained
by statistical/probabilistic methods) as a basis to design and present plans
and suggestions to the associated participants. In sooth, these proposals may influence
in a positive way the course and outcome of a negotiation task in many
aspects. This work highlights the importance of knowledge in negotiation, as in
other social forms of interaction, providing also some new insights for informed
decision support in situations in which uncertainty and conflict may be present
Population genetic structure of three species in the genus Astrocaryum G. Mey. (Arecaceae).
We assessed the level and distribution of genetic diversity in three species of the economically important palm genus Astrocaryum located in Pará State, in northern Brazil. Samples were collected in three municipalities for Astrocaryum aculeatum: Belterra, Santarém, and Terra Santa; and in two municipalities for both A. murumuru: Belém and Santo Antônio do Tauá and A. paramaca: Belém and Ananindeua. Eight microsatellite loci amplified well and were used for genetic analysis. The mean number of alleles per locus for A. aculeatum, A. murumuru, and A. paramaca were 2.33, 2.38, and 2.06, respectively. Genetic diversity was similar for the three species, ranging from HE = 0.222 in A. aculeatum to HE = 0.254 in A. murumuru. Both FST and AMOVA showed that most of the genetic variation was found within populations for all three species, but high genetic differentiation among populations was found for A. aculeatum. Three loci were not in Hardy-Weinberg equilibrium, with populations of A. paramaca showing a tendency for the excess of heterozygotes (FIS = -0.144). Gene flow was high for populations of A. paramaca (Nm = 19.35). Our results suggest that the genetic diversity within populations followed the genetic differentiation among populations due to high gene flow among the population. Greater geographic distances among the three collection sites for A. aculeatum likely hampered gene flow for this species
Predicting model training time to optimize distributed machine learning applications
Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed datasets. In this paper, we describe CEDEs—a distributed learning system in which models are heterogeneous distributed Ensembles, i.e., complex models constituted by different base models, trained with different and distributed subsets of data. Specifically, we address the issue of predicting the training time of a given model, given its characteristics and the characteristics of the data. Given that the creation of an Ensemble may imply the training of hundreds of base models, information about the predicted duration of each of these individual tasks is paramount for an efficient management of the cluster’s computational resources and for minimizing makespan, i.e., the time it takes to train the whole Ensemble. Results show that the proposed approach is able to predict the training time of Decision Trees with an average error of 0.103 s, and the training time of Neural Networks with an average error of 21.263 s. We also show how results depend significantly on the hyperparameters of the model and on the characteristics of the input data.This work has been supported by national funds through FCT – Fundação para a Ciência e Tecnologia through projects UIDB/04728/2020, EXPL/CCI-COM/0706/2021, and CPCA-IAC/AV/475278/2022
Learning and Mining Player Motion Profiles in Physically Interactive Robogames
Physically-Interactive RoboGames (PIRG) are an emerging application whose aim is to develop robotic agents able to interact and engage humans in a game situation. In this framework, learning a model of players’ activity is relevant both to understand their engagement, as well as to understand specific strategies they adopted, which in turn can foster game adaptation. Following such directions and given the lack of quantitative methods for player modeling in PIRG, we propose a methodology for representing players as a mixture of existing player’s types uncovered from data. This is done by dealing both with the intrinsic uncertainty associated with the setting and with the agent necessity to act in real time to support the game interaction. Our methodology first focuses on encoding time series data generated from player-robot interaction into images, in particular Gramian angular field images, to represent continuous data. To these, we apply latent Dirichlet allocation to summarize the player’s motion style as a probabilistic mixture of different styles discovered from data. This approach has been tested in a dataset collected from a real, physical robot game, where activity patterns are extracted by using a custom three-axis accelerometer sensor module. The obtained results suggest that the proposed system is able to provide a robust description for the player interaction
Dynamic management of distributed machine learning projects
Given the new requirements of Machine Learning problems in the last years, especially in what concerns the volume, diversity and speed of data, new approaches are needed to deal with the associated challenges. In this paper we describe CEDEs - a distributed learning system that runs on top of an Hadoop cluster and takes advantage of blocks, replication and balancing. CEDEs trains models in a distributed manner following the principle of data locality, and is able to change parts of the model through an optimization module, thus allowing a model to evolve over time as the data changes. This paper describes its generic architecture, details the implementation of the first modules, and provides a first validation.This work was supported by FCT – Fundação para a Ciência e Tecnologia within projects UIDB/04728/2020 and EXPL/CCI-COM/0706/2021
- …