3,969 research outputs found

    Interactivity to improve visual analysis in groups with different literacy levels

    Get PDF
    The study presented in this paper investigates how two groups, with different literacies, perceive interactive visualizations by using statistical tests. A prototype with interactive visualizations created with Microsoft Power BI has been used. Validation was made with quantitative and qualitative metrics tested with ANOVA single factor. Three of the variables showed statistically significant differences between groups: accuracy, complexity, and comprehension. This highlights the importance of data literacy in comprehending visualizations, leading to a gap between both groups. The line, pie and bar chart were considered the best visualizations for both groups, and the worst was the bubble chart. Regarding the interactive component, the filter and the slider had a good evaluation among both groups. Using this study, organizations will be able to create appropriate visualizations for different audiences

    Block size, parallelism and predictive performance: finding the sweet spot in distributed learning

    Get PDF
    As distributed and multi-organization Machine Learning emerges, new challenges must be solved, such as diverse and low-quality data or real-time delivery. In this paper, we use a distributed learning environment to analyze the relationship between block size, parallelism, and predictor quality. Specifically, the goal is to find the optimum block size and the best heuristic to create distributed Ensembles. We evaluated three different heuristics and five block sizes on four publicly available datasets. Results show that using fewer but better base models matches or outperforms a standard Random Forest, and that 32 MB is the best block size.This work was supported by FCT – Fundação para a Ciência e Tecnologia within projects UIDB/04728/2020, EXPL/CCI-COM/0706/2021 and CPCA/IAC/AV/475278/202

    Assessing dental symmetry: introduction of the Symmetry Measure Score (SMS) in periodontal disease analysis

    Get PDF
    IMS and CEAULinfo:eu-repo/semantics/publishedVersio

    The impact of data selection strategies on distributed model performance

    Get PDF
    Distributed Machine Learning, in which data and learning tasks are scattered across a cluster of computers, is one of the answers of the field to the challenges posed by Big Data. Still, in an era in which data abounds, decisions must still be made regarding which specific data to use on the training of the model, either because the amount of available data is simply too large, or because the training time or complexity of the model must be kept low. Typical approaches include, for example, selection based on data freshness. However, old data are not necessarily outdated and might still contain relevant patterns. Likewise, relying only on recent data may significantly decrease data diversity and representativity, and decrease model quality. The goal of this paper is to compare different heuristics for selecting data in a distributed Machine Learning scenario. Specifically, we ascertain whether selecting data based on their characteristics (meta-features), and optimizing for maximum diversity, improves model quality while, eventually, allowing to reduce model complexity. This will allow to develop more informed data selection strategies in distributed settings, in which the criteria are not only the location of the data or the state of each node in the cluster, but also include intrinsic and relevant characteristics of the data.This work has been supported by national funds through FCT – Fundação para a Ciência e Tecnologia through projects UIDB/04728/2020, EXPL/CCI/COM/0706/2021 and CPCA-IAC/AV/475278/202

    Extracting behavioural patterns from a negotiation game

    Get PDF
    The work presented focuses not only on the behavioural patterns that influence the outcome of a negotiation, but also on the discovery of ways to predict the type of conflict used in the process and the stress levels of the actors. After setting up an experimental intelligent environment provided with sensors to capture behavioural and contextual information, a set of relevant data was collected and analysed, with the underlying objective of using the behavioural patterns (obtained by statistical/probabilistic methods) as a basis to design and present plans and suggestions to the associated participants. In sooth, these proposals may influence in a positive way the course and outcome of a negotiation task in many aspects. This work highlights the importance of knowledge in negotiation, as in other social forms of interaction, providing also some new insights for informed decision support in situations in which uncertainty and conflict may be present

    Population genetic structure of three species in the genus Astrocaryum G. Mey. (Arecaceae).

    Get PDF
    We assessed the level and distribution of genetic diversity in three species of the economically important palm genus Astrocaryum located in Pará State, in northern Brazil. Samples were collected in three municipalities for Astrocaryum aculeatum: Belterra, Santarém, and Terra Santa; and in two municipalities for both A. murumuru: Belém and Santo Antônio do Tauá and A. paramaca: Belém and Ananindeua. Eight microsatellite loci amplified well and were used for genetic analysis. The mean number of alleles per locus for A. aculeatum, A. murumuru, and A. paramaca were 2.33, 2.38, and 2.06, respectively. Genetic diversity was similar for the three species, ranging from HE = 0.222 in A. aculeatum to HE = 0.254 in A. murumuru. Both FST and AMOVA showed that most of the genetic variation was found within populations for all three species, but high genetic differentiation among populations was found for A. aculeatum. Three loci were not in Hardy-Weinberg equilibrium, with populations of A. paramaca showing a tendency for the excess of heterozygotes (FIS = -0.144). Gene flow was high for populations of A. paramaca (Nm = 19.35). Our results suggest that the genetic diversity within populations followed the genetic differentiation among populations due to high gene flow among the population. Greater geographic distances among the three collection sites for A. aculeatum likely hampered gene flow for this species

    Predicting model training time to optimize distributed machine learning applications

    Get PDF
    Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed datasets. In this paper, we describe CEDEs—a distributed learning system in which models are heterogeneous distributed Ensembles, i.e., complex models constituted by different base models, trained with different and distributed subsets of data. Specifically, we address the issue of predicting the training time of a given model, given its characteristics and the characteristics of the data. Given that the creation of an Ensemble may imply the training of hundreds of base models, information about the predicted duration of each of these individual tasks is paramount for an efficient management of the cluster’s computational resources and for minimizing makespan, i.e., the time it takes to train the whole Ensemble. Results show that the proposed approach is able to predict the training time of Decision Trees with an average error of 0.103 s, and the training time of Neural Networks with an average error of 21.263 s. We also show how results depend significantly on the hyperparameters of the model and on the characteristics of the input data.This work has been supported by national funds through FCT – Fundação para a Ciência e Tecnologia through projects UIDB/04728/2020, EXPL/CCI-COM/0706/2021, and CPCA-IAC/AV/475278/2022

    Learning and Mining Player Motion Profiles in Physically Interactive Robogames

    Get PDF
    Physically-Interactive RoboGames (PIRG) are an emerging application whose aim is to develop robotic agents able to interact and engage humans in a game situation. In this framework, learning a model of players’ activity is relevant both to understand their engagement, as well as to understand specific strategies they adopted, which in turn can foster game adaptation. Following such directions and given the lack of quantitative methods for player modeling in PIRG, we propose a methodology for representing players as a mixture of existing player’s types uncovered from data. This is done by dealing both with the intrinsic uncertainty associated with the setting and with the agent necessity to act in real time to support the game interaction. Our methodology first focuses on encoding time series data generated from player-robot interaction into images, in particular Gramian angular field images, to represent continuous data. To these, we apply latent Dirichlet allocation to summarize the player’s motion style as a probabilistic mixture of different styles discovered from data. This approach has been tested in a dataset collected from a real, physical robot game, where activity patterns are extracted by using a custom three-axis accelerometer sensor module. The obtained results suggest that the proposed system is able to provide a robust description for the player interaction

    Dynamic management of distributed machine learning projects

    Get PDF
    Given the new requirements of Machine Learning problems in the last years, especially in what concerns the volume, diversity and speed of data, new approaches are needed to deal with the associated challenges. In this paper we describe CEDEs - a distributed learning system that runs on top of an Hadoop cluster and takes advantage of blocks, replication and balancing. CEDEs trains models in a distributed manner following the principle of data locality, and is able to change parts of the model through an optimization module, thus allowing a model to evolve over time as the data changes. This paper describes its generic architecture, details the implementation of the first modules, and provides a first validation.This work was supported by FCT – Fundação para a Ciência e Tecnologia within projects UIDB/04728/2020 and EXPL/CCI-COM/0706/2021
    • …
    corecore