572 research outputs found

    Graph set data mining

    Get PDF
    Graphs are among the most versatile abstract data types in computer science. With the variety comes great adoption in various application fields, such as chemistry, biology, social analysis, logistics, and computer science itself. With the growing capacities of digital storage, the collection of large amounts of data has become the norm in many application fields. Data mining, i.e., the automated extraction of non-trivial patterns from data, is a key step to extract knowledge from these datasets and generate value. This thesis is dedicated to concurrent scalable data mining algorithms beyond traditional notions of efficiency for large-scale datasets of small labeled graphs; more precisely, structural clustering and representative subgraph pattern mining. It is motivated by, but not limited to, the need to analyze molecular libraries of ever-increasing size in the drug discovery process. Structural clustering makes use of graph theoretical concepts, such as (common) subgraph isomorphisms and frequent subgraphs, to model cluster commonalities directly in the application domain. It is considered computationally demanding for non-restricted graph classes and with very few exceptions prior algorithms are only suitable for very small datasets. This thesis discusses the first truly scalable structural clustering algorithm StruClus with linear worst-case complexity. At the same time, StruClus embraces the inherent values of structural clustering algorithms, i.e., interpretable, consistent, and high-quality results. A novel two-fold sampling strategy with stochastic error bounds for frequent subgraph mining is presented. It enables fast extraction of cluster commonalities in the form of common subgraph representative sets. StruClus is the first structural clustering algorithm with a directed selection of structural cluster-representative patterns regarding homogeneity and separation aspects in the high-dimensional subgraph pattern space. Furthermore, a novel concept of cluster homogeneity balancing using dynamically-sized representatives is discussed. The second part of this thesis discusses the representative subgraph pattern mining problem in more general terms. A novel objective function maximizes the number of represented graphs for a cardinality-constrained representative set. It is shown that the problem is a special case of the maximum coverage problem and is NP-hard. Based on the greedy approximation of Nemhauser, Wolsey, and Fisher for submodular set function maximization a novel sampling approach is presented. It mines candidate sets that contain an optimal greedy solution with a probabilistic maximum error. This leads to a constant-time algorithm to generate the candidate sets given a fixed-size sample of the dataset. In combination with a cheap single-pass streaming evaluation of the candidate sets, this enables scalability to datasets with billions of molecules on a single machine. Ultimately, the sampling approach leads to the first distributed subgraph pattern mining algorithm that distributes the pattern space and the dataset graphs at the same time

    SmartAQnet: remote and in-situ sensing of urban air quality

    Get PDF
    our time. However, it is very difficult for many cities to take measures to accommodate today’s needs concerning e.g. mobility, housing and work, because a consistent fine-granular data and information on causal chains is largely missing. This has the potential to change, as today, both large-scale basic data as well as new promising measuring approaches are becoming available. The project “SmartAQnet”, funded by the German Federal Ministry of Transport and Digital Infrastructure (BMVI), is based on a pragmatic, data driven approach, which for the first time combines existing data sets with a networked mobile measurement strategy in the urban space. By connecting open data, such as weather data or development plans, remote sensing of influencing factors, and new mobile measurement approaches, such as participatory sensing with low-cost sensor technology, “scientific scouts” (autonomous, mobile smart dust measurement device that is auto-calibrated to a high-quality reference instrument within an intelligent monitoring network) and demand-oriented measurements by light-weight UAVs, a novel measuring and analysis concept is created within the model region of Augsburg, Germany. In addition to novel analytics, a prototypical technology stack is planned which, through modern analytics methods and Big Data and IoT technologies, enables application in a scalable way

    Music Preferences and Personality in Brazilians

    Get PDF
    This article analyzes the relationship between musical preference and type of personality in a large group of Brazilian young and adult participants (N = 1050). The study included 25 of 27 states of Brazil and individuals aged between 16 and 71 years (M = 30.87; SD = 10.50). Of these, 500 were male (47.6%) and 550 were female (52.4%). A correlational study was carried out applying two online questionnaires with quality parameters (content-construct validity and reliability), one on musical preference and the other on personality. The results indicate four main findings: (1) the musical listening of the participants is limited to a reduced number of styles, mainly Pop music and others, typical of Brazilian culture; (2) the Brazilian context supposes a determining aspect in the low preference of non-Brazilian music; (3) there is a positive correlation between most personality types analyzed and the Latin, Brazilian, Classical and Ethnic musical styles. A negative correlation between these types of personality and the consumption of Rock music was also observed; (4) musical preferences are driven not only by personality but in some cases they are also driven by socio-demographic variables (i.e., age and gender). Likewise, this work shows how participants make use of music in personality aspects that may be of interest for the analysis of socioaffective behavior (personality) as well as according to different socio-demographic variables (e.g., age and gender). More cross-cultural research on musical preference and personality would need to be carried out from a global perspective, framed in the context of social psychology and studies of mass communication.This research was co-financed by the Vice-Rectorate for Research and Knowledge Transfer, University of Granada, Spai

    SmartAQnet 2020: A New Open Urban Air Quality Dataset from Heterogeneous PM Sensors

    Get PDF
    The increasing attention paid to urban air quality modeling places higher requirements on urban air quality datasets. This article introduces a new urban air quality dataset—the SmartAQnet2020 dataset—which has a large span and high resolution in both time and space dimensions. The dataset contains 248,572,003 observations recorded by over 180 individual measurement devices, including ceilometers, Radio Acoustic Sounding System (RASS), mid- and low-cost stationary measuring equipment equipped with meteorological sensors and particle counters, and low-weight portable measuring equipment mounted on different platforms such as trolley, bike, and UAV

    Nutrient stoichiometry and land use rather than species richness determine plant functional diversity

    Get PDF
    Ajuts: Deutsche Forschungsgemeinschaft. Grant Numbers: FI 1246/6-1, HO 3830/2-1, KL 2265/5-1 - TRY initiative on plant traits DIVERSITAS/Future Earth and the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig - Open Access Publication Fund of University of Muenster.Plant functional traits reflect individual and community ecological strategies. They allow the detection of directional changes in community dynamics and ecosystemic processes, being an additional tool to assess biodiversity than species richness. Analysis of functional patterns in plant communities provides mechanistic insight into biodiversity alterations due to anthropogenic activity. Although studies have consi-dered of either anthropogenic management or nutrient availability on functional traits in temperate grasslands, studies combining effects of both drivers are scarce. Here, we assessed the impacts of management intensity (fertilization, mowing, grazing), nutrient stoichiometry (C, N, P, K), and vegetation composition on community-weighted means (CWMs) and functional diversity (Rao's Q) from seven plant traits in 150 grasslands in three regions in Germany, using data of 6 years. Land use and nutrient stoichiometry accounted for larger proportions of model variance of CWM and Rao's Q than species richness and productivity. Grazing affected all analyzed trait groups; fertilization and mowing only impacted generative traits. Grazing was clearly associated with nutrient retention strategies, that is, investing in durable structures and production of fewer, less variable seed. Phenological variability was increased. Fertilization and mowing decreased seed number/mass variability, indicating competition-related effects. Impacts of nutrient stoichiometry on trait syndromes varied. Nutrient limitation (large N:P, C:N ratios) promoted species with conservative strategies, that is, investment in durable plant structures rather than fast growth, fewer seed, and delayed flowering onset. In contrast to seed mass, leaf-economics variability was reduced under P shortage. Species diversity was positively associated with the variability of generative traits. Synthesis. Here, land use, nutrient availability, species richness, and plant functional strategies have been shown to interact complexly, driving community composition, and vegetation responses to management intensity. We suggest that deeper understanding of underlying mechanisms shaping community assembly and biodiversity will require analyzing all these parameters

    Trends of Exposure to Acrylamide as Measured by Urinary Biomarkers Levels within the HBM4EU Biomonitoring Aligned Studies (2000–2021)

    Get PDF
    This article belongs to the Special Issue Analysis of Human Biomonitoring Data and Risk Assessment of Human Exposure to Environmental Chemicals: What Do We Learn for Prevention?Acrylamide, a substance potentially carcinogenic in humans, represents a very prevalent contaminant in food and is also contained in tobacco smoke. Occupational exposure to higher concentrations of acrylamide was shown to induce neurotoxicity in humans. To minimize related risks for public health, it is vital to obtain data on the actual level of exposure in differently affected segments of the population. To achieve this aim, acrylamide has been added to the list of substances of concern to be investigated in the HBM4EU project, a European initiative to obtain biomonitoring data for a number of pollutants highly relevant for public health. This report summarizes the results obtained for acrylamide, with a focus on time-trends and recent exposure levels, obtained by HBM4EU as well as by associated studies in a total of seven European countries. Mean biomarker levels were compared by sampling year and time-trends were analyzed using linear regression models and an adequate statistical test. An increasing trend of acrylamide biomarker concentrations was found in children for the years 2014–2017, while in adults an overall increase in exposure was found to be not significant for the time period of observation (2000–2021). For smokers, represented by two studies and sampling for, over a total three years, no clear tendency was observed. In conclusion, samples from European countries indicate that average acrylamide exposure still exceeds suggested benchmark levels and may be of specific concern in children. More research is required to confirm trends of declining values observed in most recent years.This work received external funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 733032 and received co-funding from the author’s organizations. The Norwegian Institute of Public Health (NIPH) contributed to the funding of the Norwegian Environmental Biobank (NEB). The laboratory measurements were partly funded by the Research Council of Norway through research projects (275903 and 268465).info:eu-repo/semantics/publishedVersio

    Time Trends of Acrylamide Exposure in Europe: Combined Analysis of Published Reports and Current HBM4EU Studies

    Get PDF
    This article belongs to the Special Issue Analysis of Human Biomonitoring Data and Risk Assessment of Human Exposure to Environmental Chemicals: What Do We Learn for Prevention?More than 20 years ago, acrylamide was added to the list of potential carcinogens found in many common dietary products and tobacco smoke. Consequently, human biomonitoring studies investigating exposure to acrylamide in the form of adducts in blood and metabolites in urine have been performed to obtain data on the actual burden in different populations of the world and in Europe. Recognizing the related health risk, the European Commission responded with measures to curb the acrylamide content in food products. In 2017, a trans-European human biomonitoring project (HBM4EU) was started with the aim to investigate exposure to several chemicals, including acrylamide. Here we set out to provide a combined analysis of previous and current European acrylamide biomonitoring study results by harmonizing and integrating different data sources, including HBM4EU aligned studies, with the aim to resolve overall and current time trends of acrylamide exposure in Europe. Data from 10 European countries were included in the analysis, comprising more than 5500 individual samples (3214 children and teenagers, 2293 adults). We utilized linear models as well as a non-linear fit and breakpoint analysis to investigate trends in temporal acrylamide exposure as well as descriptive statistics and statistical tests to validate findings. Our results indicate an overall increase in acrylamide exposure between the years 2001 and 2017. Studies with samples collected after 2018 focusing on adults do not indicate increasing exposure but show declining values. Regional differences appear to affect absolute values, but not the overall time-trend of exposure. As benchmark levels for acrylamide content in food have been adopted in Europe in 2018, our results may imply the effects of these measures, but only indicated for adults, as corresponding data are still missing for children.This work has received external funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 733032 and received co-funding from the author’s organizations. The Norwegian Institute of Public Health (NIPH) has contributed to the funding of the Norwegian Environmental Biobank (NEB). The laboratory measurements have partly been funded by the Research Council of Norway through research projects (275903 and 268465).info:eu-repo/semantics/publishedVersio

    Above- and belowground biodiversity jointly tighten the P cycle in agricultural grasslands

    Get PDF
    Experiments showed that biodiversity increases grassland productivity and nutrient exploitation, potentially reducing fertiliser needs. Enhancing biodiversity could improve P-use efficiency of grasslands, which is beneficial given that rock-derived P fertilisers are expected to become scarce in the future. Here, we show in a biodiversity experiment that more diverse plant communities were able to exploit P resources more completely than less diverse ones. In the agricultural grasslands that we studied, management effects either overruled or modified the driving role of plant diversity observed in the biodiversity experiment. Nevertheless, we show that greater above- (plants) and belowground (mycorrhizal fungi) biodiversity contributed to tightening the P cycle in agricultural grasslands, as reduced management intensity and the associated increased biodiversity fostered the exploitation of P resources. Our results demonstrate that promoting a high above- and belowground biodiversity has ecological (biodiversity protection) and economical (fertiliser savings) benefits. Such win-win situations for farmers and biodiversity are crucial to convince farmers of the benefits of biodiversity and thus counteract global biodiversity loss
    corecore