9,002 research outputs found

    Towards information profiling: data lake content metadata management

    Get PDF
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

    Keeping the data lake in form: DS-kNN datasets categorization using proximity mining

    Get PDF
    With the growth of the number of datasets stored in data repositories, there has been a trend of using Data Lakes (DLs) to store such data. DLs store datasets in their raw formats without any transformations or preprocessing, with accessibility available using schema-on-read. This makes it difficult for analysts to find datasets that can be crossed and that belong to the same topic. To support them in this DL governance challenge, we propose in this paper an algorithm for categorizing datasets in the DL into pre-defined topic-wise categories of interest. We utilise a k-NN approach for this task which uses a proximity score for computing similarities of datasets based on metadata. We test our algorithm on a real-life DL with a known ground-truth categorization. Our approach is successful in detecting the correct categories for datasets and outliers with a precision of more than 90% and recall rates exceeding 75% in specific settings.Peer ReviewedPostprint (author's final draft

    A feasibility test of an online intervention to prevention dating violence in emerging adults

    Get PDF
    Dating violence in emerging adults is a significant problem and few prevention programs based on the developmental needs of this age group have been developed. Our research team developed an online dating violence prevention program called WISER (Writing to Improve Self-in-Relationships) for emerging adults. The program is based on narrative therapy principles and uses structured writing techniques. A single group pre-post feasibility test of WISER was conducted with 14 college women. WISER was demonstrated to be feasible and acceptable and to show promise as an effective program to decrease dating violence in this population

    Perturbating intramolecular hydrogen bonds through substituent effects or non-covalent interactions

    Full text link
    An analysis of the effects induced by F, Cl, and Br-substituents at the α-position of both, the hydroxyl or the amino group for a series of amino-alcohols, HOCH2(CH2)nCH2NH2 (n = 0–5) on the strength and characteristics of their OH···N or NH···O intramolecular hydrogen bonds (IMHBs) was carried out through the use of high-level G4 ab initio calculations. For the parent unsubstituted amino-alcohols, it is found that the strength of the OH···N IMHB goes through a maximum for n = 2, as revealed by the use of appropriate isodesmic reactions, natural bond orbital (NBO) analysis and atoms in molecules (AIM), and non-covalent interaction (NCI) procedures. The corresponding infrared (IR) spectra also reflect the same trends. When the α-position to the hydroxyl group is substituted by halogen atoms, the OH···N IMHB significantly reinforces following the trend H 2(CH2)nCH2NH2 (n = 0–3) interact with BeF2. Although the presence of the beryllium derivative dramatically increases the strength of the IMHBs, the possibility for the beryllium atom to interact simultaneously with the O and the N atoms of the amino-alcohol leads to the global minimum of the potential energy surface, with the result that the IMHBs are replaced by two beryllium bonds

    The effect of seasoning with herbs on the nutritional, safety and sensory properties of reduced-sodium fermented Cobrançosa cv. table olives

    Get PDF
    This study aimed at evaluating the effectiveness of seasoning Cobrancosa table olives in a brine with aromatic ingredients, in order to mask the bitter taste given by KCl when added to reduced-sodium fermentation brines. Olives were fermented in two different salt combinations: Brine A, containing 8% NaCl and, Brine B, a reduced-sodium brine, containing 4% NaCl + 4% KCl. After the fermentation the olives were immersed in seasoning brines with NaCl (2%) and the aromatic herbs (thyme, oregano and calamintha), garlic and lemon. At the end of the fermentation and two weeks after seasoning, the physicochemical, nutritional, organoleptic, and microbiological parameters, were determined. The olives fermented in the reduced-sodium brines had half the sodium concentration, higher potassium and calcium content, a lower caloric level, but were considered, by a sensorial panel, more bitter than olives fermented in NaCl brine. Seasoned table olives, previously fermented in Brine A and Brine B, had no significant differences in the amounts of protein (1.23% or 1.11%), carbohydrates (1.0% or 0.66%), fat (20.0% or 20.5%) and dietary fiber (3.4% or 3.6%). Regarding mineral contents, the sodium-reduced fermented olives, presented one third of sodium, seven times more potassium and three times more calcium than the traditional olives fermented in 8% NaCl. Additionally, according to the panelists' evaluation, seasoning the olives fermented in 4% NaCl + 4% KCl, resulted in a decrease in bitterness and an improvement in the overall evaluation and flavor. Escherichia coli and Salmonella were not found in the olives produced.info:eu-repo/semantics/publishedVersio

    Cognitively-inspired Agent-based Service Composition for Mobile & Pervasive Computing

    Full text link
    Automatic service composition in mobile and pervasive computing faces many challenges due to the complex and highly dynamic nature of the environment. Common approaches consider service composition as a decision problem whose solution is usually addressed from optimization perspectives which are not feasible in practice due to the intractability of the problem, limited computational resources of smart devices, service host's mobility, and time constraints to tailor composition plans. Thus, our main contribution is the development of a cognitively-inspired agent-based service composition model focused on bounded rationality rather than optimality, which allows the system to compensate for limited resources by selectively filtering out continuous streams of data. Our approach exhibits features such as distributedness, modularity, emergent global functionality, and robustness, which endow it with capabilities to perform decentralized service composition by orchestrating manifold service providers and conflicting goals from multiple users. The evaluation of our approach shows promising results when compared against state-of-the-art service composition models.Comment: This paper will appear on AIMS'19 (International Conference on Artificial Intelligence and Mobile Services) on June 2

    Latent Space Model for Multi-Modal Social Data

    Full text link
    With the emergence of social networking services, researchers enjoy the increasing availability of large-scale heterogenous datasets capturing online user interactions and behaviors. Traditional analysis of techno-social systems data has focused mainly on describing either the dynamics of social interactions, or the attributes and behaviors of the users. However, overwhelming empirical evidence suggests that the two dimensions affect one another, and therefore they should be jointly modeled and analyzed in a multi-modal framework. The benefits of such an approach include the ability to build better predictive models, leveraging social network information as well as user behavioral signals. To this purpose, here we propose the Constrained Latent Space Model (CLSM), a generalized framework that combines Mixed Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA) incorporating a constraint that forces the latent space to concurrently describe the multiple data modalities. We derive an efficient inference algorithm based on Variational Expectation Maximization that has a computational cost linear in the size of the network, thus making it feasible to analyze massive social datasets. We validate the proposed framework on two problems: prediction of social interactions from user attributes and behaviors, and behavior prediction exploiting network information. We perform experiments with a variety of multi-modal social systems, spanning location-based social networks (Gowalla), social media services (Instagram, Orkut), e-commerce and review sites (Amazon, Ciao), and finally citation networks (Cora). The results indicate significant improvement in prediction accuracy over state of the art methods, and demonstrate the flexibility of the proposed approach for addressing a variety of different learning problems commonly occurring with multi-modal social data.Comment: 12 pages, 7 figures, 2 table

    Thin films for advanced glazing applications

    Get PDF
    © 2016 by the authors.Functional thin films provide many opportunities for advanced glazing systems. This can be achieved by adding additional functionalities such as self-cleaning or power generation, or alternately by providing energy demand reduction through the management or modulation of solar heat gain or blackbody radiation using spectrally selective films or chromogenic materials. Self-cleaning materials have been generating increasing interest for the past two decades. They may be based on hydrophobic or hydrophilic systems and are often inspired by nature, for example hydrophobic systems based on mimicking the lotus leaf. These materials help to maintain the aesthetic properties of the building, help to maintain a comfortable working environment and in the case of photocatalytic materials, may provide external pollutant remediation. Power generation through window coatings is a relatively new idea and is based around the use of semi-transparent solar cells as windows. In this fashion, energy can be generated whilst also absorbing some solar heat. There is also the possibility, in the case of dye sensitized solar cells, to tune the coloration of the window that provides unheralded external aesthetic possibilities. Materials and coatings for energy demand reduction is highly desirable in an increasingly energy intensive world. We discuss new developments with low emissivity coatings as the need to replace scarce indium becomes more apparent. We go on to discuss thermochromic systems based on vanadium dioxide films. Such systems are dynamic in nature and present a more sophisticated and potentially more beneficial approach to reducing energy demand than static systems such as low emissivity and solar control coatings. The ability to be able to tune some of the material parameters in order to optimize the film performance for a given climate provides exciting opportunities for future technologies. In this article, we review recent progress and challenges in these areas and provide a perspective for future trends and developments.Işıl Top thanks TUBITAK for the provision of funding for a studentship. Shuqun Chen thanks the China Scholarship Council for the provision of a studentship

    DS-Prox : dataset proximity mining for governing the data lake

    Get PDF
    With the arrival of Data Lakes (DL) there is an increasing need for efficient dataset classification to support data analysis and information retrieval. Our goal is to use meta-features describing datasets to detect whether they are similar. We utilise a novel proximity mining approach to assess the similarity of datasets. The proximity scores are used as an efficient first step, where pairs of datasets with high proximity are selected for further time-consuming schema matching and deduplication. The proposed approach helps in early-pruning unnecessary computations, thus improving the efficiency of similar-schema search. We evaluate our approach in experiments using the OpenML online DL, which shows significant efficiency gains above 25% compared to matching without early-pruning, and recall rates reaching higher than 90% under certain scenarios.Peer ReviewedPostprint (author's final draft
    corecore