68,744 research outputs found

    Towards information profiling: data lake content metadata management

    Get PDF
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

    Rough Set Based Approach for IMT Automatic Estimation

    Get PDF
    Carotid artery (CA) intima-media thickness (IMT) is commonly deemed as one of the risk marker for cardiovascular diseases. The automatic estimation of the IMT on ultrasound images is based on the correct identification of the lumen-intima (LI) and media-adventitia (MA) interfaces. This task is complicated by noise, vessel morphology and pathology of the carotid artery. In a previous study we applied four non-linear methods for feature selection on a set of variables extracted from ultrasound carotid images. The main aim was to select those parameters containing the highest amount of information useful to classify the image pixels in the carotid regions they belong to. In this study we present a pixel classifier based on the selected features. Once the pixels classification was correctly performed, the IMT was evaluated and compared with two sets of manual-traced profiles. The results showed that the automatic IMTs are not statistically different from the manual one

    Location attractiveness - is ITS becoming a high-ranked factor?

    Get PDF
    The development of Intelligent Transport Systems (ITS) has taken a leap in the past decade. Under strong influence of improved Information and Communication Technology (ICT) industries, automotive suppliers and scientific institutes have put much effort on developing a range of ICT based applications for vehicles to drive safer, more comfortable, to make more efficient use of current and future infrastructure and to manage fleets more accurately. These improvements in transport services might improve the attractiveness of nearby locations. These locations (office, residential, leisure zones etcetera), might attract more activity as they appear to benefit from increased accessibility. Therefore, the expectation that ITS concepts will, in the long term, have significant spatial effect on the location pattern of, in particular, office keeping organisations, is plausible. This paper focuses on the impact of ITS concepts on location preferences of office keeping organisations. To measure this impact a stated preference experiment has been conducted in the Netherlands and involves office keeping organisations in selected city regions. The paper describes the first results of a model describing the attractiveness of location profiles, which are based on location preference attributes, and the role of ITS in these profiles. Three ITS concepts, which are selected and based on previous research are introduced as ‘new’ attributes within the location profiles. The estimated model was used to test two hypotheses. The first hypothesis is that the introduction of these ITS attributes will change the preferences of office keeping organisations regarding locations. The second hypothesis is that if preferences will change, the ITS attributes have a significant contribution to the preference model; at least for some categories of organisations. Further, the paper describes in what cases we should accept or reject these hypotheses. Finally, some conclusions are drawn on the role of ITS in location attractiveness and the validation tools which are available to validate the preference model.

    Machine Learning of User Profiles: Representational Issues

    Full text link
    As more information becomes available electronically, tools for finding information of interest to users becomes increasingly important. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. The research described here focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser), demonstrate the importance of a generalization hierarchy and the promise of combining natural language processing techniques with machine learning (ML) to address an information retrieval (IR) problem.Comment: 6 page

    All liaisons are dangerous when all your friends are known to us

    Get PDF
    Online Social Networks (OSNs) are used by millions of users worldwide. Academically speaking, there is little doubt about the usefulness of demographic studies conducted on OSNs and, hence, methods to label unknown users from small labeled samples are very useful. However, from the general public point of view, this can be a serious privacy concern. Thus, both topics are tackled in this paper: First, a new algorithm to perform user profiling in social networks is described, and its performance is reported and discussed. Secondly, the experiments --conducted on information usually considered sensitive-- reveal that by just publicizing one's contacts privacy is at risk and, thus, measures to minimize privacy leaks due to social graph data mining are outlined.Comment: 10 pages, 5 table

    Combining multiscale features for classification of hyperspectral images: a sequence based kernel approach

    Get PDF
    Nowadays, hyperspectral image classification widely copes with spatial information to improve accuracy. One of the most popular way to integrate such information is to extract hierarchical features from a multiscale segmentation. In the classification context, the extracted features are commonly concatenated into a long vector (also called stacked vector), on which is applied a conventional vector-based machine learning technique (e.g. SVM with Gaussian kernel). In this paper, we rather propose to use a sequence structured kernel: the spectrum kernel. We show that the conventional stacked vector-based kernel is actually a special case of this kernel. Experiments conducted on various publicly available hyperspectral datasets illustrate the improvement of the proposed kernel w.r.t. conventional ones using the same hierarchical spatial features.Comment: 8th IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS 2016), UCLA in Los Angeles, California, U.

    Form and function in hillslope hydrology : in situ imaging and characterization of flow-relevant structures

    Get PDF
    Thanks to Elly Karle and the Engler-BunteInstitute, KIT, for the IC measurements of bromide. We are grateful to Selina Baldauf, Marcel Delock, Razije Fiden, Barbara Herbstritt, Lisei Köhn, Jonas Lanz, Francois Nyobeu, Marvin Reich and Begona Lorente Sistiaga for their support in the lab and during fieldwork, as well as Markus Morgner and Jean Francois Iffly for technical support and Britta Kattenstroth for hydrometeorological data acquisition. Laurent Pfister and Jean-Francois Iffly from the Luxembourg Institute of Science and Technology (LIST) are acknowledged for organizing the permissions for the experiments. Moreover, we thank Markus Weiler (University of Freiburg) for his strong support during the planning of the hillslope experiment and the preparation of the manuscript. This study is part of the DFG-funded CAOS project “From Catchments as Organised Systems to Models based on Dynamic Functional Units” (FOR 1598). The manuscript was substantially improved based on the critical and constructive comments of the anonymous reviewers, Christian Stamm and Alexander Zimmermann, and the editor Ross Woods during the open review process, which is highly appreciated.Peer reviewedPublisher PD

    Predicting Multi-class Customer Profiles Based on Transactions: a Case Study in Food Sales

    Get PDF
    Predicting the class of a customer profile is a key task in marketing, which enables businesses to approach the right customer with the right product at the right time through the right channel to satisfy the customer's evolving needs. However, due to costs, privacy and/or data protection, only the business' owned transactional data is typically available for constructing customer profiles. Predicting the class of customer profiles based on such data is challenging, as the data tends to be very large, heavily sparse and highly skewed. We present a new approach that is designed to efficiently and accurately handle the multi-class classification of customer profiles built using sparse and skewed transactional data. Our approach first bins the customer profiles on the basis of the number of items transacted. The discovered bins are then partitioned and prototypes within each of the discovered bins selected to build the multi-class classifier models. The results obtained from using four multi-class classifiers on real-world transactional data from the food sales domain consistently show the critical numbers of items at which the predictive performance of customer profiles can be substantially improved
    corecore