469 research outputs found

    QueRIE: Collaborative Database Exploration

    Get PDF
    Interactive database exploration is a key task in information mining. However, users who lack SQL expertise or familiarity with the database schema face great difficulties in performing this task. To aid these users, we developed the QueRIE system for personalized query recommendations. QueRIE continuously monitors the user’s querying behavior and finds matching patterns in the system’s query log, in an attempt to identify previous users with similar information needs. Subsequently, QueRIE uses these “similar” users and their queries to recommend queries that the current user may find interesting. In this work we describe an instantiation of the QueRIE framework, where the active user’s session is represented by a set of query fragments. The recorded fragments are used to identify similar query fragments in the previously recorded sessions, which are in turn assembled in potentially interesting queries for the active user. We show through experimentation that the proposed method generates meaningful recommendations on real-life traces from the SkyServer database and propose a scalable design that enables the incremental update of similarities, making real-time computations on large amounts of data feasible. Finally, we compare this fragment-based instantiation with our previously proposed tuple-based instantiation discussing the advantages and disadvantages of each approach

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

    A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images

    Full text link
    Semantic segmentation is the pixel-wise labelling of an image. Since the problem is defined at the pixel level, determining image class labels only is not acceptable, but localising them at the original image pixel resolution is necessary. Boosted by the extraordinary ability of convolutional neural networks (CNN) in creating semantic, high level and hierarchical image features; excessive numbers of deep learning-based 2D semantic segmentation approaches have been proposed within the last decade. In this survey, we mainly focus on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images. We started with an analysis of the public image sets and leaderboards for 2D semantic segmantation, with an overview of the techniques employed in performance evaluation. In examining the evolution of the field, we chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era. We technically analysed the solutions put forward in terms of solving the fundamental problems of the field, such as fine-grained localisation and scale invariance. Before drawing our conclusions, we present a table of methods from all mentioned eras, with a brief summary of each approach that explains their contribution to the field. We conclude the survey by discussing the current challenges of the field and to what extent they have been solved.Comment: Updated with new studie

    Design and analysis of algorithms for similarity search based on intrinsic dimension

    Get PDF
    One of the most fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, is that of similarity search. It has been used in numerous fields of application such as multimedia, information retrieval, recommender systems and pattern recognition. Specifically, a similarity query aims to retrieve from the database the most similar objects to a query object, where the underlying similarity measure is usually expressed as a distance function. The cost of processing similarity queries has been typically assessed in terms of the representational dimension of the data involved, that is, the number of features used to represent individual data objects. It is generally the case that high representational dimension would result in a significant increase in the processing cost of similarity queries. This relation is often attributed to an amalgamation of phenomena, collectively referred to as the curse of dimensionality. However, the observed effects of dimensionality in practice may not be as severe as expected. This has led to the development of models quantifying the complexity of data in terms of some measure of the intrinsic dimensionality. The generalized expansion dimension (GED) is one of such models, which estimates the intrinsic dimension in the vicinity of a query point q through the observation of the ranks and distances of pairs of neighbors with respect to q. This dissertation is mainly concerned with the design and analysis of search algorithms, based on the GED model. In particular, three variants of similarity search problem are considered, including adaptive similarity search, flexible aggregate similarity search, and subspace similarity search. The good practical performance of the proposed algorithms demonstrates the effectiveness of dimensionality-driven design of search algorithms

    Serum proteomics to detect early changes in type 1 diabetes and carotid atherosclerosis

    Get PDF
    The detection of early markers is the key issue in predicting the outcome of inflammatory diseases such as type 1 diabetes and atherosclerosis. Whilst biochemical testing approaches have improved prediction of inflammatory diseases, validated biomarkers with better diagnostic specificities are still needed. Currently, majority of the disease-related proteomics studies have focused on their endpoints. The work presented in this thesis includes the first comprehensive proteomics analyses on serum samples collected from two unique Finnish longitudinal cohorts, namely The Diabetes Prediction and Prevention Project (DIPP) and The Cardiovascular Risk in Young Finns Study (YFS), to identify early markers associated with type 1 diabetes and carotid atherosclerosis. Using mass spectrometry (MS)-based quantitative serum proteomics, profiling was carried out to the study temporal variation in pre-diabetic samples and early markers of plaque formation with the T1D and YFS cohorts, respectively. The analyses revealed consistent differences in the abundance of a number of proteins in subjects having an ongoing asymptomatic changes, several of which are functionally relevant to the disease process. Taken together, the discovered markers are candidates for further validation studies in an independent cohorts and may be used to characterize an increased risk, progression and early onset of these diseases.Tyypin 1 diabeteksen ja ateroskleroosin kehittymiseen liittyvät varhaiset muutokset seerumiproteomissa Yksi keskeinen haaste tulehduksellisten sairauksien, kuten tyypin 1 diabeteksen ja ateroskleroosin, ennustamisessa on varhaisten tautimarkkerien löytäminen. Vaikka erilaiset biokemialliset testit ovat jo parantaneet tulehdusperäisten sairauksien ennustamista, uusia tarkempia biomarkkereita tarvitaan edelleen. Tästä huolimatta monissa näiden alojen proteomiikkatöissä on nykyisin keskitytty sairastumishetken tutkimiseen. Tämän väitöskirjatyön aikana olemme tehneet laajamittaiset proteomiikka-analyysit seeruminäytteille, jotka on kerätty osana kahta ainutlaatuista suomalaista seurantatutkimusta: DIPP-tutkimusta (tyypin 1 diabeteksen ennustaminen ja ennaltaehkäisy) ja YFS-tutkimusta (sydän- ja verisuonitautien riski nuorilla suomalaisilla). Näissä tutkimuksissa seerumiproteomiikkaa hyödynnettiin ensimmistä kertaa varhaisten tyypin 1 diabetes- ja ateroskleroosimarkkerien etsimiseen. Tutkimme tyypin 1 diabeteksen kehittymiseen ja ateroskleroottisten plakkien muodostumiseen liittyviä muutoksia seerumin proteomiprofiileissa massaspektrometriaan perustuvan kvantitatiivisen proteomiikan avulla. Nämä analyysit paljastivat johdonmukaisia eroja lukuisissa proteiineissa myöhemmin sairastuneiden oireettomien henkilöiden ja terveinä pysyneiden kontrollien välillä. Monet näistä proteiineista saattavat myös liittyä olennaisesti tautien kehittymiseen. Tutkimuksissamme löydetyt markkerit tarjoavat lähtökohdan tuleville validointitutkimuksille, ja niitä voitaisiin tulevaisuudessa käyttää yksilön kohonneen sairastumisriskin, taudin etenemisen sekä taudin varhaisen puhkeamisen kartoittamiseen
    corecore