469 research outputs found
QueRIE: Collaborative Database Exploration
Interactive database exploration is a key task in information mining. However, users who lack SQL expertise or familiarity with the database schema face great difficulties in performing this task. To aid these users, we developed the QueRIE system for personalized query recommendations. QueRIE continuously monitors the user’s querying behavior and finds matching patterns in the system’s query log, in an attempt to identify previous users with similar information needs. Subsequently, QueRIE uses these “similar” users and their queries to recommend queries that the current user may find interesting. In this work we describe an instantiation of the QueRIE framework, where the active user’s session is represented by a set of query fragments. The recorded fragments are used to identify similar query fragments in the previously recorded sessions, which are in turn assembled in potentially interesting queries for the active user. We show through experimentation that the proposed method generates meaningful recommendations on real-life traces from the SkyServer database and propose a scalable design that enables the incremental update of similarities, making real-time computations on large amounts of data feasible. Finally, we compare this fragment-based instantiation with our previously proposed tuple-based instantiation discussing the advantages and disadvantages of each approach
IDEAS-1997-2021-Final-Programs
This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)
Recommended from our members
Mining High Impact Combinations of Conditions from the Medical Expenditure Panel Survey
The condition of multimorbidity — the presence of two or more medical conditions in an individual — is a growing phenomenon worldwide. In the United States, multimorbid patients represent more than a third of the population and the trend is steadily increasing in an already aging population. There is thus a pressing need to understand the patterns in which multimorbidity occurs, and to better understand the nature of the care that is required to be provided to such patients.
In this thesis, we use data from the Medical Expenditure Panel Survey (MEPS) from the years 2011 to 2015 to identify combinations of multiple chronic conditions (MCCs). We first quantify the significant heterogeneity observed in these combinations and how often they are observed across the five years. Next, using two criteria associated with each combination -- (a) the annual prevalence and (b) the annual median expenditure -- along with the concept of non-dominated Pareto fronts, we determine the degree of impact each combination has on the healthcare system. Our analysis reveals that combinations of four or more conditions are often mixtures of diseases that belong to different clinically meaningful groupings such as the metabolic disorders (diabetes, hypertension, hyperlipidemia); musculoskeletal conditions (osteoarthritis, spondylosis, back problems etc.); respiratory disorders (asthma, COPD etc.); heart conditions (atherosclerosis, myocardial infarction); and mental health conditions (anxiety disorders, depression etc.).
Next, we use unsupervised learning techniques such as association rule mining and hierarchical clustering to visually explore the strength of the relationships/associations between different conditions and condition groupings. This interactive framework allows epidemiologists and clinicians (in particular primary care physicians) to have a systematic approach to understand the relationships between conditions and build a strategy with regards to screening, diagnosis and treatment over a longer term, especially for individuals at risk for more complications. The findings from this study aim to create a foundation for future work where a more holistic view of multimorbidity is possible
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Semantic segmentation is the pixel-wise labelling of an image. Since the
problem is defined at the pixel level, determining image class labels only is
not acceptable, but localising them at the original image pixel resolution is
necessary. Boosted by the extraordinary ability of convolutional neural
networks (CNN) in creating semantic, high level and hierarchical image
features; excessive numbers of deep learning-based 2D semantic segmentation
approaches have been proposed within the last decade. In this survey, we mainly
focus on the recent scientific developments in semantic segmentation,
specifically on deep learning-based methods using 2D images. We started with an
analysis of the public image sets and leaderboards for 2D semantic
segmantation, with an overview of the techniques employed in performance
evaluation. In examining the evolution of the field, we chronologically
categorised the approaches into three main periods, namely pre-and early deep
learning era, the fully convolutional era, and the post-FCN era. We technically
analysed the solutions put forward in terms of solving the fundamental problems
of the field, such as fine-grained localisation and scale invariance. Before
drawing our conclusions, we present a table of methods from all mentioned eras,
with a brief summary of each approach that explains their contribution to the
field. We conclude the survey by discussing the current challenges of the field
and to what extent they have been solved.Comment: Updated with new studie
Design and analysis of algorithms for similarity search based on intrinsic dimension
One of the most fundamental operations employed in data mining tasks such as classification, cluster analysis, and anomaly detection, is that of similarity search. It has been used in numerous fields of application such as multimedia, information retrieval, recommender systems and pattern recognition. Specifically, a similarity query aims to retrieve from the database the most similar objects to a query object, where the underlying similarity measure is usually expressed as a distance function.
The cost of processing similarity queries has been typically assessed in terms of the representational dimension of the data involved, that is, the number of features used to represent individual data objects. It is generally the case that high representational dimension would result in a significant increase in the processing cost of similarity queries. This relation is often attributed to an amalgamation of phenomena, collectively referred to as the curse of dimensionality. However, the observed effects of dimensionality in practice may not be as severe as expected. This has led to the development of models quantifying the complexity of data in terms of some measure of the intrinsic dimensionality.
The generalized expansion dimension (GED) is one of such models, which estimates the intrinsic dimension in the vicinity of a query point q through the observation of the ranks and distances of pairs of neighbors with respect to q. This dissertation is mainly concerned with the design and analysis of search algorithms, based on the GED model. In particular, three variants of similarity search problem are considered, including adaptive similarity search, flexible aggregate similarity search, and subspace similarity search. The good practical performance of the proposed algorithms demonstrates the effectiveness of dimensionality-driven design of search algorithms
Serum proteomics to detect early changes in type 1 diabetes and carotid atherosclerosis
The detection of early markers is the key issue in predicting the outcome of inflammatory diseases such as type 1 diabetes and atherosclerosis. Whilst biochemical testing approaches have improved prediction of inflammatory diseases, validated biomarkers with better diagnostic specificities are still needed. Currently, majority of the disease-related proteomics studies have focused on their endpoints. The work presented in this thesis includes the first comprehensive proteomics analyses on serum samples collected from two unique Finnish longitudinal cohorts, namely The Diabetes Prediction and Prevention Project (DIPP) and The Cardiovascular Risk in Young Finns Study (YFS), to identify early markers associated with type 1 diabetes and carotid atherosclerosis.
Using mass spectrometry (MS)-based quantitative serum proteomics, profiling was carried out to the study temporal variation in pre-diabetic samples and early markers of plaque formation with the T1D and YFS cohorts, respectively. The analyses revealed consistent differences in the abundance of a number of proteins in subjects having an ongoing asymptomatic changes, several of which are functionally relevant to the disease process. Taken together, the discovered markers are candidates for further validation studies in an independent cohorts and may be used to characterize an increased risk, progression and early onset of these diseases.Tyypin 1 diabeteksen ja ateroskleroosin kehittymiseen liittyvät varhaiset muutokset seerumiproteomissa
Yksi keskeinen haaste tulehduksellisten sairauksien, kuten tyypin 1 diabeteksen ja ateroskleroosin, ennustamisessa on varhaisten tautimarkkerien löytäminen. Vaikka erilaiset biokemialliset testit ovat jo parantaneet tulehdusperäisten sairauksien ennustamista, uusia tarkempia biomarkkereita tarvitaan edelleen. Tästä huolimatta monissa näiden alojen proteomiikkatöissä on nykyisin keskitytty sairastumishetken tutkimiseen. Tämän väitöskirjatyön aikana olemme tehneet laajamittaiset proteomiikka-analyysit seeruminäytteille, jotka on kerätty osana kahta ainutlaatuista suomalaista seurantatutkimusta: DIPP-tutkimusta (tyypin 1 diabeteksen ennustaminen ja ennaltaehkäisy) ja YFS-tutkimusta (sydän- ja verisuonitautien riski nuorilla suomalaisilla). Näissä tutkimuksissa seerumiproteomiikkaa hyödynnettiin ensimmistä kertaa varhaisten tyypin 1 diabetes- ja ateroskleroosimarkkerien etsimiseen.
Tutkimme tyypin 1 diabeteksen kehittymiseen ja ateroskleroottisten plakkien muodostumiseen liittyviä muutoksia seerumin proteomiprofiileissa massaspektrometriaan perustuvan kvantitatiivisen proteomiikan avulla. Nämä analyysit paljastivat johdonmukaisia eroja lukuisissa proteiineissa myöhemmin sairastuneiden oireettomien henkilöiden ja terveinä pysyneiden kontrollien välillä. Monet näistä proteiineista saattavat myös liittyä olennaisesti tautien kehittymiseen. Tutkimuksissamme löydetyt markkerit tarjoavat lähtökohdan tuleville validointitutkimuksille, ja niitä voitaisiin tulevaisuudessa käyttää yksilön kohonneen sairastumisriskin, taudin etenemisen sekä taudin varhaisen puhkeamisen kartoittamiseen
- …