Search CORE

12,083 research outputs found

Methods for generating and evaluating synthetic longitudinal patient data: a systematic review

Author: Auranen Kari
Perkonoja Katariina
Virta Joni
Publication venue
Publication date: 21/09/2023
Field of study

The proliferation of data in recent years has led to the advancement and utilization of various statistical and deep learning techniques, thus expediting research and development activities. However, not all industries have benefited equally from the surge in data availability, partly due to legal restrictions on data usage and privacy regulations, such as in medicine. To address this issue, various statistical disclosure and privacy-preserving methods have been proposed, including the use of synthetic data generation. Synthetic data are generated based on some existing data, with the aim of replicating them as closely as possible and acting as a proxy for real sensitive data. This paper presents a systematic review of methods for generating and evaluating synthetic longitudinal patient data, a prevalent data type in medicine. The review adheres to the PRISMA guidelines and covers literature from five databases until the end of 2022. The paper describes 17 methods, ranging from traditional simulation techniques to modern deep learning methods. The collected information includes, but is not limited to, method type, source code availability, and approaches used to assess resemblance, utility, and privacy. Furthermore, the paper discusses practical guidelines and key considerations for developing synthetic longitudinal data generation methods

arXiv.org e-Print Archive

Historical collaborative geocoding

Author: Abadie Nathalie
Costes Benoit
Cura Rémi
Dumenieu Bertrand
Gribaudi Maurizio
Perret Julien
Publication venue
Publication date: 01/01/2018
Field of study

The latest developments in digital have provided large data sets that can increasingly easily be accessed and used. These data sets often contain indirect localisation information, such as historical addresses. Historical geocoding is the process of transforming the indirect localisation information to direct localisation that can be placed on a map, which enables spatial analysis and cross-referencing. Many efficient geocoders exist for current addresses, but they do not deal with the temporal aspect and are based on a strict hierarchy (..., city, street, house number) that is hard or impossible to use with historical data. Indeed historical data are full of uncertainties (temporal aspect, semantic aspect, spatial precision, confidence in historical source, ...) that can not be resolved, as there is no way to go back in time to check. We propose an open source, open data, extensible solution for geocoding that is based on the building of gazetteers composed of geohistorical objects extracted from historical topographical maps. Once the gazetteers are available, geocoding an historical address is a matter of finding the geohistorical object in the gazetteers that is the best match to the historical address. The matching criteriae are customisable and include several dimensions (fuzzy semantic, fuzzy temporal, scale, spatial precision ...). As the goal is to facilitate historical work, we also propose web-based user interfaces that help geocode (one address or batch mode) and display over current or historical topographical maps, so that they can be checked and collaboratively edited. The system is tested on Paris city for the 19-20th centuries, shows high returns rate and is fast enough to be used interactively.Comment: WORKING PAPE

arXiv.org e-Print Archive

Directory of Open Access Journals

HAL-Rennes 1

Characterizing Seismicity in Alberta for Induced-Seismicity Applications

Author: Cui Luqi
Publication venue: Scholarship@Western
Publication date: 13/10/2015
Field of study

This report documents the compilation of a high-quality catalog of earthquakes in Alberta and the surrounding region: the Composite Alberta Seismicity Catalog (CASC). It currently includes events through July 2015. The catalog and its documentation are available for download at www.inducedseismicity.ca. For the determination of the magnitude of completeness (Mc) of the catalog, we map Mc (xi, yi, t) across a grid of the region, where xi and yi represent the longitude and latitude of center nodes in the grid and t indicates time period. The empirical relation determined from the catalog and station data is of the form Mc(D4) = aD4+c, where D4 is the distance from (xi, yi) to the fourth-nearest station. Seven Mcmaps are created to represent spatial variations of Mc from 1985 to 2015. Based on the derived Mc maps, we estimate the equivalent rate of occurrences of M ≥3 earthquakes in various grids

Scholarship@Western

Injecting equipment schemes for injecting drug users : qualitative evidence review

Author: Akhionbare Kate
Bagnall Anne-Marie
Burrell Kim
Cattan Mima
Publication venue: 'Editora Unicentro'
Publication date: 01/02/2009
Field of study

This review of the qualitative literature about needle and syringe programmes (NSPs) for injecting drug users (IDUs) complements the review of effectiveness and cost-effectiveness. It aims to provide a more situated narrative perspective on the overall guidance questions

Northumbria University Research Portal

Econometrics meets sentiment : an overview of methodology and applications

Author: Algaba Andres
Ardia David
Bluteau Keven
Borms Samuel
Boudt Kris
Publication venue: 'Wiley'
Publication date: 01/01/2020
Field of study

The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

VU Research Portal

Crossref

Ghent University Academic Bibliography

DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication

Author: Abraham Danny
Givargis Tony
Heddes Mike
Nicolau Alexandru
Nunes Igor
Veidenbaum Alexander
Vergés Pere
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/05/2023
Field of study

Metrics for set similarity are a core aspect of several data mining tasks. To remove duplicate results in a Web search, for example, a common approach looks at the Jaccard index between all pairs of pages. In social network analysis, a much-celebrated metric is the Adamic-Adar index, widely used to compare node neighborhood sets in the important problem of predicting links. However, with the increasing amount of data to be processed, calculating the exact similarity between all pairs can be intractable. The challenge of working at this scale has motivated research into efficient estimators for set similarity metrics. The two most popular estimators, MinHash and SimHash, are indeed used in applications such as document deduplication and recommender systems where large volumes of data need to be processed. Given the importance of these tasks, the demand for advancing estimators is evident. We propose DotHash, an unbiased estimator for the intersection size of two sets. DotHash can be used to estimate the Jaccard index and, to the best of our knowledge, is the first method that can also estimate the Adamic-Adar index and a family of related metrics. We formally define this family of metrics, provide theoretical bounds on the probability of estimate errors, and analyze its empirical performance. Our experimental results indicate that DotHash is more accurate than the other estimators in link prediction and detecting duplicate documents with the same complexity and similar comparison time

arXiv.org e-Print Archive