18 research outputs found

    Time-Sensitive Resource Re-allocation Strategy for Inter-Dependent Continuous Tasks

    Get PDF
    An increase in volumes of data and a shift towards live data enabled a stronger focus on resource intensive tasks which run continuously over long periods. A Grid has potential to offer the required resources for these tasks, while considering a fair and balanced allocation of resources among multiple client agents. Taking this into account, a Grid might be unwilling to allocate its resources for long time, leading to task interruptions. This problem becomes even more serious if an interruption of one task may lead to the interruption of dependent tasks. Here, we discuss a new strategy for resource re-allocation which is utilised by a client with the aim to prevent too long interruptions by re-allocating resources between its own tasks. Those re-allocations are suggested by a client agent, but only a Grid can re-allocate resources if agreed. Our strategy was tested under the different Grid settings, accounting for the adjusted coefficients, and demonstrated noticeable improvements in client utilities as compared to when it is not considered. Our experiment was also extended to tests with environmental modelling and realistic Grid resource simulation, grounded in real-life Grid studies. These tests have also shown a useful application of our strategy

    A Probabilistic Address Parser Using Conditional Random Fields and Stochastic Regular Grammar

    Get PDF
    Automatic semantic annotation of data from databases or the web is an important pre-process for data cleansing and record linkage. It can be used to resolve the problem of imperfect field alignment in a database or identify comparable fields for matching records from multiple sources. The annotation process is not trivial because data values may be noisy, such as abbreviations, variations or misspellings. In particular, overlapping features usually exist in a lexicon-based approach. In this work, we present a probabilistic address parser based on linear-chain conditional random fields (CRFs), which allow more expressive token-level features compared to hidden Markov models (HMMs). In additions, we also proposed two general enhancement techniques to improve the performance. One is taking original semi-structure of the data into account. Another is post-processing of the output sequences of the parser by combining its conditional probability and a score function, which is based on a learned stochastic regular grammar (SRG) that captures segment-level dependencies. Experiments were conducted by comparing the CRF parser to a HMM parser and a semi-Markov CRF parser in two real-world datasets. The CRF parser out-performed the HMM parser and the semiMarkov CRF in both datasets in terms of classification accuracy. Leveraging the structure of the data and combining the linear chain CRF with the SRG further improved the parser to achieve an accuracy of 97% on a postal dataset and 96% on a company dataset

    EpiGraphDB: a database and data mining platform for health data science

    Get PDF
    Motivation: The wealth of data resources on human phenotypes, risk factors, molecular traits and therapeutic interventions presents new opportunities for population health sciences. These opportunities are paralleled by a growing need for data integration, curation and mining to increase research efficiency, reduce mis-inference and ensure reproducible research. Results: We developed EpiGraphDB (https://epigraphdb.org/), a graph database containing an array of different biomedical and epidemiological relationships and an analytical platform to support their use in human population health data science. In addition, we present three case studies that illustrate the value of this platform. The first uses EpiGraphDB to evaluate potential pleiotropic relationships, addressing mis-inference in systematic causal analysis. In the second case study, we illustrate how protein-protein interaction data offer opportunities to identify new drug targets. The final case study integrates causal inference using Mendelian randomization with relationships mined from the biomedical literature to 'triangulate' evidence from different sources

    Entity Search/Match in Relational Databases

    Get PDF
    We study an entity search/match problem that requires retrieved tuples match to an input entity query. We assume the input queries are of the same type as the tuples in a materialised relational table. Existing keyword search over relational databases focuses on assembling tuples from a variety of relational tables in order to respond to a keyword query. The entity queries in this work differ from the keyword queries in two ways: (i) an entity query roughly refers to an entity that contains a number of attribute values, i.e. a product entity or an address entity; (ii) there might be redundant or incorrect information in the entity queries that could lead to misinterpretations of the queries. In this paper, we propose a transformation that first converts an unstructured entity query into a multi-valued structured query, and two retrieval methods are proposed to generate a set of candidate tuples from the database. The retrieval methods essentially formulate SQL queries against the database given the multi-valued structured query. The results of a comprehensive evaluation of a large-scale database (more than 29 millions tuples) and two real-world datasets showed that our methods have a good trade-off between generating correct candidates and the retrieval time compared to baseline approaches

    Improving Record Linkage Accuracy with Hierarchical Feature Level Information and Parsed Data

    Get PDF
    Probabilistic record linkage is a well established topic in the literature. Fellegi-Sunter probabilistic record linkage and its enhanced versions are commonly used methods, which calculate match and non- match weights for each pair of records. Bayesian network classifiers – naive Bayes classifier and TAN have also been successfully used here. Recently, an extended version of TAN (called ETAN) has been developed and proved superior in classification accuracy to conventional TAN. However, no previous work has applied ETAN to record linkage and investigated the benefits of using naturally existing hierarchical feature level information and parsed fields of the datasets. In this work, we ex- tend the naive Bayes classifier with such hierarchical feature level information. Finally we illustrate the benefits of our method over previously proposed methods on 4 datasets in terms of the linkage performance (F1 score). We also show the results can be further improved by evaluating the benefit provided by additionally parsing the fields of these datasets

    A compilation of student essays within condominium financing

    Get PDF
    Finansiering pÄ den svenska fastighetsmarknaden har sett en förÀndring frÄn innan finanskrisen till nu. Detta gÀller bÄde för privatpersoners finansiering av bostadsköp till fastighetsbolagens finansiering för projekt och tillvÀxt. Detta har skapat en förÀndring i vart man finner sitt kapital till finansiering. NÀr banker har gjort om sina utlÄningsvillkor med exempelvis bolÄnetak har privatpersoner, om de inte kan fÄ ihop till kapitalinsatsen, hittatandra lösningar sÄsom blancolÄn för att finansiera kvarvarande del. Fastighetsbolagen har ocksÄ genomgÄtt en utveckling kring finansiering dÀr man börjat utnyttja nya finansieringsmetoder men som ofta dÄ anvÀnds som komplement till det klassiska banklÄnet. Denna uppsats syftar till att undersöka vad som har skrivits om i studentuppsatser pÄ master och kandidatnivÄ för att se trender pÄ utvecklingen kring bostadsrÀttsfinansiering. Det som kommer analyseras i studien Àr dÀrmed studentuppsatser som skriver om Àmnet bostadsrÀttsfinansiering. En litteraturstudie har utförts för att sammanstÀlla uppsatserna för att fÄ en bra bild över de trender som har uppkommit under tidsspannet 2007-2019.SammanstÀllningen undersöker bÄde de finansieringsmetoder som nÀmns men ocksÄ olika hÄllbarhetsaspekter i uppsatserna för att se om det finns trender att analysera. Resultatet frÄn studien indikerar att hypotekslÄn har varit det mest omnÀmnda Àmnet i studentuppsatser över hela tidsspannet för konsumenter medans för företagen var den mest omnÀmnda metoden obligationer. De trender som man kan se frÄn resultatet Àr att under de senare Ären har det introducerats mÄnga nya finansieringsmetoder i uppsatserna, ett exempel pÄ detta kan vara gröna obligationer som blivit skrivits om tre gÄnger pÄ tvÄ senaste Ären. Detta kan tyda pÄ att ekofinansiering Àr en vÀxande trend i samhÀllet. Uppsatsernas fokus kring hÄllbarhetsaspekterna social hÄllbarhet, ekonomisk hÄllbarhet och ekologisk hÄllbarhet över tidsspannet som analyserades, visade resultatet en förÀndring över tid dÀr uppsatserna frÄn det tidiga skedet i tidsspannet nÀstan uteslutande handlade om ekonomiskt hÄllbarhet, uppsatserna skiftade sedan fokus mot en blandning av bÄde social och ekonomisk hÄllbarhet. Under de sista Ären har Àven den ekologiska delen av hÄllbarhet fÄtt fokus i vissa av uppsatserna.The Swedish real estate market has seen a major change over the past decades when it comes to financing. Some event such as the 2007 financial crisis, the recent ecology movement and the housing shortage have all in one way or another affected the real estatemarket directly or indirectly. The way consumers and companies finance their real estate purchases have changed over time. When for example, the bankes issued a mortgage cap,a major group of consumers sought another way of financing their mortgage and this need was filled by alternative financing methods that we have seen today. The real estatecompanies have also needed to undergo a change in terms of financing due to the development of the environmental movement that forced an introduction of alternative financing in green bonds. This study aims to research how the topic of real estate financing has changed in academic literature within the period 2007-2019 and identify any trends that have taken place.Furthermore, the study will also research what aspects of sustainability that have been taken into account within the range of literature that is included in this study. From analyzing 30 student essays related to the topic of real estate financing, a clear trend that was concluded was that mortgage loans were the predominant way of financing throughout the time period from a consumer perspective. Furthermore, from a corporate perspective, the most mentioned and studied way of financing were bonds. Another clear pattern that one could conclude was that during the earlier years, there were a limited amount of methods of financing that were mentioned amongst the student essays included in the study and during the later years there were a larger variation between methods that were mentioned in the litterature. From analyzing the sustainability aspects, there was a clear majority of essays that had considered economic sustainability whereas the rest of thees says were primarily focused around social sustainability
    corecore