1,088 research outputs found
Matching records in multiple databases using a hybridization of several technologies.
A major problem with integrating information from multiple databases is that the same data objects can exist in inconsistent data formats across databases and a variety of attribute variations, making it difficult to identify matching objects using exact string matching. In this research, a variety of models and methods have been developed and tested to alleviate this problem. A major motivation for this research is that the lack of efficient tools for patient record matching still exists for health care providers. This research is focused on the approximate matching of patient records with third party payer databases. This is a major need for all medical treatment facilities and hospitals that try to match patient treatment records with records of insurance companies, Medicare, Medicaid and the veteran\u27s administration. Therefore, the main objectives of this research effort are to provide an approximate matching framework that can draw upon multiple input service databases, construct an identity, and match to third party payers with the highest possible accuracy in object identification and minimal user interactions. This research describes the object identification system framework that has been developed from a hybridization of several technologies, which compares the object\u27s shared attributes in order to identify matching object. Methodologies and techniques from other fields, such as information retrieval, text correction, and data mining, are integrated to develop a framework to address the patient record matching problem. This research defines the quality of a match in multiple databases by using quality metrics, such as Precision, Recall, and F-measure etc, which are commonly used in Information Retrieval. The performance of resulting decision models are evaluated through extensive experiments and found to perform very well. The matching quality performance metrics, such as precision, recall, F-measure, and accuracy, are over 99%, ROC index are over 99.50% and mismatching rates are less than 0.18% for each model generated based on different data sets. This research also includes a discussion of the problems in patient records matching; an overview of relevant literature for the record matching problem and extensive experimental evaluation of the methodologies, such as string similarity functions and machine learning that are utilized. Finally, potential improvements and extensions to this work are also presented
Intelligent Information Access to Linked Data - Weaving the Cultural Heritage Web
The subject of the dissertation is an information alignment experiment of two cultural heritage information systems (ALAP): The Perseus Digital Library and Arachne. In modern societies, information integration is gaining importance for many tasks such as business decision making or even catastrophe management. It is beyond doubt that the information available in digital form can offer users new ways of interaction. Also, in the humanities and cultural heritage communities, more and more information is being published online. But in many situations the way that information has been made publicly available is disruptive to the research process due to its heterogeneity and distribution. Therefore integrated information will be a key factor to pursue successful research, and the need for information alignment is widely recognized.
ALAP is an attempt to integrate information from Perseus and Arachne, not only on a schema level, but to also perform entity resolution. To that end, technical peculiarities and philosophical implications of the concepts of identity and co-reference are discussed. Multiple approaches to information integration and entity resolution are discussed and evaluated. The methodology that is used to implement ALAP is mainly rooted in the fields of information retrieval and knowledge discovery.
First, an exploratory analysis was performed on both information systems to get a first impression of the data. After that, (semi-)structured information from both systems was extracted and normalized. Then, a clustering algorithm was used to reduce the number of needed entity comparisons. Finally, a thorough matching was performed on the different clusters. ALAP helped with identifying challenges and highlighted the opportunities that arise during the attempt to align cultural heritage information systems
Recommended from our members
Commodities and Linkages: Industrialisation in Sub-Saharan Africa
In a complementary Discussion Paper (MMCP DP 12 2011) we set out the reasons why we believe that there is extensive scope for linkage development into and out of SSA’s commodities sectors. In this Discussion Paper, we present the findings of our detailed empirical enquiry into the determinants of the breadth and depth of linkages in eight SSA countries (Angola, Botswana, Gabon, Ghana, Nigeria, South Africa Tanzania, and Zambia) and six sectors (copper, diamonds, gold, oil and gas, mining services and timber). We conclude from this detailed research that the extent of linkages varies as a consequence of four factors which intrinsically affect their progress – the passage of time, the complexity of the sector and the level of capabilities in the domestic economy. However, beyond this we identify three sets of related factors which determined the nature and pace of linkage development. The first is the structure of ownership, both in lead commodity producing firms and in their suppliers and domestic customers. The second is the nature and quality of both hard infrastructure (for example, roads and ports) and soft infrastructure (for example, the efficiency of customs clearance). The third is the availability of skills and the structure and orientation of the National System of Innovation in the domestic economy. The fourth, and overwhelmingly important contextual factor is policy. This reflects policy towards the commodity sector itself, and policy which affects the three contextual drivers, namely ownership, infrastructure and capabilities. As a result of this comparative analysis we provided an explanation of why linkage development was progressive in some economies (such as Botswana) and regressive in others (such as Tanzania). This cluster of factors also explains why the breadth and depth of linkages is relative advanced in some countries (such as South Africa), and at a very nascent stage in other countries (such as Angola)
A FRAMEWORK FOR GUIDING THE BRIEFING PROCESS IN PUBLIC-PRIVATE PARTNERSHIPS IN THE UAE CONSTRUCTION INDUSTRY
Public-Private Partnership (PPP) is a procurement method that employs a long-term contractual arrangement between public and private sectors with the intention of developing a public facility. A PPP brief must supply information that not only particularizes the project requirements but also specifies its program, risk management, expected performance output and payment mechanism. Many challenges currently face the briefing process of PPP projects in the UAE. A uniform briefing process has not been agreed, because there is no unified tender law or PPP procurement process in the country. The main aim of this research is to develop a framework for guiding the development of PPP briefing stage in the UAE construction industry. To this end, a process framework for PPP briefing with special reference to UAE construction projects was developed first, on the basis of an intensive literature review and analysis of case studies. This framework was validated through interviews with PPP experts and professionals in the UAE. Following this, the Critical Success Factors (CSFs) in PPP briefing, with special reference to UAE construction projects, were investigated and identified through a literature review, expert interviews, and a questionnaire survey. This step led to developing another framework for CSFs in PPP briefing with special reference to UAE construction projects. With these in mind, CSFs were modelled to develop a Decision Support System (DSS) the main aim of which was to guide the of the briefing stage for PPP projects in the UAE. Its main objectives focused on assessing the readiness of public and private organizations for successful briefing development, highlighting areas for improvements and helping to develop action plans to improve the briefing process. In order to validate the developed model and assess its performance as a decision-making tool, two mega construction projects (real case studies) were assessed by means of the proposed model. The outputs of the implemented evaluation validated the major aspects of this model and its developed prototype, together with its performance for its stated purpose
The 1992 4th NASA SERC Symposium on VLSI Design
Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design
Metric Selection and Metric Learning for Matching Tasks
A quarter of a century after the world-wide web was born, we have grown accustomed to having easy access to a wealth of data sets and open-source software. The value of these resources is restricted if they are not properly integrated and maintained. A lot of this work boils down to matching; finding existing records about entities and enriching them with information from a new data source. In the realm of code this means integrating new code snippets into a code base while avoiding duplication.
In this thesis, we address two different such matching problems. First, we leverage the diverse and mature set of string similarity measures in an iterative semisupervised learning approach to string matching. It is designed to query a user to make a sequence of decisions on specific cases of string matching. We show that we can find almost optimal solutions after only a small amount of such input. The low labelling complexity of our algorithm is due to addressing the cold start problem that is inherent to Active Learning; by ranking queries by variance before the arrival of enough supervision information, and by a self-regulating mechanism that counteracts initial biases.
Second, we address the matching of code fragments for deduplication. Programming code is not only a tool, but also a resource that itself demands maintenance. Code duplication is a frequent problem arising especially from modern development practice. There are many reasons to detect and address code duplicates, for example to keep a clean and maintainable codebase. In such more complex data structures, string similarity measures are inadequate. In their stead, we study a modern supervised Metric Learning approach to model code similarity with Neural Networks. We find that in such a model representing the elementary tokens with a pretrained word embedding is the most important ingredient. Our results show both qualitatively (by visualization) that relatedness is modelled well by the embeddings and quantitatively (by ablation) that the encoded information is useful for the downstream matching task.
As a non-technical contribution, we unify the common challenges arising in supervised learning approaches to Record Matching, Code Clone Detection and generic Metric Learning tasks. We give a novel account to string similarity measures from a psychological standpoint and point out and document one longstanding naming conflict in string similarity measures. Finally, we point out the overlap of latest research in Code Clone Detection with the field of Natural Language Processing
- …