    An Analysis of Machine Learning-Based Semantic Matchmaking

    Interoperability remains to be one of the main challenges in the Internet of Things. The increasing number of IoT data sources from various vendors augments the complexity of integrating different sensors and actuators on the existing platforms, requiring human involvement and becoming error prone. To improve this situation, devices are usually coupled with a semantic description of their attributes. Such semantic descriptions, Things Descriptions, TD, are therefore an abstraction of devices, that is helpful to achieve a smoother integration of devices into IoT platforms. However, TD are usually vendor-based, so for large-scale IoT infrastructures, the integration complexity increases, as there will be different descriptions of similar sensors, provided by different vendors to be interconnected into IoT platforms. In this context, the paper assesses different ML-based semantic matchmaking approaches, against a sentence-based statistical similarity approach. For the ML approaches, the paper focuses on clustering and Natural Language Processing. The three approaches have been implemented on a realistic testbed, and experiments carried out show that the best performance achieved in terms of accuracy, time to completion of a matchmaking request, and memory usage is the NLP-based approach


    A Survey of Appliactions and Researches on Schema Matching between GIS Spatial Data

    Toward the Inter-organizational Product Information Supply Chain – Evidence from the Retail and Consumer Goods Industries

    Since the 1980s, the retail and consumer goods industries have been making very extensive use of EDI-based data exchange and subsequently developed the vision of Efficient Consumer Response (ECR). In the meantime, a growing number of studies report that poor data quali¬ty, in particular out¬dated or wrong product information, negatively impacts demand and supply chain performance. Whereas prior literature intensively studied the positive effects of information sharing on the coordination of supply and demand, this research is aimed at establishing a basis for understanding the phenomena of the underlying inter-organizational product information supply chain. Using coordination theory as an overarching framework, the main research contribution is a set of dependencies, coordination problems, and coordination mechanisms that characterize the product information supply chain. From an analysis of two retailer-manufacturer relationships, we conclude that flow and sharing dependencies evolve into reciprocal dependencies as the intensity of demand and supply collaboration increases. We also find that industry standards ?notably Global Data Synchronization (GDS) ?do not yet fully cover the inter-organizational coordination requirements that result from the identified set of sharing and flow dependencies

    Managing Missing Data in Data Integration

    The amount of data in the world is constantly growing at an enormous pace, especially with the expansion of the internet. Data is stored in different formats in various source systems. The goal of data integration is to provide users with unified access to heterogeneous and independent data without requiring them to understand the logic of the source systems. Users can submit queries on the mediated schema that interprets them to the source systems. The data in integration is rarely complete: it may contain incorrect or completely missing values. These missing data can be managed and enriched using various methods. The literature review of this thesis explores data integration and its challenges, as well as the missing data mechanisms and strategies for dealing with missing data. The experimental section of this work analyses these strategies in the context of online automotive dealerships. Cars are increasingly being purchased directly from the internet or at least using the internet as a strong support in the purchasing process. Incomplete car data can lead to issues such as the car not appearing in potential buyers' search results, even resulting in the car not being sold. The results of this work show that finding a similar car from a dataset is crucial in managing missing car data, which is not always straightforward. String matching -method is an essential part of finding a similar car, but it doesn't always give a perfectly accurate result. For this reason, the work presents a model for managing missing car data, where string matching is used only when necessary. According to the model, string matching can also be strengthened by comparing other values belonging to the same attribute group. External sources, such as pre-existing com- mercial databases or a company's self-built database, should also be used, if needed, to find the similar car

    Intelligent Information Access to Linked Data - Weaving the Cultural Heritage Web

    The subject of the dissertation is an information alignment experiment of two cultural heritage information systems (ALAP): The Perseus Digital Library and Arachne. In modern societies, information integration is gaining importance for many tasks such as business decision making or even catastrophe management. It is beyond doubt that the information available in digital form can offer users new ways of interaction. Also, in the humanities and cultural heritage communities, more and more information is being published online. But in many situations the way that information has been made publicly available is disruptive to the research process due to its heterogeneity and distribution. Therefore integrated information will be a key factor to pursue successful research, and the need for information alignment is widely recognized. ALAP is an attempt to integrate information from Perseus and Arachne, not only on a schema level, but to also perform entity resolution. To that end, technical peculiarities and philosophical implications of the concepts of identity and co-reference are discussed. Multiple approaches to information integration and entity resolution are discussed and evaluated. The methodology that is used to implement ALAP is mainly rooted in the fields of information retrieval and knowledge discovery. First, an exploratory analysis was performed on both information systems to get a first impression of the data. After that, (semi-)structured information from both systems was extracted and normalized. Then, a clustering algorithm was used to reduce the number of needed entity comparisons. Finally, a thorough matching was performed on the different clusters. ALAP helped with identifying challenges and highlighted the opportunities that arise during the attempt to align cultural heritage information systems

    Confiança, comprometimento e efeito chicote na gestão da cadeia de suprimentos automotiva

    Dissertação apresentada ao Programa de Mestrado em Administração da Universidade Municipal de São Caetano do Sul - USCSA CS da indústria automotiva é uma das mais globalizadas e competitivas do mundo. Por esta razão, sempre esteve à frente na implantação de processos inovadores, tanto em tecnologia, processos, e principalmente, quanto à gestão dos negócios. Buscando aumentar o foco no “core business” e reduzir custos, as montadoras têm continuadamente terceirizado processos, o que definitivamente alterou a arquitetura da CS tradicional, tipicamente vertical nas décadas anteriores, para o formato modular. Por outro lado, a arquitetura modular requer relacionamentos mais próximos entre os atores da CS; parceria, compartilhamento das informações, confiança e comprometimento, e naturalmente as ferramentas para fazer tudo isso factível e eficiente. Entretanto, apesar das necessidades requeridas desta nova arquitetura da CS, não se sabe se os gestores das autopeças têm a percepção do EC na CS automotiva. As Metodologias aplicadas na Pesquisa foram; Pesquisa Bibliográfica, Pesquisa Bibliométrica e Pesquisa de Campo ou Levantamento (Survey). Conforme evidenciado nas pesquisas de campo e bibliográfica, a Confiança na CS automotivas e a adoção de medidas contra o EC modificam-se de acordo com a montadora. No Brasil o relacionamento entre autopeças e montadoras apresenta restrições quanto aos aspectos de Confiança e Compromentimento e Compartilhamento de Informações mesmo em relacionamentos mais intensos que envolvem grande número de itens comercializados e a alta frequência de entrega de materiais. No ranking das 5 montadoras avaliadas; Fiat, GM, Honda, Toyota e VW, a melhor colocação foi da GM, totalizando 2103 pontos, entre 3240 possíveis, desempenho 35% aquém do ideal. Avaliar o desempenho de fornecedores e clientes, seus fluxos, processos logísticos, tecnologias, percepções da qualidade e necessidades do mercado de uma CS é avaliar a própria CS como um todo. O segmento automotivo brasileiro necessita padronizar a avaliação dos processos logísticos para torná-los mais justos e coerentes, tanto para os fornecedores como para as montadoras para aperfeiçoar a CS como um todo, e assim estabeler um sério e real do processo de parceria colaborativa.si

    Semantic matching across heterogeneous data sources

    Semantic Matching across Heterogeneous Data Sources

    As our ability to build information systems continues to grow, so does the need to integrate the systems we build. There is currently an urgent need for cooperation among massively distributed information systems for homeland security purposes. Information about a suspected individual needs to be retrieved from many systems maintained nationwide by various organizations, including intelligence agencies, police departments, motor vehicle departments, and airlines. There is also a need for a unified Master Patient Index (MPI) that integrates numerous healthcare information systems and allows authorized care providers to easily access the medical records of all patients [1]. The rapid growth of the Internet, especially the recent development of Web services, continuously amplifies the need for semantic interoperability across heterogeneous data sources. Related data sources accessible via different Web services create new requirements and opportunities for data integration. Such need for integration of information systems is becoming ubiquitous, both within and across organizations, in many domains. The information systems that need to be integrated are typically heterogeneous in several aspects; these include operating system, data model, database management system (DBMS)