1,326 research outputs found
Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web
The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the authorâs and shouldnât be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very
instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that
they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our
technologies is still barely visible. McLuhanâs predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet
the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the
services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge
management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The
combination of this expertise, and the time and space afforded the consortium by the
IRC structure, suggested the opportunity for a concerted effort to develop an approach
to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to
the knowledge management services AKT tries to provide. As a medium for the
semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the
provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different
applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing
ontologies to create a third). Ontology mapping, and the elimination of conflicts of
reference, will be important tasks. All of these issues are discussed along with our
proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices
that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which
semantic hygiene prevails interesting enough to reason in? These and many other
questions need to be addressed if we are to provide effective knowledge technologies
for our content on the web
Reusability ontology in business processes with similarity matching
The working technology will provide information and knowledge. Information and technology can be developed in various ways, by reusing the technologies. In this study modeled the ontology of SOPs using protégé. Ontology will be matched between ontology A and B to obtain similarity and reuse ontology to create a more optimal ontology. Matching is a matching process between both ontologies to get the same value from both ontologies. Jaro-Winkler distance is used to find commonality between ontology. The result of the Jaro-Winkler distance has a value of 0 and 1, in matching will be obtained value close to 0 or 1. On matching ontology obtained two tests using 40% SPARQL query. In the test it uses Jaro-Winkler distance with a value of 0.67. This research yields matching value between ontology A and ontology B which is the same so that reuse ontology can be done for better ontolog
Recommended from our members
Ontology-based Semantic Harmonization of HIV-associated Common Data Elements for Integration of Diverse HIV Research Datasets
Analysis of integrated, diverse, Human Immunodeficiency Virus (HIV)-associated datasets can increase knowledge and guide the development of novel and effective interventions for disease prevention and treatment by increasing breadth of variables and statistical power, particularly for sub-group analyses. This topic has been identified as a National Institutes of Health research priority, but few efforts have been made to integrate data across HIV studies. Our aims were to: 1) Characterize the semantic heterogeneity (SH) in the HIV research domain; 2) Identify HIV-associated common data elements (CDEs) in empirically generated and knowledge-based resources; 3) Create a formal representation of HIV-associated CDEs in the form of an HIV-associated Entities in Research Ontology (HERO); 4) Assess the feasibility of using HERO to semantically harmonize HIV research data. Our approach was guided by information/knowledge theory and the DIKW (Data Information Knowledge Wisdom) hierarchical model.
Our systematized review of the literature revealed that synergistic use of both ontologies and CDEs included integration, interoperability, data exchange, and data standardization. Moreover, methods and tools included use of experts for CDE identification, the Unified Medical Language System, natural language processing, Extensible Markup Language, Health Level 7, and ontology development tools (e.g., Protégé). Additionally, evaluation methods included expert assessment, quantification of mapping tasks between raters, assessment of interrater reliability, and comparison to established standards. We used these findings to inform our process for achieving the study aims.
For Aim 1, we analyzed eight disparate HIV-associated data dictionaries and developed a String Metric-assisted Assessment of Semantic Heterogeneity (SMASH) method, which aided identification of 127 (13%) homogeneous data element (DE) pairs and 1,048 (87%) semantically heterogeneous DE pairs. Most heterogeneous pairs (97%) were semantically-equivalent/syntactically-different, allowing us to determine that SH in the HIV research domain was high.
To achieve Aim 2, we used Clinicaltrials.gov, Google Search, and text mining in R to identify HIV-associated CDEs in HIV journal articles, HIV-associated datasets, AIDSinfo HIV/AIDS Glossary, AIDSinfo Drug Database, Logical Observation Identifiers Names and Codes (LOINC), Systematized Nomenclature of Medicine (SNOMED), and RxNORM (understood as prescription normalization). Two HIV experts then manually reviewed DEs from the journal articles and data dictionaries to confirm DE commonality and resolved semantic discrepancies through discussion. Ultimately, we identified 2,179 unique CDEs. Of all CDEs, data-driven approaches identified 2,055 (94%) (999 from the HIV/AIDS Glossary, 398 from the Drug Database, 91 from journal articles, and a total of 567 from LOINC, SNOMED, and RxNorm cumulatively). Expert-based approaches identified 124 (6%) unique CDEs from data dictionaries and confirmed the 91 CDEs from journal articles.
In Aim 3, we used the ProtĂ©gĂ© suite of ontology development tools and the 2,179 CDEs to develop the HERO. We modeled the ontology using the semantic structure of the Medical Entities Dictionary, available hierarchical information from the CDE knowledge resources, and expert knowledge. The ontology fulfilled most relevant criteria from Ciminoâs desiderata and OntoClean ontology engineering principles, and it successfully answered eight competency questions.
Finally, for Aim 4, we assessed the feasibility of using HERO to semantically harmonize and integrate the data dictionaries from two diverse HIV-associated datasets. Two HIV experts involved in the development of HERO independently assessed each data dictionary. Of the 367 DEs in data dictionary 1 (D1), 181 (49.32%) were identified as CDEs and 186 (50.68%) were not CDEs, and of the 72 DEs in data dictionary 2 (D2), 37 (51.39%) were CDEs and 35 (48.61%) were not CDEs. The HIV experts then traversed HEROâs hierarchy to map CDEs from D1 and D2 to CDEs in HERO. Of the 181 CDEs in D1, 156 (86.19%) were found in HERO, and 25 (13.81%) were not. Similarly, of the 37 CDEs in D2 32 (86.48%) were found in HERO, and 5 (13.51%) were not. Interrater reliability for CDE identification as measured by Cohenâs Kappa was 0.900 for D1 and 0.892 for D2. Cohenâs Kappas for CDEs in D1 and D2 that were also identified in HERO were 0.885 and 0.688, respectively.
Subsequently, to demonstrate the integration of the two HIV-associated datasets, a sample of semantically harmonized CDEs in both datasets was categorically selected (e.g. administrative, demographic, and behavioral), and D2 sample size increases were calculated for race (e.g., White, African American/Black, Asian/Pacific Islander, Native American/Indian, and Hispanic/Latino) and for âintravenous drug useâ from the integrated datasets. The average increase of D2 CDEs for six selected CDEs was 1,928%.
Despite the limitation of HERO developers also serving as evaluators, the contributions of the study to the fields of informatics and HIV research were substantial. Confirmatory contributions include: identification of effective CDE/ontology tools, and use of data-driven and expert-based methods. Novel contributions include: development of SMASH and HERO; and new contributions include documenting that SH is high in HIV-associated datasets, identifying 2,179 HIV-associated CDEs, creating two additional classifications of SH, and showing that using HERO for semantic harmonization of HIV-associated data dictionaries is feasible. Our future work will build upon this research by expanding the numbers and types of datasets, refining our methods and tools, and conducting an external evaluation
A methodology for designing layered ontology structures
Semantic ontologies represent the knowledge from different domains, which is used as a knowledge base by intelligent agents. The creation of ontologies by different developers leads to heterogeneous ontologies, which hampers the interoperability between knowledge-based applications. This interoperability is achieved
through global ontologies, which provide a common domain representation. Global ontologies must provide a balance of reusability-usability to minimise the ontology effort in different applications. To achieve this balance, ontology design methodologies focus on designing layered ontologies that classify into abstraction layers the domain knowledge relevant to many applications and the knowledge relevant to specific applications. During the design of the layered ontology structure, the domain knowledge classification is
performed from scratch by domain experts and ontology engineers in collaboration with application stakeholders. Hence, the design of reusable and usable ontologies in complex domains takes a significant effort. Software Product Line (SPL) design techniques can be applied to facilitate the domain knowledge classification by analysing the knowledge similarities/differences of existing ontologies. In this context, this thesis aims to define new methodological guidelines to design layered ontology structures that enable to classify the domain knowledge taking as reference existing ontologies, and to apply these guidelines to enable the development of reusable and usable ontologies in complex domains. The MODDALS methodology guides the design of layered ontology structures for reusable and usable ontologies. It brings together SPL engineering techniques and ontology design techniques to enable the classification of the domain knowledge
by exploiting the knowledge similarities/differences of existing ontologies. MODDALS eases the design of the layered ontology structure. The MODDALS methodology was evaluated by applying it to design the layered structure of a reusable and usable global ontology for the energy domain. The designed layered structure was taken as reference to develop the ontology. The resulting ontology simplifies the ontology reuse process in different applications. In particular, it reduced the average ontology reuse time by 0.5 and 1.2 person-hours in in two different applications in comparison with a global energy ontology which does not follow a layered structure.Ontologia semantikoak datu domeinu ezberdinen ezagutza irudikatzen dute, agente adimendunek jakintza oinarri bezala erabiltzen dutena. Ontologiak ingeniari desberdinek garatzen dituzte eta heterogeneoak dira, aplikazioen arteko komunikazioa oztopatuz. Komunikazio hau ontologia globalen bidez lortzen da, domeinuaren
errepresentazio komun bat ematen baitute. Ontologia globalek berrerabilgarritasunerabilgarritasun oreka eman behar dute aplikazio desberdinetan berrerabiltzeko ahalegina murrizteko. Horretarako, ontologia diseinu metodologiek aplikazio askok erabiltzen duten eta aplikazio zehatzetarako garrantzitsua den ezagutza abstrakzio geruzetan sailkatzea proposatzen dute. Geruza egituraren diseinuan zehar, domeinuko adituek eta ontologiako ingeniariek hutsetik sailkatzen dute jakintza, domeinu konplexuetan ontologia berrerabilgarriak eta erabilgarrien diseinu ahalegina areagotuz. Software produktu lerroak diseinatzeko erabiltzen diren teknikak jakintza sailkatzea erraztu ahal dute, ontologien ezagutza antzekotasunak edo desberdintasunak aztertuz. Testuinguru honetan, honakoa da tesiaren helburua: ezagutza garatutako ontologien arabera sailkatzen duen ontologia berrerabilgarri eta erabilgarrien geruza egitura diseinatzeko metodologia bat garatzea; baita metodologia aplikatu ere, ontologia berrerabilgarri eta erabilgarriak domeinu konplexuetan garatu ahal izateko. MODDALS metodologiak ontologia berrerabilgarri eta erabilgarrien abstrakzio geruzak nola diseinatu azaltzen du. MODDALS-ek software produktu lerro eta ontologia diseinu teknikak aplikatzen ditu ezagutza garatuta dauden ontologien antzekotasunen/desberdintasunen arabera sailkatzeko. Planteamendu honek geruza egitura diseinua errazten du. MODDALS ebaluatu da energia domeinurako ontologia berrerabilgarri eta erabilgarri baten egitura diseinatzeko aplikatuz. Diseinatutako geruza egitura erreferentzia gisa hartu da ontologia gartzeko. Egitura onekin, garatutako ontologia berrerabiltzea errazten du aplikazio desberdinetan. Konkretuki, garatutako ontologiak berrerabilpen denbora 0.5 eta 1.2 pertsona-orduetan murriztu du bi aplikazioetan; geruza egitura jarraitzen ez duen ontologia batekin alderatuz.Las ontologĂas semĂĄnticas representan el conocimiento de diferentes dominios, utilizado como base de conocimiento por agentes inteligentes. Las ontologĂas son desarrolladas por diferentes ingenieros y son heterogĂ©neas, afectando a la interoperabilidad entre aplicaciones. Esta interoperabilidad se logra mediante ontologĂas globales que proporcionan una representaciĂłn comĂșn del dominio, las cuales deben proporcionar un balance de reusabilidad-usabilidad para minimizar el esfuerzo de reutilizaciĂłn en diferentes aplicaciones. Para lograr este balance, las metodologĂas de diseño de ontologĂas proponen clasificar en capas de abstracciĂłn el conocimiento del dominio comĂșn a muchas aplicaciones y el que es relevante para aplicaciones especĂficas. Durante el diseño de la estructura de capas, el conocimiento se clasifica partiendo de cero por expertos del dominio e ingenieros de ontologĂas. Por lo tanto, el diseño de ontologĂas reusables y usables en dominios complejos requiere un gran esfuerzo. Las tĂ©cnicas de diseño de lĂneas de producto de software pueden facilitar la clasificaciĂłn del conocimiento analizando las similitudes/diferencias de conocimiento de ontologĂas existentes. En este contexto, el objetivo de la tesis es crear una metodologĂa de diseño de la estructura de capas para ontologĂas que permita clasificar el conocimiento tomando como referencia ontologĂas existentes, y aplicar esta metodologĂa para poder desarrollar ontologĂas reusables y usables en dominios complejos. La metodologĂa MODDALS explica cĂłmo diseñar estructuras de capas para ontologĂas reusables y usables. MODDALS adopta tĂ©cnicas de diseño de lĂneas de producto en combinaciĂłn con tĂ©cnicas de diseño de ontologĂas para clasificar el conocimiento basĂĄndose en las similitudes/diferencias de ontologĂas existentes. Este enfoque facilita el diseño de la estructura de capas de la ontologĂa. La metodologĂa MODDALS se ha evaluado aplicĂĄndola para diseñar la estructura de capas de una ontologĂa global reusable y usable para el dominio de la energĂa. La estructura de capas diseñada se ha tomado como referencia para desarrollar la ontologĂa. Con esta estructura, la ontologĂa resultante simplifica la reutilizaciĂłn de ontologĂas en diferentes aplicaciones. En concreto, la ontologĂa redujo el tiempo de reutilizaciĂłn en 0.5 y 1.2 personas-hora en dos aplicaciones respecto a una ontologĂa global que no sigue una estructura por capas
Tools for enterprises collaboration in virtual enterprises
Virtual Enterprise (VE) is an organizational collaboration concept which provides a competitive edge in the globalized business environment. The life cycle of a VE consists of four stages i.e. opportunity identification (Pre-Creation), partner selection (Creation), operation and dissolution. The success of VEs depends upon the efficient execution of their VE-lifecycles along with knowledge enhancement for the partner enterprises to facilitate the future formation of efficient VEs. This research aims to study the different issues which occur in the VE lifecycle and provides a platform for the formation of high performance enterprises and VEs.
In the pre-creation stage, enterprises look for suitable partners to create their VE and to exploit a market opportunity. This phase requires explicit and implicit information extraction from enterprise data bases (ECOS-ontology) for the identification of suitable partners. A description logic (DL) based query system is developed to extract explicit and implicit information and to identify potential partners for the creation of the VE.
In the creation phase, the identified partners are analysed using different risks paradigms and a cooperative game theoretic approach is used to develop a revenue sharing mechanism based on enterprises inputs and risk minimization for optimal partner selection.
In the operation phases, interoperability remains a key issue for seamless transfer of knowledge information and data. DL-based ontology mapping is applied in this research to provide interoperability in the VE between enterprises with different domains of expertise.
In the dissolution stage, knowledge acquired in the VE lifecycle needs to be disseminated among the enterprises to enhance their competitiveness. A DL-based ontology merging approach is provided to accommodate new knowledge with existing data bases with logical consistency.
Finally, the proposed methodologies are validated using the case study. The results obtained in the case study illustrate the applicability and effectiveness of proposed methodologies in each stage of the VE life cycle
A Semantic-driven Approach for Maintenance Digitalization in the Pharmaceutical Industry
The digital transformation of pharmaceutical industry is a challenging task
due to the high complexity of involved elements and the strict regulatory
compliance. Maintenance activities in the pharmaceutical industry play an
essential role in ensuring product quality and integral functioning of
equipment and premises. This paper first identifies the key challenges of
digitalization in pharmaceutical industry and creates the corresponding problem
space for key involved elements. A literature review is conducted to
investigate the mainstream maintenance strategies, digitalization models, tools
and official guidance from authorities in pharmaceutical industry. Based on the
review result, a semantic-driven digitalization framework is proposed aiming to
improve the digital continuity and cohesion of digital resources and
technologies for maintenance activities in the pharmaceutical industry. A case
study is conducted to verify the feasibility of the proposed framework based on
the water sampling activities in Merck Serono facility in Switzerland. A
tool-chain is presented to enable the functional modules of the framework. Some
of the key functional modules within the framework are implemented and have
demonstrated satisfactory performance. As one of the outcomes, a digital
sampling assistant with web-based services is created to support the automated
workflow of water sampling activities. The implementation result proves the
potential of the proposed framework to solve the identified problems of
maintenance digitalization in the pharmaceutical industry
Improving National and Homeland Security through a proposed Laboratory for Information Globalization and Harmonization Technologies (LIGHT)
A recent National Research Council study found that: "Although there are many private and public databases that
contain information potentially relevant to counter terrorism programs, they lack the necessary context definitions
(i.e., metadata) and access tools to enable interoperation with other databases and the extraction of meaningful and
timely information" [NRC02, p.304, emphasis added] That sentence succinctly describes the objectives of this
project. Improved access and use of information are essential to better identify and anticipate threats, protect
against and respond to threats, and enhance national and homeland security (NHS), as well as other national
priority areas, such as Economic Prosperity and a Vibrant Civil Society (ECS) and Advances in Science and
Engineering (ASE). This project focuses on the creation and contributions of a Laboratory for Information
Globalization and Harmonization Technologies (LIGHT) with two interrelated goals:
(1) Theory and Technologies: To research, design, develop, test, and implement theory and technologies for
improving the reliability, quality, and responsiveness of automated mechanisms for reasoning and resolving semantic
differences that hinder the rapid and effective integration (int) of systems and data (dmc) across multiple
autonomous sources, and the use of that information by public and private agencies involved in national and
homeland security and the other national priority areas involving complex and interdependent social systems (soc).
This work builds on our research on the COntext INterchange (COIN) project, which focused on the integration
of diverse distributed heterogeneous information sources using ontologies, databases, context mediation algorithms,
and wrapper technologies to overcome information representational conflicts. The COIN approach makes it
substantially easier and more transparent for individual receivers (e.g., applications, users) to access and exploit
distributed sources. Receivers specify their desired context to reduce ambiguities in the interpretation of information
coming from heterogeneous sources. This approach significantly reduces the overhead involved in the integration of
multiple sources, improves data quality, increases the speed of integration, and simplifies maintenance in an
environment of changing source and receiver context - which will lead to an effective and novel distributed
information grid infrastructure. This research also builds on our Global System for Sustainable Development
(GSSD), an Internet platform for information generation, provision, and integration of multiple domains, regions,
languages, and epistemologies relevant to international relations and national security.
(2) National Priority Studies: To experiment with and test the developed theory and technologies on practical
problems of data integration in national priority areas. Particular focus will be on national and homeland security,
including data sources about conflict and war, modes of instability and threat, international and regional
demographic, economic, and military statistics, money flows, and contextualizing terrorism defense and response.
Although LIGHT will leverage the results of our successful prior research projects, this will be the first research
effort to simultaneously and effectively address ontological and temporal information conflicts as well as
dramatically enhance information quality. Addressing problems of national priorities in such rapidly changing
complex environments requires extraction of observations from disparate sources, using different interpretations, at
different points in times, for different purposes, with different biases, and for a wide range of different uses and
users. This research will focus on integrating information both over individual domains and across multiple domains.
Another innovation is the concept and implementation of Collaborative Domain Spaces (CDS), within which
applications in a common domain can share, analyze, modify, and develop information. Applications also can span
multiple domains via Linked CDSs. The PIs have considerable experience with these research areas and the
organization and management of such large scale international and diverse research projects.
The PIs come from three different Schools at MIT: Management, Engineering, and Humanities, Arts & Social
Sciences. The faculty and graduate students come from about a dozen nationalities and diverse ethnic, racial, and
religious backgrounds. The currently identified external collaborators come from over 20 different organizations
and many different countries, industrial as well as developing. Specific efforts are proposed to engage even more
women, underrepresented minorities, and persons with disabilities.
The anticipated results apply to any complex domain that relies on heterogeneous distributed data to address and
resolve compelling problems. This initiative is supported by international collaborators from (a) scientific and
research institutions, (b) business and industry, and (c) national and international agencies. Research products
include: a System for Harmonized Information Processing (SHIP), a software platform, and diverse applications in
research and education which are anticipated to significantly impact the way complex organizations, and society in
general, understand and manage critical challenges in NHS, ECS, and ASE
Improving National and Homeland Security through a proposed Laboratory for nformation Globalization and Harmonization Technologies (LIGHT)
A recent National Research Council study found that: "Although there are many private and public databases that
contain information potentially relevant to counter terrorism programs, they lack the necessary context definitions
(i.e., metadata) and access tools to enable interoperation with other databases and the extraction of meaningful and
timely information" [NRC02, p.304, emphasis added] That sentence succinctly describes the objectives of this
project. Improved access and use of information are essential to better identify and anticipate threats, protect
against and respond to threats, and enhance national and homeland security (NHS), as well as other national
priority areas, such as Economic Prosperity and a Vibrant Civil Society (ECS) and Advances in Science and
Engineering (ASE). This project focuses on the creation and contributions of a Laboratory for Information
Globalization and Harmonization Technologies (LIGHT) with two interrelated goals:
(1) Theory and Technologies: To research, design, develop, test, and implement theory and technologies for
improving the reliability, quality, and responsiveness of automated mechanisms for reasoning and resolving semantic
differences that hinder the rapid and effective integration (int) of systems and data (dmc) across multiple
autonomous sources, and the use of that information by public and private agencies involved in national and
homeland security and the other national priority areas involving complex and interdependent social systems (soc).
This work builds on our research on the COntext INterchange (COIN) project, which focused on the integration of
diverse distributed heterogeneous information sources using ontologies, databases, context mediation algorithms,
and wrapper technologies to overcome information representational conflicts. The COIN approach makes it
substantially easier and more transparent for individual receivers (e.g., applications, users) to access and exploit
distributed sources. Receivers specify their desired context to reduce ambiguities in the interpretation of information
coming from heterogeneous sources. This approach significantly reduces the overhead involved in the integration of
multiple sources, improves data quality, increases the speed of integration, and simplifies maintenance in an
environment of changing source and receiver context - which will lead to an effective and novel distributed
information grid infrastructure. This research also builds on our Global System for Sustainable Development
(GSSD), an Internet platform for information generation, provision, and integration of multiple domains, regions,
languages, and epistemologies relevant to international relations and national security.
(2) National Priority Studies: To experiment with and test the developed theory and technologies on practical
problems of data integration in national priority areas. Particular focus will be on national and homeland security,
including data sources about conflict and war, modes of instability and threat, international and regional
demographic, economic, and military statistics, money flows, and contextualizing terrorism defense and response.
Although LIGHT will leverage the results of our successful prior research projects, this will be the first research
effort to simultaneously and effectively address ontological and temporal information conflicts as well as
dramatically enhance information quality. Addressing problems of national priorities in such rapidly changing
complex environments requires extraction of observations from disparate sources, using different interpretations, at
different points in times, for different purposes, with different biases, and for a wide range of different uses and
users. This research will focus on integrating information both over individual domains and across multiple domains.
Another innovation is the concept and implementation of Collaborative Domain Spaces (CDS), within which
applications in a common domain can share, analyze, modify, and develop information. Applications also can span
multiple domains via Linked CDSs. The PIs have considerable experience with these research areas and the
organization and management of such large scale international and diverse research projects.
The PIs come from three different Schools at MIT: Management, Engineering, and Humanities, Arts & Social
Sciences. The faculty and graduate students come from about a dozen nationalities and diverse ethnic, racial, and
religious backgrounds. The currently identified external collaborators come from over 20 different organizations and
many different countries, industrial as well as developing. Specific efforts are proposed to engage even more
women, underrepresented minorities, and persons with disabilities.
The anticipated results apply to any complex domain that relies on heterogeneous distributed data to address and
resolve compelling problems. This initiative is supported by international collaborators from (a) scientific and
research institutions, (b) business and industry, and (c) national and international agencies. Research products
include: a System for Harmonized Information Processing (SHIP), a software platform, and diverse applications in
research and education which are anticipated to significantly impact the way complex organizations, and society in
general, understand and manage critical challenges in NHS, ECS, and ASE
- âŠ