Search CORE

976 research outputs found

Recommended from our members

Extracting and re-using research data from chemistry e-theses: the SPECTRa-T project

Author: Downing Jim
Harvey Matt
Morgan Peter
Murray-Rust Peter
Rzepa Henry S
Stewart Diana
Tonge Alan
Townsend Joseph A
Publication venue: 11th International Symposium on Electronic Theses and Dissertations
Publication date: 01/06/2008
Field of study

Scientific e-theses are data-rich resources, but much of the information they contain is not readily accessible. For chemistry, the SPECTRa-T project has addressed this problem by developing data-mining techniques to extract experimental data, creating RDF (Resource Description Framework) triples for exposure to sophisticated Semantic Web searches. We used OSCAR3, an Open Source chemistry text-mining tool, to parse and extract data from theses in PDF, and from theses in Office Open XML document format. Theses in PDF suffered data corruption and a loss of formatting that prevented the identification of chemical objects. Theses in .docx yielded semantically rich SciXML that enabled the additional extraction of associated data. Chemical objects were placed in a data repository, and RDF triples deposited in a triplestore. Data-mining from chemistry e-theses is both desirable and feasible; but the use of PDF, the de facto format standard for deposit in most repositories, prevents the optimal extraction of data for semantic querying. In order to facilitate this, we recommend that universities also require deposition of chemistry e-theses in an XML document format. Further work is required to clarify the complex IPR issues and ensure that they do not become an unwarranted barrier to data extraction and re-use

Apollo (Cambridge)

XML and Semantics

Author: Keyvanpour MohammadReza
Moradi Mohammad
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2015
Field of study

Since the early days of introducing eXtensible Markup Language (XML), owing to its expressive capabilities and flexibilities, it became the defacto standard for representing, storing, and interchanging data on the Web. Such features have made XML one of the building blocks of the Semantic Web. From another viewpoint, since XML documents could be considered from content, structural, and semantic aspects, leveraging their semantics is very useful and applicable in different domains. However, XML does not by itself introduce any built-in mechanisms for governing semantics. For this reason, many studies have been conducted on the representation of semantics within/from XML documents. This paper studies and discusses different aspects of the mentioned topic in the form of an overview with an emphasis on the state of semantics in XML and its presentation methods

IAES journal

Institute of Advanced Engineering and Science

Improving National and Homeland Security through a proposed Laboratory for Information Globalization and Harmonization Technologies (LIGHT)

Author: Choucri Nazli
Madnick Stuart
Siegel Michael
Wang Richard
Publication venue
Publication date: 30/11/2004
Field of study

A recent National Research Council study found that: "Although there are many private and public databases that contain information potentially relevant to counter terrorism programs, they lack the necessary context definitions (i.e., metadata) and access tools to enable interoperation with other databases and the extraction of meaningful and timely information" [NRC02, p.304, emphasis added] That sentence succinctly describes the objectives of this project. Improved access and use of information are essential to better identify and anticipate threats, protect against and respond to threats, and enhance national and homeland security (NHS), as well as other national priority areas, such as Economic Prosperity and a Vibrant Civil Society (ECS) and Advances in Science and Engineering (ASE). This project focuses on the creation and contributions of a Laboratory for Information Globalization and Harmonization Technologies (LIGHT) with two interrelated goals: (1) Theory and Technologies: To research, design, develop, test, and implement theory and technologies for improving the reliability, quality, and responsiveness of automated mechanisms for reasoning and resolving semantic differences that hinder the rapid and effective integration (int) of systems and data (dmc) across multiple autonomous sources, and the use of that information by public and private agencies involved in national and homeland security and the other national priority areas involving complex and interdependent social systems (soc). This work builds on our research on the COntext INterchange (COIN) project, which focused on the integration of diverse distributed heterogeneous information sources using ontologies, databases, context mediation algorithms, and wrapper technologies to overcome information representational conflicts. The COIN approach makes it substantially easier and more transparent for individual receivers (e.g., applications, users) to access and exploit distributed sources. Receivers specify their desired context to reduce ambiguities in the interpretation of information coming from heterogeneous sources. This approach significantly reduces the overhead involved in the integration of multiple sources, improves data quality, increases the speed of integration, and simplifies maintenance in an environment of changing source and receiver context - which will lead to an effective and novel distributed information grid infrastructure. This research also builds on our Global System for Sustainable Development (GSSD), an Internet platform for information generation, provision, and integration of multiple domains, regions, languages, and epistemologies relevant to international relations and national security. (2) National Priority Studies: To experiment with and test the developed theory and technologies on practical problems of data integration in national priority areas. Particular focus will be on national and homeland security, including data sources about conflict and war, modes of instability and threat, international and regional demographic, economic, and military statistics, money flows, and contextualizing terrorism defense and response. Although LIGHT will leverage the results of our successful prior research projects, this will be the first research effort to simultaneously and effectively address ontological and temporal information conflicts as well as dramatically enhance information quality. Addressing problems of national priorities in such rapidly changing complex environments requires extraction of observations from disparate sources, using different interpretations, at different points in times, for different purposes, with different biases, and for a wide range of different uses and users. This research will focus on integrating information both over individual domains and across multiple domains. Another innovation is the concept and implementation of Collaborative Domain Spaces (CDS), within which applications in a common domain can share, analyze, modify, and develop information. Applications also can span multiple domains via Linked CDSs. The PIs have considerable experience with these research areas and the organization and management of such large scale international and diverse research projects. The PIs come from three different Schools at MIT: Management, Engineering, and Humanities, Arts & Social Sciences. The faculty and graduate students come from about a dozen nationalities and diverse ethnic, racial, and religious backgrounds. The currently identified external collaborators come from over 20 different organizations and many different countries, industrial as well as developing. Specific efforts are proposed to engage even more women, underrepresented minorities, and persons with disabilities. The anticipated results apply to any complex domain that relies on heterogeneous distributed data to address and resolve compelling problems. This initiative is supported by international collaborators from (a) scientific and research institutions, (b) business and industry, and (c) national and international agencies. Research products include: a System for Harmonized Information Processing (SHIP), a software platform, and diverse applications in research and education which are anticipated to significantly impact the way complex organizations, and society in general, understand and manage critical challenges in NHS, ECS, and ASE

DSpace@MIT

Improving National and Homeland Security through a proposed Laboratory for nformation Globalization and Harmonization Technologies (LIGHT)

Author: Choucri Nazli
Madnick Stuart
Siegel Michael
Wang Richard
Publication venue
Publication date: 10/12/2004
Field of study

DSpace@MIT

Text-to-picture tools, systems, and approaches: a survey

Author: Al Ja’am J.
Saleh M.
Zakraoui J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Text-to-picture systems attempt to facilitate high-level, user-friendly communication between humans and computers while promoting understanding of natural language. These systems interpret a natural language text and transform it into a visual format as pictures or images that are either static or dynamic. In this paper, we aim to identify current difficulties and the main problems faced by prior systems, and in particular, we seek to investigate the feasibility of automatic visualization of Arabic story text through multimedia. Hence, we analyzed a number of well-known text-to-picture systems, tools, and approaches. We showed their constituent steps, such as knowledge extraction, mapping, and image layout, as well as their performance and limitations. We also compared these systems based on a set of criteria, mainly natural language processing, natural language understanding, and input/output modalities. Our survey showed that currently emerging techniques in natural language processing tools and computer vision have made promising advances in analyzing general text and understanding images and videos. Furthermore, important remarks and findings have been deduced from these prior works, which would help in developing an effective text-to-picture system for learning and educational purposes. - 2019, The Author(s).This work was made possible by NPRP grant #10-0205-170346 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors

Qatar University Institutional Repository

Recommended from our members

From database to knowledge graph — using data in chemistry

Author: Kraft M
Krdzavac NB
Menon A
Publication venue: Current Opinion in Chemical Engineering
Publication date: 01/01/2019
Field of study

Apollo (Cambridge)

Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses

Author: AM Khan
B McBride
DA Benson
DL McGuinness
E Ghedin
E Neumann
F Yergeau
I Horrocks
JB Bard
JC Obenauer
JR Swedlow
K Wolstencroft
M Garcia-Solaco
MY Galperin
O Miotto
O Miotto
O Miotto
Olivo Miotto
PD Karp
R Stevens
R Stevens
T Berners-Lee
Tin Wee Tan
V Brusic
Vladimir Brusic
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: The explosive growth of biological data provides opportunities for new statistical and comparative analyses of large information sets, such as alignments comprising tens of thousands of sequences. In such studies, sequence annotations frequently play an essential role, and reliable results depend on metadata quality. However, the semantic heterogeneity and annotation inconsistencies in biological databases greatly increase the complexity of aggregating and cleaning metadata. Manual curation of datasets, traditionally favoured by life scientists, is impractical for studies involving thousands of records. In this study, we investigate quality issues that affect major public databases, and quantify the effectiveness of an automated metadata extraction approach that combines structural and semantic rules. We applied this approach to more than 90,000 influenza A records, to annotate sequences with protein name, virus subtype, isolate, host, geographic origin, and year of isolation. Results: Over 40,000 annotated Influenza A protein sequences were collected by combining information from more than 90,000 documents from NCBI public databases. Metadata values were automatically extracted, aggregated and reconciled from several document fields by applying user-defined structural rules. For each property, values were recovered from ≥88.8% of records, with accuracy exceeding 96% in most cases. Because of semantic heterogeneity, each property required up to six different structural rules to be combined. Significant quality differences between databases were found: GenBank documents yield values more reliably than documents extracted from GenPept. Using a simple set of semantic rules and a reasoner, we reconstructed relationships between sequences from the same isolate, thus identifying 7640 isolates. Validation of isolate metadata against a simple ontology highlighted more than 400 inconsistencies, leading to over 3,000 property value corrections. Conclusion: To overcome the quality issues inherent in public databases, automated knowledge aggregation with embedded intelligence is needed for large-scale analyses. Our results show that user-controlled intuitive approaches, based on combination of simple rules, can reliably automate various curation tasks, reducing the need for manual corrections to approximately 5% of the records. Emerging semantic technologies possess desirable features to support today's knowledge aggregation tasks, with a potential to bring immediate benefits to this field

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

ScholarBank@NUS

University of Queensland eSpace

Lightning Fast Business Process Simulator

Author: Abel Madis
Publication venue: Tartu Ülikool
Publication date: 01/01/2011
Field of study

Äriprotsesside juhtimine on teatud hulk järjepidevalt korratavaid tegevusi alustades äriprotsessi analüüsist, millele järgneb modelleerimine, väljaarendamine, elluviimine ning jälgimine. Korrektne äriprotsesside juhtimine on aluseks efektiivsele ning produktiivsele ettevõttele ning võimaldab kiirelt muutuvas keskkonnas kohandada vastavalt ka ettevõtte äriprotsesse. Rutakalt, läbimõtlemata või –proovimata tehtud muudatused ettevõtte töövoo korralduses võivad halvemal juhul lõppeda veel ebaefektiivsemate tulemustega, mis põhjustavad oodatud kasu asemel hoopis kahju. Seetõttu on oluline tehtavaid muudatusi enne reaalset rakendamist põhjalikult analüüsida, mida omakorda saab teha läbi virtuaalse äriprotsesside simuleerimise. Protsesside simuleerimine on laialdaselt levinud metoodika katsetamaks kavandatavaid mudeleid ning analüüsimaks mõju erinevatele ettevõtte tulemuslikkuse näitajatele, mis tuleneb tehtud muudatustest. Käesoleval ajahetkel on olemas erinevaid äriprotsesside simuleerimise rakendusi nii teadusliku kallakuga kui ka kommertslahendusi nagu näiteks IBM Websphere Business Modeler, Savvion Process Modeler ja teised. Osutub aga, et olemasolevad rakendused on tihtipeale väga aeglased, nendega ei saa modelleerida või simuleerida keerukamaid äriprotsesse või need ei tule toime suuremahulisemate simulatsioonidega. Käesoleva magistritöö esimeses osas on räägitud üldiselt äriprotsesside juhtimisest, nende simuleerimisest ning olemasolevast tarkvarast. Seejärel esitletakse täiesti uut lahendust, kuidas ehitada äriprotsesside simulaator, mis toetab ka keerukamaid konstruktsioone äriprotsesside mudelite de facto esitusstandardist BPMN ning on kordi kiirem kui olemasolevad tasuliselt pakutavad simulatsioonitarkvarad. Kolmandas osas kirjeldatakse lähemalt loodud simulaatorit ja selle arhitektuuri ning viimases peatükis võrreldakse saavutatud tulemust eelpool nimetatud olemasolevate äriprotsesside simuleerimisrakendustega ja antakse ülevaade simulaatori jõudlusest üldiselt.Business process management is a discipline to make an organization’s workflow more efficient and more capable of adapting to changes in an ever-changing global environment. Making changes in real-life business processes could lead to undesired results if potential impact of change is not completely analyzed before the changes are applied. Business process simulation is a widely used technique for analyzing business process models with respect to performance metrics such as cycle time, cost and resource utilization before putting them to production. Many commercial state of the art business process modeling tools incorporate a simulation component, e.g. IBM Websphere Business Modeler, Savvion Process Modeler and others. However, these process simulators are often slow, cannot simulate complex real-life business processes and sometimes cannot even deal with large-scale simulations. For example, it is not possible to simulate process models with sub-processes, intermediate events or inclusive merge gateways (Or-joins). The objective is to build a lightning fast business process simulator engine which could also handle advanced constructions in the process models that are used to represent real-life processes. The simulator is designed and implemented from scratch in the Java programming language and it will support the simulation of business process models defined in the BPMN 2.0 standard. This work presents a novel approach to business process simulation field by using the architecture of a scalable and high-performance business process simulation engine. The contribution of this thesis is a set of design principles, architecture supporting simulations of models containing advanced BPMN constructions like loops, sub-processes, intermediate events and Or-joins

DSpace at Tartu University Library

Accelerating data retrieval steps in XML documents

Author: Shen Yun
Publication venue
Publication date: 01/01/2005
Field of study

Repository@Hull - Worktribe

Dagstuhl News January - December 2008

Author: Wilhelm Reinhard
Publication venue: Dagstuhl Publications. Dagstuhl News
Publication date: 01/01/2008
Field of study

"Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

Dagstuhl Research Online Publication Server