Search CORE

1,092 research outputs found

A posteriori metadata from automated provenance tracking: Integration of AiiDA and TCOD

Author: Cepellotti Andrea
Gražulis Saulius
Marzari Nicola
Merkys Andrius
Mounet Nicolas
Pizzi Giovanni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/06/2017
Field of study

In order to make results of computational scientific research findable, accessible, interoperable and re-usable, it is necessary to decorate them with standardised metadata. However, there are a number of technical and practical challenges that make this process difficult to achieve in practice. Here the implementation of a protocol is presented to tag crystal structures with their computed properties, without the need of human intervention to curate the data. This protocol leverages the capabilities of AiiDA, an open-source platform to manage and automate scientific computational workflows, and TCOD, an open-access database storing computed materials properties using a well-defined and exhaustive ontology. Based on these, the complete procedure to deposit computed data in the TCOD database is automated. All relevant metadata are extracted from the full provenance information that AiiDA tracks and stores automatically while managing the calculations. Such a protocol also enables reproducibility of scientific data in the field of computational materials science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170 theoretical structures together with their computed properties and their full provenance graphs, consisting in over 4600 AiiDA nodes

arXiv.org e-Print Archive

Directory of Open Access Journals

Natural Language Processing (NLP) – A Solution for Knowledge Extraction from Patent Unstructured Data

Author: Cavallucci Denis
Rousselot François
Souili Achille
Publication venue: The Authors. Published by Elsevier Ltd.
Publication date: 31/12/2015
Field of study

AbstractPatents are valuable source of knowledge and are extremely important for assisting engineers and decisions makers through the inventive process. This paper describes a new approach of automatic extraction of IDM (Inventive Design Method) related knowledge from patent documents. IDM derives from TRIZ, the theory of Inventive problem solving, which is largely based on patent's observation to theorize the act of inventing. Our method mainly consists in using natural language techniques (NLP) to match and extract knowledge relevant to IDM Ontology. The purpose of this paper is to investigate on the contribution of NLP techniques to effective knowledge extraction from patent documents. We propose in this paper to firstly report on progress made so far in data mining before describing our approach

Elsevier - Publisher Connector

Bridging the gap between folksonomies and the semantic web: an experience report

Author: Angeletou Sofia
Motta Enrico
Sabou Marta
Specia Lucia
Publication venue
Publication date: 01/01/2007
Field of study

Abstract. While folksonomies allow tagging of similar resources with a variety of tags, their content retrieval mechanisms are severely hampered by being agnostic to the relations that exist between these tags. To overcome this limitation, several methods have been proposed to find groups of implicitly inter-related tags. We believe that content retrieval can be further improved by making the relations between tags explicit. In this paper we propose the semantic enrichment of folksonomy tags with explicit relations by harvesting the Semantic Web, i.e., dynamically selecting and combining relevant bits of knowledge from online ontologies. Our experimental results show that, while semantic enrichment needs to be aware of the particular characteristics of folksonomies and the Semantic Web, it is beneficial for both.

CiteSeerX

Open Research Online (The Open University)

Preparing Laboratory and Real-World EEG Data for Large-Scale Analysis: A Containerized Approach.

Author: Bigdely-Shamlo Nima
Makeig Scott
Robbins Kay A
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Large-scale analysis of EEG and other physiological measures promises new insights into brain processes and more accurate and robust brain-computer interface models. However, the absence of standardized vocabularies for annotating events in a machine understandable manner, the welter of collection-specific data organizations, the difficulty in moving data across processing platforms, and the unavailability of agreed-upon standards for preprocessing have prevented large-scale analyses of EEG. Here we describe a "containerized" approach and freely available tools we have developed to facilitate the process of annotating, packaging, and preprocessing EEG data collections to enable data sharing, archiving, large-scale machine learning/data mining and (meta-)analysis. The EEG Study Schema (ESS) comprises three data "Levels," each with its own XML-document schema and file/folder convention, plus a standardized (PREP) pipeline to move raw (Data Level 1) data to a basic preprocessed state (Data Level 2) suitable for application of a large class of EEG analysis methods. Researchers can ship a study as a single unit and operate on its data using a standardized interface. ESS does not require a central database and provides all the metadata data necessary to execute a wide variety of EEG processing pipelines. The primary focus of ESS is automated in-depth analysis and meta-analysis EEG studies. However, ESS can also encapsulate meta-information for the other modalities such as eye tracking, that are increasingly used in both laboratory and real-world neuroimaging. ESS schema and tools are freely available at www.eegstudy.org and a central catalog of over 850 GB of existing data in ESS format is available at studycatalog.org. These tools and resources are part of a larger effort to enable data sharing at sufficient scale for researchers to engage in truly large-scale EEG analysis and data mining (BigEEG.org)

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

eScholarship - University of California

Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes

Author: Ammerman Jack
Benedetti Dan
Green Garth W.
Zafrin Vika
Publication venue
Publication date: 30/11/2011
Field of study

In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.NEH Office of Digital Humanities: Digital Humanities Start-Up Grant (HD-51166-10

Boston University Institutional Repository (OpenBU)

SecREP : A Framework for Automating the Extraction and Prioritization of Security Requirements Using Machine Learning and NLP Techniques

Author: Khanneh Shada
Publication venue: Montclair State University Digital Commons
Publication date: 01/08/2022
Field of study

Gathering and extracting security requirements adequately requires extensive effort, experience, and time, as large amounts of data need to be analyzed. While many manual and academic approaches have been developed to tackle the discipline of Security Requirements Engineering (SRE), a need still exists for automating the SRE process. This need stems mainly from the difficult, error-prone, and time-consuming nature of traditional and manual frameworks. Machine learning techniques have been widely used to facilitate and automate the extraction of useful information from software requirements documents and artifacts. Such approaches can be utilized to yield beneficial results in automating the process of extracting and eliciting security requirements. However, the extraction of security requirements alone leaves software engineers with yet another tedious task of prioritizing the most critical security requirements. The competitive and fast-paced nature of software development, in addition to resource constraints make the process of security requirements prioritization crucial for software engineers to make educated decisions in risk-analysis and trade-off analysis. To that end, this thesis presents an automated framework/pipeline for extracting and prioritizing security requirements. The proposed framework, called the Security Requirements Extraction and Prioritization Framework (SecREP) consists of two parts: SecREP Part 1: Proposes a machine learning approach for identifying/extracting security requirements from natural language software requirements artifacts (e.g., the Software Requirement Specification document, known as the SRS documents) SecREP Part 2: Proposes a scheme for prioritizing the security requirements identified in the previous step. For the first part of the SecREP framework, three machine learning models (SVM, Naive Bayes, and Random Forest) were trained using an enhanced dataset the “SecREP Dataset” that was created as a result of this work. Each model was validated using resampling (80% of for training and 20% for validation) and 5-folds cross validation techniques. For the second part of the SecREP framework, a prioritization scheme was established with the aid of NLP techniques. The proposed prioritization scheme analyzes each security requirement using Part-of-speech (POS) and Named Entity Recognition methods to extract assets, security attributes, and threats from the security requirement. Additionally, using a text similarity method, each security requirement is compared to a super-sentence that was defined based on the STRIDE threat model. This prioritization scheme was applied to the extracted list of security requirements obtained from the case study in part one, and the priority score for each requirement was calculated and showcase

Montclair State University Digital Commons

Sentiment Analysis Meets Semantic Analysis: Constructing Insight Knowledge Bases

Author: Jabr Wael
Qi Zirun
Storey Veda
Publication venue: AIS Electronic Library (AISeL)
Publication date: 13/12/2015
Field of study

Numerous Web 2.0 applications collect user opinions, and other user-generated content in the form of product reviews, discussion boards, and blogs, which are often captured as unstructured data. Text mining techniques are important for analyzing users’ opinions (sentiment analysis) and identifying topics of interest (semantic analysis). However, little work has been carried out that combines semantics with user’s sentiments. This research proposes a Sentiment-Semantic Framework that incorporates results from both semantic and sentiment analysis to construct a knowledge base of insights gained from integrating the information extracted from each type of analysis. To evaluate the framework, a prototype is developed and applied to two different domains (e-commerce and politics) and the resulting insight knowledge bases constructed

AIS Electronic Library (AISeL)

Automatic Transformation of Natural to Unified Modeling Language: A Systematic Review

Author: Ahmed Arif
Ahmed Sharif
Eisty Nasir U.
Publication venue
Publication date: 01/01/2022
Field of study

Context: Processing Software Requirement Specifications (SRS) manually takes a much longer time for requirement analysts in software engineering. Researchers have been working on making an automatic approach to ease this task. Most of the existing approaches require some intervention from an analyst or are challenging to use. Some automatic and semi-automatic approaches were developed based on heuristic rules or machine learning algorithms. However, there are various constraints to the existing approaches of UML generation, such as restriction on ambiguity, length or structure, anaphora, incompleteness, atomicity of input text, requirements of domain ontology, etc. Objective: This study aims to better understand the effectiveness of existing systems and provide a conceptual framework with further improvement guidelines. Method: We performed a systematic literature review (SLR). We conducted our study selection into two phases and selected 70 papers. We conducted quantitative and qualitative analyses by manually extracting information, cross-checking, and validating our findings. Result: We described the existing approaches and revealed the issues observed in these works. We identified and clustered both the limitations and benefits of selected articles. Conclusion: This research upholds the necessity of a common dataset and evaluation framework to extend the research consistently. It also describes the significance of natural language processing obstacles researchers face. In addition, it creates a path forward for future research

arXiv.org e-Print Archive

Boise State University - ScholarWorks

requirements and use cases

Author: Coskun Gökhan
Heese Ralf
Luczak-Rösch Markus
Oldakowski Radoslaw
Schäfermeier Ralph
Streibel Olga
Publication venue
Publication date: 01/01/2008
Field of study

In this report, we introduce our initial vision of the Corporate Semantic Web as the next step in the broad field of Semantic Web research. We identify requirements of the corporate environment and gaps between current approaches to tackle problems facing ontology engineering, semantic collaboration, and semantic search. Each of these pillars will yield innovative methods and tools during the project runtime until 2013. Corporate ontology engineering will improve the facilitation of agile ontology engineering to lessen the costs of ontology development and, especially, maintenance. Corporate semantic collaboration focuses the human-centered aspects of knowledge management in corporate contexts. Corporate semantic search is settled on the highest application level of the three research areas and at that point it is a representative for applications working on and with the appropriately represented and delivered background knowledge. We propose an initial layout for an integrative architecture of a Corporate Semantic Web provided by these three core pillars

Institutional Repository of the Freie Universität Berlin