6,441 research outputs found
Multiple Retrieval Models and Regression Models for Prior Art Search
This paper presents the system called PATATRAS (PATent and Article Tracking,
Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach
presents three main characteristics: 1. The usage of multiple retrieval models
(KL, Okapi) and term index definitions (lemma, phrase, concept) for the three
languages considered in the present track (English, French, German) producing
ten different sets of ranked results. 2. The merging of the different results
based on multiple regression models using an additional validation set created
from the patent collection. 3. The exploitation of patent metadata and of the
citation structures for creating restricted initial working sets of patents and
for producing a final re-ranking regression model. As we exploit specific
metadata of the patent documents and the citation relations only at the
creation of initial working sets and during the final post ranking step, our
architecture remains generic and easy to extend
Tacit knowledge elicitation process for industry 4.0
Manufacturers migrate their processes to Industry 4.0, which includes new technologies for improving productivity and efficiency of operations. One of the issues is capturing, recreating, and documenting the tacit knowledge of the aging workers. However, there are no systematic procedures to incorporate this knowledge into Enterprise Resource Planning systems and maintain a competitive advantage. This paper describes a solution proposal for a tacit knowledge elicitation process for capturing operational best practices of experienced workers in industrial domains based on a mix of algorithmic techniques and a cooperative game. We use domain ontologies for Industry 4.0 and reasoning techniques to discover and integrate new facts from textual sources into an Operational Knowledge Graph. We describe a concepts formation iterative process in a role game played by human and virtual agents through socialization and externalization for knowledge graph refinement. Ethical and societal concerns are discussed as well
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
Technological taxonomies for hypernym and hyponym retrieval in patent texts
This paper presents an automatic approach to creating taxonomies of technical
terms based on the Cooperative Patent Classification (CPC). The resulting
taxonomy contains about 170k nodes in 9 separate technological branches and is
freely available. We also show that a Text-to-Text Transfer Transformer (T5)
model can be fine-tuned to generate hypernyms and hyponyms with relatively high
precision, confirming the manually assessed quality of the resource. The T5
model opens the taxonomy to any new technological terms for which a hypernym
can be generated, thus making the resource updateable with new terms, an
essential feature for the constantly evolving field of technological
terminology.Comment: ToTh 2022 - Terminology & Ontology: Theories and applications, Jun
2022, Chamb{\'e}ry, Franc
Applied Evaluative Informetrics: Part 1
This manuscript is a preprint version of Part 1 (General Introduction and
Synopsis) of the book Applied Evaluative Informetrics, to be published by
Springer in the summer of 2017. This book presents an introduction to the field
of applied evaluative informetrics, and is written for interested scholars and
students from all domains of science and scholarship. It sketches the field's
history, recent achievements, and its potential and limits. It explains the
notion of multi-dimensional research performance, and discusses the pros and
cons of 28 citation-, patent-, reputation- and altmetrics-based indicators. In
addition, it presents quantitative research assessment as an evaluation
science, and focuses on the role of extra-informetric factors in the
development of indicators, and on the policy context of their application. It
also discusses the way forward, both for users and for developers of
informetric tools.Comment: The posted version is a preprint (author copy) of Part 1 (General
Introduction and Synopsis) of a book entitled Applied Evaluative
Bibliometrics, to be published by Springer in the summer of 201
Using ontologies to map between research data and policymakers’ presumptions: the experience of the KNOWMAK project
Understanding knowledge co-creation in key emerging areas of European research is critical for policy makers wishing to analyze impact and make strategic decisions. However, purely data-driven methods for characterising policy topics have limitations relating to the broad nature of such topics and the differences in language and topic structure between the political language and scientific and technological outputs. In this paper, we discuss the use of ontologies and semantic technologies as a means to bridge the linguistic and conceptual gap between policy questions and data sources for characterising European knowledge production. Our experience suggests that the integration between advanced techniques for language processing and expert assessment at critical junctures in the process is key for the success of this endeavour
LODE: Linking Digital Humanities Content to the Web of Data
Numerous digital humanities projects maintain their data collections in the
form of text, images, and metadata. While data may be stored in many formats,
from plain text to XML to relational databases, the use of the resource
description framework (RDF) as a standardized representation has gained
considerable traction during the last five years. Almost every digital
humanities meeting has at least one session concerned with the topic of digital
humanities, RDF, and linked data. While most existing work in linked data has
focused on improving algorithms for entity matching, the aim of the
LinkedHumanities project is to build digital humanities tools that work "out of
the box," enabling their use by humanities scholars, computer scientists,
librarians, and information scientists alike. With this paper, we report on the
Linked Open Data Enhancer (LODE) framework developed as part of the
LinkedHumanities project. With LODE we support non-technical users to enrich a
local RDF repository with high-quality data from the Linked Open Data cloud.
LODE links and enhances the local RDF repository without compromising the
quality of the data. In particular, LODE supports the user in the enhancement
and linking process by providing intuitive user-interfaces and by suggesting
high-quality linking candidates using tailored matching algorithms. We hope
that the LODE framework will be useful to digital humanities scholars
complementing other digital humanities tools
Recommended from our members
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery
Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant’s sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes use of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. The resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for complex phenotypes, such as recalcitrance
- …