32,725 research outputs found
Mining Measured Information from Text
We present an approach to extract measured information from text (e.g., a
1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such
extractions are critically important across a wide range of domains -
especially those involving search and exploration of scientific and technical
documents. We first propose a rule-based entity extractor to mine measured
quantities (i.e., a numeric value paired with a measurement unit), which
supports a vast and comprehensive set of both common and obscure measurement
units. Our method is highly robust and can correctly recover valid measured
quantities even when significant errors are introduced through the process of
converting document formats like PDF to plain text. Next, we describe an
approach to extracting the properties being measured (e.g., the property "pixel
pitch" in the phrase "a pixel pitch as high as 352 {\mu}m"). Finally, we
present MQSearch: the realization of a search engine with full support for
measured information.Comment: 4 pages; 38th International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR '15
The Principles Of Developing A Management Decision Support System For Scientific Employees
Employees engaged in mental work have become the most valuable assets of any organization in the 21st century. The satisfaction of those involved in mental work requires the provision of objectivity and transparency in their decision-making. This, in turn, entails the development of scientifically motivated decision making mechanisms and scientific-methodological approaches to evaluate their performance based on innovative technologies.The main goal of this article is in development of the scientific and methodological framework for the establishment of a decision support system to manage the employees engaged in mental work and operating in uncertainty. In this regard, initially, the question of evaluating the activities of scientific workers is examined, its characteristic features are determined, and the fuzzy relation model is proposed as a multi-criterion issue formed in uncertainty. Taking into consideration the hierarchical structure of the criteria that allows evaluating the activities of scientific workers, a phased solution method based on an additive aggregation method is proposed. In accordance with the methodology, a functional scheme of the decision support system to manage the scientific personnel is developed. The working principle of each block and the interaction of the blocks are described. The rules for the employees\u27 management decisions are shown by referring to the knowledge production model.Based on the proposed methodological approach, the implementation phases of the decision support system for the management of the scientific workers of the Institute of Information Technology of ANAS are described. To evaluate the employees\u27 performance, the tools to collect initial information, evaluate the system of criteria, define their importance coefficients and mathematical descriptions are provided. Some results of the system software are presented. The opportunities of the system based on the proposed methodology to support enterprise mangers to make scientifically justified decisions are provided
A methodological approach to developing the model of correlation between economic development and environmental efficiency on the basis of company's non-financial reports
Having reviewed the most widely used international non-financial reporting standards, GRI was identified as the optimal standard for the Russian context. The environmental component of the GRI G4 guidelines and the contribution of each aspect to the overall sustainability picture were analysed. Over time, the value of biological resources increases, and therefore, a company’s economic development cannot continue in isolation. To determine the degree of harmony between economic development and ecological condition of the territories involved, new approaches and methods are required. Based on statistical methods, a model of correlation between economic development and environmental efficiency was developed that uses non-financial reporting data. The model can be used by oil and gas companies, and its general principles — by other industries. The results may interest stakeholders and serve as a platform for forecasting and making administrative decisions aimed at achieving harmony between economic development and environmental efficiency. The model was tested on the largest oil and gas Russian company “Surgutneftegaz” data. A positive correlation was shown between the two systems of its sustainable development: economy and ecology. The results obtained demonstrate the company’s strong commitment to conservation. Further research may yield more profound results, contributing to broader sustainable development
Mathematical Formula Recognition and Automatic Detection and Translation of Algorithmic Components into Stochastic Petri Nets in Scientific Documents
A great percentage of documents in scientific and engineering disciplines include mathematical formulas and/or algorithms. Exploring the mathematical formulas in the technical documents, we focused on the mathematical operations associations, their syntactical correctness, and the association of these components into attributed graphs and Stochastic Petri Nets (SPN). We also introduce a formal language to generate mathematical formulas and evaluate their syntactical correctness. The main contribution of this work focuses on the automatic segmentation of mathematical documents for the parsing and analysis of detected algorithmic components. To achieve this, we present a synergy of methods, such as string parsing according to mathematical rules, Formal Language Modeling, optical analysis of technical documents in forms of images, structural analysis of text in images, and graph and Stochastic Petri Net mapping. Finally, for the recognition of the algorithms, we enriched our rule based model with machine learning techniques to acquire better results
Recommended from our members
HOLMES: A Hybrid Ontology-Learning Materials Engineering System
Designing and discovering novel materials is challenging problem in many domains such as fuel additives, composites, pharmaceuticals, and so on. At the core of all this are models that capture how the different domain-specific data, information, and knowledge regarding the structures and properties of the materials are related to one another. This dissertation explores the difficult task of developing an artificial intelligence-based knowledge modeling environment, called Hybrid Ontology-Learning Materials Engineering System (HOLMES) that can assist humans in populating a materials science and engineering ontology through automatic information extraction from journal article abstracts. While what we propose may be adapted for a generic materials engineering application, our focus in this thesis is on the needs of the pharmaceutical industry. We develop the Columbia Ontology for Pharmaceutical Engineering (COPE), which is a modification of the Purdue Ontology for Pharmaceutical Engineering. COPE serves as the basis for HOLMES.
The HOLMES framework starts with journal articles that are in the Portable Document Format (PDF) and ends with the assignment of the entries in the journal articles into ontologies. While this might seem to be a simple task of information extraction, to fully extract the information such that the ontology is filled as completely and correctly as possible is not easy when considering a fully developed ontology.
In the development of the information extraction tasks, we note that there are new problems that have not arisen in previous information extraction work in the literature. The first is the necessity to extract auxiliary information in the form of concepts such as actions, ideas, problem specifications, properties, etc. The second problem is in the existence of multiple labels for a single token due to the existence of the aforementioned concepts. These two problems are the focus of this dissertation.
In this work, the HOLMES framework is presented as a whole, describing our successful progress as well as unsolved problems, which might help future research on this topic. The ontology is then presented to help in the identification of the relevant information that needs to be retrieved. The annotations are next developed to create the data sets necessary for the machine learning algorithms to perform. Then, the current level of information extraction for these concepts is explored and expanded. This is done through the introduction of entity feature sets that are based on previously extracted entities from the entity recognition task. And finally, the new task of handling multiple labels for tagging a single entity is also explored by the use of multiple-label algorithms used primarily in image processing
Making Presentation Math Computable
This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This is an open access book
Conceptual graph-based knowledge representation for supporting reasoning in African traditional medicine
Although African patients use both conventional or modern and traditional healthcare simultaneously, it has been proven that 80% of people rely on African traditional medicine (ATM). ATM includes medical activities stemming from practices, customs and traditions which were integral to the distinctive African cultures. It is based mainly on the oral transfer of knowledge, with the risk of losing critical knowledge. Moreover, practices differ according to the regions and the availability of medicinal plants. Therefore, it is necessary to compile tacit, disseminated and complex knowledge from various Tradi-Practitioners (TP) in order to determine interesting patterns for treating a given disease. Knowledge engineering methods for traditional medicine are useful to model suitably complex information needs, formalize knowledge of domain experts and highlight the effective practices for their integration to conventional medicine. The work described in this paper presents an approach which addresses two issues. First it aims at proposing a formal representation model of ATM knowledge and practices to facilitate their sharing and reusing. Then, it aims at providing a visual reasoning mechanism for selecting best available procedures and medicinal plants to treat diseases. The approach is based on the use of the Delphi method for capturing knowledge from various experts which necessitate reaching a consensus. Conceptual graph formalism is used to model ATM knowledge with visual reasoning capabilities and processes. The nested conceptual graphs are used to visually express the semantic meaning of Computational Tree Logic (CTL) constructs that are useful for formal specification of temporal properties of ATM domain knowledge. Our approach presents the advantage of mitigating knowledge loss with conceptual development assistance to improve the quality of ATM care (medical diagnosis and therapeutics), but also patient safety (drug monitoring)
- …