Search CORE

3,531 research outputs found

Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

Author: Zhuge Hai
Publication venue
Publication date: 18/07/2015
Field of study

Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

How the Structure of the Constraint Space Enables Learning

Author: Chandlee Jane
Eyraud Remi
Heinz Jeffrey
Jardine Adam
Rawski Jonathan
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2018
Field of study

International audienc

ScholarWorks@UMass Amherst

HAL AMU

Data Driven Approach To Saltwater Disposal (SWD) Well Location Optimization In North Dakota

Author: Venugopal Kalyanaraman
Publication venue: UND Scholarly Commons
Publication date: 01/01/2022
Field of study

The sharp increase in oil and gas production in the Williston Basin of North Dakota since 2006 has resulted in a significant increase in produced water volumes. Primary mechanism for disposal of produced water is by injection into underground Inyan Kara formation through Class-II Saltwater Disposal (SWD) wells. With number of SWD wells anticipated to increase from 900 to over 1400 by 2035, localized pressurization and other potential issues that could affect performance of future oil and SWD wells, there was a need for a reliable model to select locations of future SWD wells for optimum performance. Since it is uncommon to develop traditional geological and simulation models for SWD wells, this research focused on developing data-driven proxy models based on the CRISP-Data Mining pipeline for understanding SWD well performance and optimizing future well locations. NDIC’s oil and gas division was identified as the primary data source. Significant efforts went towards identifying other secondary data sources, extracting required data from primary and secondary data sources using web scraping, integrating different data types including spatial data and creating the final data set. Orange visual programming application and Python programming language were used to carry out the required data mining activities. Exploratory Data Analysis and clustering analysis were used to gain a good understanding of the features in the data set and their relationships. Graph Data Science techniques such as Knowledge Graphs and graph-based clustering were used to gain further insights. Machine Learning regression algorithms such as Multi-Linear Regression, k-Nearest Neighbors and Random Forest were used to train machine learning models to predict average monthly barrels of saltwater disposed in a well. Model performance was optimized using the RMSE metric and the Random Forest model was selected as the final model for deployment to predict performance of a planned SWD well. A multi-target regression model was trained using deep neural network to predict water production in oil and gas wells drilled in the McKenzie county of North Dakota

UND Scholarly Commons (University of North Dakota)

Ontology based data warehousing for mining of heterogeneous and multidimensional data sources

Author: Nimmagadda Shastri Lakshman
Publication venue: Curtin University
Publication date: 01/01/2015
Field of study

Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals

espace@Curtin

Machine Translation Vs. Multilingual Dictionaries Assessing Two Strategies for the Topic Modeling of Multilingual Text Collections

Author: Baden Christian
De Vries-Kedem Maya
Maier Daniel
Stoltenberg Daniela
Waldherr Annie
Publication venue
Publication date: 01/01/2022
Field of study

The goal of this paper is to evaluate two methods for the topic modeling of multilingual document collections: (1) machine translation (MT), and (2) the coding of semantic concepts using a multilingual dictionary (MD) prior to topic modeling. We empirically assess the consequences of these approaches based on both a quantitative comparison of models and a qualitative validation of each method’s potentials and weaknesses. Our case study uses two text collections (of tweets and news articles) in three languages (English, Hebrew, Arabic), covering the ongoing local conflicts between Israeli authorities, settlers, and Palestinian Bedouins in the West Bank. We find that both methods produce a large share of equivalent topics, especially in the context of fairly homogenous news discourse, yet show limited but systematic differences when applied to highly heterogenous social media discourse. While the MD model delivers a more nuanced picture of conflict-related topics, it misses several more peripheral topics, especially those unrelated to the dictionary’s focus, which are picked up by the MT model. Our study is a first step toward instrument validation, indicating that both methods yield valid, comparable results, while method-specific differences remain

Institutional Repository of the Freie Universität Berlin

Some resonances between Eastern thought and Integral Biomathics in the framework of the WLIMES formalism for modelling living systems

Author: Ehresmann Andree C.
Simeonov Plamen L.
Publication venue
Publication date
Field of study

Forty-two years ago, Capra published “The Tao of Physics” (Capra, 1975). In this book (page 17) he writes: “The exploration of the atomic and subatomic world in the twentieth century has …. necessitated a radical revision of many of our basic concepts” and that, unlike ‘classical’ physics, the sub-atomic and quantum “modern physics” shows resonances with Eastern thoughts and “leads us to a view of the world which is very similar to the views held by mystics of all ages and traditions.“ This article stresses an analogous situation in biology with respect to a new theoretical approach for studying living systems, Integral Biomathics (IB), which also exhibits some resonances with Eastern thought. Stepping on earlier research in cybernetics1 and theoretical biology,2 IB has been developed since 2011 by over 100 scientists from a number of disciplines who have been exploring a substantial set of theoretical frameworks. From that effort, the need for a robust core model utilizing advanced mathematics and computation adequate for understanding the behavior of organisms as dynamic wholes was identified. At this end, the authors of this article have proposed WLIMES (Ehresmann and Simeonov, 2012), a formal theory for modeling living systems integrating both the Memory Evolutive Systems (Ehresmann and Vanbremeersch, 2007) and the Wandering Logic Intelligence (Simeonov, 2002b). Its principles will be recalled here with respect to their resonances to Eastern thought

PhilPapers

Classifier System Learning of Good Database Schema

Author: Tanaka Mitsuru
Publication venue: ScholarWorks@UNO
Publication date: 07/08/2008
Field of study

This thesis presents an implementation of a learning classifier system which learns good database schema. The system is implemented in Java using the NetBeans development environment, which provides a good control for the GUI components. The system contains four components: a user interface, a rule and message system, an apportionment of credit system, and genetic algorithms. The input of the system is a set of simple database schemas and the objective for the classifier system is to keep the good database schemas which are represented by classifiers. The learning classifier system is given some basic knowledge about database concepts or rules. The result showed that the system could decrease the bad schemas and keep the good ones

University of New Orleans

Classifier System Learning of Good Database Schema

Author: Tanaka Mitsuru
Publication venue: ScholarWorks@UNO
Publication date: 07/08/2008
Field of study

G-Complexity, Quantum Computation and Anticipatory Processes

Author: Nadin Mihai
Publication venue
Publication date: 01/01/2014
Field of study

PhilPapers