Search CORE

27,866 research outputs found

Entity Identification Problem in Big and Open Data

Author: Domínguez Mayo Francisco José
Escalona Cuaresma María José
González Enríquez José
Goto Masatomo
Lee Vivian
Publication venue: ScitePress Digital Library
Publication date: 01/01/2015
Field of study

Big and Open Data provide great opportunities to businesses to enhance their competitive advantages if utilized properly. However, during past few years’ research in Big and Open Data process, we have encountered big challenge in entity identification reconciliation, when trying to establish accurate relationships between entities from different data sources. In this paper, we present our innovative Intelligent Reconciliation Platform and Virtual Graphs solution that addresses this issue. With this solution, we are able to efficiently extract Big and Open Data from heterogeneous source, and integrate them into a common analysable format. Further enhanced with the Virtual Graphs technology, entity identification reconciliation is processed dynamically to produce more accurate result at system runtime. Moreover, we believe that our technology can be applied to a wide diversity of entity identification problems in several domains, e.g., e- Health, cultural heritage, and company identities in financial world.Ministerio de Ciencia e Innovación TIN2013-46928-C3-3-

idUS. Depósito de Investigación Universidad de Sevilla

The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch

Author: McCollum Bruce
Pesenson Isaac Z.
Pesenson Meyer Z.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2010
Field of study

Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in Astronomy, special issue "Robotic Astronomy

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Caltech Authors

PABED A Tool for Big Education Data Analysis

Author: Alam Mansaf
Khan Samiya
Shakil Kashish Ara
Publication venue
Publication date: 31/07/2018
Field of study

Cloud computing and big data have risen to become the most popular technologies of the modern world. Apparently, the reason behind their immense popularity is their wide range of applicability as far as the areas of interest are concerned. Education and research remain one of the most obvious and befitting application areas. This research paper introduces a big data analytics tool, PABED Project Analyzing Big Education Data, for the education sector that makes use of cloud-based technologies. This tool is implemented using Google BigQuery and R programming language and allows comparison of undergraduate enrollment data for different academic years. Although, there are many proposed applications of big data in education, there is a lack of tools that can actualize the concept into practice. PABED is an effort in this direction. The implementation and testing details of the project have been described in this paper. This tool validates the use of cloud computing and big data technologies in education and shall head start development of more sophisticated educational intelligence tools

arXiv.org e-Print Archive

Crossref

A Security Monitoring Framework For Virtualization Based HEP Infrastructures

Author: Betev L.
Grigoras C.
Kebschull U.
Lara C.
Pedreira M. Martinez
Ramirez A. Gomez
Publication venue: 'IOP Publishing'
Publication date: 16/04/2017
Field of study

High Energy Physics (HEP) distributed computing infrastructures require automatic tools to monitor, analyze and react to potential security incidents. These tools should collect and inspect data such as resource consumption, logs and sequence of system calls for detecting anomalies that indicate the presence of a malicious agent. They should also be able to perform automated reactions to attacks without administrator intervention. We describe a novel framework that accomplishes these requirements, with a proof of concept implementation for the ALICE experiment at CERN. We show how we achieve a fully virtualized environment that improves the security by isolating services and Jobs without a significant performance impact. We also describe a collected dataset for Machine Learning based Intrusion Prevention and Detection Systems on Grid computing. This dataset is composed of resource consumption measurements (such as CPU, RAM and network traffic), logfiles from operating system services, and system call data collected from production Jobs running in an ALICE Grid test site and a big set of malware. This malware was collected from security research sites. Based on this dataset, we will proceed to develop Machine Learning algorithms able to detect malicious Jobs.Comment: Proceedings of the 22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016, 10-14 October 2016, San Francisco. Submitted to Journal of Physics: Conference Series (JPCS

arXiv.org e-Print Archive

CERN Document Server

Large Graph Analysis in the GMine System

Author: Faloutsos Christos
Pan Jia-Yu
Rodrigues Jr. Jose F.
Tong Hanghang
Traina Jr. Caetano
Traina Agma J. M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/05/2015
Field of study

Current applications have produced graphs on the order of hundreds of thousands of nodes and millions of edges. To take advantage of such graphs, one must be able to find patterns, outliers and communities. These tasks are better performed in an interactive environment, where human expertise can guide the process. For large graphs, though, there are some challenges: the excessive processing requirements are prohibitive, and drawing hundred-thousand nodes results in cluttered images hard to comprehend. To cope with these problems, we propose an innovative framework suited for any kind of tree-like graph visual design. GMine integrates (a) a representation for graphs organized as hierarchies of partitions - the concepts of SuperGraph and Graph-Tree; and (b) a graph summarization methodology - CEPS. Our graph representation deals with the problem of tracing the connection aspects of a graph hierarchy with sub linear complexity, allowing one to grasp the neighborhood of a single node or of a group of nodes in a single click. As a proof of concept, the visual environment of GMine is instantiated as a system in which large graphs can be investigated globally and locally

arXiv.org e-Print Archive

CiteSeerX

Communication Theoretic Data Analytics

Author: Chen Kwang-Cheng
Huang Shao-Lun
Poor H. Vincent
Zheng Lizhong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/01/2015
Field of study

Widespread use of the Internet and social networks invokes the generation of big data, which is proving to be useful in a number of applications. To deal with explosively growing amounts of data, data analytics has emerged as a critical technology related to computing, signal processing, and information networking. In this paper, a formalism is considered in which data is modeled as a generalized social network and communication theory and information theory are thereby extended to data analytics. First, the creation of an equalizer to optimize information transfer between two data variables is considered, and financial data is used to demonstrate the advantages. Then, an information coupling approach based on information geometry is applied for dimensionality reduction, with a pattern recognition example to illustrate the effectiveness. These initial trials suggest the potential of communication theoretic data analytics for a wide range of applications.Comment: Published in IEEE Journal on Selected Areas in Communications, Jan. 201

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Indonesian Innovations on Information Technology 2013: Between Syntactic and Semantic Textual Network\ud

Author: Situngkir Hokky
Publication venue
Publication date: 15/09/2013
Field of study

Network and graph model is a good alternative to analyze huge collective textual data for the ability to reduce the dimensionality of the data. Texts can be seen as syntactic and semantic network among words and phrases seen as concepts. The model is implemented to observe the proposals of Indonesian innovators for implementation of information technology. From the analysis some interesting insights are outlined

Munich RePEc Personal Archive

CogPrints Cognitive Sciences Eprint Archive