Search CORE

159,280 research outputs found

Cloud-Based Big Data Management and Analytics for Scholarly Resources: Current Trends, Challenges and Scope for Future Research

Author: Alam Mansaf
Khan Samiya
Shakil Kashish A.
Publication venue
Publication date: 06/06/2016
Field of study

With the shifting focus of organizations and governments towards digitization of academic and technical documents, there has been an increasing need to use this reserve of scholarly documents for developing applications that can facilitate and aid in better management of research. In addition to this, the evolving nature of research problems has made them essentially interdisciplinary. As a result, there is a growing need for scholarly applications like collaborator discovery, expert finding and research recommendation systems. This research paper reviews the current trends and identifies the challenges existing in the architecture, services and applications of big scholarly data platform with a specific focus on directions for future research

arXiv.org e-Print Archive

Proficiency Comparison of LADTree and REPTree Classifiers for Credit Risk Forecast

Author: C Lakshmi Devasena
Publication venue
Publication date: 23/03/2015
Field of study

Predicting the Credit Defaulter is a perilous task of Financial Industries like Banks. Ascertaining non-payer before giving loan is a significant and conflict-ridden task of the Banker. Classification techniques are the better choice for predictive analysis like finding the claimant, whether he/she is an unpretentious customer or a cheat. Defining the outstanding classifier is a risky assignment for any industrialist like a banker. This allow computer science researchers to drill down efficient research works through evaluating different classifiers and finding out the best classifier for such predictive problems. This research work investigates the productivity of LADTree Classifier and REPTree Classifier for the credit risk prediction and compares their fitness through various measures. German credit dataset has been taken and used to predict the credit risk with a help of open source machine learning tool.Comment: arXiv admin note: text overlap with arXiv:1310.5963 by other author

arXiv.org e-Print Archive

Survey of state-of-the-art mixed data clustering algorithms

Author: Ahmad Amir
Khan Shehroz S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/03/2019
Field of study

Mixed data comprises both numeric and categorical features, and mixed datasets occur frequently in many domains, such as health, finance, and marketing. Clustering is often applied to mixed datasets to find structures and to group similar objects for further analysis. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation or averaging, to the feature values of these datasets. In this paper, we present a taxonomy for the study of mixed data clustering algorithms by identifying five major research themes. We then present a state-of-the-art review of the research works within each research theme. We analyze the strengths and weaknesses of these methods with pointers for future research directions. Lastly, we present an in-depth analysis of the overall challenges in this field, highlight open research questions and discuss guidelines to make progress in the field.Comment: 20 Pages, 2 columns, 6 Tables, 209 Reference

arXiv.org e-Print Archive

Empirical Big Data Research: A Systematic Literature Mapping

Author: Mathisen Bjørn Magnus
Roman Dumitru
Wienhofen Leendert
Publication venue
Publication date: 12/10/2016
Field of study

Background: Big Data is a relatively new field of research and technology, and literature reports a wide variety of concepts labeled with Big Data. The maturity of a research field can be measured in the number of publications containing empirical results. In this paper we present the current status of empirical research in Big Data. Method: We employed a systematic mapping method with which we mapped the collected research according to the labels Variety, Volume and Velocity. In addition, we addressed the application areas of Big Data. Results: We found that 151 of the assessed 1778 contributions contain a form of empirical result and can be mapped to one or more of the 3 V's and 59 address an application area. Conclusions: The share of publications containing empirical results is well below the average compared to computer science research as a whole. In order to mature the research on Big Data, we recommend applying empirical methods to strengthen the confidence in the reported results. Based on our trend analysis we consider Volume and Variety to be the most promising uncharted area in Big Data.Comment: Submitted to Springer journal Data Science and Engineerin

arXiv.org e-Print Archive

Threshold-Based Portfolio: The Role of the Threshold and Its Applications

Author: Lee Sang Il
Yoo Seong Joon
Publication venue
Publication date: 02/08/2018
Field of study

This paper aims at developing a new method by which to build a data-driven portfolio featuring a target risk-return. We first present a comparative study of recurrent neural network models (RNNs), including a simple RNN, long short-term memory (LSTM), and gated recurrent unit (GRU) for selecting the best predictor to use in portfolio construction. The models are applied to the investment universe consisted of ten stocks in the S&P500. The experimental results shows that LSTM outperforms the others in terms of hit ratio of one-month-ahead forecasts. We then build predictive threshold-based portfolios (TBPs) that are subsets of the universe satisfying given threshold criteria for the predicted returns. The TBPs are rebalanced monthly to restore equal weights to each security within the TBPs. We find that the risk and return profile of the realized TBP represents a monotonically increasing frontier on the risk-return plane, where the equally weighted portfolio (EWP) of all ten stocks plays a role in their lower bound. This shows the availability of TBPs in targeting specific risk-return levels, and an EWP based on all the assets plays a role in the reference portfolio of TBPs. In the process, thresholds play dominant roles in characterizing risk, return, and the prediction accuracy of the subset. The TBP is more data-driven in designing portfolio target risk and return than existing ones, in the sense that it requires no prior knowledge of finance such as financial assumptions, financial mathematics, or expert insights. In a practical application, we present the TBP management procedure for a time horizon extending over multiple time periods; we also discuss their application to mean-variance portfolios to reduce estimation risk.Comment: 20 pages, 7 figure

arXiv.org e-Print Archive

Explainability in Human-Agent Systems

Author: Richardson Ariella
Rosenfeld Avi
Publication venue
Publication date: 17/04/2019
Field of study

This paper presents a taxonomy of explainability in Human-Agent Systems. We consider fundamental questions about the Why, Who, What, When and How of explainability. First, we define explainability, and its relationship to the related terms of interpretability, transparency, explicitness, and faithfulness. These definitions allow us to answer why explainability is needed in the system, whom it is geared to and what explanations can be generated to meet this need. We then consider when the user should be presented with this information. Last, we consider how objective and subjective measures can be used to evaluate the entire system. This last question is the most encompassing as it will need to evaluate all other issues regarding explainability

arXiv.org e-Print Archive

Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives

Author: Basak Sanchita
Peters II Richard Alan
Sengupta Saptarshi
Publication venue: 'MDPI AG'
Publication date: 04/01/2019
Field of study

Particle Swarm Optimization (PSO) is a metaheuristic global optimization paradigm that has gained prominence in the last two decades due to its ease of application in unsupervised, complex multidimensional problems which cannot be solved using traditional deterministic algorithms. The canonical particle swarm optimizer is based on the flocking behavior and social co-operation of birds and fish schools and draws heavily from the evolutionary behavior of these organisms. This paper serves to provide a thorough survey of the PSO algorithm with special emphasis on the development, deployment and improvements of its most basic as well as some of the state-of-the-art implementations. Concepts and directions on choosing the inertia weight, constriction factor, cognition and social weights and perspectives on convergence, parallelization, elitism, niching and discrete optimization as well as neighborhood topologies are outlined. Hybridization attempts with other evolutionary and swarm paradigms in selected applications are covered and an up-to-date review is put forward for the interested reader.Comment: 34 pages, 7 table

arXiv.org e-Print Archive

Quda: Natural Language Queries for Visual Data Analytics

Author: Chen Wei
Fu Siwei
Ge Xiaodong
Tang Siliang
Wu Yingcai
Xiong Kai
Publication venue
Publication date: 23/08/2020
Field of study

Visualization-oriented natural language interfaces (V-NLIs) have been explored and developed in recent years. One challenge faced by V-NLIs is in the formation of effective design decisions that usually requires a deep understanding of user queries. Learning-based approaches have shown potential in V-NLIs and reached state-of-the-art performance in various NLP tasks. However, because of the lack of sufficient training samples that cater to visual data analytics, cutting-edge techniques have rarely been employed to facilitate the development of V-NLIs. We present a new dataset, called Quda, to help V-NLIs understand free-form natural language. Our dataset contains 14;035 diverse user queries annotated with 10 low-level analytic tasks that assist in the deployment of state-of-the-art techniques for parsing complex human language. We achieve this goal by first gathering seed queries with data analysts who are target users of V-NLIs. Then we employ extensive crowd force for paraphrase generation and validation. We demonstrate the usefulness of Quda in building V-NLIs by creating a prototype that makes effective design decisions for free-form user queries. We also show that Quda can be beneficial for a wide range of applications in the visualization community by analyzing the design tasks described in academic publications.Comment: This work isn't sufficiently exhaustive. We need to do some new work on thi

arXiv.org e-Print Archive

Annotation Scaffolds for Object Modeling and Manipulation

Author: Frank-Bolton Pablo
Simha Rahul
Publication venue
Publication date: 20/08/2018
Field of study

We present and evaluate an approach for human-in-the-loop specification of shape reconstruction with annotations for basic robot-object interactions. Our method is based on the idea of model annotation: the addition of simple cues to an underlying object model to specify shape and delineate a simple task. The goal is to explore reducing the complexity of CAD-like interfaces so that novice users can quickly recover an object's shape and describe a manipulation task that is then carried out by a robot. The object modeling and interaction annotation capabilities are tested with a user study and compared against results obtained using existing approaches. The approach has been analyzed using a variety of shape comparison, grasping, and manipulation metrics, and tested with the PR2 robot platform, where it was shown to be successful.Comment: 31 pages, 46 Figure

arXiv.org e-Print Archive

The neurocognitive gains of diagnostic reasoning training using simulated interactive veterinary cases

Author: Nassar Maaly
Publication venue
Publication date: 01/01/2019
Field of study

The present longitudinal study ascertained training-associated transformations in the neural underpinnings of diagnostic reasoning, using a simulation game named “Equine Virtual Farm” (EVF). Twenty participants underwent structural, EVF/task-based and resting-state MRI and diffusion tensor imaging (DTI) before and after completing their training on diagnosing simulated veterinary cases. Comparing playing veterinarian versus seeing a colorful image across training sessions revealed the transition of brain activity from scientific creativity regions pre-training (left middle frontal and temporal gyrus) to insight problem-solving regions post-training (right cerebellum, middle cingulate and medial superior gyrus and left postcentral gyrus). Further, applying linear mixed-effects modelling on graph centrality metrics revealed the central roles of the creative semantic (inferior frontal, middle frontal and angular gyrus and parahippocampus) and reward systems (orbital gyrus, nucleus accumbens and putamen) in driving pre-training diagnostic reasoning; whereas, regions implicated in inductive reasoning (superior temporal and medial postcentral gyrus and parahippocampus) were the main post-training hubs. Lastly, resting-state and DTI analysis revealed post-training effects within the occipitotemporal semantic processing region. Altogether, these results suggest that simulation-based training transforms diagnostic reasoning in novices from regions implicated in creative semantic processing to regions implicated in improvised rule-based problem-solving