159,280 research outputs found
Cloud-Based Big Data Management and Analytics for Scholarly Resources: Current Trends, Challenges and Scope for Future Research
With the shifting focus of organizations and governments towards digitization
of academic and technical documents, there has been an increasing need to use
this reserve of scholarly documents for developing applications that can
facilitate and aid in better management of research. In addition to this, the
evolving nature of research problems has made them essentially
interdisciplinary. As a result, there is a growing need for scholarly
applications like collaborator discovery, expert finding and research
recommendation systems. This research paper reviews the current trends and
identifies the challenges existing in the architecture, services and
applications of big scholarly data platform with a specific focus on directions
for future research
Proficiency Comparison of LADTree and REPTree Classifiers for Credit Risk Forecast
Predicting the Credit Defaulter is a perilous task of Financial Industries
like Banks. Ascertaining non-payer before giving loan is a significant and
conflict-ridden task of the Banker. Classification techniques are the better
choice for predictive analysis like finding the claimant, whether he/she is an
unpretentious customer or a cheat. Defining the outstanding classifier is a
risky assignment for any industrialist like a banker. This allow computer
science researchers to drill down efficient research works through evaluating
different classifiers and finding out the best classifier for such predictive
problems. This research work investigates the productivity of LADTree
Classifier and REPTree Classifier for the credit risk prediction and compares
their fitness through various measures. German credit dataset has been taken
and used to predict the credit risk with a help of open source machine learning
tool.Comment: arXiv admin note: text overlap with arXiv:1310.5963 by other author
Survey of state-of-the-art mixed data clustering algorithms
Mixed data comprises both numeric and categorical features, and mixed
datasets occur frequently in many domains, such as health, finance, and
marketing. Clustering is often applied to mixed datasets to find structures and
to group similar objects for further analysis. However, clustering mixed data
is challenging because it is difficult to directly apply mathematical
operations, such as summation or averaging, to the feature values of these
datasets. In this paper, we present a taxonomy for the study of mixed data
clustering algorithms by identifying five major research themes. We then
present a state-of-the-art review of the research works within each research
theme. We analyze the strengths and weaknesses of these methods with pointers
for future research directions. Lastly, we present an in-depth analysis of the
overall challenges in this field, highlight open research questions and discuss
guidelines to make progress in the field.Comment: 20 Pages, 2 columns, 6 Tables, 209 Reference
Empirical Big Data Research: A Systematic Literature Mapping
Background: Big Data is a relatively new field of research and technology,
and literature reports a wide variety of concepts labeled with Big Data. The
maturity of a research field can be measured in the number of publications
containing empirical results. In this paper we present the current status of
empirical research in Big Data. Method: We employed a systematic mapping method
with which we mapped the collected research according to the labels Variety,
Volume and Velocity. In addition, we addressed the application areas of Big
Data. Results: We found that 151 of the assessed 1778 contributions contain a
form of empirical result and can be mapped to one or more of the 3 V's and 59
address an application area. Conclusions: The share of publications containing
empirical results is well below the average compared to computer science
research as a whole. In order to mature the research on Big Data, we recommend
applying empirical methods to strengthen the confidence in the reported
results. Based on our trend analysis we consider Volume and Variety to be the
most promising uncharted area in Big Data.Comment: Submitted to Springer journal Data Science and Engineerin
Threshold-Based Portfolio: The Role of the Threshold and Its Applications
This paper aims at developing a new method by which to build a data-driven
portfolio featuring a target risk-return. We first present a comparative study
of recurrent neural network models (RNNs), including a simple RNN, long
short-term memory (LSTM), and gated recurrent unit (GRU) for selecting the best
predictor to use in portfolio construction. The models are applied to the
investment universe consisted of ten stocks in the S&P500. The experimental
results shows that LSTM outperforms the others in terms of hit ratio of
one-month-ahead forecasts. We then build predictive threshold-based portfolios
(TBPs) that are subsets of the universe satisfying given threshold criteria for
the predicted returns. The TBPs are rebalanced monthly to restore equal weights
to each security within the TBPs. We find that the risk and return profile of
the realized TBP represents a monotonically increasing frontier on the
risk-return plane, where the equally weighted portfolio (EWP) of all ten stocks
plays a role in their lower bound. This shows the availability of TBPs in
targeting specific risk-return levels, and an EWP based on all the assets plays
a role in the reference portfolio of TBPs. In the process, thresholds play
dominant roles in characterizing risk, return, and the prediction accuracy of
the subset. The TBP is more data-driven in designing portfolio target risk and
return than existing ones, in the sense that it requires no prior knowledge of
finance such as financial assumptions, financial mathematics, or expert
insights. In a practical application, we present the TBP management procedure
for a time horizon extending over multiple time periods; we also discuss their
application to mean-variance portfolios to reduce estimation risk.Comment: 20 pages, 7 figure
Explainability in Human-Agent Systems
This paper presents a taxonomy of explainability in Human-Agent Systems. We
consider fundamental questions about the Why, Who, What, When and How of
explainability. First, we define explainability, and its relationship to the
related terms of interpretability, transparency, explicitness, and
faithfulness. These definitions allow us to answer why explainability is needed
in the system, whom it is geared to and what explanations can be generated to
meet this need. We then consider when the user should be presented with this
information. Last, we consider how objective and subjective measures can be
used to evaluate the entire system. This last question is the most encompassing
as it will need to evaluate all other issues regarding explainability
Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives
Particle Swarm Optimization (PSO) is a metaheuristic global optimization
paradigm that has gained prominence in the last two decades due to its ease of
application in unsupervised, complex multidimensional problems which cannot be
solved using traditional deterministic algorithms. The canonical particle swarm
optimizer is based on the flocking behavior and social co-operation of birds
and fish schools and draws heavily from the evolutionary behavior of these
organisms. This paper serves to provide a thorough survey of the PSO algorithm
with special emphasis on the development, deployment and improvements of its
most basic as well as some of the state-of-the-art implementations. Concepts
and directions on choosing the inertia weight, constriction factor, cognition
and social weights and perspectives on convergence, parallelization, elitism,
niching and discrete optimization as well as neighborhood topologies are
outlined. Hybridization attempts with other evolutionary and swarm paradigms in
selected applications are covered and an up-to-date review is put forward for
the interested reader.Comment: 34 pages, 7 table
Quda: Natural Language Queries for Visual Data Analytics
Visualization-oriented natural language interfaces (V-NLIs) have been
explored and developed in recent years. One challenge faced by V-NLIs is in the
formation of effective design decisions that usually requires a deep
understanding of user queries. Learning-based approaches have shown potential
in V-NLIs and reached state-of-the-art performance in various NLP tasks.
However, because of the lack of sufficient training samples that cater to
visual data analytics, cutting-edge techniques have rarely been employed to
facilitate the development of V-NLIs. We present a new dataset, called Quda, to
help V-NLIs understand free-form natural language. Our dataset contains 14;035
diverse user queries annotated with 10 low-level analytic tasks that assist in
the deployment of state-of-the-art techniques for parsing complex human
language. We achieve this goal by first gathering seed queries with data
analysts who are target users of V-NLIs. Then we employ extensive crowd force
for paraphrase generation and validation. We demonstrate the usefulness of Quda
in building V-NLIs by creating a prototype that makes effective design
decisions for free-form user queries. We also show that Quda can be beneficial
for a wide range of applications in the visualization community by analyzing
the design tasks described in academic publications.Comment: This work isn't sufficiently exhaustive. We need to do some new work
on thi
Annotation Scaffolds for Object Modeling and Manipulation
We present and evaluate an approach for human-in-the-loop specification of
shape reconstruction with annotations for basic robot-object interactions. Our
method is based on the idea of model annotation: the addition of simple cues to
an underlying object model to specify shape and delineate a simple task. The
goal is to explore reducing the complexity of CAD-like interfaces so that
novice users can quickly recover an object's shape and describe a manipulation
task that is then carried out by a robot. The object modeling and interaction
annotation capabilities are tested with a user study and compared against
results obtained using existing approaches. The approach has been analyzed
using a variety of shape comparison, grasping, and manipulation metrics, and
tested with the PR2 robot platform, where it was shown to be successful.Comment: 31 pages, 46 Figure
The neurocognitive gains of diagnostic reasoning training using simulated interactive veterinary cases
The present longitudinal study ascertained training-associated transformations in the neural underpinnings of diagnostic reasoning, using a simulation game named “Equine Virtual Farm” (EVF). Twenty participants underwent structural, EVF/task-based and resting-state MRI and diffusion tensor imaging (DTI) before and after completing their training on diagnosing simulated veterinary cases. Comparing playing veterinarian versus seeing a colorful image across training sessions revealed the transition of brain activity from scientific creativity regions pre-training (left middle frontal and temporal gyrus) to insight problem-solving regions post-training (right cerebellum, middle cingulate and medial superior gyrus and left postcentral gyrus). Further, applying linear mixed-effects modelling on graph centrality metrics revealed the central roles of the creative semantic (inferior frontal, middle frontal and angular gyrus and parahippocampus) and reward systems (orbital gyrus, nucleus accumbens and putamen) in driving pre-training diagnostic reasoning; whereas, regions implicated in inductive reasoning (superior temporal and medial postcentral gyrus and parahippocampus) were the main post-training hubs. Lastly, resting-state and DTI analysis revealed post-training effects within the occipitotemporal semantic processing region. Altogether, these results suggest that simulation-based training transforms diagnostic reasoning in novices from regions implicated in creative semantic processing to regions implicated in improvised rule-based problem-solving
- …