76,959 research outputs found
Recommended from our members
On the challenges and opportunities in visualization for machine learning and knowledge extraction: A research agenda
We describe a selection of challenges at the intersection of machine learning and data visualization and outline a subjective research agenda based on professional and personal experience. The unprecedented increase in the amount, variety and the value of data has been significantly transforming the way that scientific research is carried out and businesses operate. Within data science, which has emerged as a practice to enable this data-intensive innovation by gathering together and advancing the knowledge from fields such as statistics, machine learning, knowledge extraction, data management, and visualization, visualization plays a unique and maybe the ultimate role as an approach to facilitate the human and computer cooperation, and to particularly enable the analysis of diverse and heterogeneous data using complex computational methods where algorithmic results are challenging to interpret and operationalize. Whilst algorithm development is surely at the center of the whole pipeline in disciplines such as Machine Learning and Knowledge Discovery, it is visualization which ultimately makes the results accessible to the end user. Visualization thus can be seen as a mapping from arbitrarily high-dimensional abstract spaces to the lower dimensions and plays a central and critical role in interacting with machine learning algorithms, and particularly in interactive machine learning (iML) with including the human-in-the-loop. The central goal of the CD-MAKE VIS workshop is to spark discussions at this intersection of visualization, machine learning and knowledge discovery and bring together experts from these disciplines. This paper discusses a perspective on the challenges and opportunities in this integration of these discipline and presents a number of directions and strategies for further research
Knowledge Cartography: Software tools and mapping techniques
Knowledge Cartography is the discipline of mapping intellectual landscapes.The focus of this book is on the process by which manually crafting interactive, hypertextual maps clarifies one’s own understanding, as well as communicating it.The authors see mapping software as a set of visual tools for reading and writing in a networked age. In an information ocean, the primary challenge is to find meaningful patterns around which we can weave plausible narratives. Maps of concepts, discussions and arguments make the connections between ideas tangible and disputable.
With 17 chapters from the leading researchers and practitioners, the reader will find the current state–of-the-art in the field. Part 1 focuses on educational applications in schools and universities, before Part 2 turns to applications in professional communitie
Attributes of Big Data Analytics for Data-Driven Decision Making in Cyber-Physical Power Systems
Big data analytics is a virtually new term in power system terminology. This concept delves into the way a massive volume of data is acquired, processed, analyzed to extract insight from available data. In particular, big data analytics alludes to applications of artificial intelligence, machine learning techniques, data mining techniques, time-series forecasting methods. Decision-makers in power systems have been long plagued by incapability and weakness of classical methods in dealing with large-scale real practical cases due to the existence of thousands or millions of variables, being time-consuming, the requirement of a high computation burden, divergence of results, unjustifiable errors, and poor accuracy of the model. Big data analytics is an ongoing topic, which pinpoints how to extract insights from these large data sets. The extant article has enumerated the applications of big data analytics in future power systems through several layers from grid-scale to local-scale. Big data analytics has many applications in the areas of smart grid implementation, electricity markets, execution of collaborative operation schemes, enhancement of microgrid operation autonomy, management of electric vehicle operations in smart grids, active distribution network control, district hub system management, multi-agent energy systems, electricity theft detection, stability and security assessment by PMUs, and better exploitation of renewable energy sources. The employment of big data analytics entails some prerequisites, such as the proliferation of IoT-enabled devices, easily-accessible cloud space, blockchain, etc. This paper has comprehensively conducted an extensive review of the applications of big data analytics along with the prevailing challenges and solutions
A Data Science Course for Undergraduates: Thinking with Data
Data science is an emerging interdisciplinary field that combines elements of
mathematics, statistics, computer science, and knowledge in a particular
application domain for the purpose of extracting meaningful information from
the increasingly sophisticated array of data available in many settings. These
data tend to be non-traditional, in the sense that they are often live, large,
complex, and/or messy. A first course in statistics at the undergraduate level
typically introduces students with a variety of techniques to analyze small,
neat, and clean data sets. However, whether they pursue more formal training in
statistics or not, many of these students will end up working with data that is
considerably more complex, and will need facility with statistical computing
techniques. More importantly, these students require a framework for thinking
structurally about data. We describe an undergraduate course in a liberal arts
environment that provides students with the tools necessary to apply data
science. The course emphasizes modern, practical, and useful skills that cover
the full data analysis spectrum, from asking an interesting question to
acquiring, managing, manipulating, processing, querying, analyzing, and
visualizing data, as well communicating findings in written, graphical, and
oral forms.Comment: 21 pages total including supplementary material
Visual Analysis of Spatio-Temporal Event Predictions: Investigating the Spread Dynamics of Invasive Species
Invasive species are a major cause of ecological damage and commercial
losses. A current problem spreading in North America and Europe is the vinegar
fly Drosophila suzukii. Unlike other Drosophila, it infests non-rotting and
healthy fruits and is therefore of concern to fruit growers, such as vintners.
Consequently, large amounts of data about infestations have been collected in
recent years. However, there is a lack of interactive methods to investigate
this data. We employ ensemble-based classification to predict areas susceptible
to infestation by D. suzukii and bring them into a spatio-temporal context
using maps and glyph-based visualizations. Following the information-seeking
mantra, we provide a visual analysis system Drosophigator for spatio-temporal
event prediction, enabling the investigation of the spread dynamics of invasive
species. We demonstrate the usefulness of this approach in two use cases
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform
Advances in detectors and computational technologies provide new
opportunities for applied research and the fundamental sciences. Concurrently,
dramatic increases in the three Vs (Volume, Velocity, and Variety) of
experimental data and the scale of computational tasks produced the demand for
new real-time processing systems at experimental facilities. Recently, this
demand was addressed by the Spark-MPI approach connecting the Spark
data-intensive platform with the MPI high-performance framework. In contrast
with existing data management and analytics systems, Spark introduced a new
middleware based on resilient distributed datasets (RDDs), which decoupled
various data sources from high-level processing algorithms. The RDD middleware
significantly advanced the scope of data-intensive applications, spreading from
SQL queries to machine learning to graph processing. Spark-MPI further extended
the Spark ecosystem with the MPI applications using the Process Management
Interface. The paper explores this integrated platform within the context of
online ptychographic and tomographic reconstruction pipelines.Comment: New York Scientific Data Summit, August 6-9, 201
- …