9,462 research outputs found
Data-driven design of intelligent wireless networks: an overview and tutorial
Data science or "data-driven research" is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i) clarifies when, why and how to use data science in wireless network research; (ii) provides a generic framework for applying data science in wireless networks; (iii) gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv) illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v) provides the reader the necessary datasets and scripts to go through the tutorial steps themselves
Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance
In this paper we present a linear regression model for modal symbolic data.
The observed variables are histogram variables according to the definition
given in the framework of Symbolic Data Analysis and the parameters of the
model are estimated using the classic Least Squares method. An appropriate
metric is introduced in order to measure the error between the observed and the
predicted distributions. In particular, the Wasserstein distance is proposed.
Some properties of such metric are exploited to predict the response variable
as direct linear combination of other independent histogram variables. Measures
of goodness of fit are discussed. An application on real data corroborates the
proposed method
Multi-Armed Bandits for Intelligent Tutoring Systems
We present an approach to Intelligent Tutoring Systems which adaptively
personalizes sequences of learning activities to maximize skills acquired by
students, taking into account the limited time and motivational resources. At a
given point in time, the system proposes to the students the activity which
makes them progress faster. We introduce two algorithms that rely on the
empirical estimation of the learning progress, RiARiT that uses information
about the difficulty of each exercise and ZPDES that uses much less knowledge
about the problem.
The system is based on the combination of three approaches. First, it
leverages recent models of intrinsically motivated learning by transposing them
to active teaching, relying on empirical estimation of learning progress
provided by specific activities to particular students. Second, it uses
state-of-the-art Multi-Arm Bandit (MAB) techniques to efficiently manage the
exploration/exploitation challenge of this optimization process. Third, it
leverages expert knowledge to constrain and bootstrap initial exploration of
the MAB, while requiring only coarse guidance information of the expert and
allowing the system to deal with didactic gaps in its knowledge. The system is
evaluated in a scenario where 7-8 year old schoolchildren learn how to
decompose numbers while manipulating money. Systematic experiments are
presented with simulated students, followed by results of a user study across a
population of 400 school children
Development of Computer Science Disciplines - A Social Network Analysis Approach
In contrast to many other scientific disciplines, computer science considers
conference publications. Conferences have the advantage of providing fast
publication of papers and of bringing researchers together to present and
discuss the paper with peers. Previous work on knowledge mapping focused on the
map of all sciences or a particular domain based on ISI published JCR (Journal
Citation Report). Although this data covers most of important journals, it
lacks computer science conference and workshop proceedings. That results in an
imprecise and incomplete analysis of the computer science knowledge. This paper
presents an analysis on the computer science knowledge network constructed from
all types of publications, aiming at providing a complete view of computer
science research. Based on the combination of two important digital libraries
(DBLP and CiteSeerX), we study the knowledge network created at
journal/conference level using citation linkage, to identify the development of
sub-disciplines. We investigate the collaborative and citation behavior of
journals/conferences by analyzing the properties of their co-authorship and
citation subgraphs. The paper draws several important conclusions. First,
conferences constitute social structures that shape the computer science
knowledge. Second, computer science is becoming more interdisciplinary. Third,
experts are the key success factor for sustainability of journals/conferences
Basic statistics for probabilistic symbolic variables: a novel metric-based approach
In data mining, it is usually to describe a set of individuals using some
summaries (means, standard deviations, histograms, confidence intervals) that
generalize individual descriptions into a typology description. In this case,
data can be described by several values. In this paper, we propose an approach
for computing basic statics for such data, and, in particular, for data
described by numerical multi-valued variables (interval, histograms, discrete
multi-valued descriptions). We propose to treat all numerical multi-valued
variables as distributional data, i.e. as individuals described by
distributions. To obtain new basic statistics for measuring the variability and
the association between such variables, we extend the classic measure of
inertia, calculated with the Euclidean distance, using the squared Wasserstein
distance defined between probability measures. The distance is a generalization
of the Wasserstein distance, that is a distance between quantile functions of
two distributions. Some properties of such a distance are shown. Among them, we
prove the Huygens theorem of decomposition of the inertia. We show the use of
the Wasserstein distance and of the basic statistics presenting a k-means like
clustering algorithm, for the clustering of a set of data described by modal
numerical variables (distributional variables), on a real data set. Keywords:
Wasserstein distance, inertia, dependence, distributional data, modal
variables.Comment: 19 pages, 3 figure
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
- …