2,021 research outputs found
Voting-Based Consensus of Data Partitions
Over the past few years, there has been a renewed interest in the consensus
problem for ensembles of partitions. Recent work is primarily motivated by the
developments in the area of combining multiple supervised learners. Unlike the
consensus of supervised classifications, the consensus of data partitions is a
challenging problem due to the lack of globally defined cluster labels and to
the inherent difficulty of data clustering as an unsupervised learning problem.
Moreover, the true number of clusters may be unknown. A fundamental goal of
consensus methods for partitions is to obtain an optimal summary of an ensemble
and to discover a cluster structure with accuracy and robustness exceeding those
of the individual ensemble partitions.
The quality of the consensus partitions highly depends on the ensemble
generation mechanism and on the suitability of the consensus method for
combining the generated ensemble. Typically, consensus methods derive an
ensemble representation that is used as the basis for extracting the consensus
partition. Most ensemble representations circumvent the labeling problem. On
the other hand, voting-based methods establish direct parallels with consensus
methods for supervised classifications, by seeking an optimal relabeling of the
ensemble partitions and deriving an ensemble representation consisting of a
central aggregated partition. An important element of the voting-based
aggregation problem is the pairwise relabeling of an ensemble partition with
respect to a representative partition of the ensemble, which is refered to here
as the voting problem. The voting problem is commonly formulated as a weighted
bipartite matching problem.
In this dissertation, a general theoretical framework for the voting problem as
a multi-response regression problem is proposed. The problem is formulated as
seeking to estimate the uncertainties associated with the assignments of the
objects to the representative clusters, given their assignments to the clusters
of an ensemble partition. A new voting scheme, referred to as cumulative voting,
is derived as a special instance of the proposed regression formulation
corresponding to fitting a linear model by least squares estimation. The
proposed formulation reveals the close relationships between the underlying loss
functions of the cumulative voting and bipartite matching schemes. A useful
feature of the proposed framework is that it can be applied to model substantial
variability between partitions, such as a variable number of clusters.
A general aggregation algorithm with variants corresponding to
cumulative voting and bipartite matching is applied and a simulation-based
analysis is presented to compare the suitability of each scheme to different
ensemble generation mechanisms. The bipartite matching is found to be more
suitable than cumulative voting for a particular generation model, whereby each
ensemble partition is generated as a noisy permutation of an underlying
labeling, according to a probability of error. For ensembles with a variable
number of clusters, it is proposed that the aggregated partition be viewed as an
estimated distributional representation of the ensemble, on the basis of which,
a criterion may be defined to seek an optimally compressed consensus partition.
The properties and features of the proposed cumulative voting scheme are
studied. In particular, the relationship between cumulative voting and the
well-known co-association matrix is highlighted. Furthermore, an adaptive
aggregation algorithm that is suited for the cumulative voting scheme is
proposed. The algorithm aims at selecting the initial reference partition and
the aggregation sequence of the ensemble partitions the loss of mutual
information associated with the aggregated partition is minimized. In order to
subsequently extract the final consensus partition, an efficient agglomerative
algorithm is developed. The algorithm merges the aggregated clusters such that
the maximum amount of information is preserved. Furthermore, it allows the
optimal number of consensus clusters to be estimated.
An empirical study using several artificial and real-world datasets demonstrates
that the proposed cumulative voting scheme leads to discovering substantially
more accurate consensus partitions compared to bipartite matching, in the case
of ensembles with a relatively large or a variable number of clusters. Compared
to other recent consensus methods, the proposed method is found to be comparable
with or better than the best performing methods. Moreover, accurate estimates of
the true number of clusters are often achieved using cumulative voting, whereas
consistently poor estimates are achieved based on bipartite matching. The
empirical evidence demonstrates that the bipartite matching scheme is not
suitable for these types of ensembles
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
A data-assisted approach to supporting instructional interventions in technology enhanced learning environments
The design of intelligent learning environments requires significant up-front resources and expertise. These environments generally maintain complex and comprehensive knowledge bases describing pedagogical approaches, learner traits, and content models. This has limited the influence of these technologies in higher education, which instead largely uses learning content management systems in order to deliver non-classroom instruction to learners.
This dissertation puts forth a data-assisted approach to embedding intelligence within learning environments. In this approach, instructional experts are provided with summaries of the activities of learners who interact with technology enhanced learning tools. These experts, which may include instructors, instructional designers, educational technologists, and others, use this data to gain insight into the activities of their learners. These insights lead experts to form instructional interventions which can be used to enhance the learning experience. The novel aspect of this approach is that the actions of the intelligent learning environment are now not just those of the learners and software constructs, but also those of the educational experts who may be supporting the learning process.
The kinds of insights and interventions that come from application of the data-assisted approach vary with the domain being taught, the epistemology and pedagogical techniques being employed, and the particulars of the cohort being instructed. In this dissertation, three investigations using the data-assisted approach are described. The first of these demonstrates the effects of making available to instructors novel sociogram-based visualizations of online asynchronous discourse. By making instructors aware of the discussion habits of both themselves and learners, the instructors are better able to measure the effect of their teaching practice. This enables them to change their activities in response to the social networks that form between their learners, allowing them to react to deficiencies in the learning environment. Through these visualizations it is demonstrated that instructors can effectively change their pedagogy based on seeing data of their students’ interactions.
The second investigation described in this dissertation is the application of unsupervised machine learning to the viewing habits of learners using lecture capture facilities. By clustering learners into groups based on behaviour and correlating groups with academic outcome, a model of positive learning activity can be described. This is particularly useful for instructional designers who are evaluating the role of learning technologies in programs as it contextualizes how technologies enable success in learners. Through this investigation it is demonstrated that the viewership data of learners can be used to assist designers in building higher level models of learning that can be used for evaluating the use of specific tools in blended learning situations.
Finally, the results of applying supervised machine learning to the indexing of lecture video is described. Usage data collected from software is increasingly being used by software engineers to make technologies that are more customizable and adaptable. In this dissertation, it is demonstrated that supervised machine learning can provide human-like indexing of lecture videos that is more accurate than current techniques. Further, these indices can be customized for groups of learners, increasing the level of personalization in the learning environment. This investigation demonstrates that the data-assisted approach can also be used by application developers who are building software features for personalization into intelligent learning environments.
Through this work, it is shown that a data-assisted approach to supporting instructional interventions in technology enhanced learning environments is both possible and can positively impact the teaching and learning process. By making available to instructional experts the online activities of learners, experts can better understand and react to patterns of use that develop, making for a more effective and personalized learning environment. This approach differs from traditional methods of building intelligent learning environments, which apply learning theories a priori to instructional design, and do not leverage the in situ data collected about learners
Women in Artificial intelligence (AI)
This Special Issue, entitled "Women in Artificial Intelligence" includes 17 papers from leading women scientists. The papers cover a broad scope of research areas within Artificial Intelligence, including machine learning, perception, reasoning or planning, among others. The papers have applications to relevant fields, such as human health, finance, or education. It is worth noting that the Issue includes three papers that deal with different aspects of gender bias in Artificial Intelligence. All the papers have a woman as the first author. We can proudly say that these women are from countries worldwide, such as France, Czech Republic, United Kingdom, Australia, Bangladesh, Yemen, Romania, India, Cuba, Bangladesh and Spain. In conclusion, apart from its intrinsic scientific value as a Special Issue, combining interesting research works, this Special Issue intends to increase the invisibility of women in AI, showing where they are, what they do, and how they contribute to developments in Artificial Intelligence from their different places, positions, research branches and application fields. We planned to issue this book on the on Ada Lovelace Day (11/10/2022), a date internationally dedicated to the first computer programmer, a woman who had to fight the gender difficulties of her times, in the XIX century. We also thank the publisher for making this possible, thus allowing for this book to become a part of the international activities dedicated to celebrating the value of women in ICT all over the world. With this book, we want to pay homage to all the women that contributed over the years to the field of AI
Multi-person tracking using dynamic programming
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 75-77).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.by Rania Y. Khalaf.M.Eng
- …