2,021 research outputs found

    Voting-Based Consensus of Data Partitions

    Get PDF
    Over the past few years, there has been a renewed interest in the consensus problem for ensembles of partitions. Recent work is primarily motivated by the developments in the area of combining multiple supervised learners. Unlike the consensus of supervised classifications, the consensus of data partitions is a challenging problem due to the lack of globally defined cluster labels and to the inherent difficulty of data clustering as an unsupervised learning problem. Moreover, the true number of clusters may be unknown. A fundamental goal of consensus methods for partitions is to obtain an optimal summary of an ensemble and to discover a cluster structure with accuracy and robustness exceeding those of the individual ensemble partitions. The quality of the consensus partitions highly depends on the ensemble generation mechanism and on the suitability of the consensus method for combining the generated ensemble. Typically, consensus methods derive an ensemble representation that is used as the basis for extracting the consensus partition. Most ensemble representations circumvent the labeling problem. On the other hand, voting-based methods establish direct parallels with consensus methods for supervised classifications, by seeking an optimal relabeling of the ensemble partitions and deriving an ensemble representation consisting of a central aggregated partition. An important element of the voting-based aggregation problem is the pairwise relabeling of an ensemble partition with respect to a representative partition of the ensemble, which is refered to here as the voting problem. The voting problem is commonly formulated as a weighted bipartite matching problem. In this dissertation, a general theoretical framework for the voting problem as a multi-response regression problem is proposed. The problem is formulated as seeking to estimate the uncertainties associated with the assignments of the objects to the representative clusters, given their assignments to the clusters of an ensemble partition. A new voting scheme, referred to as cumulative voting, is derived as a special instance of the proposed regression formulation corresponding to fitting a linear model by least squares estimation. The proposed formulation reveals the close relationships between the underlying loss functions of the cumulative voting and bipartite matching schemes. A useful feature of the proposed framework is that it can be applied to model substantial variability between partitions, such as a variable number of clusters. A general aggregation algorithm with variants corresponding to cumulative voting and bipartite matching is applied and a simulation-based analysis is presented to compare the suitability of each scheme to different ensemble generation mechanisms. The bipartite matching is found to be more suitable than cumulative voting for a particular generation model, whereby each ensemble partition is generated as a noisy permutation of an underlying labeling, according to a probability of error. For ensembles with a variable number of clusters, it is proposed that the aggregated partition be viewed as an estimated distributional representation of the ensemble, on the basis of which, a criterion may be defined to seek an optimally compressed consensus partition. The properties and features of the proposed cumulative voting scheme are studied. In particular, the relationship between cumulative voting and the well-known co-association matrix is highlighted. Furthermore, an adaptive aggregation algorithm that is suited for the cumulative voting scheme is proposed. The algorithm aims at selecting the initial reference partition and the aggregation sequence of the ensemble partitions the loss of mutual information associated with the aggregated partition is minimized. In order to subsequently extract the final consensus partition, an efficient agglomerative algorithm is developed. The algorithm merges the aggregated clusters such that the maximum amount of information is preserved. Furthermore, it allows the optimal number of consensus clusters to be estimated. An empirical study using several artificial and real-world datasets demonstrates that the proposed cumulative voting scheme leads to discovering substantially more accurate consensus partitions compared to bipartite matching, in the case of ensembles with a relatively large or a variable number of clusters. Compared to other recent consensus methods, the proposed method is found to be comparable with or better than the best performing methods. Moreover, accurate estimates of the true number of clusters are often achieved using cumulative voting, whereas consistently poor estimates are achieved based on bipartite matching. The empirical evidence demonstrates that the bipartite matching scheme is not suitable for these types of ensembles

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Full text link
    Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

    A data-assisted approach to supporting instructional interventions in technology enhanced learning environments

    Get PDF
    The design of intelligent learning environments requires significant up-front resources and expertise. These environments generally maintain complex and comprehensive knowledge bases describing pedagogical approaches, learner traits, and content models. This has limited the influence of these technologies in higher education, which instead largely uses learning content management systems in order to deliver non-classroom instruction to learners. This dissertation puts forth a data-assisted approach to embedding intelligence within learning environments. In this approach, instructional experts are provided with summaries of the activities of learners who interact with technology enhanced learning tools. These experts, which may include instructors, instructional designers, educational technologists, and others, use this data to gain insight into the activities of their learners. These insights lead experts to form instructional interventions which can be used to enhance the learning experience. The novel aspect of this approach is that the actions of the intelligent learning environment are now not just those of the learners and software constructs, but also those of the educational experts who may be supporting the learning process. The kinds of insights and interventions that come from application of the data-assisted approach vary with the domain being taught, the epistemology and pedagogical techniques being employed, and the particulars of the cohort being instructed. In this dissertation, three investigations using the data-assisted approach are described. The first of these demonstrates the effects of making available to instructors novel sociogram-based visualizations of online asynchronous discourse. By making instructors aware of the discussion habits of both themselves and learners, the instructors are better able to measure the effect of their teaching practice. This enables them to change their activities in response to the social networks that form between their learners, allowing them to react to deficiencies in the learning environment. Through these visualizations it is demonstrated that instructors can effectively change their pedagogy based on seeing data of their students’ interactions. The second investigation described in this dissertation is the application of unsupervised machine learning to the viewing habits of learners using lecture capture facilities. By clustering learners into groups based on behaviour and correlating groups with academic outcome, a model of positive learning activity can be described. This is particularly useful for instructional designers who are evaluating the role of learning technologies in programs as it contextualizes how technologies enable success in learners. Through this investigation it is demonstrated that the viewership data of learners can be used to assist designers in building higher level models of learning that can be used for evaluating the use of specific tools in blended learning situations. Finally, the results of applying supervised machine learning to the indexing of lecture video is described. Usage data collected from software is increasingly being used by software engineers to make technologies that are more customizable and adaptable. In this dissertation, it is demonstrated that supervised machine learning can provide human-like indexing of lecture videos that is more accurate than current techniques. Further, these indices can be customized for groups of learners, increasing the level of personalization in the learning environment. This investigation demonstrates that the data-assisted approach can also be used by application developers who are building software features for personalization into intelligent learning environments. Through this work, it is shown that a data-assisted approach to supporting instructional interventions in technology enhanced learning environments is both possible and can positively impact the teaching and learning process. By making available to instructional experts the online activities of learners, experts can better understand and react to patterns of use that develop, making for a more effective and personalized learning environment. This approach differs from traditional methods of building intelligent learning environments, which apply learning theories a priori to instructional design, and do not leverage the in situ data collected about learners

    Women in Artificial intelligence (AI)

    Get PDF
    This Special Issue, entitled "Women in Artificial Intelligence" includes 17 papers from leading women scientists. The papers cover a broad scope of research areas within Artificial Intelligence, including machine learning, perception, reasoning or planning, among others. The papers have applications to relevant fields, such as human health, finance, or education. It is worth noting that the Issue includes three papers that deal with different aspects of gender bias in Artificial Intelligence. All the papers have a woman as the first author. We can proudly say that these women are from countries worldwide, such as France, Czech Republic, United Kingdom, Australia, Bangladesh, Yemen, Romania, India, Cuba, Bangladesh and Spain. In conclusion, apart from its intrinsic scientific value as a Special Issue, combining interesting research works, this Special Issue intends to increase the invisibility of women in AI, showing where they are, what they do, and how they contribute to developments in Artificial Intelligence from their different places, positions, research branches and application fields. We planned to issue this book on the on Ada Lovelace Day (11/10/2022), a date internationally dedicated to the first computer programmer, a woman who had to fight the gender difficulties of her times, in the XIX century. We also thank the publisher for making this possible, thus allowing for this book to become a part of the international activities dedicated to celebrating the value of women in ICT all over the world. With this book, we want to pay homage to all the women that contributed over the years to the field of AI

    Multi-person tracking using dynamic programming

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 75-77).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.by Rania Y. Khalaf.M.Eng
    • …
    corecore