2,180 research outputs found

    Convex and non-convex optimization using centroid-encoding for visualization, classification, and feature selection

    Get PDF
    Includes bibliographical references.2022 Fall.Classification, visualization, and feature selection are the three essential tasks of machine learning. This Ph.D. dissertation presents convex and non-convex models suitable for these three tasks. We propose Centroid-Encoder (CE), an autoencoder-based supervised tool for visualizing complex and potentially large, e.g., SUSY with 5 million samples and high-dimensional datasets, e.g., GSE73072 clinical challenge data. Unlike an autoencoder, which maps a point to itself, a centroid-encoder has a modified target, i.e., the class centroid in the ambient space. We present a detailed comparative analysis of the method using various data sets and state-of-the-art techniques. We have proposed a variation of the centroid-encoder, Bottleneck Centroid-Encoder (BCE), where additional constraints are imposed at the bottleneck layer to improve generalization performance in the reduced space. We further developed a sparse optimization problem for the non-linear mapping of the centroid-encoder called Sparse Centroid-Encoder (SCE) to determine the set of discriminate features between two or more classes. The sparse model selects variables using the 1-norm applied to the input feature space. SCE extracts discriminative features from multi-modal data sets, i.e., data whose classes appear to have multiple clusters, by using several centers per class. This approach seems to have advantages over models which use a one-hot-encoding vector. We also provide a feature selection framework that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set. CE and SCE are models based on neural network architectures and require the solution of non-convex optimization problems. Motivated by the CE algorithm, we have developed a convex optimization for the supervised dimensionality reduction technique called Centroid Component Retrieval (CCR). The CCR model optimizes a multi-objective cost by balancing two complementary terms. The first term pulls the samples of a class towards its centroid by minimizing a sample's distance from its class centroid in low dimensional space. The second term pushes the classes by maximizing the scattering volume of the ellipsoid formed by the class-centroids in embedded space. Although the design principle of CCR is similar to LDA, our experimental results show that CCR exhibits performance advantages over LDA, especially on high-dimensional data sets, e.g., Yale Faces, ORL, and COIL20. Finally, we present a linear formulation of Centroid-Encoder with orthogonality constraints, called Principal Centroid Component Analysis (PCCA). This formulation is similar to PCA, except the class labels are used to formulate the objective, resulting in the form of supervised PCA. We show the classification and visualization experiments results with this new linear tool

    Use Case Oriented Medical Visual Information Retrieval & System Evaluation

    Get PDF
    Large amounts of medical visual data are produced daily in hospitals, while new imaging techniques continue to emerge. In addition, many images are made available continuously via publications in the scientific literature and can also be valuable for clinical routine, research and education. Information retrieval systems are useful tools to provide access to the biomedical literature and fulfil the information needs of medical professionals. The tools developed in this thesis can potentially help clinicians make decisions about difficult diagnoses via a case-based retrieval system based on a use case associated with a specific evaluation task. This system retrieves articles from the biomedical literature when querying with a case description and attached images. This thesis proposes a multimodal approach for medical case-based retrieval with focus on the integration of visual information connected to text. Furthermore, the ImageCLEFmed evaluation campaign was organised during this thesis promoting medical retrieval system evaluation

    A Survey of Evaluation in Music Genre Recognition

    Get PDF

    Novel Datasets, User Interfaces and Learner Models to Improve Learner Engagement Prediction on Educational Videos

    Get PDF
    With the emergence of Open Education Resources (OERs), educational content creation has rapidly scaled up, making a large collection of new materials made available. Among these, we find educational videos, the most popular modality for transferring knowledge in the technology-enhanced learning paradigm. Rapid creation of learning resources opens up opportunities in facilitating sustainable education, as the potential to personalise and recommend specific materials that align with individual users’ interests, goals, knowledge level, language and stylistic preferences increases. However, the quality and topical coverage of these materials could vary significantly, posing significant challenges in managing this large collection, including the risk of negative user experience and engagement with these materials. The scarcity of support resources such as public datasets is another challenge that slows down the development of tools in this research area. This thesis develops a set of novel tools that improve the recommendation of educational videos. Two novel datasets and an e-learning platform with a novel user interface are developed to support the offline and online testing of recommendation models for educational videos. Furthermore, a set of learner models that accounts for the learner interests, knowledge, novelty and popularity of content is developed through this thesis. The different models are integrated together to propose a novel learner model that accounts for the different factors simultaneously. The user studies conducted on the novel user interface show that the new interface encourages users to explore the topical content more rigorously before making relevance judgements about educational videos. Offline experiments on the newly constructed datasets show that the newly proposed learner models outperform their relevant baselines significantly

    Application of Machine Learning within Visual Content Production

    Get PDF
    We are living in an era where digital content is being produced at a dazzling pace. The heterogeneity of contents and contexts is so varied that a numerous amount of applications have been created to respond to people and market demands. The visual content production pipeline is the generalisation of the process that allows a content editor to create and evaluate their product, such as a video, an image, a 3D model, etc. Such data is then displayed on one or more devices such as TVs, PC monitors, virtual reality head-mounted displays, tablets, mobiles, or even smartwatches. Content creation can be simple as clicking a button to film a video and then share it into a social network, or complex as managing a dense user interface full of parameters by using keyboard and mouse to generate a realistic 3D model for a VR game. In this second example, such sophistication results in a steep learning curve for beginner-level users. In contrast, expert users regularly need to refine their skills via expensive lessons, time-consuming tutorials, or experience. Thus, user interaction plays an essential role in the diffusion of content creation software, primarily when it is targeted to untrained people. In particular, with the fast spread of virtual reality devices into the consumer market, new opportunities for designing reliable and intuitive interfaces have been created. Such new interactions need to take a step beyond the point and click interaction typical of the 2D desktop environment. The interactions need to be smart, intuitive and reliable, to interpret 3D gestures and therefore, more accurate algorithms are needed to recognise patterns. In recent years, machine learning and in particular deep learning have achieved outstanding results in many branches of computer science, such as computer graphics and human-computer interface, outperforming algorithms that were considered state of the art, however, there are only fleeting efforts to translate this into virtual reality. In this thesis, we seek to apply and take advantage of deep learning models to two different content production pipeline areas embracing the following subjects of interest: advanced methods for user interaction and visual quality assessment. First, we focus on 3D sketching to retrieve models from an extensive database of complex geometries and textures, while the user is immersed in a virtual environment. We explore both 2D and 3D strokes as tools for model retrieval in VR. Therefore, we implement a novel system for improving accuracy in searching for a 3D model. We contribute an efficient method to describe models through 3D sketch via an iterative descriptor generation, focusing both on accuracy and user experience. To evaluate it, we design a user study to compare different interactions for sketch generation. Second, we explore the combination of sketch input and vocal description to correct and fine-tune the search for 3D models in a database containing fine-grained variation. We analyse sketch and speech queries, identifying a way to incorporate both of them into our system's interaction loop. Third, in the context of the visual content production pipeline, we present a detailed study of visual metrics. We propose a novel method for detecting rendering-based artefacts in images. It exploits analogous deep learning algorithms used when extracting features from sketches

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    A visual analytics approach for visualisation and knowledge discovery from time-varying personal life data

    Get PDF
    A thesis submitted to the University of Bedfordshire, in ful filment of the requirements for the degree of Doctor of PhilosophyToday, the importance of big data from lifestyles and work activities has been the focus of much research. At the same time, advances in modern sensor technologies have enabled self-logging of a signi cant number of daily activities and movements. Lifestyle logging produces a wide variety of personal data along the lifespan of individuals, including locations, movements, travel distance, step counts and the like, and can be useful in many areas such as healthcare, personal life management, memory recall, and socialisation. However, the amount of obtainable personal life logging data has enormously increased and stands in need of effective processing, analysis, and visualisation to provide hidden insights owing to the lack of semantic information (particularly in spatiotemporal data), complexity, large volume of trivial records, and absence of effective information visualisation on a large scale. Meanwhile, new technologies such as visual analytics have emerged with great potential in data mining and visualisation to overcome the challenges in handling such data and to support individuals in many aspects of their life. Thus, this thesis contemplates the importance of scalability and conducts a comprehensive investigation into visual analytics and its impact on the process of knowledge discovery from the European Commission project MyHealthAvatar at the Centre for Visualisation and Data Analytics by actively involving individuals in order to establish a credible reasoning and effectual interactive visualisation of such multivariate data with particular focus on lifestyle and personal events. To this end, this work widely reviews the foremost existing work on data mining (with the particular focus on semantic enrichment and ranking), data visualisation (of time-oriented, personal, and spatiotemporal data), and methodical evaluations of such approaches. Subsequently, a novel automated place annotation is introduced with multilevel probabilistic latent semantic analysis to automatically attach relevant information to the collected personal spatiotemporal data with low or no semantic information in order to address the inadequate information, which is essential for the process of knowledge discovery. Correspondingly, a multi-signi ficance event ranking model is introduced by involving a number of factors as well as individuals' preferences, which can influence the result within the process of analysis towards credible and high-quality knowledge discovery. The data mining models are assessed in terms of accurateness and performance. The results showed that both models are highly capable of enriching the raw data and providing significant events based on user preferences. An interactive visualisation is also designed and implemented including a set of novel visual components signifi cantly based upon human perception and attentiveness to visualise the extracted knowledge. Each visual component is evaluated iteratively based on usability and perceptibility in order to enhance the visualisation towards reaching the goal of this thesis. Lastly, three integrated visual analytics tools (platforms) are designed and implemented in order to demonstrate how the data mining models and interactive visualisation can be exploited to support different aspects of personal life, such as lifestyle, life pattern, and memory recall (reminiscence). The result of the evaluation for the three integrated visual analytics tools showed that this visual analytics approach can deliver a remarkable experience in gaining knowledge and supporting the users' life in certain aspects

    Leveraging Formulae and Text for Improved Math Retrieval

    Get PDF
    Large collections containing millions of math formulas are available online. Retrieving math expressions from these collections is challenging. Users can use formula, formula+text, or math questions to express their math information needs. The structural complexity of formulas requires specialized processing. Despite the existence of math search systems and online community question-answering websites for math, little is known about mathematical information needs. This research first explores the characteristics of math searches using a general search engine. The findings show how math searches are different from general searches. Then, test collections for math-aware search are introduced. The ARQMath test collections have two main tasks: 1) finding answers for math questions and 2) contextual formula search. In each test collection (ARQMath-1 to -3) the same collection is used, Math Stack Exchange posts from 2010 to 2018, introducing different topics for each task. Compared to the previous test collections, ARQMath has a much larger number of diverse topics, and improved evaluation protocol. Another key role of this research is to leverage text and math information for improved math information retrieval. Three formula search models that only use the formula, with no context are introduced. The first model is an n-gram embedding model using both symbol layout tree and operator tree representations. The second model uses tree-edit distance to re-rank the results from the first model. Finally, a learning-to-rank model that leverages full-tree, sub-tree, and vector similarity scores is introduced. To use context, Math Abstract Meaning Representation (MathAMR) is introduced, which generalizes AMR trees to include math formula operations and arguments. This MathAMR is then used for contextualized formula search using a fine-tuned Sentence-BERT model. The experiments show tree-edit distance ranking achieves the current state-of-the-art results on contextual formula search task, and the MathAMR model can be beneficial for re-ranking. This research also addresses the answer retrieval task, introducing a two-step retrieval model in which similar questions are first found and then answers previously given to those similar questions are ranked. The proposed model, fine-tunes two Sentence-BERT models, one for finding similar questions and another one for ranking the answers. For Sentence-BERT model, raw text as well as MathAMR are used
    • 

    corecore