16,914 research outputs found

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    Measuring concept similarities in multimedia ontologies: analysis and evaluations

    Get PDF
    The recent development of large-scale multimedia concept ontologies has provided a new momentum for research in the semantic analysis of multimedia repositories. Different methods for generic concept detection have been extensively studied, but the question of how to exploit the structure of a multimedia ontology and existing inter-concept relations has not received similar attention. In this paper, we present a clustering-based method for modeling semantic concepts on low-level feature spaces and study the evaluation of the quality of such models with entropy-based methods. We cover a variety of methods for assessing the similarity of different concepts in a multimedia ontology. We study three ontologies and apply the proposed techniques in experiments involving the visual and semantic similarities, manual annotation of video, and concept detection. The results show that modeling inter-concept relations can provide a promising resource for many different application areas in semantic multimedia processing

    The fabrication of the mathematics teacher as neoliberal subject

    Get PDF

    Enhanced clustering analysis pipeline for performance analysis of parallel applications

    Get PDF
    Clustering analysis is widely used to stratify data in the same cluster when they are similar according to the specific metrics. We can use the cluster analysis to group the CPU burst of a parallel application, and the regions on each process in-between communication calls or calls to the parallel runtime. The resulting clusters obtained are the different computational trends or phases that appear in the application. These clusters are useful to understand the behavior of the computation part of the application and focus the analyses on those that present performance issues. Although density-based clustering algorithms are a powerful and efficient tool to summarize this type of information, their traditional user-guided clustering methodology has many shortcomings and deficiencies in dealing with the complexity of data, the diversity of data structures, high-dimensionality of data, and the dramatic increase in the amount of data. Consequently, the majority of DBSCAN-like algorithms have weaknesses to handle high-dimensionality and/or Multi-density data, and they are sensitive to their hyper-parameter configuration. Furthermore, extracting insight from the obtained clusters is an intuitive and manual task. To mitigate these weaknesses, we have proposed a new unified approach to replace the user-guided clustering with an automated clustering analysis pipeline, called Enhanced Cluster Identification and Interpretation (ECII) pipeline. To build the pipeline, we propose novel techniques including Robust Independent Feature Selection, Feature Space Curvature Map, Organization Component Analysis, and hyper-parameters tuning to feature selection, density homogenization, cluster interpretation, and model selection which are the main components of our machine learning pipeline. This thesis contributes four new techniques to the Machine Learning field with a particular use case in Performance Analytics field. The first contribution is a novel unsupervised approach for feature selection on noisy data, called Robust Independent Feature Selection (RIFS). Specifically, we choose a feature subset that contains most of the underlying information, using the same criteria as the Independent component analysis. Simultaneously, the noise is separated as an independent component. The second contribution of the thesis is a parametric multilinear transformation method to homogenize cluster densities while preserving the topological structure of the dataset, called Feature Space Curvature Map (FSCM). We present a new Gravitational Self-organizing Map to model the feature space curvature by plugging the concepts of gravity and fabric of space into the Self-organizing Map algorithm to mathematically describe the density structure of the data. To homogenize the cluster density, we introduce a novel mapping mechanism to project the data from the non-Euclidean curved space to a new Euclidean flat space. The third contribution is a novel topological-based method to study potentially complex high-dimensional categorized data by quantifying their shapes and extracting fine-grain insights from them to interpret the clustering result. We introduce our Organization Component Analysis (OCA) method for the automatic arbitrary cluster-shape study without an assumption about the data distribution. Finally, to tune the DBSCAN hyper-parameters, we propose a new tuning mechanism by combining techniques from machine learning and optimization domains, and we embed it in the ECII pipeline. Using this cluster analysis pipeline with the CPU burst data of a parallel application, we provide the developer/analyst with a high-quality SPMD computation structure detection with the added value that reflects the fine grain of the computation regions.El análisis de conglomerados se usa ampliamente para estratificar datos en el mismo conglomerado cuando son similares según las métricas específicas. Nosotros puede usar el análisis de clúster para agrupar la ráfaga de CPU de una aplicación paralela y las regiones en cada proceso intermedio llamadas de comunicación o llamadas al tiempo de ejecución paralelo. Los clusters resultantes obtenidos son las diferentes tendencias computacionales o fases que aparecen en la solicitud. Estos clusters son útiles para entender el comportamiento de la parte de computación del aplicación y centrar los análisis en aquellos que presenten problemas de rendimiento. Aunque los algoritmos de agrupamiento basados en la densidad son una herramienta poderosa y eficiente para resumir este tipo de información, su La metodología tradicional de agrupación en clústeres guiada por el usuario tiene muchas deficiencias y deficiencias al tratar con la complejidad de los datos, la diversidad de estructuras de datos, la alta dimensionalidad de los datos y el aumento dramático en la cantidad de datos. En consecuencia, el La mayoría de los algoritmos similares a DBSCAN tienen debilidades para manejar datos de alta dimensionalidad y/o densidad múltiple, y son sensibles a su configuración de hiperparámetros. Además, extraer información de los clústeres obtenidos es una forma intuitiva y tarea manual Para mitigar estas debilidades, hemos propuesto un nuevo enfoque unificado para reemplazar el agrupamiento guiado por el usuario con un canalización de análisis de agrupamiento automatizado, llamada canalización de identificación e interpretación de clúster mejorada (ECII). para construir el tubería, proponemos técnicas novedosas que incluyen la selección robusta de características independientes, el mapa de curvatura del espacio de características, Análisis de componentes de la organización y ajuste de hiperparámetros para la selección de características, homogeneización de densidad, agrupación interpretación y selección de modelos, que son los componentes principales de nuestra canalización de aprendizaje automático. Esta tesis aporta cuatro nuevas técnicas al campo de Machine Learning con un caso de uso particular en el campo de Performance Analytics. La primera contribución es un enfoque novedoso no supervisado para la selección de características en datos ruidosos, llamado Robust Independent Feature. Selección (RIFS).Específicamente, elegimos un subconjunto de funciones que contiene la mayor parte de la información subyacente, utilizando el mismo criterios como el análisis de componentes independientes. Simultáneamente, el ruido se separa como un componente independiente. La segunda contribución de la tesis es un método de transformación multilineal paramétrica para homogeneizar densidades de clústeres mientras preservando la estructura topológica del conjunto de datos, llamado Mapa de Curvatura del Espacio de Características (FSCM). Presentamos un nuevo Gravitacional Mapa autoorganizado para modelar la curvatura del espacio característico conectando los conceptos de gravedad y estructura del espacio en el Algoritmo de mapa autoorganizado para describir matemáticamente la estructura de densidad de los datos. Para homogeneizar la densidad del racimo, introducimos un mecanismo de mapeo novedoso para proyectar los datos del espacio curvo no euclidiano a un nuevo plano euclidiano espacio. La tercera contribución es un nuevo método basado en topología para estudiar datos categorizados de alta dimensión potencialmente complejos mediante cuantificando sus formas y extrayendo información detallada de ellas para interpretar el resultado de la agrupación. presentamos nuestro Método de análisis de componentes de organización (OCA) para el estudio automático de forma arbitraria de conglomerados sin una suposición sobre el distribución de datos.Postprint (published version

    Token, carrot, or just in the way? The challenge of visualizing acquired knowledge in the era of digital learning and gamification.

    Get PDF
    Metacognition is necessary for learning. Without know-ing what you know, what you don’t know and what you are about to learn, seeking new knowledge becomes both hard and inefficient. At the same time, keeping old facts (as well as old skills) in mind at a meta-level when striv-ing for new insights is not always an easy task. Conse-quently, external visible cues and representations are essential, reminding us of not only of what we have learnt, but also of where we are in the process and where are heading. At present, improving metacognition is on top of the educational agenda for many schools and universities. However, in the current new era of digital learning, the impact from digital tools on metacognitive processes - especially from educational games and apps - is rarely discussed. For instance, in contrast to traditional learn-ing material (books etc.), most applications let the stu-dents solve tasks without saving any traces of the solu-tions, resulting in a minimum of durable external repre-sentations and memories. To address the need for metacognitive support in instructional software, the present study examines the effect of a visualization tool on knowledge monitoring and self-regulation. The tool, that consists of a diary where tokens with knowledge related content are received as proofs of achievement for solved tasks, was designed specifically for the study at hand and implemented in an educational game (Guardians of History). It was tested with a between-subjects design, where 117 Swedish students in grade 5 and 6 played two varieties of the game – one with and one without the diary. Although no significant positive impact of the tool was found, the study reveals several important and interesting findings regarding the chal-lenges of visualizing acquired knowledge in instructional software

    Education about organic food and sustainability in Denmark, Finland, Italy and Norway: Findings from the iPOPY project

    Get PDF
    This report presents results from “Work Package 4: Consumer perceptions, practices and learning” of the “innovative Public Organic food Procurement for Youth – iPOPY” project. The report gives an overview of organic food and sustainable development within national core curriculum and examples of education about organic food and sustainability in the participating European countries: Denmark, Finland, Italy and Norway

    Development of a virtual reality milling machine for knowledge learning and skill training

    Get PDF
    Current methods of training personnel on high cost machine tools involve the use of both classroom and hands on practical training. The practical training required the operation of costly equipment and the trainee has to be under close personnel supervision. The main aim of this project is to reduce the amount of practical training and its inherent cost, time, danger, personal injury risk and material requirements by utilising a virtual reality technology. In this study, an investigation into the use of Virtual reality for training operators and students to use the Milling Machine was carried out. The investigation has been divided into two sections: first the development of Milling Machine in the 3D virtual environment, where the real machine was re-constructed in the virtual space. This has been carried out by creating objects and assembling them together. The complete Milling machine was then properly modelled and rendered so it could be viewed from all viewpoints. The second section was to add motion to the virtual world. The machine was made of functions as for the real machine. This was achieved by attaching Superscape Control Language (SCL) to the objects. The developed Milling machine allows the users to choose the material, speed and feed rate. Upon activation, the virtual machine will be simulated to carry out the machining process and instantaneous data on the machined part can be generated. The results were satisfactory, the Milling Machine was modelled successfully and the machine was able to perform according to task set. Using the developed Virtual Model, the ability for training students and operators to use the Milling Machine has been achieved
    corecore