258 research outputs found

    Implementation of robust image artifact removal in SWarp through clipped mean stacking

    Full text link
    We implement an algorithm for detecting and removing artifacts from astronomical images by means of outlier rejection during stacking. Our method is capable of addressing both small, highly significant artifacts such as cosmic rays and, by applying a filtering technique to generate single frame masks, larger area but lower surface brightness features such as secondary (ghost) images of bright stars. In contrast to the common method of building a median stack, the clipped or outlier-filtered mean stacked point-spread function (PSF) is a linear combination of the single frame PSFs as long as the latter are moderately homogeneous, a property of great importance for weak lensing shape measurement or model fitting photometry. In addition, it has superior noise properties, allowing a significant reduction in exposure time compared to median stacking. We make publicly available a modified version of SWarp that implements clipped mean stacking and software to generate single frame masks from the list of outlier pixels.Comment: PASP accepted; software for download at http://www.usm.uni-muenchen.de/~dgruen

    La détection d'anomalies comme outil de renforcement d'analyse des données et de prédiction dans l'éducation

    Get PDF
    Les établissements d'enseignement cherchent à concevoir des mécanismes efficaces pour améliorer les résultats scolaires, renforcer le processus d'apprentissage et éviter l'abandon scolaire. L'analyse et la prédiction des performances des étudiants au cours de leurs études peuvent mettre en évidence certaines lacunes d'une formation et détecter les étudiants ayant des problèmes d'apprentissage. Il s'agit donc de développer des techniques et des modèles basés sur des données qui visent à améliorer l'enseignement et l'apprentissage. Les modèles classiques ignorent généralement les étudiants présentant des comportements et incohérences inhabituels, bien qu'ils puissent fournir des informations importantes aux experts du domaine et améliorer les modèles de prédiction. Les profils atypiques dans l'éducation sont à peine explorés et leur impact sur les modèles de prédiction n'a pas encore été étudié dans la littérature. Cette thèse vise donc à étudier les valeurs anormales dans les données éducatives et à étendre les connaissances existantes à leur sujet. La thèse présente trois études de cas de détection de données anormales pour différents contextes éducatifs et modes de représentation des données (jeu de données numériques pour une université allemande, jeu de données numériques pour une université russe, jeu de données séquentiel pour les écoles d'infirmières françaises). Pour chaque cas, l'approche de prétraitement des données est proposée en tenant compte des particularités du jeu de données. Les données préparées ont été utilisées pour détecter les valeurs anormales dans des conditions de vérité terrain inconnue. Les caractéristiques des valeurs anormales détectées ont été explorées et analysées, ce qui a permis d'étendre les connaissances sur le comportement des étudiants dans un processus d'apprentissage. L'une des principales tâches dans le domaine de l'éducation est de développer des mécanismes essentiels qui permettront d'améliorer les résultats scolaires et de réduire l'abandon scolaire. Ainsi, il est nécessaire de construire des modèles de prédiction de performance qui sont capables de détecter les étudiants ayant des problèmes d'apprentissage, qui ont besoin d'une aide spéciale. Le deuxième objectif de la thèse est d'étudier l'impact des valeurs anormales sur les modèles de prédiction. Nous avons considéré deux des tâches de prédiction les plus courantes dans le domaine de l'éducation: (i) la prédiction de l'abandon scolaire, (ii) la prédiction du score final. Les modèles de prédiction ont été comparés en fonction de différents algorithmes de prédiction et de la présence de valeurs anormales dans les données d'entraînement. Cette thèse ouvre de nouvelles voies pour étudier les performances des élèves dans les environnements éducatifs. La compréhension des valeurs anormales et des raisons de leur apparition peut aider les experts du domaine à extraire des informations précieuses des données. La détection des valeurs aberrantes pourrait faire partie du pipeline des systèmes d'alerte précoce pour détecter les élèves à haut risque d'abandon. De plus, les tendances comportementales des valeurs aberrantes peuvent servir de base pour fournir des recommandations aux étudiants dans leurs études ou prendre des décisions concernant l'amélioration du processus éducatif.Educational institutions seek to design effective mechanisms that improve academic results, enhance the learning process, and avoid dropout. The performance analysis and performance prediction of students in their studies may show drawbacks in the educational formations and detect students with learning problems. This induces the task of developing techniques and data-based models which aim to enhance teaching and learning. Classical models usually ignore the students-outliers with uncommon and inconsistent characteristics although they may show significant information to domain experts and affect the prediction models. The outliers in education are barely explored and their impact on the prediction models has not been studied yet in the literature. Thus, the thesis aims to investigate the outliers in educational data and extend the existing knowledge about them. The thesis presents three case studies of outlier detection for different educational contexts and ways of data representation (numerical dataset for the German University, numerical dataset for the Russian University, sequential dataset for French nurse schools). For each case, the data preprocessing approach is proposed regarding the dataset peculiarities. The prepared data has been used to detect outliers in conditions of unknown ground truth. The characteristics of detected outliers have been explored and analysed, which allowed extending the comprehension of students' behaviour in a learning process. One of the main tasks in the educational domain is to develop essential tools which will help to improve academic results and reduce attrition. Thus, plenty of studies aim to build models of performance prediction which can detect students with learning problems that need special help. The second goal of the thesis is to study the impact of outliers on prediction models. The two most common prediction tasks in the educational field have been considered: (i) dropout prediction, (ii) the final score prediction. The prediction models have been compared in terms of different prediction algorithms and the presence of outliers in the training data. This thesis opens new avenues to investigate the students' performance in educational environments. The understanding of outliers and the reasons for their appearance can help domain experts to extract valuable information from the data. Outlier detection might be a part of the pipeline in the early warning systems of detecting students with a high risk of dropouts. Furthermore, the behavioral tendencies of outliers can serve as a basis for providing recommendations for students in their studies or making decisions about improving the educational process

    D-AREdevil: a novel approach for discovering disease-associated rare cell populations in mass cytometry data

    Get PDF
    Background: The advances in single-cell technologies such as mass cytometry provides increasing resolution of the complexity of cellular samples, allowing researchers to deeper investigate and understand the cellular heterogeneity and possibly detect and discover previously undetectable rare cell populations. The identification of rare cell populations is of paramount importance for understanding the onset, progression and pathogenesis of many diseases. However, their identification remains challenging due to the always increasing dimensionality and throughput of the data generated. Aim: This study aimed at implementing a straightforward approach that efficiently supports a data analyst to identify disease-associated rare cell populations in large and complex biological samples and within reasonable limits of time and computational infrastructure. Methods: We proposed a novel computational framework called D-AREdevil (disease- associated rare cells detection) for cytometry datasets. The main characteristic of our computational framework is the combination of an anomaly detection algorithm (i.e. LOF, or FiRE) that provides a continuous score for individual cells with one of the best performing and fastest unsupervised clustering methods (i.e. FlowSOM). In our approach, the LOF score serves to select a set of candidate cells belonging to one or more subgroups of similar rare cell populations. Then, we tested these subgroups of rare cells for association with a patient group, disease type, clinical outcome or other characteristic of interest. Results: We reported in this study the properties and implementation of D-AREdevil and presented an evaluation of its performances and applications on three different testing datasets based on mass cytometry data. We generated data mixed with one or more known rare cell populations at varying frequencies (below 1%) and tested the ability of our approach to identify those cells in order to bring them to the attention of the data analyst. This is a key step in the process of finding cell subgroups that are associated with a disease or outcome of interest, when their existence and identification is not previously known and has yet to be discovered. Conclusions: We proposed a novel computational framework with demostrated good sensitivity and precision in detecting target rare cell poopulations present at very low frequencies in the total datasets (<1%). -- Contexte: Les avancées en technologies sur cellules individuelles telles que la cytométrie de masse offrent une meilleure résolution de la complexité des échantillons cellulaires, permettant aux chercheurs d’étudier et de comprendre plus en profondeur l’hétérogénéité cellulaire et éventuellement de détecter et découvrir des populations de cellules rares auparavant indétectables. L’identification de populations de cellules rares est importante pour comprendre l’apparition, la progression et la pathogenèse de nombreuses maladies. Cependant, leur identification reste difficile en raison de la haute dimensionnalité et du débit toujours croissants de données générées. But: Cette étude met en œuvre une approche simple et efficace pour identifier des populations de cellules rares associées à une maladie dans des échantillons biologiques vastes et complexes dans des limites de temps et d’infrastructure de calcul raisonnables. Méthodes: Nous proposons un nouveau cadre de calcul appelé D-AREdevil (détection de cellules rares associées à une maladie) pour l’analyse de données de cytométrie de masse. La principale caractéristique de notre cadre computationnel est la combinaison d’un algorithme de détection d’anomalies (LOF ou FiRE) qui fournit un score continu pour chaque cellule avec l’une des méthodes de regroupement non-supervisé les plus performantes et les plus rapides (FlowSOM). Dans notre approche, le score LOF sert à sélectionner un ensemble de cellules candidates appartenant à un ou plusieurs sous-groupes de populations de cellules rares similaires. Ensuite, nous testons ces sous-groupes de cellules rares pour déterminer s’ils sont associées avec un groupe de patients, un type de maladie, un résultat clinique ou une autre caractéristique d’intérêt. Résultats: Dans cette étude, nous avons rapporté les propriétés et l’implémentation de D-AREdevil, et présenté une évaluation de ses performances et applications sur trois jeux de données différents de cytométrie de masse. Nous avons généré des données mélangées contenant une ou plusieurs populations de cellules rares connues à des fréquences variables (inférieures à 1%) et nous avons testé la capacité de notre approche à identifier ces cellules afin de les porter à l’attention de l’analyste. Il s’agit là d’une étape clé dans le processus de recherche de sous-groupes de cellules qui sont associés à une maladie ou à un résultat d’intérêt qui est encore inconnu. Conclusions: Nous proposons un nouveau cadre de calcul avec une bonne sensibilité et une bonne précision dans la détection de cellules rares qui sont présentes à de très basses fréquences dans l’ensemble des données (<1%)

    Deep Weakly-supervised Anomaly Detection

    Full text link
    Anomaly detection is typically posited as an unsupervised learning task in the literature due to the prohibitive cost and difficulty to obtain large-scale labeled anomaly data, but this ignores the fact that a very small number (e.g.,, a few dozens) of labeled anomalies can often be made available with small/trivial cost in many real-world anomaly detection applications. To leverage such labeled anomaly data, we study an important anomaly detection problem termed weakly-supervised anomaly detection, in which, in addition to a large amount of unlabeled data, a limited number of labeled anomalies are available during modeling. Learning with the small labeled anomaly data enables anomaly-informed modeling, which helps identify anomalies of interest and address the notorious high false positives in unsupervised anomaly detection. However, the problem is especially challenging, since (i) the limited amount of labeled anomaly data often, if not always, cannot cover all types of anomalies and (ii) the unlabeled data is often dominated by normal instances but has anomaly contamination. We address the problem by formulating it as a pairwise relation prediction task. Particularly, our approach defines a two-stream ordinal regression neural network to learn the relation of randomly sampled instance pairs, i.e., whether the instance pair contains two labeled anomalies, one labeled anomaly, or just unlabeled data instances. The resulting model effectively leverages both the labeled and unlabeled data to substantially augment the training data and learn well-generalized representations of normality and abnormality. Comprehensive empirical results on 40 real-world datasets show that our approach (i) significantly outperforms four state-of-the-art methods in detecting both of the known and previously unseen anomalies and (ii) is substantially more data-efficient.Comment: Theoretical results are refined and extended. Significant more empirical results are added, including results on detecting previously unknown anomalie

    More planetary candidates from K2 Campaign 5 by tran_k2

    Get PDF
    CONTEXT. The exquisite precision of the space-based photometric surveys and the unavoidable presence of instrumental systematics and intrinsic stellar variability call for the development of sophisticated methods that separate these signal components from those caused by planetary transits. AIMS. Here we introduce tran_k2 a stand-alone Fortran code to search for planetary transits under the colored noise of stellar variability and instrumental effects. With this code we perform a survey for new candidates. METHODS. Stellar variability is represented by a Fourier series, and, if needed, by an autoregressive model to avoid excessive Gibbs overshoots at the edges. For the treatment of systematics, cotrending and external parameter decorrelation are employed by using cotrending stars with low stellar variability, the chip position and the background flux level at the target. The filtering is made within the framework of the standard weighted least squares, where the weights are determined iteratively, to allow robust fit and separate the transit signal from stellar variability and systematics. Once the periods of the transit components are determined from the filtered data by the box-fitting least squares method, we reconstruct the full signal and determine the transit parameters with a higher accuracy. This step greatly reduces the excessive attenuation of the transit depths and minimizes shape deformation. RESULTS. The code was tested on the field of Campaign 5 of the K2 mission. We detected 98% of the systems with all their candidate planets reported earlier by other authors, surveyed the whole field and discovered 15 new systems. Additional 3 planets were found in 3 multiplanetary systems and 2 more planets were found in a previously known single planet system.Comment: 1.15 Mb, 18 pages, after the 1st referee report from A&

    Ten years of active learning techniques and object detection: a systematic review

    Get PDF
    Object detection (OD) coupled with active learning (AL) has emerged as a powerful synergy in the field of computer vision, harnessing the capabilities of machine learning (ML) to automatically identify and perform image-based objects localisation while actively engaging human expertise to iteratively enhance model performance and foster machine-based knowledge expansion. Their prior success, demonstrated in a wide range of fields (e.g., industry and medicine), motivated this work, in which a comprehensive and systematic review of OD and AL techniques was carried out, considering reputed technical/scientific publication databases—such as ScienceDirect, IEEE, PubMed, and arXiv—and a temporal range between 2010 and December 2022. The primary inclusion criterion for papers in this review was the application of AL techniques for OD tasks, regardless of the field of application. A total of 852 articles were analysed, and 60 articles were included after full screening. Among the remaining ones, relevant topics such as AL sampling strategies used for OD tasks and groups categorisation can be found, along with details regarding the deep neural network architectures employed, application domains, and approaches used to blend learning techniques with those sampling strategies. Furthermore, an analysis of the geographical distribution of OD researchers across the globe and their affiliated organisations was conducted, providing a comprehensive overview of the research landscape in this field. Finally, promising research opportunities to enhance the AL process were identified, including the development of novel sampling strategies and their integration with different learning techniques.This research was funded by the RRP- Recovery and Resilience Plan and the European NextGeneration EU Funds, within the scope of the Mobilizing Agendas for Business Innovation, under reference C644937233-00000047 and by the Vine&Wine Portugal Project, co-financed by the RRP- Recovery and Resilience Plan and the European NextGeneration EU Funds, within the scope of the Mobilizing Agendas for Reindustrialization, under reference C644866286-00000011

    Unshifted Metastable He I* Mini-Broad Absorption Line System in the Narrow Line Type 1 Quasar SDSS J080248.18++551328.9

    Full text link
    We report the identification of an unusual absorption line system in the quasar SDSS J080248.18++551328.9 and present a detailed study of the system, incorporating follow-up optical and NIR spectroscopy. A few tens of absorption lines are detected, including He I*, Fe II* and Ni II* that arise from metastable or excited levels, as well as resonant lines in Mg I, Mg II, Fe II, Mn II, and Ca II. All of the isolated absorption lines show the same profile of width Δv1,500\Delta v\sim 1,500km s1^{-1} centered at a common redshift as that of the quasar emission lines, such as [O II], [S II], and hydrogen Paschen and Balmer series. With narrow Balmer lines, strong optical Fe II multiplets, and weak [O III] doublets, its emission line spectrum is typical for that of a narrow-line Seyfert 1 galaxy (NLS1). We have derived reliable measurements of the gas-phase column densities of the absorbing ions/levels. Photoionization modeling indicates that the absorber has a density of nH(1.02.5)×105 cm3n_{\rm H} \sim (1.0-2.5)\times 10^5~ {\rm cm}^{-3} and a column density of NH(1.03.2)×1021cm2N_{\rm H} \sim (1.0-3.2)\times 10^{21} \sim {\rm cm}^{-2}, and is located at R100250R\sim100-250 pc from the central super-massive black hole. The location of the absorber, the symmetric profile of the absorption lines, and the coincidence of the absorption and emission line centroid jointly suggest that the absorption gas is originated from the host galaxy and is plausibly accelerated by stellar processes, such as stellar winds \zhy{and/or} supernova explosions. The implications for the detection of such a peculiar absorption line system in an NLS1 are discussed in the context of co-evolution between super-massive black hole growth and host galaxy build-up.Comment: 28 pages, 16 figures; accepted for publication in Astrophysical Journa
    corecore