24 research outputs found

    Evaluation of clustering results and novel cluster algorithms

    Get PDF
    Cluster analysis is frequently performed in many application fields to find groups in data. For example, in medicine, researchers have used gene expression data to cluster patients suffering from a particular disease (e.g., breast cancer), in order to detect new disease subtypes. Many cluster algorithms and methods for cluster validation, i.e., methods for evaluating the quality of cluster analysis results, have been proposed in the literature. However, open questions about the evaluation of both clustering results and novel cluster algorithms remain. It has rarely been discussed whether a) interesting clustering results or b) promising performance evaluations of newly presented cluster algorithms might be over-optimistic, in the sense that these good results cannot be replicated on new data or in other settings. Such questions are relevant in light of the so-called "replication crisis"; in various research disciplines such as medicine, biology, psychology, and economics, many results have turned out to be non-replicable, casting doubt on the trustworthiness and reliability of scientific findings. This crisis has led to increasing popularity of "metascience". Metascientific studies analyze problems that have contributed to the replication crisis (e.g., questionable research practices), and propose and evaluate possible solutions. So far, metascientific studies have mainly focused on issues related to significance testing. In contrast, this dissertation addresses the reliability of a) clustering results in applied research and b) results concerning newly presented cluster algorithms in the methodological literature. Different aspects of this topic are discussed in three Contributions. The first Contribution presents a framework for validating clustering results on validation data. Using validation data is vital to examine the replicability and generalizability of results. While applied researchers sometimes use validation data to check their clustering results, our article is the first to review the different approaches in the literature and to structure them in a systematic manner. We demonstrate that many classical cluster validation techniques, such as internal and external validation, can be combined with validation data. Our framework provides guidance to applied researchers who wish to evaluate their own clustering results or the results of other teams on new data. The second Contribution applies the framework from Contribution 1 to quantify over-optimistic bias in the context of a specific application field, namely unsupervised microbiome research. We analyze over-optimism effects which result from the multiplicity of analysis strategies for cluster analysis and network learning. The plethora of possible analysis strategies poses a challenge for researchers who are often uncertain about which method to use. Researchers might be tempted to try different methods on their dataset and look for the method yielding the "best" result. If only the "best" result is selectively reported, this may cause "overfitting" of the method to the dataset and the result might not be replicable on validation data. We quantify such over-optimism effects for four illustrative types of unsupervised research tasks (clustering of bacterial genera, hub detection in microbial association networks, differential network analysis, and clustering of samples). Contributions 1 and 2 consider the evaluation of clustering results and thus adopt a metascientific perspective on applied research. In contrast, the third Contribution is a metascientific study about methodological research on the development of new cluster algorithms. This Contribution analyzes the over-optimistic evaluation and reporting of novel cluster algorithms. As an illustrative example, we consider the recently proposed cluster algorithm "Rock"; initially deemed promising, it later turned out to be not generally better than its competitors. We demonstrate how Rock can nevertheless appear to outperform competitors via optimization of the evaluation design, namely the used data types, data characteristics, the algorithm’s parameters, and the choice of competing algorithms. The study is a cautionary tale that illustrates how easy it can be for researchers to claim apparent "superiority" of a new cluster algorithm. This, in turn, stresses the importance of strategies for avoiding the problems of over-optimism, such as neutral benchmark studies

    Visual and Camera Sensors

    Get PDF
    This book includes 13 papers published in Special Issue ("Visual and Camera Sensors") of the journal Sensors. The goal of this Special Issue was to invite high-quality, state-of-the-art research papers dealing with challenging issues in visual and camera sensors

    Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

    Full text link
    By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    Information Reliability on the Social Web - Models and Applications in Intelligent User Interfaces

    Get PDF
    The Social Web is undergoing continued evolution, changing the paradigm of information production, processing and sharing. Information sources have shifted from institutions to individual users, vastly increasing the amount of information available online. To overcome the information overload problem, modern filtering algorithms have enabled people to find relevant information in efficient ways. However, noisy, false and otherwise useless information remains a problem. We believe that the concept of information reliability needs to be considered along with information relevance to adapt filtering algorithms to today's Social Web. This approach helps to improve information search and discovery and can also improve user experience by communicating aspects of information reliability.This thesis first shows the results of a cross-disciplinary study into perceived reliability by reporting on a novel user experiment. This is followed by a discussion of modeling, validating, and communicating information reliability, including its various definitions across disciplines. A selection of important reliability attributes such as source credibility, competence, influence and timeliness are examined through different case studies. Results show that perceived reliability of information can vary greatly across contexts. Finally, recent studies on visual analytics, including algorithm explanations and interactive interfaces are discussed with respect to their impact on the perception of information reliability in a range of application domains

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
    corecore