1,830 research outputs found

    The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

    Full text link
    Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy

    Comprehensible and Robust Knowledge Discovery from Small Datasets

    Get PDF
    Die Wissensentdeckung in Datenbanken (“Knowledge Discovery in Databases”, KDD) zielt darauf ab, nützliches Wissen aus Daten zu extrahieren. Daten können eine Reihe von Messungen aus einem realen Prozess repräsentieren oder eine Reihe von Eingabe- Ausgabe-Werten eines Simulationsmodells. Zwei häufig widersprüchliche Anforderungen an das erworbene Wissen sind, dass es (1) die Daten möglichst exakt zusammenfasst und (2) in einer gut verständlichen Form vorliegt. Entscheidungsbäume (“Decision Trees”) und Methoden zur Entdeckung von Untergruppen (“Subgroup Discovery”) liefern Wissenszusammenfassungen in Form von Hyperrechtecken; diese gelten als gut verständlich. Um die Bedeutung einer verständlichen Datenzusammenfassung zu demonstrieren, erforschen wir Dezentrale intelligente Netzsteuerung — ein neues System, das die Bedarfsreaktion in Stromnetzen ohne wesentliche Änderungen in der Infrastruktur implementiert. Die bisher durchgeführte konventionelle Analyse dieses Systems beschränkte sich auf die Berücksichtigung identischer Teilnehmer und spiegelte daher die Realität nicht ausreichend gut wider. Wir führen viele Simulationen mit unterschiedlichen Eingabewerten durch und wenden Entscheidungsbäume auf die resultierenden Daten an. Mit den daraus resultierenden verständlichen Datenzusammenfassung konnten wir neue Erkenntnisse zum Verhalten der Dezentrale intelligente Netzsteuerung gewinnen. Entscheidungsbäume ermöglichen die Beschreibung des Systemverhaltens für alle Eingabekombinationen. Manchmal ist man aber nicht daran interessiert, den gesamten Eingaberaum zu partitionieren, sondern Bereiche zu finden, die zu bestimmten Ausgabe führen (sog. Untergruppen). Die vorhandenen Algorithmen zum Erkennen von Untergruppen erfordern normalerweise große Datenmengen, um eine stabile und genaue Ausgabe zu erzielen. Der Datenerfassungsprozess ist jedoch häufig kostspielig. Unser Hauptbeitrag ist die Verbesserung der Untergruppenerkennung aus Datensätzen mit wenigen Beobachtungen. Die Entdeckung von Untergruppen in simulierten Daten wird als Szenarioerkennung bezeichnet. Ein häufig verwendeter Algorithmus für die Szenarioerkennung ist PRIM (Patient Rule Induction Method). Wir schlagen REDS (Rule Extraction for Discovering Scenarios) vor, ein neues Verfahren für die Szenarioerkennung. Für REDS, trainieren wir zuerst ein statistisches Zwischenmodell und verwenden dieses, um eine große Menge neuer Daten für PRIM zu erstellen. Die grundlegende statistische Intuition beschrieben wir ebenfalls. Experimente zeigen, dass REDS viel besser funktioniert als PRIM für sich alleine: Es reduziert die Anzahl der erforderlichen Simulationsläufe um 75% im Durchschnitt. Mit simulierten Daten hat man perfekte Kenntnisse über die Eingangsverteilung — eine Voraussetzung von REDS. Um REDS auf realen Messdaten anwendbar zu machen, haben wir es mit Stichproben aus einer geschätzten multivariate Verteilung der Daten kombiniert. Wir haben die resultierende Methode in Kombination mit verschiedenen Methoden zur Generierung von Daten experimentell evaluiert. Wir haben dies für PRIM und BestInterval — eine weitere repräsentative Methode zur Erkennung von Untergruppen — gemacht. In den meisten Fällen hat unsere Methodik die Qualität der entdeckten Untergruppen erhöht

    An Examination of E-Banking Fraud Prevention and Detection in Nigerian Banks

    Get PDF
    E-banking offers a number of advantages to financial institutions, including convenience in terms of time and money. However, criminal activities in the information age have changed the way banking operations are performed. This has made e-banking an area of interest. The growth of cybercrime – particularly hacking, identity theft, phishing, Trojans, service denial attacks and account takeover– has created several challenges for financial institutions, especially regarding how they protect their assets and prevent their customers from becoming victims of cyber fraud. These criminal activities have remained prevalent due to certain features of cyber, such as the borderless nature of the internet and the continuous growth of the computer networks. Following these identified challenges for financial institutions, this study examines e-banking fraud prevention and detection in the Nigerian banking sector; particularly the current nature, impacts, contributing factors, and prevention and detection mechanisms of e-banking fraud in Nigerian banking institutions. This study adopts mixed research methods with the aid of descriptive and inferential analysis, which comprised exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) for the quantitative data analysis, whilst thematic analysis was used for the qualitative data analysis. The theoretical framework was informed by Routine Activity Theory (RAT) and Fraud Management Lifecycle Theory (FMLT). The findings show that the factors contributing to the increase in e-banking fraud in Nigeria include ineffective banking operations, internal control issues, lack of customer awareness and bank staff training and education, inadequate infrastructure, presence of sophisticated technological tools in the hands of fraudsters, negligence of banks’ customers concerning their e-banking account devices, lack of compliance with the banking rules and regulations, and ineffective legal procedure and law enforcement. In addition, the enforcement of rules and regulations in relation to the prosecution of financial fraudsters has been passive in Nigeria. Moreover, the findings also show that the activities of each stage of fraud management lifecycle theory are interdependent and have a collective and considerable influence on combating e-banking fraud. The results of the findings confirm that routine activity theory is a real-world theoretical framework while applied to e-banking fraud. Also, from the analysis of the findings, this research offers a new model for e-banking fraud prevention and detection within the Nigerian banking sector. This new model confirms that to have perfect prevention and detection of e-banking fraud, there must be a presence of technological mechanisms, fraud monitoring, effective internal controls, customer complaints, whistle-blowing, surveillance mechanisms, staff-customer awareness and education, legal and judicial controls, institutional synergy mechanisms of in the banking systems. Finally, the findings from the analyses of this study have some significant implications; not only for academic researchers or scholars and accounting practitioners, but also for policymakers in the financial institutions and anti-fraud agencies in both the private and public sectors

    Computer vision and machine learning for medical image analysis: recent advances, challenges, and way forward.

    Get PDF
    The recent development in the areas of deep learning and deep convolutional neural networks has significantly progressed and advanced the field of computer vision (CV) and image analysis and understanding. Complex tasks such as classifying and segmenting medical images and localising and recognising objects of interest have become much less challenging. This progress has the potential of accelerating research and deployment of multitudes of medical applications that utilise CV. However, in reality, there are limited practical examples being physically deployed into front-line health facilities. In this paper, we examine the current state of the art in CV as applied to the medical domain. We discuss the main challenges in CV and intelligent data-driven medical applications and suggest future directions to accelerate research, development, and deployment of CV applications in health practices. First, we critically review existing literature in the CV domain that addresses complex vision tasks, including: medical image classification; shape and object recognition from images; and medical segmentation. Second, we present an in-depth discussion of the various challenges that are considered barriers to accelerating research, development, and deployment of intelligent CV methods in real-life medical applications and hospitals. Finally, we conclude by discussing future directions

    Kekulescope: Prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images

    Get PDF
    The application of convolutional neural networks (ConvNets) to harness high-content screening images or 2D compound representations is gaining increasing attention in drug discovery. However, existing applications often require large data sets for training, or sophisticated pretraining schemes. Here, we show using 33 IC50 data sets from ChEMBL 23 that the in vitro activity of compounds on cancer cell lines and protein targets can be accurately predicted on a continuous scale from their Kekule structure representations alone by extending existing architectures, which were pretrained on unrelated image data sets. We show that the predictive power of the generated models is comparable to that of Random Forest (RF) models and fully-connected Deep Neural Networks trained on circular (Morgan) fingerprints. Notably, including additional fully-connected layers further increases the predictive power of the ConvNets by up to 10%. Analysis of the predictions generated by RF models and ConvNets shows that by simply averaging the output of the RF models and ConvNets we obtain significantly lower errors in prediction for multiple data sets, although the effect size is small, than those obtained with either model alone, indicating that the features extracted by the convolutional layers of the ConvNets provide complementary predictive signal to Morgan fingerprints. Lastly, we show that multi-task ConvNets trained on compound images permit to model COX isoform selectivity on a continuous scale with errors in prediction comparable to the uncertainty of the data. Overall, in this work we present a set of ConvNet architectures for the prediction of compound activity from their Kekule structure representations with state-of-the-art performance, that require no generation of compound descriptors or use of sophisticated image processing techniques

    Interpretable machine learning for genomics

    Get PDF
    High-throughput technologies such as next-generation sequencing allow biologists to observe cell function with unprecedented resolution, but the resulting datasets are too large and complicated for humans to understand without the aid of advanced statistical methods. Machine learning (ML) algorithms, which are designed to automatically find patterns in data, are well suited to this task. Yet these models are often so complex as to be opaque, leaving researchers with few clues about underlying mechanisms. Interpretable machine learning (iML) is a burgeoning subdiscipline of computational statistics devoted to making the predictions of ML models more intelligible to end users. This article is a gentle and critical introduction to iML, with an emphasis on genomic applications. I define relevant concepts, motivate leading methodologies, and provide a simple typology of existing approaches. I survey recent examples of iML in genomics, demonstrating how such techniques are increasingly integrated into research workflows. I argue that iML solutions are required to realize the promise of precision medicine. However, several open challenges remain. I examine the limitations of current state-of-the-art tools and propose a number of directions for future research. While the horizon for iML in genomics is wide and bright, continued progress requires close collaboration across disciplines
    • …
    corecore