19 research outputs found

    Fast Cross-Validation via Sequential Testing

    Full text link
    With the increasing size of today's data sets, finding the right parameter configuration in model selection via cross-validation can be an extremely time-consuming task. In this paper we propose an improved cross-validation procedure which uses nonparametric testing coupled with sequential analysis to determine the best parameter set on linearly increasing subsets of the data. By eliminating underperforming candidates quickly and keeping promising candidates as long as possible, the method speeds up the computation while preserving the capability of the full cross-validation. Theoretical considerations underline the statistical power of our procedure. The experimental evaluation shows that our method reduces the computation time by a factor of up to 120 compared to a full cross-validation with a negligible impact on the accuracy

    ASAP: Automatic semantics-aware analysis of network payloads

    Get PDF
    Automatic inspection of network payloads is a prerequisite for effective analysis of network communication. Security research has largely focused on network analysis using protocol specifications, for example for intrusion detection, fuzz testing and forensic analysis. The specification of a protocol alone, however, is often not sufficient for accurate analysis of communication, as it fails to reflect individual semantics of network applications. We propose a framework for semantics-aware analysis of network payloads which automaticylly extracts semantic components from recorded network traffic. Our method proceeds by mapping network payloads to a vector space and identifying semantic templates corresponding to base directions in the vector space. We demonstrate the efficacy of semantics-aware analysis in different security applications: automatic discovery of patterns in honeypot data, analysis of malware communication and network intrusion detection

    Ringo – an R/Bioconductor package for analyzing ChIP-chip readouts

    Get PDF
    Background: Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is a high-throughput assay for DNA-protein-binding or post-translational chromatin/histone modifications. However, the raw microarray intensity readings themselves are not immediately useful to researchers, but require a number of bioinformatic analysis steps. Identified enriched regions need to be bioinformatically annotated and compared to related datasets by statistical methods. Results: We present a free, open-source R package Ringo that facilitates the analysis of ChIP-chip experiments by providing functionality for data import, quality assessment, normalization and visualization of the data, and the detection of ChIP-enriched genomic regions. Conclusion: Ringo integrates with other packages of the Bioconductor project, uses common data structures and is accompanied by ample documentation. It facilitates the construction of programmed analysis workflows, offers benefits in scalability, reproducibility and methodical scope of the analyses and opens up a broad selection of follow-up statistical and bioinformatic methods

    Signal and response properties indicate an optoacoustic effect underlying the intra-cochlear laser-optical stimulation

    Get PDF
    Optical cochlea stimulation is under investigation as a potential alternative to conventional electric cochlea implants in treatment of sensorineural hearing loss. If direct optical stimulation of spiral ganglion neurons (SGNs) would be feasible, a smaller stimulation volume and, therefore, an improved frequency resolution could be achieved. However, it is unclear whether the mechanism of optical stimulation is based on direct neuronal stimulation or on optoacoustics. Animal studies on hearing vs. deafened guinea pigs already identified the optoacoustic effect as potential mechanism for intra-cochlear optical stimulation. In order to characterize the optoacoustic stimulus more thoroughly the acoustic signal along the beam path of a pulsed laser in water was quantified and compared to the neuronal response properties of hearing guinea pigs stimulated with the same laser parameters. Two pulsed laser systems were used for analyzing the influence of variable pulse duration, pulse energy, pulse peak power and absorption coefficient. Preliminary results of the experiments in water and in vivo suggesta similar dependency of response signals on the applied laser parameters: Both datasets show an onset and offset signal at the beginning and the end of the laser pulse. Further, the resulting signal amplitude depends on the pulse peak power as well as the temporal development of the applied laser pulse. The data indicates the maximum of the first derivative of power as the decisive factor. In conclusion our findings strengthen the hypothesis of optoacoustics as the underlying mechanism for optical stimulation of the cochlea. © SPIE 201

    Learning stateful models for network honeypots

    Full text link
    Attacks like call fraud and identity theft often involve so-phisticated stateful attack patterns which, on top of normal communication, try to harm systems on a higher semantic level than usual attack scenarios. To detect these kind of threats via specially deployed honeypots, at least a minimal understanding of the inherent state machine of a specific service is needed to lure potential attackers and to keep a communication for a sufficiently large number of steps. To this end we propose PRISMA, a method for protocol inspec-tion and state machine analysis, which infers a functional state machine and message format of a protocol from net-work traffic alone. We apply our method to three real-life network traces ranging from 10,000 up to 2 million mes-sages of both binary and textual protocols. We show that PRISMA is capable of simulating complete and correct ses-sions based on the learned models. A case study on malware traffic reveals the different states of the execution, rendering PRISMA a valuable tool for malware analysis

    Identification of Y-Box Binding Protein 1 As a Core Regulator of MEK/ERK Pathway-Dependent Gene Signatures in Colorectal Cancer Cells

    Get PDF
    Transcriptional signatures are an indispensible source of correlative information on disease-related molecular alterations on a genome-wide level. Numerous candidate genes involved in disease and in factors of predictive, as well as of prognostic, value have been deduced from such molecular portraits, e.g. in cancer. However, mechanistic insights into the regulatory principles governing global transcriptional changes are lagging behind extensive compilations of deregulated genes. To identify regulators of transcriptome alterations, we used an integrated approach combining transcriptional profiling of colorectal cancer cell lines treated with inhibitors targeting the receptor tyrosine kinase (RTK)/RAS/mitogen-activated protein kinase pathway, computational prediction of regulatory elements in promoters of co-regulated genes, chromatin-based and functional cellular assays. We identified commonly co-regulated, proliferation-associated target genes that respond to the MAPK pathway. We recognized E2F and NFY transcription factor binding sites as prevalent motifs in those pathway-responsive genes and confirmed the predicted regulatory role of Y-box binding protein 1 (YBX1) by reporter gene, gel shift, and chromatin immunoprecipitation assays. We also validated the MAPK-dependent gene signature in colorectal cancers and provided evidence for the association of YBX1 with poor prognosis in colorectal cancer patients. This suggests that MEK/ERK-dependent, YBX1-regulated target genes are involved in executing malignant properties

    The Cardiac Transcription Network Modulated by Gata4, Mef2a, Nkx2.5, Srf, Histone Modifications, and MicroRNAs

    Get PDF
    The transcriptome, as the pool of all transcribed elements in a given cell, is regulated by the interaction between different molecular levels, involving epigenetic, transcriptional, and post-transcriptional mechanisms. However, many previous studies investigated each of these levels individually, and little is known about their interdependency. We present a systems biology study integrating mRNA profiles with DNA–binding events of key cardiac transcription factors (Gata4, Mef2a, Nkx2.5, and Srf), activating histone modifications (H3ac, H4ac, H3K4me2, and H3K4me3), and microRNA profiles obtained in wild-type and RNAi–mediated knockdown. Finally, we confirmed conclusions primarily obtained in cardiomyocyte cell culture in a time-course of cardiac maturation in mouse around birth. We provide insights into the combinatorial regulation by cardiac transcription factors and show that they can partially compensate each other's function. Genes regulated by multiple transcription factors are less likely differentially expressed in RNAi knockdown of one respective factor. In addition to the analysis of the individual transcription factors, we found that histone 3 acetylation correlates with Srf- and Gata4-dependent gene expression and is complementarily reduced in cardiac Srf knockdown. Further, we found that altered microRNA expression in Srf knockdown potentially explains up to 45% of indirect mRNA targets. Considering all three levels of regulation, we present an Srf-centered transcription network providing on a single-gene level insights into the regulatory circuits establishing respective mRNA profiles. In summary, we show the combinatorial contribution of four DNA–binding transcription factors in regulating the cardiac transcriptome and provide evidence that histone modifications and microRNAs modulate their functional consequence. This opens a new perspective to understand heart development and the complexity cardiovascular disorders

    Probabilistische Methoden für die Netzwerksicherheit. Von der Analyse bis zur Reaktion

    No full text
    Computer-Netzwerke sind einer steigenden Bedrohung ausgesetzt, und neue Märkte wie Mobile Computing und Tablet-PCs eröffnen dem Cybercrime immer neue Angriffspunkte. Formale Methoden versprechen zwar beweisbaren Schutz, sind jedoch häufig aufwendig in der Implementierung. Um die Sicherheitslücken zu schließen, werden daher flexiblere und effektivere Methoden benötigt, die mit der Fortentwicklung des Bedrohungspotenzials Schritt halten können. Ausgehend von den Einzelschritten des Sicherheitsprozesses wird in dieser Arbeit gezeigt, wie Verfahren aus der Wahrscheinlichkeitstheorie und Methoden aus dem maschinellen Lernen effektiv verbunden und eingesetzt werden können, um den Sicherheitsprozess zu unterstützen: Im Bereich der Analyse werden durch statistische Tests sinnvolle Repräsentationen für Netzwerkverkehr gefunden, die es ermöglichen, Methoden des maschinellen Lernens zur Extraktion von Mustern anzuwenden. In Kombination mit Markov-Modellen zur Abbildung der abstrakten Statusmaschine des Netzwerkdienstes können somit sowohl die dynamischen als auch die statischen Aspekte von Netzwerkkommunikation probabilistisch erfasst werden. Für die Detektion von Angriffen wird eine schnelle Modellselektion präsentiert, die es ermöglicht, die korrekten Parameter für einen Lernalgorithmus sehr viel schneller als mit der herkömmlichen Kreuzvalidierung zu finden. Durch das sequentielle Lernen auf Untermengen der Trainingsdaten wird ein probabilistisches Modell des Verhaltens der einzelnen Parameter erstellt, das mittels robuster statistischer Tests kontinuierlich verkleinert wird, bis eine Konfiguration als signifikanter Sieger feststeht. Im Bereich der Reaktion auf Angriffe wird eine intelligente Web-Application Firewall vorgestellt, die mittels spezialisierter Einbettungen für einzelne Teile einer HTTP-Anfrage deren bösartige Teile extrahieren und durch gutartige Inhalte ersetzen kann. Methoden der Anomalieerkennung zusammen mit statistischen Tests führen zu einer vollautomatisierten Konfiguration anhand einer gegebenen Trainingsmenge. Bei allen Lösungen konnte ein generelles Prinzip ähnlich den Design Patterns im Software-Engineering gewinnbringend zum Einsatz gebracht werden: Durch sorgfältige Zerlegung des Problems in unabhängige Schichten können lokale Probleme effizient gelöst werden. Beginnend mit der physikalischen Vorverarbeitung werden die Ursprungsdaten so aufbereitet, dass sie mit Hilfe der probabilistischen Vorverarbeitung in geeignete, spezialisierte Räume eingebettet werden können. Darauf basierend können im folgenden probabilistischen Modellierungsschritt die Daten mit Methodiken des maschinellen Lernens und der Statistik abgebildet werden, sodass probabilistische Inferenz betrieben werden kann. Open Source Implementierungen der wichtigsten Teile dieser Arbeit unterstreichen die Notwendigkeit frei zugänglicher Lösungen zur effizienten Weiterentwicklung und Forschung im Bereich der Netzwerksicherheit.Today's computer networks are constantly under attack: apart from a continuous amount of network security threats like denial of service attacks or cross-site request forgeries, new markets like mobile computing and tablets open up new venues for fraudulent monetary exploitation. Thus, cybercrime is a steady driving force for constant malicious innovations which keep on increasing the security gap. While formal methods promise perfect protection, they are often hard to implement and very time-consuming. Hence, they can not keep up with the pace of the modern threat landscape and different, more flexible solutions have to be found. This thesis shows, how methods from statistics and machine learning can improve the security cycle of analysis, detection and response to threats. By carefully layering probabilistic methods and machine learning techniques we find a design pattern similar to best practice in software engineering: Dividing the overall problem modeling process into physical preprocessing, probabilistic preprocessing and probabilistic modeling we arrive at solid solutions for pressing security problems. For the analysis of network security problems we devise a fully automated procedure for protocol inference and state machine analysis of arbitrary network traces. By first transforming the raw network traffic into session information and embedding the network messages into problem-dependent vector spaces via statistical feature selection tests, we are able to extract complete and correct message templates and abstract state machines of the underlying services leveraging specialized, replicate-aware matrix factorization methods in combination with Markov models. To support the detection of network security threats we construct a fast model selection procedure which is capable of choosing the correct learning parameters while saving up to two orders of magnitude of computation time. Deconstructing the overall process into substeps in combination with robust testing procedures leads to a statistically controlled model selection process. We show the applicability of our concepts in the domain of intrusion response with an intelligent web application firewall which is able to "heal" malicious HTTP requests by cutting out suspicious tokens and replacing them with harmless counterparts, therefore actively protecting the web server. Open source implementation of major parts of this thesis underline the necessity for freely available solutions to foster future development in the implementation-heavy domain of network computer security
    corecore