Search CORE

555 research outputs found

Machine Learning Models for High-dimensional Biomedical Data

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields. The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner. The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability. The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability. The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.Dissertation/ThesisDoctoral Dissertation Industrial Engineering 201

ASU Digital Repository

Extending the Finite Domain Solver of GNU Prolog

Author: Abreu Salvador
Diaz Daniel
Vincent Bloemen
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceThis paper describes three significant extensions for the Finite Domain solver of GNU Prolog. First, the solver now supports negative integers. Second, the solver detects and prevents integer overflows from occurring. Third, the internal representation of sparse domains has been redesigned to overcome its current limitations. The preliminary performance evaluation shows a limited slowdown factor with respect to the initial solver. This factor is widely counterbalanced by the new possibilities and the robustness of the solver. Furthermore these results are preliminary and we propose some directions to limit this overhead

HAL-Paris1

Scalable Techniques for Anomaly Detection

Author: Yadav Sandeep 1985-
Publication venue
Publication date: 14/03/2013
Field of study

Computer networks are constantly being attacked by malicious entities for various reasons. Network based attacks include but are not limited to, Distributed Denial of Service (DDoS), DNS based attacks, Cross-site Scripting (XSS) etc. Such attacks have exploited either the network protocol or the end-host software vulnerabilities for perpetration. Current network traffic analysis techniques employed for detection and/or prevention of these anomalies suffer from significant delay or have only limited scalability because of their huge resource requirements. This dissertation proposes more scalable techniques for network anomaly detection. We propose using DNS analysis for detecting a wide variety of network anomalies. The use of DNS is motivated by the fact that DNS traffic comprises only 2-3% of total network traffic reducing the burden on anomaly detection resources. Our motivation additionally follows from the observation that almost any Internet activity (legitimate or otherwise) is marked by the use of DNS. We propose several techniques for DNS traffic analysis to distinguish anomalous DNS traffic patterns which in turn identify different categories of network attacks. First, we present MiND, a system to detect misdirected DNS packets arising due to poisoned name server records or due to local infections such as caused by worms like DNSChanger. MiND validates misdirected DNS packets using an externally collected database of authoritative name servers for second or third-level domains. We deploy this tool at the edge of a university campus network for evaluation. Secondly, we focus on domain-fluxing botnet detection by exploiting the high entropy inherent in the set of domains used for locating the Command and Control (C&C) server. We apply three metrics namely the Kullback-Leibler divergence, the Jaccard Index, and the Edit distance, to different groups of domain names present in Tier-1 ISP DNS traces obtained from South Asia and South America. Our evaluation successfully detects existing domain-fluxing botnets such as Conficker and also recognizes new botnets. We extend this approach by utilizing DNS failures to improve the latency of detection. Alternatively, we propose a system which uses temporal and entropy-based correlation between successful and failed DNS queries, for fluxing botnet detection. We also present an approach which computes the reputation of domains in a bipartite graph of hosts within a network, and the domains accessed by them. The inference technique utilizes belief propagation, an approximation algorithm for marginal probability estimation. The computation of reputation scores is seeded through a small fraction of domains found in black and white lists. An application of this technique, on an HTTP-proxy dataset from a large enterprise, shows a high detection rate with low false positive rates

Texas A&M Repository

Recommended from our members

Adaptive intra refresh for robust wireless multi-view video

Author: Lawan Sagir
Publication venue: Brunel University London
Publication date: 01/01/2016
Field of study

This thesis was submitted for the award of PhD and was awarded by Brunel University LondonMobile wireless communication technology is a fast developing field and every day new mobile communication techniques and means are becoming available. In this thesis multi-view video (MVV) is also refers to as 3D video. Thus, the 3D video signals through wireless communication are shaping telecommunication industry and academia. However, wireless channels are prone to high level of bit and burst errors that largely deteriorate the quality of service (QoS). Noise along the wireless transmission path can introduce distortion or make a compressed bitstream lose vital information. The error caused by noise progressively spread to subsequent frames and among multiple views due to prediction. This error may compel the receiver to pause momentarily and wait for the subsequent INTRA picture to continue decoding. The pausing of video stream affects the user's Quality of Experience (QoE). Thus, an error resilience strategy is needed to protect the compressed bitstream against transmission errors. This thesis focuses on error resilience Adaptive Intra Refresh (AIR) technique. The AIR method is developed to make the compressed 3D video more robust to channel errors. The process involves periodic injection of Intra-coded macroblocks in a cyclic pattern using H.264/AVC standard. The algorithm takes into account individual features in each macroblock and the feedback information sent by the decoder about the channel condition in order to generate an MVV-AIR map. MVV-AIR map generation regulates the order of packets arrival and identifies the motion activities in each macroblock. Based on the level of motion activity contained in each macroblock, the MVV-AIR map classifies frames as high or low motion macroblocks. A proxy MVV-AIR transcoder is used to validate the efficiency of the generated MVV-AIR map. The MVV-AIR transcoding algorithm uses spatial and views downscaling scheme to convert from MVV to single view. Various experimental results indicate that the proposed error resilient MVV-AIR transcoder technique effectively improves the quality of reconstructed 3D video in wireless networks. A comparison of MVV-AIR transcoder algorithm with some traditional error resilience techniques demonstrates that MVV-AIR algorithm performs better in an error prone channel. Results of simulation revealed significant improvements in both objective and subjective qualities. No additional computational complexity emanates from the scheme while the QoS and QoE requirements are still fully met.Tertiary Institution Trust Fund (TETFund) of Nigeri

Brunel University Research Archive

Advanced Threat Intelligence: Interpretation of Anomalous Behavior in Ubiquitous Kernel Processes

Author: Luh Robert
Publication venue: Faculty of Computing, Engineering and Media
Publication date: 01/07/2019
Field of study

Targeted attacks on digital infrastructures are a rising threat against the confidentiality, integrity, and availability of both IT systems and sensitive data. With the emergence of advanced persistent threats (APTs), identifying and understanding such attacks has become an increasingly difficult task. Current signature-based systems are heavily reliant on fixed patterns that struggle with unknown or evasive applications, while behavior-based solutions usually leave most of the interpretative work to a human analyst. This thesis presents a multi-stage system able to detect and classify anomalous behavior within a user session by observing and analyzing ubiquitous kernel processes. Application candidates suitable for monitoring are initially selected through an adapted sentiment mining process using a score based on the log likelihood ratio (LLR). For transparent anomaly detection within a corpus of associated events, the author utilizes star structures, a bipartite representation designed to approximate the edit distance between graphs. Templates describing nominal behavior are generated automatically and are used for the computation of both an anomaly score and a report containing all deviating events. The extracted anomalies are classified using the Random Forest (RF) and Support Vector Machine (SVM) algorithms. Ultimately, the newly labeled patterns are mapped to a dedicated APT attacker–defender model that considers objectives, actions, actors, as well as assets, thereby bridging the gap between attack indicators and detailed threat semantics. This enables both risk assessment and decision support for mitigating targeted attacks. Results show that the prototype system is capable of identifying 99.8% of all star structure anomalies as benign or malicious. In multi-class scenarios that seek to associate each anomaly with a distinct attack pattern belonging to a particular APT stage we achieve a solid accuracy of 95.7%. Furthermore, we demonstrate that 88.3% of observed attacks could be identified by analyzing and classifying a single ubiquitous Windows process for a mere 10 seconds, thereby eliminating the necessity to monitor each and every (unknown) application running on a system. With its semantic take on threat detection and classification, the proposed system offers a formal as well as technical solution to an information security challenge of great significance.The financial support by the Christian Doppler Research Association, the Austrian Federal Ministry for Digital and Economic Affairs, and the National Foundation for Research, Technology and Development is gratefully acknowledged

De Montfort University Open Research Archive

Recent Developments in Smart Healthcare

Author: Tie Qiu (Ed.)
Wenbing Zhao (Ed.)
Xiong Luo (Ed.)
Publication venue: MDPI - Multidisciplinary Digital Publishing Institute
Publication date: 01/01/2018
Field of study

Medicine is undergoing a sector-wide transformation thanks to the advances in computing and networking technologies. Healthcare is changing from reactive and hospital-centered to preventive and personalized, from disease focused to well-being centered. In essence, the healthcare systems, as well as fundamental medicine research, are becoming smarter. We anticipate significant improvements in areas ranging from molecular genomics and proteomics to decision support for healthcare professionals through big data analytics, to support behavior changes through technology-enabled self-management, and social and motivational support. Furthermore, with smart technologies, healthcare delivery could also be made more efficient, higher quality, and lower cost. In this special issue, we received a total 45 submissions and accepted 19 outstanding papers that roughly span across several interesting topics on smart healthcare, including public health, health information technology (Health IT), and smart medicine

Directory of Open Access Books (DOAB)

Sublinear Computation Paradigm

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/11/2021
Field of study

This open access book gives an overview of cutting-edge work on a new paradigm called the “sublinear computation paradigm,” which was proposed in the large multiyear academic research project “Foundations of Innovative Algorithms for Big Data.” That project ran from October 2014 to March 2020, in Japan. To handle the unprecedented explosion of big data sets in research, industry, and other areas of society, there is an urgent need to develop novel methods and approaches for big data analysis. To meet this need, innovative changes in algorithm theory for big data are being pursued. For example, polynomial-time algorithms have thus far been regarded as “fast,” but if a quadratic-time algorithm is applied to a petabyte-scale or larger big data set, problems are encountered in terms of computational resources or running time. To deal with this critical computational and algorithmic bottleneck, linear, sublinear, and constant time algorithms are required. The sublinear computation paradigm is proposed here in order to support innovation in the big data era. A foundation of innovative algorithms has been created by developing computational procedures, data structures, and modelling techniques for big data. The project is organized into three teams that focus on sublinear algorithms, sublinear data structures, and sublinear modelling. The work has provided high-level academic research results of strong computational and algorithmic interest, which are presented in this book. The book consists of five parts: Part I, which consists of a single chapter on the concept of the sublinear computation paradigm; Parts II, III, and IV review results on sublinear algorithms, sublinear data structures, and sublinear modelling, respectively; Part V presents application results. The information presented here will inspire the researchers who work in the field of modern algorithms

Directory of Open Access Books (DOAB)

Intelligent Biosignal Processing in Wearable and Implantable Sensors

Author
Publication venue: 'MDPI AG'
Publication date: 06/07/2022
Field of study

This reprint provides a collection of papers illustrating the state-of-the-art of smart processing of data coming from wearable, implantable or portable sensors. Each paper presents the design, databases used, methodological background, obtained results, and their interpretation for biomedical applications. Revealing examples are brain–machine interfaces for medical rehabilitation, the evaluation of sympathetic nerve activity, a novel automated diagnostic tool based on ECG data to diagnose COVID-19, machine learning-based hypertension risk assessment by means of photoplethysmography and electrocardiography signals, Parkinsonian gait assessment using machine learning tools, thorough analysis of compressive sensing of ECG signals, development of a nanotechnology application for decoding vagus-nerve activity, detection of liver dysfunction using a wearable electronic nose system, prosthetic hand control using surface electromyography, epileptic seizure detection using a CNN, and premature ventricular contraction detection using deep metric learning. Thus, this reprint presents significant clinical applications as well as valuable new research issues, providing current illustrations of this new field of research by addressing the promises, challenges, and hurdles associated with the synergy of biosignal processing and AI through 16 different pertinent studies. Covering a wide range of research and application areas, this book is an excellent resource for researchers, physicians, academics, and PhD or master students working on (bio)signal and image processing, AI, biomaterials, biomechanics, and biotechnology with applications in medicine

Directory of Open Access Books (DOAB)

Virtual Machine Lifecycle Management in Grid and Cloud Computing

Author: Schwarzkopf Roland
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2015
Field of study

Virtualisierungstechnologie ist die Grundlage für zwei wichtige Konzepte: Virtualized Grid Computing und Cloud Computing. Ersteres ist eine Erweiterung des klassischen Grid Computing. Es hat zum Ziel, die Anforderungen kommerzieller Nutzer des Grid hinsichtlich der Isolation von gleichzeitig ausgeführten Batch-Jobs und der Sicherheit der zugehörigen Daten zu erfüllen. Dabei werden Anwendungen in virtuellen Maschinen ausgeführt, um sie voneinander zu isolieren und die von ihnen verarbeiteten Daten vor anderen Nutzern zu schützen. Darüber hinaus löst Virtualized Grid Computing das Problem der Softwarebereitstellung, eines der bestehenden Probleme des klassischen Grid Computing. Cloud Computing ist ein weiteres Konzept zur Verwendung von entfernten Ressourcen. Der Fokus dieser Dissertation bezüglich Cloud Computing liegt auf dem “Infrastructure as a Service Modell”, das Ideen des (Virtualized) Grid Computing mit einem neuartigen Geschäftsmodell kombiniert. Dieses besteht aus der Bereitstellung von virtuellen Maschinen auf Abruf und aus einem Tarifmodell, bei dem lediglich die tatsächliche Nutzung berechnet wird. Der Einsatz von Virtualisierungstechnologie erhöht die Auslastung der verwendeten (physischen) Rechnersysteme und vereinfacht deren Administration. So ist es beispielsweise möglich, eine virtuelle Maschine zu klonen oder einen Snapshot einer virtuellen Maschine zu erstellen, um zu einem definierten Zustand zurückkehren zu können. Jedoch sind noch nicht alle Probleme im Zusammenhang mit der Virtualisierungstechnologie gelöst. Insbesondere entstehen durch den Einsatz in den sehr dynamischen Umgebungen des Virtualized Grid Computing und des Cloud Computing neue Herausforderungen für die Virtualisierungstechnologie. Diese Dissertation befasst sich mit verschiedenen Aspekten des Einsatzes von Virtualisierungstechnologie in Virtualized Grid und Cloud Computing Umgebungen. Zunächst wird der Lebenszyklus von virtuellen Maschinen in diesen Umgebungen untersucht, und es werden Modelle dieses Lebenszyklus entwickelt. Anhand der entwickelten Modelle werden Probleme identifiziert und Lösungen für diese Probleme entwickelt. Der Fokus liegt dabei auf den Bereichen Speicherung, Bereitstellung und Ausführung von virtuellen Maschinen. Virtuelle Maschinen werden üblicherweise in so genannten Disk Images, also Abbildern von virtuellen Festplatten, gespeichert. Dieses Format hat nicht nur Einfluss auf die Speicherung von größeren Mengen virtueller Maschinen, sondern auch auf deren Bereitstellung. In den untersuchten Umgebungen hat es zwei konkrete Nachteile: es verschwendet Speicherplatz und es verhindert eine effiziente Bereitstellung von virtuellen Maschinen. Maßnahmen zur Steigerung der Sicherheit von virtuellen Maschinen haben auf alle drei genannten Bereiche Einfluss. Beispielsweise sollte vor der Bereitstellung einer virtuellen Maschine geprüft werden, ob die darin installierte Software noch aktuell ist. Weiterhin sollte die Ausführungsumgebung Möglichkeiten bereitstellen, um die virtuelle Infrastruktur wirksam zu überwachen. Die erste in dieser Dissertation vorgestellte Lösung ist das Konzept der Image Composition. Es beschreibt die Komposition eines kombinierten Disk Images aus mehreren Schichten. Dadurch können Teile der einzelnen Schichten, die von mehreren virtuellen Maschinen verwendet werden, zwischen diesen geteilt und somit der Speicherbedarf für die Gesamtheit der virtuellen Maschinen reduziert werden. Der Marvin Image Compositor ist die Umsetzung dieses Konzepts. Die zweite Lösung ist der Marvin Image Store, ein Speichersystem für virtuelle Maschinen, das nicht auf den traditionell genutzten Disk Images basiert, sondern die darin enthaltenen Daten und Metadaten auf eine effiziente Weise getrennt voneinander speichert. Weiterhin werden vier Lösungen vorgestellt, die die Sicherheit von virtuellen Maschine verbessern können: Der Update Checker ist eine Lösung, die es ermöglicht, veraltete Software in virtuellen Maschinen zu identifizieren. Dabei spielt es keine Rolle, ob die jeweilige virtuelle Maschine gerade ausgeführt wird oder nicht. Die zweite Sicherheitslösung ermöglicht es, mehrere virtuelle Maschinen, die auf dem Konzept der Image Composition basieren, zentral zu aktualisieren. Das bedeutet, dass die einmalige Installation einer neuen Softwareversion ausreichend ist, um mehrere virtuelle Maschinen auf den neuesten Stand zu bringen. Die dritte Sicherheitslösung namens Online Penetration Suite ermöglicht es, virtuelle Maschinen automatisiert nach Schwachstellen zu durchsuchen. Die Überwachung der virtuellen Infrastruktur auf allen Ebenen ist der Zweck der vierten Sicherheitslösung. Zusätzlich zur Überwachung ermöglicht diese Lösung auch eine automatische Reaktion auf sicherheitsrelevante Ereignisse. Schließlich wird ein Verfahren zur Migration von virtuellen Maschinen vorgestellt, welches auch ohne ein zentrales Speichersystem eine effiziente Migration ermöglicht

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg