Search CORE

951 research outputs found

Deep Multi-view Models for Glitch Classification

Author: Bahaadini Sara
Coughlin Scott
Kalogera Vicky
Katsaggelos Aggelos K
Rohani Neda
Zevin Michael
Publication venue
Publication date: 28/04/2017
Field of study

Non-cosmic, non-Gaussian disturbances known as "glitches", show up in gravitational-wave data of the Advanced Laser Interferometer Gravitational-wave Observatory, or aLIGO. In this paper, we propose a deep multi-view convolutional neural network to classify glitches automatically. The primary purpose of classifying glitches is to understand their characteristics and origin, which facilitates their removal from the data or from the detector entirely. We visualize glitches as spectrograms and leverage the state-of-the-art image classification techniques in our model. The suggested classifier is a multi-view deep neural network that exploits four different views for classification. The experimental results demonstrate that the proposed model improves the overall accuracy of the classification compared to traditional single view algorithms.Comment: Accepted to the 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'17

arXiv.org e-Print Archive

Crossref

Class-Level Refactoring Prediction by Ensemble Learning with Various Feature Selection Techniques

Author: Kuanar Sanjay Kumar
Kumar Lov
Misra Sanjay
Panigrahi Rasmita
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Background: Refactoring is changing a software system without affecting the software functionality. The current researchers aim i to identify the appropriate method(s) or class(s) that needs to be refactored in object-oriented software. Ensemble learning helps to reduce prediction errors by amalgamating different classifiers and their respective performances over the original feature data. Other motives are added in this paper regarding several ensemble learners, errors, sampling techniques, and feature selection techniques for refactoring prediction at the class level. Objective: This work aims to develop an ensemble-based refactoring prediction model with structural identification of source code metrics using different feature selection techniques and data sampling techniques to distribute the data uniformly. Our model finds the best classifier after achieving fewer errors during refactoring prediction at the class level. Methodology: At first, our proposed model extracts a total of 125 software metrics computed from object-oriented software systems processed for a robust multi-phased feature selection method encompassing Wilcoxon significant text, Pearson correlation test, and principal component analysis (PCA). The proposed multi-phased feature selection method retains the optimal features characterizing inheritance, size, coupling, cohesion, and complexity. After obtaining the optimal set of software metrics, a novel heterogeneous ensemble classifier is developed using techniques such as ANN-Gradient Descent, ANN-Levenberg Marquardt, ANN-GDX, ANN-Radial Basis Function; support vector machine with different kernel functions such as LSSVM-Linear, LSSVM-Polynomial, LSSVM-RBF, Decision Tree algorithm, Logistic Regression algorithm and extreme learning machine (ELM) model are used as the base classifier. In our paper, we have calculated four different errors i.e., Mean Absolute Error (MAE), Mean magnitude of Relative Error (MORE), Root Mean Square Error (RMSE), and Standard Error of Mean (SEM). Result: In our proposed model, the maximum voting ensemble (MVE) achieves better accuracy, recall, precision, and F-measure values (99.76, 99.93, 98.96, 98.44) as compared to the base trained ensemble (BTE) and it experiences less errors (MAE = 0.0057, MORE = 0.0701, RMSE = 0.0068, and SEM = 0.0107) during its implementation to develop the refactoring model. Conclusions: Our experimental result recommends that MVE with upsampling can be implemented to improve the performance of the refactoring prediction model at the class level. Furthermore, the performance of our model with different data sampling techniques and feature selection techniques has been shown in the form boxplot diagram of accuracy, F-measure, precision, recall, and area under the curve (AUC) parameters.publishedVersio

Directory of Open Access Journals

HIØ Brage

Methods and Algorithms for Inference Problems in Population Genetics

Author: Pei Jingwen
Publication venue: OpenCommons@UConn
Publication date: 09/07/2018
Field of study

Inference of population history is a central problem of population genetics. The advent of large genetic data brings us not only opportunities on developing more accurate methods for inference problems, but also computational challenges. Thus, we aim at developing accurate method and fast algorithm for problems in population genetics. Inference of admixture proportions is a classical statistical problem. We particularly focus on the problem of ancestry inference for ancestors. Standard methods implicitly assume that both parents of an individual have the same admixture fraction. However, this is rarely the case in real data. We develop a Hidden Markov Model (HMM) framework for estimating the admixture proportions of the immediate ancestors of an individual, i.e. a type of appropriation of an individual\u27s admixture proportions into further subsets of ancestral proportions in the ancestors. Based on a genealogical model for admixture tracts, we develop an efficient algorithm for computing the sampling probability of the genome from a single individual, as a function of the admixture proportions of the ancestors of this individual. We show that the distribution and lengths of admixture tracts in a genome contain information about the admixture proportions of the ancestors of an individual. This allows us to perform probabilistic inference of admixture proportions of ancestors only using the genome of an extant individual. To better understand population, we further study the species delimitation problem. It is a problem of determining the boundary between population and species. We propose a classification-based method to assign a set of populations to a number of species. Our new method uses summary statistics generated from genetic data to classify pairwise populations as either \u27same species\u27 or \u27different species\u27. We show that machine learning can be used for species delimitation and scaled for large genomic data. It can also outperform Bayesian approaches, especially when gene flow involves in the evolutionary process

DigitalCommons@UConn

OpenCommons at University of Connecticut

Analysis of CT Brain Images using Radial Basis Function Neural Network

Author: Devadas T. Joshva
R. Ganesan
Publication venue: 'Defence Scientific Information and Documentation Centre'
Publication date: 01/07/2012
Field of study

Medical image processing and analysis is the tool to assist radiologists in the diagnosis process to obtain a moreaccurate and faster diagnosis. In this work, we have developed a neural network to classify the computer tomography(CT) brain tumor image for automatic diagnosis. This system is divided into four steps namely enhancement, segmentation, feature extraction and classification. In the first phase, an edge-based selective median filter is usedto improve the visibility of the loss of the gray-white matter interface in CT brain tumor images. Second phaseuses a modified version of shift genetic algorithm for the segmentation. Next phase extracts the textural featuresusing statistical texture analysis method. These features are fed into classifiers like BPN, Fuzzy k-NN, and radialbasis function network. The performances of these classifiers are analyzed in the final phase with receiver operating characteristic and precision-recall curve. The result shows that the CAD system is only to develop the tool for braintumor and proposed method is very accurate and computationally more efficient and less time consuming.Defence Science Journal, 2012, 62(4), pp.212-218, DOI:http://dx.doi.org/10.14429/dsj.62.183

Directory of Open Access Journals

Defence Science Journal

Genetic algorithms for hyperparameter optimization in predictive business process monitoring

Author: Di Francescomarino Chiara
Dumas Marlon
Federici Marco
Ghidini Chiara
Maggi Fabrizio Maria
Rizzi Williams
Simonetto Luca
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Predictive business process monitoring exploits event logs to predict how ongoing (uncompleted) traces will unfold up to their completion. A predictive process monitoring framework collects a range of techniques that allow users to get accurate predictions about the achievement of a goal for a given ongoing trace. These techniques can be combined and their parameters configured in different framework instances. Unfortunately, a unique framework instance that is general enough to outperform others for every dataset, goal or type of prediction is elusive. Thus, the selection and configuration of a framework instance needs to be done for a given dataset. This paper presents a predictive process monitoring framework armed with a hyperparameter optimization method to select a suitable framework instance for a given dataset

Archivio della ricerca - Fondazione Bruno Kessler

Advances in Evolutionary Algorithms

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

With the recent trends towards massive data sets and significant computational power, combined with evolutionary algorithmic advances evolutionary computation is becoming much more relevant to practice. Aim of the book is to present recent improvements, innovative ideas and concepts in a part of a huge EA field

Directory of Open Access Books (DOAB)

Reconstrução e classificação de sequências de ADN desconhecidas

Author: Lourenço Alexandre Emanuel Monteiro
Publication venue
Publication date: 15/12/2021
Field of study

The continuous advances in DNA sequencing technologies and techniques in metagenomics require reliable reconstruction and accurate classification methodologies for the diversity increase of the natural repository while contributing to the organisms' description and organization. However, after sequencing and de-novo assembly, one of the highest complex challenges comes from the DNA sequences that do not match or resemble any biological sequence from the literature. Three main reasons contribute to this exception: the organism sequence presents high divergence according to the known organisms from the literature, an irregularity has been created in the reconstruction process, or a new organism has been sequenced. The inability to efficiently classify these unknown sequences increases the sample constitution's uncertainty and becomes a wasted opportunity to discover new species since they are often discarded. In this context, the main objective of this thesis is the development and validation of a tool that provides an efficient computational solution to solve these three challenges based on an ensemble of experts, namely compression-based predictors, the distribution of sequence content, and normalized sequence lengths. The method uses both DNA and amino acid sequences and provides efficient classification beyond standard referential comparisons. Unusually, it classifies DNA sequences without resorting directly to the reference genomes but rather to features that the species biological sequences share. Specifically, it only makes use of features extracted individually from each genome without using sequence comparisons. RFSC was then created as a machine learning classification pipeline that relies on an ensemble of experts to provide efficient classification in metagenomic contexts. This pipeline was tested in synthetic and real data, both achieving precise and accurate results that, at the time of the development of this thesis, have not been reported in the state-of-the-art. Specifically, it has achieved an accuracy of approximately 97% in the domain/type classification.Os contínuos avanços em tecnologias de sequenciação de ADN e técnicas em meta genómica requerem metodologias de reconstrução confiáveis e de classificação precisas para o aumento da diversidade do repositório natural, contribuindo, entretanto, para a descrição e organização dos organismos. No entanto, após a sequenciação e a montagem de-novo, um dos desafios mais complexos advém das sequências de ADN que não correspondem ou se assemelham a qualquer sequencia biológica da literatura. São três as principais razões que contribuem para essa exceção: uma irregularidade emergiu no processo de reconstrução, a sequência do organismo é altamente dissimilar dos organismos da literatura, ou um novo e diferente organismo foi reconstruído. A incapacidade de classificar com eficiência essas sequências desconhecidas aumenta a incerteza da constituição da amostra e desperdiça a oportunidade de descobrir novas espécies, uma vez que muitas vezes são descartadas. Neste contexto, o principal objetivo desta tese é fornecer uma solução computacional eficiente para resolver este desafio com base em um conjunto de especialistas, nomeadamente preditores baseados em compressão, a distribuição de conteúdo de sequência e comprimentos de sequência normalizados. O método usa sequências de ADN e de aminoácidos e fornece classificação eficiente além das comparações referenciais padrão. Excecionalmente, ele classifica as sequências de ADN sem recorrer diretamente a genomas de referência, mas sim às características que as sequências biológicas da espécie compartilham. Especificamente, ele usa apenas recursos extraídos individualmente de cada genoma sem usar comparações de sequência. Além disso, o pipeline é totalmente automático e permite a reconstrução sem referência de genomas a partir de reads FASTQ com a garantia adicional de armazenamento seguro de informações sensíveis. O RFSC é então um pipeline de classificação de aprendizagem automática que se baseia em um conjunto de especialistas para fornecer classificação eficiente em contextos meta genómicos. Este pipeline foi aplicado em dados sintéticos e reais, alcançando em ambos resultados precisos e exatos que, no momento do desenvolvimento desta dissertação, não foram relatados na literatura. Especificamente, esta ferramenta desenvolvida, alcançou uma precisão de aproximadamente 97% na classificação de domínio/tipo.Mestrado em Engenharia de Computadores e Telemátic

Repositório Institucional da Universidade de Aveiro

Medical Informatics and Data Analysis

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

During recent years, the use of advanced data analysis methods has increased in clinical and epidemiological research. This book emphasizes the practical aspects of new data analysis methods, and provides insight into new challenges in biostatistics, epidemiology, health sciences, dentistry, and clinical medicine. This book provides a readable text, giving advice on the reporting of new data analytical methods and data presentation. The book consists of 13 articles. Each article is self-contained and may be read independently according to the needs of the reader. The book is essential reading for postgraduate students as well as researchers from medicine and other sciences where statistical data analysis plays a central role

Directory of Open Access Books (DOAB)

Recommended from our members

Topics in Signal Processing: applications in genomics and genetics

Author: Elmas Abdulkadir
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

The information in genomic or genetic data is influenced by various complex processes and appropriate mathematical modeling is required for studying the underlying processes and the data. This dissertation focuses on the formulation of mathematical models for certain problems in genomics and genetics studies and the development of algorithms for proposing efficient solutions. A Bayesian approach for the transcription factor (TF) motif discovery is examined and the extensions are proposed to deal with many interdependent parameters of the TF-DNA binding. The problem is described by statistical terms and a sequential Monte Carlo sampling method is employed for the estimation of unknown parameters. In particular, a class-based resampling approach is applied for the accurate estimation of a set of intrinsic properties of the DNA binding sites. Through statistical analysis of the gene expressions, a motif-based computational approach is developed for the inference of novel regulatory networks in a given bacterial genome. To deal with high false-discovery rates in the genome-wide TF binding predictions, the discriminative learning approaches are examined in the context of sequence classification, and a novel mathematical model is introduced to the family of kernel-based Support Vector Machines classifiers. Furthermore, the problem of haplotype phasing is examined based on the genetic data obtained from cost-effective genotyping technologies. Based on the identification and augmentation of a small and relatively more informative genotype set, a sparse dictionary selection algorithm is developed to infer the haplotype pairs for the sampled population. In a relevant context, to detect redundant information in the single nucleotide polymorphism (SNP) sites, the problem of representative (tag) SNP selection is introduced. An information theoretic heuristic is designed for the accurate selection of tag SNPs that capture the genetic diversity in a large sample set from multiple populations. The method is based on a multi-locus mutual information measure, reflecting a biological principle in the population genetics that is linkage disequilibrium

Columbia University Academic Commons