730 research outputs found

    Molecular Signature as Optima of Multi-Objective Function with Applications to Prediction in Oncogenomics

    Get PDF
    Náplní této práce je teoretický úvod a následné praktické zpracování tématu Molekulární signatura jako optimální multi-objektivní funkce s aplikací v predikci v onkogenomice. Úvodní kapitoly jsou zaměřeny na téma rakovina, zejména pak rakovina prsu a její podtyp triple negativní rakovinu prsu. Následuje literární přehled z oblasti optimalizačních metod, zejména se zaměřením na metaheuristické metody a problematiku strojového učení. Část se odkazuje na onkogenomiku a principy microarray a také na statistiku a s důrazem na výpočet p-hodnoty a bimodálního indexu. Praktická část je pak zaměřena na konkrétní průběh výzkumu a nalezené závěry, vedoucí k dalším krokům výzkumu. Implementace vybraných metod byla provedena v programech Matlab a R, s využitím dalších programovacích jazyků a to konkrétně programů Java a Python.Content of this work is theoretical introduction and follow-up practical processing of topic Molecular signature as optima of multi-objective function with applications to prediction in oncogenomics. Opening chapters are targeted on topic of cancer, mainly on breast cancer and its subtype Triple Negative Breast Cancer. Succeeds the literature review of optimization methods, mainly on meta-heuristic methods for multi-objective optimization and problematic of machine learning. Part is focused on the oncogenomics and on the principal of microarray and also to statistics methods with emphasis on the calculation of p-value and Bimodality Index. Practical part of work consists from concrete research and conclusions lead to next steps of research. Implementation of selected methods was realised in Matlab and R, with use of other programming languages Java and Python.

    A Perspective on Future Research Directions in Information Theory

    Get PDF
    Information theory is rapidly approaching its 70th birthday. What are promising future directions for research in information theory? Where will information theory be having the most impact in 10-20 years? What new and emerging areas are ripe for the most impact, of the sort that information theory has had on the telecommunications industry over the last 60 years? How should the IEEE Information Theory Society promote high-risk new research directions and broaden the reach of information theory, while continuing to be true to its ideals and insisting on the intellectual rigor that makes its breakthroughs so powerful? These are some of the questions that an ad hoc committee (composed of the present authors) explored over the past two years. We have discussed and debated these questions, and solicited detailed inputs from experts in fields including genomics, biology, economics, and neuroscience. This report is the result of these discussions

    Modern Computing Techniques for Solving Genomic Problems

    Get PDF
    With the advent of high-throughput genomics, biological big data brings challenges to scientists in handling, analyzing, processing and mining this massive data. In this new interdisciplinary field, diverse theories, methods, tools and knowledge are utilized to solve a wide variety of problems. As an exploration, this dissertation project is designed to combine concepts and principles in multiple areas, including signal processing, information-coding theory, artificial intelligence and cloud computing, in order to solve the following problems in computational biology: (1) comparative gene structure detection, (2) DNA sequence annotation, (3) investigation of CpG islands (CGIs) for epigenetic studies. Briefly, in problem #1, sequences are transformed into signal series or binary codes. Similar to the speech/voice recognition, similarity is calculated between two signal series and subsequently signals are stitched/matched into a temporal sequence. In the nature of binary operation, all calculations/steps can be performed in an efficient and accurate way. Improving performance in terms of accuracy and specificity is the key for a comparative method. In problem #2, DNA sequences are encoded and transformed into numeric representations for deep learning methods. Encoding schemes greatly influence the performance of deep learning algorithms. Finding the best encoding scheme for a particular application of deep learning is significant. Three applications (detection of protein-coding splicing sites, detection of lincRNA splicing sites and improvement of comparative gene structure identification) are used to show the computing power of deep neural networks. In problem #3, CpG sites are assigned certain energy and a Gaussian filter is applied to detection of CpG islands. By using the CpG box and Markov model, we investigate the properties of CGIs and redefine the CGIs using the emerging epigenetic data. In summary, these three problems and their solutions are not isolated; they are linked to modern techniques in such diverse areas as signal processing, information-coding theory, artificial intelligence and cloud computing. These novel methods are expected to improve the efficiency and accuracy of computational tools and bridge the gap between biology and scientific computing

    High-throughput DNA sequence data compression

    Get PDF

    Feature Selection and Classifier Development for Radio Frequency Device Identification

    Get PDF
    The proliferation of simple and low-cost devices, such as IEEE 802.15.4 ZigBee and Z-Wave, in Critical Infrastructure (CI) increases security concerns. Radio Frequency Distinct Native Attribute (RF-DNA) Fingerprinting facilitates biometric-like identification of electronic devices emissions from variances in device hardware. Developing reliable classifier models using RF-DNA fingerprints is thus important for device discrimination to enable reliable Device Classification (a one-to-many looks most like assessment) and Device ID Verification (a one-to-one looks how much like assessment). AFITs prior RF-DNA work focused on Multiple Discriminant Analysis/Maximum Likelihood (MDA/ML) and Generalized Relevance Learning Vector Quantized Improved (GRLVQI) classifiers. This work 1) introduces a new GRLVQI-Distance (GRLVQI-D) classifier that extends prior GRLVQI work by supporting alternative distance measures, 2) formalizes a framework for selecting competing distance measures for GRLVQI-D, 3) introducing response surface methods for optimizing GRLVQI and GRLVQI-D algorithm settings, 4) develops an MDA-based Loadings Fusion (MLF) Dimensional Reduction Analysis (DRA) method for improved classifier-based feature selection, 5) introduces the F-test as a DRA method for RF-DNA fingerprints, 6) provides a phenomenological understanding of test statistics and p-values, with KS-test and F-test statistic values being superior to p-values for DRA, and 7) introduces quantitative dimensionality assessment methods for DRA subset selection

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    Reconhecimento de padrões baseado em compressão: um exemplo de biometria utilizando ECG

    Get PDF
    The amount of data being collected by sensors and smart devices that people use on their daily lives has been increasing at higher rates than ever before. That enables the possibility of using biomedical signals in several applications, with the aid of pattern recognition algorithms in several applications. In this thesis we investigate the usage of compression based methods to perform classification using one-dimensional signals. In order to test those methods, we use as testbed example, electrocardiographic (ECG) signals and the task biometric identification. First and foremost, we introduce the notion of Kolmogorov complexity and how it relates with compression methods. Then, we explain how can these methods be useful for pattern recognition, by exploring different compression-based measures, namely, the Normalized Relative Compression, a measure based on the relative similarity between strings. For this purpose, we present finite-context models and explain the theory behind a generalized version of those models, called the extended-alphabet finite-context models, a novel contribution. Since the testbed application for the methods presented in the thesis is based on ECG signals, we explain what constitutes such a signal and the methods that should be used before data compresison can be applied to them, such as filtering and quantization. Finally, we explore the application of biometric identification using the ECG signal into more depth, making some tests regarding the acquisition of signals and benchmark different proposals based on compresison methods, namely, non-fiducial ones. We also highlight the advantages of such an alternative approach to machine learning methods, namely, low computational costs and not requiring any kind of feature extraction, making this approach easily transferable into different applications and signals.A quantidade de dados recolhidos por sensores e dispositivos inteligentes que as pessoas utilizam no seu dia a dia tem aumentado a taxas mais elevadas do que nunca. Isso possibilita a utilização de sinais biomédicos em diversas aplicações práticas, com o auxílio de algoritmos de reconhecimento de padrões. Nesta tese, investigamos o uso de métodos baseados em compressão para realizar classificação de sinais unidimensionais. Para testar esses métodos, utilizamos, como aplicação de exemplo, o problema de identificação biométrica através de sinais eletrocardiográficos (ECG). Em primeiro lugar, introduzimos a noção de complexidade de Kolmogorov e a forma como a mesma se relaciona com os métodos de compressão. De seguida, explicamos como esses métodos são úteis para reconhecimento de padrões, explorando diferentes medidas baseadas em compressão, nomeadamente, a compressão relativa normalizada (NRC), uma medida baseada na similaridade relativa entre strings. Para isso, apresentamos os modelos de contexto finito e explicaremos a teoria por detrás de uma versão generalizada desses modelos, chamados de modelos de contexto finito de alfabeto estendido (xaFCM), uma nova contribuição. Uma vez que a aplicação de exemplo para os métodos apresentados na tese é baseada em sinais de ECG, explicamos também o que constitui tal sinal e os métodos que devem ser utilizados antes que a compressão de dados possa ser aplicada aos mesmos, tais como filtragem e quantização. Por fim, exploramos com maior profundidade a aplicação da identificação biométrica utilizando o sinal de ECG, realizando alguns testes relativos à aquisição de sinais e comparando diferentes propostas baseadas em métodos de compressão, nomeadamente os não fiduciais. Destacamos também as vantagens de tal abordagem, alternativa aos métodos de aprendizagem computacional, nomeadamente, baixo custo computacional bem como não exigir tipo de extração de atributos, tornando esta abordagem mais facilmente transponível para diferentes aplicações e sinais.Programa Doutoral em Informátic

    Principal Graph and Structure Learning Based on Reversed Graph Embedding

    Full text link
    © 2017 IEEE. Many scientific datasets are of high dimension, and the analysis usually requires retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are mathematically formulated by curves, which is quite restrictive for real applications. A few methods can overcome the above problem, but they either require complicated human-made rules for a specific task with lack of adaption flexibility to different tasks, or cannot obtain explicit structures of data. To address these issues, we develop a novel principal graph and structure learning framework that captures the local information of the underlying graph structure based on reversed graph embedding. As showcases, models that can learn a spanning tree or a weighted undirected ℓ1 graph are proposed, and a new learning algorithm is developed that learns a set of principal points and a graph structure from data, simultaneously. The new algorithm is simple with guaranteed convergence. We then extend the proposed framework to deal with large-scale data. Experimental results on various synthetic and six real world datasets show that the proposed method compares favorably with baselines and can uncover the underlying structure correctly

    Advanced Imaging Analysis for Predicting Tumor Response and Improving Contour Delineation Uncertainty

    Get PDF
    ADVANCED IMAGING ANALYSIS FOR PREDICTING TUMOR RESPONSE AND IMPROVING CONTOUR DELINEATION UNCERTAINTY By Rebecca Nichole Mahon, MS A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University. Virginia Commonwealth University, 2018 Major Director: Dr. Elisabeth Weiss, Professor, Department of Radiation Oncology Radiomics, an advanced form of imaging analysis, is a growing field of interest in medicine. Radiomics seeks to extract quantitative information from images through use of computer vision techniques to assist in improving treatment. Early prediction of treatment response is one way of improving overall patient care. This work seeks to explore the feasibility of building predictive models from radiomic texture features extracted from magnetic resonance (MR) and computed tomography (CT) images of lung cancer patients. First, repeatable primary tumor texture features from each imaging modality were identified to ensure a sufficient number of repeatable features existed for model development. Then a workflow was developed to build models to predict overall survival and local control using single modality and multi-modality radiomics features. The workflow was also applied to normal tissue contours as a control study. Multiple significant models were identified for the single modality MR- and CT-based models, while the multi-modality models were promising indicating exploration with a larger cohort is warranted. Another way advances in imaging analysis can be leveraged is in improving accuracy of contours. Unfortunately, the tumor can be close in appearance to normal tissue on medical images creating high uncertainty in the tumor boundary. As the entire defined target is treated, providing physicians with additional information when delineating the target volume can improve the accuracy of the contour and potentially reduce the amount of normal tissue incorporated into the contour. Convolution neural networks were developed and trained to identify the tumor interface with normal tissue and for one network to identify the tumor location. A mock tool was presented using the output of the network to provide the physician with the uncertainty in prediction of the interface type and the probability of the contour delineation uncertainty exceeding 5mm for the top three predictions
    corecore