730 research outputs found
Molecular Signature as Optima of Multi-Objective Function with Applications to Prediction in Oncogenomics
Náplní této práce je teoretický úvod a následné praktické zpracování tématu Molekulární signatura jako optimální multi-objektivní funkce s aplikací v predikci v onkogenomice. Úvodní kapitoly jsou zaměřeny na téma rakovina, zejména pak rakovina prsu a její podtyp triple negativní rakovinu prsu. Následuje literární přehled z oblasti optimalizačních metod, zejména se zaměřením na metaheuristické metody a problematiku strojového učení. Část se odkazuje na onkogenomiku a principy microarray a také na statistiku a s důrazem na výpočet p-hodnoty a bimodálního indexu. Praktická část je pak zaměřena na konkrétní průběh výzkumu a nalezené závěry, vedoucí k dalším krokům výzkumu. Implementace vybraných metod byla provedena v programech Matlab a R, s využitím dalších programovacích jazyků a to konkrétně programů Java a Python.Content of this work is theoretical introduction and follow-up practical processing of topic Molecular signature as optima of multi-objective function with applications to prediction in oncogenomics. Opening chapters are targeted on topic of cancer, mainly on breast cancer and its subtype Triple Negative Breast Cancer. Succeeds the literature review of optimization methods, mainly on meta-heuristic methods for multi-objective optimization and problematic of machine learning. Part is focused on the oncogenomics and on the principal of microarray and also to statistics methods with emphasis on the calculation of p-value and Bimodality Index. Practical part of work consists from concrete research and conclusions lead to next steps of research. Implementation of selected methods was realised in Matlab and R, with use of other programming languages Java and Python.
A Perspective on Future Research Directions in Information Theory
Information theory is rapidly approaching its 70th birthday. What are
promising future directions for research in information theory? Where will
information theory be having the most impact in 10-20 years? What new and
emerging areas are ripe for the most impact, of the sort that information
theory has had on the telecommunications industry over the last 60 years? How
should the IEEE Information Theory Society promote high-risk new research
directions and broaden the reach of information theory, while continuing to be
true to its ideals and insisting on the intellectual rigor that makes its
breakthroughs so powerful? These are some of the questions that an ad hoc
committee (composed of the present authors) explored over the past two years.
We have discussed and debated these questions, and solicited detailed inputs
from experts in fields including genomics, biology, economics, and
neuroscience. This report is the result of these discussions
Modern Computing Techniques for Solving Genomic Problems
With the advent of high-throughput genomics, biological big data brings challenges to scientists in handling, analyzing, processing and mining this massive data. In this new interdisciplinary field, diverse theories, methods, tools and knowledge are utilized to solve a wide variety of problems. As an exploration, this dissertation project is designed to combine concepts and principles in multiple areas, including signal processing, information-coding theory, artificial intelligence and cloud computing, in order to solve the following problems in computational biology: (1) comparative gene structure detection, (2) DNA sequence annotation, (3) investigation of CpG islands (CGIs) for epigenetic studies. Briefly, in problem #1, sequences are transformed into signal series or binary codes. Similar to the speech/voice recognition, similarity is calculated between two signal series and subsequently signals are stitched/matched into a temporal sequence. In the nature of binary operation, all calculations/steps can be performed in an efficient and accurate way. Improving performance in terms of accuracy and specificity is the key for a comparative method. In problem #2, DNA sequences are encoded and transformed into numeric representations for deep learning methods. Encoding schemes greatly influence the performance of deep learning algorithms. Finding the best encoding scheme for a particular application of deep learning is significant. Three applications (detection of protein-coding splicing sites, detection of lincRNA splicing sites and improvement of comparative gene structure identification) are used to show the computing power of deep neural networks. In problem #3, CpG sites are assigned certain energy and a Gaussian filter is applied to detection of CpG islands. By using the CpG box and Markov model, we investigate the properties of CGIs and redefine the CGIs using the emerging epigenetic data. In summary, these three problems and their solutions are not isolated; they are linked to modern techniques in such diverse areas as signal processing, information-coding theory, artificial intelligence and cloud computing. These novel methods are expected to improve the efficiency and accuracy of computational tools and bridge the gap between biology and scientific computing
Feature Selection and Classifier Development for Radio Frequency Device Identification
The proliferation of simple and low-cost devices, such as IEEE 802.15.4 ZigBee and Z-Wave, in Critical Infrastructure (CI) increases security concerns. Radio Frequency Distinct Native Attribute (RF-DNA) Fingerprinting facilitates biometric-like identification of electronic devices emissions from variances in device hardware. Developing reliable classifier models using RF-DNA fingerprints is thus important for device discrimination to enable reliable Device Classification (a one-to-many looks most like assessment) and Device ID Verification (a one-to-one looks how much like assessment). AFITs prior RF-DNA work focused on Multiple Discriminant Analysis/Maximum Likelihood (MDA/ML) and Generalized Relevance Learning Vector Quantized Improved (GRLVQI) classifiers. This work 1) introduces a new GRLVQI-Distance (GRLVQI-D) classifier that extends prior GRLVQI work by supporting alternative distance measures, 2) formalizes a framework for selecting competing distance measures for GRLVQI-D, 3) introducing response surface methods for optimizing GRLVQI and GRLVQI-D algorithm settings, 4) develops an MDA-based Loadings Fusion (MLF) Dimensional Reduction Analysis (DRA) method for improved classifier-based feature selection, 5) introduces the F-test as a DRA method for RF-DNA fingerprints, 6) provides a phenomenological understanding of test statistics and p-values, with KS-test and F-test statistic values being superior to p-values for DRA, and 7) introduces quantitative dimensionality assessment methods for DRA subset selection
Hidden Markov Models
Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research
Reconhecimento de padrões baseado em compressão: um exemplo de biometria utilizando ECG
The amount of data being collected by sensors and smart devices that
people use on their daily lives has been increasing at higher rates than
ever before. That enables the possibility of using biomedical signals in
several applications, with the aid of pattern recognition algorithms in several
applications. In this thesis we investigate the usage of compression based
methods to perform classification using one-dimensional signals. In order to
test those methods, we use as testbed example, electrocardiographic (ECG)
signals and the task biometric identification.
First and foremost, we introduce the notion of Kolmogorov complexity
and how it relates with compression methods. Then, we explain how
can these methods be useful for pattern recognition, by exploring different
compression-based measures, namely, the Normalized Relative Compression,
a measure based on the relative similarity between strings. For this purpose,
we present finite-context models and explain the theory behind a generalized
version of those models, called the extended-alphabet finite-context models,
a novel contribution.
Since the testbed application for the methods presented in the thesis is
based on ECG signals, we explain what constitutes such a signal and the
methods that should be used before data compresison can be applied to
them, such as filtering and quantization.
Finally, we explore the application of biometric identification using the ECG
signal into more depth, making some tests regarding the acquisition of
signals and benchmark different proposals based on compresison methods,
namely, non-fiducial ones. We also highlight the advantages of such an
alternative approach to machine learning methods, namely, low computational
costs and not requiring any kind of feature extraction, making this
approach easily transferable into different applications and signals.A quantidade de dados recolhidos por sensores e dispositivos inteligentes
que as pessoas utilizam no seu dia a dia tem aumentado a taxas mais
elevadas do que nunca. Isso possibilita a utilização de sinais biomédicos
em diversas aplicações práticas, com o auxílio de algoritmos de reconhecimento
de padrões. Nesta tese, investigamos o uso de métodos baseados
em compressão para realizar classificação de sinais unidimensionais. Para
testar esses métodos, utilizamos, como aplicação de exemplo, o problema
de identificação biométrica através de sinais eletrocardiográficos (ECG).
Em primeiro lugar, introduzimos a noção de complexidade de Kolmogorov
e a forma como a mesma se relaciona com os métodos de compressão. De
seguida, explicamos como esses métodos são úteis para reconhecimento de
padrões, explorando diferentes medidas baseadas em compressão, nomeadamente,
a compressão relativa normalizada (NRC), uma medida baseada
na similaridade relativa entre strings. Para isso, apresentamos os modelos
de contexto finito e explicaremos a teoria por detrás de uma versão generalizada
desses modelos, chamados de modelos de contexto finito de alfabeto
estendido (xaFCM), uma nova contribuição.
Uma vez que a aplicação de exemplo para os métodos apresentados na tese
é baseada em sinais de ECG, explicamos também o que constitui tal sinal
e os métodos que devem ser utilizados antes que a compressão de dados
possa ser aplicada aos mesmos, tais como filtragem e quantização.
Por fim, exploramos com maior profundidade a aplicação da identificação
biométrica utilizando o sinal de ECG, realizando alguns testes relativos à
aquisição de sinais e comparando diferentes propostas baseadas em métodos
de compressão, nomeadamente os não fiduciais. Destacamos também as
vantagens de tal abordagem, alternativa aos métodos de aprendizagem computacional, nomeadamente, baixo custo computacional bem como não exigir tipo de extração de atributos, tornando esta abordagem mais facilmente
transponível para diferentes aplicações e sinais.Programa Doutoral em Informátic
Principal Graph and Structure Learning Based on Reversed Graph Embedding
© 2017 IEEE. Many scientific datasets are of high dimension, and the analysis usually requires retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are mathematically formulated by curves, which is quite restrictive for real applications. A few methods can overcome the above problem, but they either require complicated human-made rules for a specific task with lack of adaption flexibility to different tasks, or cannot obtain explicit structures of data. To address these issues, we develop a novel principal graph and structure learning framework that captures the local information of the underlying graph structure based on reversed graph embedding. As showcases, models that can learn a spanning tree or a weighted undirected ℓ1 graph are proposed, and a new learning algorithm is developed that learns a set of principal points and a graph structure from data, simultaneously. The new algorithm is simple with guaranteed convergence. We then extend the proposed framework to deal with large-scale data. Experimental results on various synthetic and six real world datasets show that the proposed method compares favorably with baselines and can uncover the underlying structure correctly
Advanced Imaging Analysis for Predicting Tumor Response and Improving Contour Delineation Uncertainty
ADVANCED IMAGING ANALYSIS FOR PREDICTING TUMOR RESPONSE AND IMPROVING CONTOUR DELINEATION UNCERTAINTY
By Rebecca Nichole Mahon, MS
A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University.
Virginia Commonwealth University, 2018
Major Director: Dr. Elisabeth Weiss,
Professor,
Department of Radiation Oncology
Radiomics, an advanced form of imaging analysis, is a growing field of interest in medicine. Radiomics seeks to extract quantitative information from images through use of computer vision techniques to assist in improving treatment. Early prediction of treatment response is one way of improving overall patient care. This work seeks to explore the feasibility of building predictive models from radiomic texture features extracted from magnetic resonance (MR) and computed tomography (CT) images of lung cancer patients. First, repeatable primary tumor texture features from each imaging modality were identified to ensure a sufficient number of repeatable features existed for model development. Then a workflow was developed to build models to predict overall survival and local control using single modality and multi-modality radiomics features. The workflow was also applied to normal tissue contours as a control study. Multiple significant models were identified for the single modality MR- and CT-based models, while the multi-modality models were promising indicating exploration with a larger cohort is warranted.
Another way advances in imaging analysis can be leveraged is in improving accuracy of contours. Unfortunately, the tumor can be close in appearance to normal tissue on medical images creating high uncertainty in the tumor boundary. As the entire defined target is treated, providing physicians with additional information when delineating the target volume can improve the accuracy of the contour and potentially reduce the amount of normal tissue incorporated into the contour. Convolution neural networks were developed and trained to identify the tumor interface with normal tissue and for one network to identify the tumor location. A mock tool was presented using the output of the network to provide the physician with the uncertainty in prediction of the interface type and the probability of the contour delineation uncertainty exceeding 5mm for the top three predictions
- …