29 research outputs found

    Approximation by finite mixtures of continuous density functions that vanish at infinity

    Full text link
    Given sufficiently many components, it is often cited that finite mixture models can approximate any other probability density function (pdf) to an arbitrary degree of accuracy. Unfortunately, the nature of this approximation result is often left unclear. We prove that finite mixture models constructed from pdfs in C0\mathcal{C}_{0} can be used to conduct approximation of various classes of approximands in a number of different modes. That is, we prove approximands in C0\mathcal{C}_{0} can be uniformly approximated, approximands in Cb\mathcal{C}_{b} can be uniformly approximated on compact sets, and approximands in Lp\mathcal{L}_{p} can be approximated with respect to the Lp\mathcal{L}_{p}, for p[1,)p\in\left[1,\infty\right). Furthermore, we also prove that measurable functions can be approximated, almost everywhere

    Computing and counting longest paths on circular-arc graphs in polynomial time.

    Get PDF
    The longest path problem asks for a path with the largest number of vertices in a given graph. The first polynomial time algorithm (with running time O(n4)) has been recently developed for interval graphs. Even though interval and circular-arc graphs look superficially similar, they differ substantially, as circular-arc graphs are not perfect. In this paper, we prove that for every path P of a circular-arc graph G, we can appropriately “cut” the circle, such that the obtained (not induced) interval subgraph G′ of G admits a path P′ on the same vertices as P. This non-trivial result is of independent interest, as it suggests a generic reduction of a number of path problems on circular-arc graphs to the case of interval graphs with a multiplicative linear time overhead of O(n). As an application of this reduction, we present the first polynomial algorithm for the longest path problem on circular-arc graphs, which turns out to have the same running time O(n4) with the one on interval graphs, as we manage to get rid of the linear overhead of the reduction. This algorithm computes in the same time an n-approximation of the number of different vertex sets that provide a longest path; in the case where G is an interval graph, we compute the exact number. Moreover, our algorithm can be directly extended with the same running time to the case where every vertex has an arbitrary positive weight

    Pathway and network analysis in proteomics

    Get PDF
    Proteomics is inherently a systems science that studies not only measured protein and their expressions in a cell, but also the interplay of proteins, protein complexes, signaling pathways, and network modules. There is a rapid accumulation of Proteomics data in recent years. However, Proteomics data are highly variable, with results sensitive to data preparation methods, sample condition, instrument types, and analytical methods. To address the challenge in Proteomics data analysis, we review current tools being developed to incorporate biological function and network topological information. We categorize these tools into four types: tools with basic functional information and little topological features (e.g., GO category analysis), tools with rich functional information and little topological features (e.g., GSEA), tools with basic functional information and rich topological features (e.g., Cytoscape), and tools with rich functional information and rich topological features (e.g., PathwayExpress). We first review the potential application of these tools to Proteomics; then we review tools that can achieve automated learning of pathway modules and features, and tools that help perform integrated network visual analytics

    Agreement among human and annotated transcriptions of global songs

    Get PDF
    Cross-cultural musical analysis requires standardized symbolic representation of sounds such as score notation. However, transcription into notation is usually conducted manually by ear, which is time-consuming and subjective. Our aim is to evaluate the reliability of existing methods for transcribing songs from diverse societies. We had 3 experts independently transcribe a sample of 32 excerpts of traditional monophonic songs from around the world (half a cappella, half with instrumental accompaniment). 16 songs also had pre-existing transcriptions created by 3 different experts. We compared these human transcriptions against one another and against 10 automatic music transcription algorithms. We found that human transcriptions can be sufficiently reliable (~90% agreement, κ ~.7), but current automated methods are not (<60% agreement, κ <.4). No automated method clearly outperformed others, in contrast to our predictions. These results suggest that improving automated methods for cross-cultural music transcription is critical for diversifying MIR

    Initial characterization of the human central proteome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>On the basis of large proteomics datasets measured from seven human cell lines we consider their intersection as an approximation of the human central proteome, which is the set of proteins ubiquitously expressed in all human cells. Composition and properties of the central proteome are investigated through bioinformatics analyses.</p> <p>Results</p> <p>We experimentally identify a central proteome comprising 1,124 proteins that are ubiquitously and abundantly expressed in human cells using state of the art mass spectrometry and protein identification bioinformatics. The main represented functions are proteostasis, primary metabolism and proliferation. We further characterize the central proteome considering gene structures, conservation, interaction networks, pathways, drug targets, and coordination of biological processes. Among other new findings, we show that the central proteome is encoded by exon-rich genes, indicating an increased regulatory flexibility through alternative splicing to adapt to multiple environments, and that the protein interaction network linking the central proteome is very efficient for synchronizing translation with other biological processes. Surprisingly, at least 10% of the central proteome has no or very limited functional annotation.</p> <p>Conclusions</p> <p>Our data and analysis provide a new and deeper description of the human central proteome compared to previous results thereby extending and complementing our knowledge of commonly expressed human proteins. All the data are made publicly available to help other researchers who, for instance, need to compare or link focused datasets to a common background.</p

    Computational and validation approaches in proteomics discovery of disease biomarkers

    Get PDF
    Tese de mestrado, Biologia Humana e Ambiente, Universidade de Lisboa, Faculdade de Ciências, 2019O estudo em larga escala de proteínas, a proteómica, está a mudar amplamente a nossa compreensão das funções dos genes na era pós-genómica. Depois da revolução na genómica pelos métodos de sequenciação de ADN (ácido desoxirribonucleico), a proteómica tem vindo a aumentar o nosso conhecimento sobre a variabilidade, localização, função e vias metabólicas das proteínas na célula, tecido ou organismo. O desenvolvimento de biomarcadores, como uma componente relevante na tomada de decisões, tanto nos processos clínicos quanto no desenvolvimento de novos medicamentos, é uma área emergente onde a proteómica tem vindo a ganhar relevo. As tecnologias proteómicas permitem uma análise comparativa, qualitativa e quantitativa de milhares de proteínas de células/tecidos de doentes versus indivíduos controles, assim como, de doentes antes e depois de um determinado tratamento. As proteínas identificadas diferencialmente abundantes e/ou modificadas pós-traducionalmente na condição de doença ou em resposta a terapia, são possíveis candidatas a biomacardores destas condições. No entanto, a interpretação da enorme quantidade de dados proteómicos, gerados principalmente por experiências baseadas em espectrometria de massa (MS), requer suporte computacional para processamento e análise dos dados de forma efetiva e robusta. A Proteómica computacional estuda os métodos computacionais, algoritmos, bases de dados e metodologias utilizadas para processar, gerir, analisar e interpretar os dados produzidos em experiências proteómicas na identificação de potenciais biomarcadores. O Laboratório de Proteómica do INSA (Instituto Nacional de Saúde Doutor Ricardo Jorge), através da busca de biomarcadores proteómicos para compreensão de doenças tais como a Anemia das Células Falciformes (ACF), tem produzido grandes dados de MS que necessitam de análise computacional, sendo este o principal propósito deste projeto (ver abaixo objetivo deste estudo). A ACF, também denominada por drepanocitose ou anemia drepanocítica, é um distúrbio monogénico autossómico recessivo, clinicamente heterogéneo, caracterizado por episódios recorrentes de hemólise grave, vaso-oclusão e infecção. Vários modificadores genéticos e ambientais foram sugeridos para modular o início e o curso da ACF. Especificamente, os componentes vasculares da patologia (por exemplo, acidente vascular cerebral) foram submetidos a pesquisas intensivas e o uso de metodologias proteómicas promete oferecer novas percepções moleculares sobre a fisiopatologia da ACF. A mudança do estado estacionário para a crise ainda é em grande parte imprevisível. A fim de descobrir biomarcadores putativos para essa exacerbação, o laboratório do INSA analisou por proteómica de shotgun MS, amostras de plasma e glóbulos vermelhos (GV), de um grupo de pacientes com ACF em estado estacionário e em crise (episódio de vaso-oclusão). Objetivo do estudo: Este estudo teve como objetivo analisar os dados de shotgun MS gerados para a ACF através de plataformas de proteómica computacional de código aberto, nomeadamente o PatternLab e o MaxQuant , no sentido de identificar proteínas como possíveis candidatos a biomarcadores da ACF e em particular da ACF associada a vaso-oclusão. O PatternLab for Proteomics é um ambiente computacional integrado para análise de proteómica shotgun, formatando bancos de dados de sequências, que realizam a correspondência de espectro peptídico, filtrando estatisticamente e organizando dados por proteómica diferencial, exibindo resultados em formato de gráficos, realizando estudos orientados por similaridade com dados de sequenciação de novo, ajudando à compreensão do significado biológico dos dados à luz da Ontologia Genética (Gene Ontology). O MaxQuant é um conjunto de algoritmos, que inclui a detecção de picos e a pontuação de péptidos, realiza a calibração em massa e pesquisas em bancos de dados para identificação de proteínas, quantifica proteínas identificadas e fornece estatísticas resumidas. Para validar os achados proteómicos, algumas das proteínas identificadas diferencialmente na patologia por essas plataformas computacionais, foram selecionadas para validação (verificação) através de abordagens como Western blot. O Western blot é uma técnica bioquímica imunológica na qual uma mistura de proteínas é separada por gel 1DSDS-PAGE (Sodium Dodecyl Sulfate - PolyAcrylamide Gel Electrophoresis), transferida para uma membrana, posteriormente incubada com anticorpo específico contra a proteína de interesse. A reação, geralmente visualizada por quimioluminescência, pode ser quantificada por densitometria. De todas as 111 proteínas diferencialmente expressas identificadas (74 da fração citoplasmática e 37 da fração membranar) associadas ao evento de crise na ACF, a peroxiredoxina-2, a catalase, a Hsp70 e Hsp 90 foram validadas por Western blot. Destas 4 proteínas, apenas a peroxiredoxina-2 apresentou significância estatística. Das proteínas diferencialmente expressas que hipoteticamente podem estar associadas à ACF, um promissor biomarcador de crise, nomeadamente, a Voltagedependent anion-selective channel protein 1 (VDAC1) foi encontrada diminuída. Os resultados sugerem que episódios de vaso-oclusão em doentes com ACF podem estar associados à diminuição da VDAC1 nos glóbulos vermelhos. Os resultados deste projeto após apresentação e discussão podem contribuir para um melhor entendimento das vias moleculares associadas à ACF, bem como para a identificação de modulações proteicas específicas, como possíveis candidatos a biomarcadores para essas patologias.The Laboratory of Proteomics at INSA (Instituto Nacional de Saúde Doutor Ricardo Jorge) by searching biomarkers for Sickle-cell disease (SCD) has been produced considerable Mass Spectrometry (MS) data, needing computational analysis to process, manage, analyze and interpret the data to reveal relevant biomarkers. SCD is a clinically heterogeneous autosomal recessive monogenic disorder characterized by recurrent episodes of severe haemolysis, vaso-occlusion and infection. Several genetic and environmental modifiers have been suggested to modulate the onset and course of SCD. The vascular components of the pathology have been thus subjected to intensive research and the usage of proteomics methodologies promises to offer novel unbiased molecular insights into the pathophysiology of SCD. The objective of this project is to analyze by different bioinformatics tools the MS raw data that have been generated by INSA’s Lab in order to investigate biological/molecular mechanisms responsible for protein changes that might be related with the development of SCD. The most pertinent proteins identified by these computational approaches associated with those pathologies will be selected for further validation as candidate biomarkers by using western blot methods. From all the identified 111 differentially expressed proteins (74 cytoplasmatic fraction and 37 membrane fraction) associated with SCD crisis event, Peroxiredoxin-2, Catalase, 70-kDa Heat shock protein and Heat shock protein 90 were validated with Western blot. From these 4 proteins only Peroxiredoxin-2 showed statically significant. Of the differentially expressed proteins that hypothetically may be associated with SCD, a promise candidate biomarker of crisis namely the Voltage-dependent anion-selective channel protein 1 (VDAC1) was found decreased. Our results suggest that vaso-occlusion episodes in SCD patients may be associated with decreased VDAC1 in their RBCs. This study indicated that SCD patients at crisis-state are under oxidative stress and the proteins such as PRDX2 and VDAC1 are promising candidates biomarkers for SCD crisis-state. In summary, the main objective of this project is to contribute to a better understanding of the molecular mechanisms associated with these pathologies, as well as to discover new diagnostic, prognostic or monitoring biomarkers for these diseases, leading to the development of new methods that would increase the quality of life of these patients
    corecore