1,139 research outputs found
Deep Learning -- A first Meta-Survey of selected Reviews across Scientific Disciplines, their Commonalities, Challenges and Research Impact
Deep learning belongs to the field of artificial intelligence, where machines
perform tasks that typically require some kind of human intelligence. Similar
to the basic structure of a brain, a deep learning algorithm consists of an
artificial neural network, which resembles the biological brain structure.
Mimicking the learning process of humans with their senses, deep learning
networks are fed with (sensory) data, like texts, images, videos or sounds.
These networks outperform the state-of-the-art methods in different tasks and,
because of this, the whole field saw an exponential growth during the last
years. This growth resulted in way over 10,000 publications per year in the
last years. For example, the search engine PubMed alone, which covers only a
sub-set of all publications in the medical field, provides already over 11,000
results in Q3 2020 for the search term 'deep learning', and around 90% of these
results are from the last three years. Consequently, a complete overview over
the field of deep learning is already impossible to obtain and, in the near
future, it will potentially become difficult to obtain an overview over a
subfield. However, there are several review articles about deep learning, which
are focused on specific scientific fields or applications, for example deep
learning advances in computer vision or in specific tasks like object
detection. With these surveys as a foundation, the aim of this contribution is
to provide a first high-level, categorized meta-survey of selected reviews on
deep learning across different scientific disciplines. The categories (computer
vision, language processing, medical informatics and additional works) have
been chosen according to the underlying data sources (image, language, medical,
mixed). In addition, we review the common architectures, methods, pros, cons,
evaluations, challenges and future directions for every sub-category.Comment: 83 pages, 22 figures, 9 tables, 100 reference
Using ODIN for a PharmGKB revalidation experiment
The need for efficient text-mining tools that support curation of the biomedical literature is ever increasing. In this article, we describe an experiment aimed at verifying whether a text-mining tool capable of extracting meaningful relationships among domain entities can be successfully integrated into the curation workflow of a major biological database. We evaluate in particular (i) the usability of the system's interface, as perceived by users, and (ii) the correlation of the ranking of interactions, as provided by the text-mining system, with the choices of the curators
Using Neural Networks for Relation Extraction from Biomedical Literature
Using different sources of information to support automated extracting of
relations between biomedical concepts contributes to the development of our
understanding of biological systems. The primary comprehensive source of these
relations is biomedical literature. Several relation extraction approaches have
been proposed to identify relations between concepts in biomedical literature,
namely, using neural networks algorithms. The use of multichannel architectures
composed of multiple data representations, as in deep neural networks, is
leading to state-of-the-art results. The right combination of data
representations can eventually lead us to even higher evaluation scores in
relation extraction tasks. Thus, biomedical ontologies play a fundamental role
by providing semantic and ancestry information about an entity. The
incorporation of biomedical ontologies has already been proved to enhance
previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1
Privacy in the Genomic Era
Genome sequencing technology has advanced at a rapid pace and it is now
possible to generate highly-detailed genotypes inexpensively. The collection
and analysis of such data has the potential to support various applications,
including personalized medical services. While the benefits of the genomics
revolution are trumpeted by the biomedical community, the increased
availability of such data has major implications for personal privacy; notably
because the genome has certain essential features, which include (but are not
limited to) (i) an association with traits and certain diseases, (ii)
identification capability (e.g., forensics), and (iii) revelation of family
relationships. Moreover, direct-to-consumer DNA testing increases the
likelihood that genome data will be made available in less regulated
environments, such as the Internet and for-profit companies. The problem of
genome data privacy thus resides at the crossroads of computer science,
medicine, and public policy. While the computer scientists have addressed data
privacy for various data types, there has been less attention dedicated to
genomic data. Thus, the goal of this paper is to provide a systematization of
knowledge for the computer science community. In doing so, we address some of
the (sometimes erroneous) beliefs of this field and we report on a survey we
conducted about genome data privacy with biomedical specialists. Then, after
characterizing the genome privacy problem, we review the state-of-the-art
regarding privacy attacks on genomic data and strategies for mitigating such
attacks, as well as contextualizing these attacks from the perspective of
medicine and public policy. This paper concludes with an enumeration of the
challenges for genome data privacy and presents a framework to systematize the
analysis of threats and the design of countermeasures as the field moves
forward
์๋ฌผํ์ ์ฌ์ ์ง์์ ํ์ฉํ ๊ณ ์ฐจ์์ ๋ค์ค ์ค๋ฏน์ค ๊ด๊ณ๋ฅผ ์ฐพ๋ ์ปดํจํฐ ๊ณตํ์ ์ ๊ทผ ๋ฐฉ๋ฒ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2021.8. ๊น์ .์ธํฌ๊ฐ ์ด๋ป๊ฒ ๊ธฐ๋ฅํ๊ณ ์ธ๋ถ ์๊ทน์ ๋ฐ์ํ๋์ง ์ดํดํ๋ ๊ฒ์ ์๋ฌผํ, ์ํ์์ ๊ฐ์ฅ ์ค์ํ ๊ด์ฌ์ฌ ์ค ํ๋์ด๋ค. ๊ธฐ์ ์ ๋ฐ์ ์ผ๋ก ๊ณผํ์๋ค์ ๋จ์ผ ์๋ฌผํ์ ์คํ์ผ๋ก ์ธํฌ์ ๋ณํ์์ธ๋ค์ ์ฝ๊ฒ ์ธก์ ํ ์ ์๊ฒ ๋์๋ค. ์ฃผ๋ชฉํ ๋งํ ์์๋ก ๊ฒ๋ ์ํ์ฑ, ์ ์ ์ ๋ฐํ๋ ์ธก์ , ์ ์ ์ ๋ฐํ์ ์กฐ์ ํ๋ ํ์ฑ ์ ์ ์ฒด ์ธก์ ๊ฐ์ ๋ค์ค ์ค๋ฏน์ค ๋ฐ์ดํฐ๊ฐ ์๋ค. ์ธํฌ์ ์ํ๋ฅผ ๋ ์์ธํ ์ดํดํ๊ธฐ ์ํด์ ๋ค์ค ์ค๋ฏน์ค ์กฐ์ ์์ ์ ์ ์ ์ฌ์ด์ ์กฐ์ ๊ด๊ณ๋ฅผ ์์๋ด๋ ๊ฒ์ด ์ค์ํ๋ค. ํ์ง๋ง ๋ค์ค ์ค๋ฏน์ค ์กฐ์ ๊ด๊ณ๋ ๋งค์ฐ ๋ณต์กํ๊ณ ๋ชจ๋ ์ธํฌ ์ํ ํน์ด์ ์ธ ๊ด๊ณ๋ฅผ ์คํ์ ์ผ๋ก ๊ฒ์ฆํ๋ ๊ฒ์ ๋ถ๊ฐ๋ฅํ๋ค. ๋ฐ๋ผ์, ์๋ก ๋ค๋ฅธ ์ ํ์ ๊ณ ์ฐจ์ ์ค๋ฏน์ค ๋ฐ์ดํฐ๋ก๋ถํฐ ๊ด๊ณ๋ฅผ ์์ธกํ๊ธฐ ์ํ ํจ์จ์ ์ธ ์ปดํจํฐ ๊ณตํ์ ์ ๊ทผ๋ฐฉ๋ฒ์ด ์๊ตฌ๋๋ค. ์ด๋ฌํ ๊ณ ์ฐจ์ ๋ฐ์ดํฐ๋ฅผ ์ฒ๋ฆฌํ๋ ํ ๊ฐ์ง ๋ฐฉ๋ฒ์ ๋ค์ํ ๋ฐ์ดํฐ๋ฒ ์ด์ค์์ ์ ๋ณ๋ ์ ์ ์์ ๊ธฐ๋ฅ๊ณผ ์ค๋ฏน์ค ๊ฐ์ ๊ด๊ณ์ ๊ฐ์ ์ธ๋ถ ์๋ฌผํ์ ์ง์์ ํตํฉํ์ฌ ํ์ฉํ๋ ๊ฒ์ด๋ค.
๋ณธ ๋ฐ์ฌํ์ ๋
ผ๋ฌธ์ ์๋ฌผํ์ ์ฌ์ ์ง์์ ํ์ฉํ์ฌ ๋ค์ค ์ค๋ฏน์ค ๋ฐ์ดํฐ๋ก๋ถํฐ ์ ์ ์์ ๋ฐํ์ ์กฐ์ ํ๋ ๊ด๊ณ๋ฅผ ์์ธกํ๊ธฐ ์ํ ์ธ ๊ฐ์ง ์ปดํจํฐ ๊ณตํ์ ์ธ ์ ๊ทผ๋ฒ์ ์ ์ํ์๋ค.
์ฒซ ๋ฒ์งธ๋ ๋ง์ดํฌ๋ก ์์์์ด์ ์ ์ ์์ ์ผ๋๋ค ๊ด๊ณ๋ฅผ ์์ธกํ๊ธฐ ์ํ ๊ธฐ๋ฒ์ด๋ค. ๋ง์ดํฌ๋ก ์์์์ด ํ์ ์์ธก ๋ฌธ์ ๋ ๊ฐ๋ฅํ ํ์ ์ ์ ์์ ๊ฐ์๊ฐ ๋๋ฌด ๋ง์ผ๋ฉฐ ๊ฑฐ์ง ์์ฑ๊ณผ ๊ฑฐ์ง์์ฑ์ ๋น์จ์ ์กฐ์ ํด์ผ ํ๋ ๋ฌธ์ ๊ฐ ์๋ค. ์ด๋ฌํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ๋ง์ดํฌ๋ก ์์์์ด-์ ์ ์์ ๋ฐ์ดํฐ์ ๋งฅ๋ฝ ์ฌ์ด์ ์ฐ๊ด์ฑ์ ๋ฌธํ ์ง์์ ํ์ฉํ์ฌ ๊ฒฐ์ ํ๊ณ ๋ง์ดํฌ๋ก ์์์์ด-์ ์ ์ ๊ด๊ณ๋ฅผ ์์ธกํ๊ธฐ ์ํ ContextMMIA๋ฅผ ๊ฐ๋ฐํ์๋ค. ContextMMIA๋ ํต๊ณ์ ์ ์์ฑ๊ณผ ๋ฌธํ ๊ด๋ จ์ฑ์ ๊ธฐ๋ฐ์ผ๋ก ๋ง์ดํฌ๋ก ์์์์ด-์ ์ ์ ๊ด๊ณ์ ์ ์๋ฅผ ๊ณ์ฐํ์ฌ ๊ด๊ณ์ ์ฐ์ ์์๋ฅผ ๊ฒฐ์ ํ๋ค. ์ํ๊ฐ ๋ค๋ฅธ ์ ๋ฐฉ์ ๋ฐ์ดํฐ์ ๋ํ ์คํ์์ ContextMMIA๋ ์ํ๊ฐ ๋์ ์ ๋ฐฉ์์์ ํ์ฑํ๋ ๋ง์ดํฌ๋ก ์์์์ด-์ ์ ์ ๊ด๊ณ๋ฅผ ์์ธกํ์๊ณ ๊ธฐ์กด ์คํ์ ์ผ๋ก ๊ฒ์ฆ๋ ๊ด๊ณ๊ฐ ๋์ ์ฐ์ ์์๋ก ์์ธก๋์์ผ๋ฉฐ ํด๋น ์ ์ ์๋ค์ด ์ ๋ฐฉ์ ๊ด๋ จ ๊ฒฝ๋ก์ ๊ด์ฌํ๋ ๊ฒ์ผ๋ก ์๋ ค์ก๋ค.
๋ ๋ฒ์งธ๋ ์ฝ๋ฌผ ๋ฐ์์ ์ผ์ผํค๋ ์ ์ ์์ ๋ค๋์ผ ์กฐ์ ๊ด๊ณ๋ฅผ ์์ธกํ๊ธฐ ์ํ ๊ธฐ๋ฒ์ด๋ค. ์ฝ๋ฌผ ๋ฐ์ ์์ธก์ ์ํด์ ์ฝ๋ฌผ ๋ฐ์ ๋งค๊ฐ ์ ์ ์๋ฅผ ๊ฒฐ์ ํด์ผ ํ๋ฉฐ ์ด๋ฅผ ์ํด 20,000๊ฐ ์ ์ ์์ ๋ค์ค ์ค๋ฏน์ค ๋ฐ์ดํฐ๋ฅผ ํตํฉ ๋ถ์ํ๋ ๋ฐฉ๋ฒ์ด ํ์ํ๋ค. ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ์ ์ฐจ์ ์๋ฒ ๋ฉ ๋ฐฉ๋ฒ, ์ฝ๋ฌผ-์ ์ ์ ์ฐ๊ด์ฑ์ ๋ํ ๋ฌธํ ์ง์ ๋ฐ ์ ์ ์-์ ์ ์ ์ํธ ์์ฉ ์ง์์ ํ์ฉํ์ฌ ์ฝ๋ฌผ ๋ฐ์์ ์์ธกํ๊ธฐ ์ํ DRIM์ ๊ฐ๋ฐํ์๋ค. DRIM์ ์คํ ์ธ์ฝ๋, ํ
์ ๋ถํด, ์ฝ๋ฌผ-์ ์ ์ ์ฐ๊ด์ฑ์ ์ด์ฉํ์ฌ ๋ค์ค ์ค๋ฏน์ค ๋ฐ์ดํฐ์์ ๋ค๋์ผ ๊ด๊ณ๋ฅผ ๊ฒฐ์ ํ๋ค. ๊ฒฐ์ ๋ ๋งค๊ฐ ์ ์ ์์ ์กฐ์ ๊ด๊ณ๋ฅผ ์ ์ ์-์ ์ ์ ์ํธ ์์ฉ ์ง์๊ณผ ์ฝ๋ฌผ ๋ฐ์ ์๊ณ์ด ์ ์ ์ ๋ฐํ ๋ฐ์ดํฐ์ ์ํธ ์๊ด๊ด๊ณ๋ฅผ ์ด์ฉํ์ฌ ๊ฒฐ์ ํ๋ค. ์ ๋ฐฉ์ ์ธํฌ์ฃผ ๋ฐ์ดํฐ์ ๋ํ ์คํ์์ DRIM์ ๋ผํํฐ๋์ด ํ์ ์ผ๋ก ํ๋ PI3K-Akt ํจ์ค์จ์ด์ ๊ด์ฌํ๋ ์ ์ ์๋ค์ ์ฝ๋ฌผ ๋ฐ์ ์กฐ์ ๊ด๊ณ๋ฅผ ์์ธกํ์๊ณ ๋ผํํฐ๋ ๋ฐ์์ฑ๊ณผ ๊ด๋ จ๋ ๋งค๊ฐ ์ ์ ์๋ฅผ ์์ธกํ์๋ค. ๊ทธ๋ฆฌ๊ณ ์์ธก๋ ์กฐ์ ๊ด๊ณ๊ฐ ์ธํฌ์ฃผ ํน์ด์ ์ธ ํจํด์ ๋ณด์ด๋ ๊ฒ์ ํ์ธํ์๋ค.
์ธ ๋ฒ์งธ๋ ์ธํฌ์ ์ํ๋ฅผ ์ค๋ช
ํ๋ ์กฐ์ ์์ ์ ์ ์์ ๋ค๋๋ค ์กฐ์ ๊ด๊ณ๋ฅผ ์์ธกํ๊ธฐ ์ํ ๊ธฐ๋ฒ์ด๋ค. ๋ค๋๋ค ๊ด๊ณ ์์ธก์ ์ํด ๊ด์ฐฐ๋ ์ ์ ์ ๋ฐํ ๊ฐ๊ณผ ์ ์ ์ ์กฐ์ ๋คํธ์ํฌ๋ก๋ถํฐ ์ถ์ ๋ ์ ์ ์ ๋ฐํ ๊ฐ ์ฌ์ด์ ์ฐจ์ด๋ฅผ ์ธก์ ํ๋ ๋ชฉ์ ํจ์๋ฅผ ๋ง๋ค์๋ค. ๋ชฉ์ ํจ์๋ฅผ ์ต์ํํ๊ธฐ ์ํ์ฌ ์กฐ์ ์ธ์์ ์ ์ ์์ ์์ ๋ฐ๋ผ ๊ธฐํ๊ธ์์ ์ผ๋ก ์ฆ๊ฐํ๋ ๊ฒ์ ๊ณต๊ฐ์ ํ์ํด์ผ ํ๋ค. ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ์กฐ์ ์-์ ์ ์ ์ํธ ์์ฉ ์ง์์ ํ์ฉํ์ฌ ๋ ๊ฐ์ง ์ฐ์ฐ์ ๋ฐ๋ณตํ์ฌ ์กฐ์ ๊ด๊ณ๋ฅผ ์ฐพ๋ ์ต์ ํ ๊ธฐ๋ฒ์ ๊ฐ๋ฐํ์๋ค. ์ฒซ ๋ฒ์งธ ๋จ๊ณ๋ ๋คํธ์ํฌ์ ๊ฐ์ ์ ์ถ๊ฐํ๊ธฐ ์ํด ๊ฐํ ํ์ต ๊ธฐ๋ฐ ํด๋ฆฌ์คํฑ์ ํตํด ์กฐ์ ์๋ฅผ ์ ํํ๋ ๋ค๋์ผ ์ ์ ์ ์ค์ฌ ๊ด๊ณ๋ฅผ ํ์ํ๋ ๋จ๊ณ์ด๋ค. ๋ ๋ฒ์งธ ๋จ๊ณ๋ ๋คํธ์ํฌ์์ ๊ฐ์ ์ ์ ๊ฑฐํ๊ธฐ ์ํด ์ ์ ์๋ฅผ ํ๋ฅ ์ ์ผ๋ก ์ ํํ๋ ์ผ๋๋ค ์กฐ์ ์ ์ค์ฌ ๊ด๊ณ๋ฅผ ํ์ํ๋ ๋จ๊ณ์ด๋ค. ์ ๋ฐฉ์ ์ธํฌ์ฃผ ๋ฐ์ดํฐ์ ๋ํ ์คํ์์ ์ ์๋ ๋ฐฉ๋ฒ์ ์ด์ ์ ์ต์ ํ ๋ฐฉ๋ฒ๋ณด๋ค ๋ ์ ํํ ์ ์ ์ ๋ฐํ๋ ์ถ์ ์ ํ์๊ณ ์กฐ์ ์ ๋ฐ ์ ์ ์ ๋ฐํ ๋ฐ์ดํฐ๋ก ์ ๋ฐฉ์ ์ํ ํน์ด์ ๋คํธ์ํฌ๋ฅผ ๊ตฌ์ฑํ์๋ค. ๋ํ, ์ ๋ฐฉ์ ์ํ ๊ด๋ จ ์คํ ๊ฒ์ฆ๋ ์กฐ์ ๊ด๊ณ๋ฅผ ์์ธกํ์๋ค.
์์ฝํ๋ฉด, ๋ณธ ๋ฐ์ฌํ์ ๋
ผ๋ฌธ์ ๋ค์ค ์ค๋ฏน์ค ์กฐ์ ์์ ์ ์ ์์ ์ฌ์ด์ ์ผ๋๋ค, ๋ค๋์ผ, ๋ค๋๋ค ๊ด๊ณ๋ฅผ ์์ธกํ๊ธฐ ์ํ์ฌ ์๋ฌผํ์ ์ง์์ ํ์ฉํ ์ปดํจํฐ ๊ณตํ์ ์ ๊ทผ๋ฒ์ ์ ์ํ์๋ค. ์ ์๋ ๋ฐฉ๋ฒ์ ์ฆ๊ฐํ๊ณ ์๋ ๋ถ์ ์๋ฌผํ ๋ฐ์ดํฐ๋ฅผ ๋ถ์ํ์ฌ ์ ์ ์ ์กฐ์ ์ํธ ์์ฉ์ ์ดํดํจ์ผ๋ก์จ ์ธํฌ ๊ธฐ๋ฅ์ ๋ํ ์ฌ์ธต์ ์ธ ์ดํด๋ฅผ ๋์์ค ์ ์์ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค.Understanding how cells function or respond to external stimuli is one of the most important questions in biology and medicine. Thanks to the advances in instrumental technologies, scientists can routinely measure events within cells in single biological experiments. Notable examples are multi-omics data: sequencing of genomes, quantifications of gene expression, and identification of epigenetic events that regulate expression of genes. In order to better understand cellular mechanisms, it is essential to identify regulatory relationships between multi-omics regulators and genes. However, regulatory relationships are very complex and it is infeasible to validate all condition-specific relationships experimentally. Thus, there is an urgent need for an efficient computational method to extract relationships from different types of high-dimensional omics data. One way to address these high-dimensional data is to incorporate external biological knowledge such as relationships between omics and functions of genes curated in various databases.
In my doctoral study, I developed three computational approaches to identify the regulatory relationships from multi-omics data utilizing biological prior knowledge.
The first study proposes a method to predict one-to-m relationships between miRNA and genes. The computational challenge of miRNA target prediction is that there are many miRNA target candidates, and the ratio of false positives to false negatives needs to be adjusted. This challenge is addressed by utilizing literature knowledge for determining the association between miRNA-gene and a given context. In this study, I developed ContextMMIA to predict miRNA-gene relationships from miRNA and gene expression data. ContextMMIA computes scores of miRNA-gene relationships based on statistical significance and literature relevance and prioritizes the relationships based on the scores. In experiments on breast cancer data with different prognosis, ContextMMIA predicted differentially activated miRNA-gene relationships in invasive breast cancer. The experimentally verified miRNA-gene relationships were predicted with high priority and those genes are known to be involved in breast cancer-related pathways.
The second study proposes a method to predict n-to-one relationships between regulators and gene on drug response. The computational challenge of drug response prediction is how to integrate multi-omics data of 20,000 genes for determining drug response mediator genes. This challenge is addressed by utilizing low-dimensional embedding methods, literature knowledge of drug-gene associations, and gene-gene interaction knowledge. For this problem, I developed DRIM to predict drug response relationships from the multi-omics data and drug-induced time-series gene expression data. DRIM uses autoencoder, tensor decomposition, and drug-gene association to determine n-to-one relationships from multi-omics data. Then, regulatory relationships of mediator genes are determined by gene-gene interaction knowledge and cross-correlation of drug-induced time-series gene expression data. In experiments on breast cancer cell line data, DRIM extracted mediator genes relevant to drug response and regulatory relationships of genes involved in the PI3K-Akt pathway targeted by lapatinib. In addition, DRIM revealed distinguished patterns of relationships in breast cancer cell lines with different lapatinib resistance.
The third study proposes a method to predict n-to-m relationships between regulators and genes. In order to predict n-to-m relationships, this study formulated an objective function that measures the deviation between observed gene expression values and estimated gene expression values derived from gene regulatory networks. The computational challenge of minimizing the objective function is to navigate the search space of relationships exponentially increasing according to the number of regulators and genes. This challenge is addressed by the iterative local optimization with regulator-gene interaction knowledge. In this study, I developed a two-step iterative RL-based method to predict n-to-m relationships from regulator and gene expression data. The first step is to explore the n-to-one gene-oriented step that selects regulators by reinforcement learning based heuristic to add edges to the network. The second step is to explore the one-to-m regulator-oriented step that stochastically selects genes to remove edges from the network. In experiments on breast cancer cell line data, the proposed method constructed breast cancer subtype-specific networks from the regulator and gene expression profiles with a more accurate gene expression estimation than previous combinatorial optimization methods. Moreover, regulatory relationships involved in the networks were associated with breast cancer subtypes.
In summary, in this thesis, I proposed computational methods for predicting one-to-m, n-to-one, and n-to-m relationships between multi-omics regulators and genes utilizing external domain knowledge. The proposed methods are expected to deepen our knowledge of cellular mechanisms by understanding gene regulatory interactions by analyzing the ever-increasing molecular biology data such as The Cancer Genome Atlas, Cancer Cell Line Encyclopedia.Chapter 1 Introduction 1
1.1 Biological background 1
1.1.1 Multi-omics analysis 1
1.1.2 Multi-omics relationships indicating cell state 2
1.1.3 Biological prior knowledge 4
1.2 Research problems for the multi-omics relationship 6
1.3 Computational challenges and approaches in the exploring multiomics relationship 6
1.4 Outline of the thesis 12
Chapter 2 Literature-based condition-specific miRNA-mRNA target prediction 13
2.1 Computational Problem & Evaluation criterion 14
2.2 Related works 15
2.3 Motivation 17
2.4 Methods 20
2.4.1 Identifying genes and miRNAs based on the user-provided context 22
2.4.2 Omics Score 23
2.4.3 Context Score 24
2.4.4 Confidence Score 26
2.5 Results 26
2.5.1 Pathway analysis 27
2.5.2 Reproducibility of validated targets in humans 31
2.5.3 Sensitivity tests when different keywords are used 33
2.6 Summary 34
Chapter 3 DRIM: A web-based system for investigating drug response at the molecular level by condition-specific multi-omics data integration 36
3.1 Computational Problem & Evaluation criterion 37
3.2 Related works 38
3.3 Motivation 42
3.4 Methods 44
3.4.1 Step 1: Input 45
3.4.2 Step 2: Identifying perturbed sub-pathway with time-series 45
3.4.3 Step 3: Embedding multi-omics for selecting potential mediator genes 47
3.4.4 Step 4: Construct TF-regulatory time-bounded network and identify regulatory path 52
3.4.5 Step 5: Analysis result on the web 52
3.5 Case study: Comparative analysis of breast cancer cell lines that have different sensitivity with lapatinib 54
3.5.1 Multi-omics analysis result before drug treatment 56
3.5.2 Time-series gene expression analysis after drug treatment 57
3.6 Summary 61
Chapter 4 Combinatorial modeling and optimization using iterative RL search for inferring sample-specific regulatory network 63
4.1 Computational Problem & Evaluation criterion 64
4.2 Related works 64
4.3 Motivation 66
4.4 Methods 68
4.4.1 Formulating an objective function 68
4.4.2 Overview of an iterative search method 70
4.4.3 G-step for exploring n-to-one gene-oriented relationship 73
4.4.4 R-step for exploring one-to-m regulator-oriented relationship 79
4.5 Results 80
4.5.1 Cancer cell line data 80
4.5.2 Hyperparameters 81
4.5.3 Quantitative evaluation 82
4.5.4 Qualitative evaluation 83
4.6 Summary 86
Chapter 5 Conclusions 88
๊ตญ๋ฌธ์ด๋ก 111๋ฐ
Warfarin Dose Estimation on High-dimensional and Incomplete Data
Warfarin is a widely used oral anticoagulant worldwide. However, due to the complex relationship between individual factors, it is challenging to estimate the optimal warfarin dose to give full play to its ideal ef๏ฌcacy. Currently, there are plenty of studies using machine learning or deep learning techniques to help with the optimal warfarin dose selection. But few of them can resolve missing values and high-dimensional data naturally, that are two main concerns when analyzing clinical real world data. In this work, we propose to regard each patientโs record as a set of observed individual factors, and represent them in an embedding space, that enables our method can learn from the incomplete date directly and avoid the negative impact from the high-dimensional feature set. Then, a novel neural network is proposed to combine the set of embedded vectors non-linearly, that are capable of capturing their correlations and locating the informative ones for prediction. After comparing with the baseline models on the open source data from International Warfarin Pharmacogenetics Consortium, the experimental results demonstrate that our proposed method outperform others by a signi๏ฌcant margin. After further analyzing the model performance in different dosing subgroups, we can conclude that the proposed method has the high application value in clinical, especially for the patients in high-dose and medium-dose subgroups
- โฆ