Search CORE

56 research outputs found

Influence of Dictionary Size on the Lossless Compression of Microarray Images

Author: Rahul Singh
Robert Bierman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

A key challenge in the management of microarray data is the large size of images that constitute the output of microarray experiments. Therefore, only the expression values extracted from these experiments are generally made available. However, the extraction of expression data is effected by a variety of factors, such as the thresholds used for background intensity correction, method used for grid determination, and parameters used in foreground (spot)-background delineation. This information is not always available or consistent across experiments and impacts downstream data analysis. Furthermore, the lack of access to the image-based primary data often leads to costly replication of experiments. Currently, both lossy and lossless compression techniques have been developed for microarray images. While lossy algorithms deliver better compression, a significant advantage of the lossless techniques is that they guarantee against loss of information that is putatively of biological importance. A key challenge therefore is the development of more efficacious lossless compression techniques. Dictionary-based compression is one of the critical methods used in lossless microarray compression. However, the image-based microarray data has potentially infinite variability. So the selection and effect of the dictionary size on the compression rate is crucial. Our paper examines this problem and shows that increasing the dictionary size beyond a certain size, does not lead to better compression. Our investigations also point to strategies for determining the optimal dictionary size. 1

CiteSeerX

Crossref

Standard and specific compression techniques for DNA microarray images

Author: Blanes Garcia Ian
Hernández Cabronero Miguel
Marcellin Michael W.
Serra-Sagristà Joan
Publication venue: 'MDPI AG'
Publication date: 01/01/2012
Field of study

We review the state of the art in DNA microarray image compression and provide original comparisons between standard and microarray-specific compression techniques that validate and expand previous work. First, we describe the most relevant approaches published in the literature and classify them according to the stage of the typical image compression process where each approach makes its contribution, and then we summarize the compression results reported for these microarray-specific image compression schemes. In a set of experiments conducted for this paper, we obtain new results for several popular image coding techniques that include the most recent coding standards. Prediction-based schemes CALIC and JPEG-LS are the best-performing standard compressors, but are improved upon by the best microarray-specific technique, Battiato's CNN-based scheme

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Diposit Digital de Documents de la UAB

Algoritmos de compressão sem perdas para imagens de microarrays e alinhamento de genomas completos

Author: Matos Luís Miguel de Oliveira
Publication venue: Universidade de Aveiro
Publication date: 01/01/2015
Field of study

Doutoramento em InformáticaNowadays, in the 21st century, the never-ending expansion of information is a major global concern. The pace at which storage and communication resources are evolving is not fast enough to compensate this tendency. In order to overcome this issue, sophisticated and efficient compression tools are required. The goal of compression is to represent information with as few bits as possible. There are two kinds of compression, lossy and lossless. In lossless compression, information loss is not tolerated so the decoded information is exactly the same as the encoded one. On the other hand, in lossy compression some loss is acceptable. In this work we focused on lossless methods. The goal of this thesis was to create lossless compression tools that can be used in two types of data. The first type is known in the literature as microarray images. These images have 16 bits per pixel and a high spatial resolution. The other data type is commonly called Whole Genome Alignments (WGA), in particularly applied to MAF files. Regarding the microarray images, we improved existing microarray-specific methods by using some pre-processing techniques (segmentation and bitplane reduction). Moreover, we also developed a compression method based on pixel values estimates and a mixture of finite-context models. Furthermore, an approach based on binary-tree decomposition was also considered. Two compression tools were developed to compress MAF files. The first one based on a mixture of finite-context models and arithmetic coding, where only the DNA bases and alignment gaps were considered. The second tool, designated as MAFCO, is a complete compression tool that can handle all the information that can be found in MAF files. MAFCO relies on several finite-context models and allows parallel compression/decompression of MAF files.Hoje em dia, no século XXI, a expansão interminável de informação é uma grande preocupação mundial. O ritmo ao qual os recursos de armazenamento e comunicação estão a evoluir não é suficientemente rápido para compensar esta tendência. De forma a ultrapassar esta situação, são necessárias ferramentas de compressão sofisticadas e eficientes. A compressão consiste em representar informação utilizando a menor quantidade de bits possível. Existem dois tipos de compressão, com e sem perdas. Na compressão sem perdas, a perda de informação não é tolerada, por isso a informação descodificada é exatamente a mesma que a informação que foi codificada. Por outro lado, na compressão com perdas alguma perda é aceitável. Neste trabalho, focámo-nos apenas em métodos de compressão sem perdas. O objetivo desta tese consistiu na criação de ferramentas de compressão sem perdas para dois tipos de dados. O primeiro tipo de dados é conhecido na literatura como imagens de microarrays. Estas imagens têm 16 bits por píxel e uma resolução espacial elevada. O outro tipo de dados é geralmente denominado como alinhamento de genomas completos, particularmente aplicado a ficheiros MAF. Relativamente às imagens de microarrays, melhorámos alguns métodos de compressão específicos utilizando algumas técnicas de pré-processamento (segmentação e redução de planos binários). Além disso, desenvolvemos também um método de compressão baseado em estimação dos valores dos pixéis e em misturas de modelos de contexto-finito. Foi também considerada, uma abordagem baseada em decomposição em árvore binária. Foram desenvolvidas duas ferramentas de compressão para ficheiros MAF. A primeira ferramenta, é baseada numa mistura de modelos de contexto-finito e codificação aritmética, onde apenas as bases de ADN e os símbolos de alinhamento foram considerados. A segunda, designada como MAFCO, é uma ferramenta de compressão completa que consegue lidar com todo o tipo de informação que pode ser encontrada nos ficheiros MAF. MAFCO baseia-se em vários modelos de contexto-finito e permite compressão/descompressão paralela de ficheiros MAF

Repositório Institucional da Universidade de Aveiro

Image Compression Techniques: A Survey in Lossless and Lossy algorithms

Author: Al-Fayadh A
Hussain A
Radi N
Publication venue: 'Elsevier BV'
Publication date
Field of study

The bandwidth of the communication networks has been increased continuously as results of technological advances. However, the introduction of new services and the expansion of the existing ones have resulted in even higher demand for the bandwidth. This explains the many efforts currently being invested in the area of data compression. The primary goal of these works is to develop techniques of coding information sources such as speech, image and video to reduce the number of bits required to represent a source without significantly degrading its quality. With the large increase in the generation of digital image data, there has been a correspondingly large increase in research activity in the field of image compression. The goal is to represent an image in the fewest number of bits without losing the essential information content within. Images carry three main type of information: redundant, irrelevant, and useful. Redundant information is the deterministic part of the information, which can be reproduced without loss from other information contained in the image. Irrelevant information is the part of information that has enormous details, which are beyond the limit of perceptual significance (i.e., psychovisual redundancy). Useful information, on the other hand, is the part of information, which is neither redundant nor irrelevant. Human usually observes decompressed images. Therefore, their fidelities are subject to the capabilities and limitations of the Human Visual System. This paper provides a survey on various image compression techniques, their limitations, compression rates and highlights current research in medical image compression

LJMU Research Online (Liverpool John Moores University)

Prediction by Partial Matching for Identification of Biological Entities

Author: Thirumalaiswamy Sekhar Arvind Kumar
Publication venue
Publication date: 29/09/2010
Field of study

As biomedical research and advances in biotechnology generate expansive datasets, the need to process this data into information has grown simultaneously. Specifically, recognizing and extracting these “key” phrases comprising the named entities from this information databank promises a plethora of applications for scientists. The ability to construct interaction maps,identify proteins as drug targets are two important applications. Since we have the choice of defining what is “useful”, we can potentially utilize text mining for our purpose. In a novel attempt to beat the challenge, we have put information theory and text compression through this task. Prediction by partial matching is an adaptive text encoding scheme that blends together a set of finite context Markov models to predict the probability of the next token in a given symbol stream. We observe, named entities such as gene names, protein names, gene functions, protein-protein interactions – all follow symbol statistics uniquely different from normal scientific text. By using well defined training sets that allow us to selectively differentiate between named entities and the rest of the symbols; we were able to extract them with a good accuracy. We have implemented our tests, using the Text Mining Toolkit, on identification of gene functions and protein-protein interactions with f-scores (based on precision & recall) of 0.9737 and 0.6865 respectively. With our results, we foresee the application of such an approach in automated information retrieval in the realm of biology

IUPUIScholarWorks

Quantitative analysis of mass spectrometry proteomics data : Software for improved life science

Author: Teleman Johan
Publication venue: Lund University Press
Publication date: 01/05/2016
Field of study

The rapid advances in life science, including the sequencing of the human genome and numerous other techiques, has given an extraordinary ability to aquire data on biological systems and human disease. Even so, drug development costs are higher than ever, while the rate of new approved treatments is historically low. A potential explanation to this discrepancy might be the difficulty of understanding the biology underlying the acquired data; the difficulty to refine the data to useful knowledge through interpretation. In this thesis the refinement of the complex data from mass spectrometry proteomics is studied. A number of new algorithms and programs are presented and demonstrated to provide increased analytical ability over previously suggested alternatives. With the higher goal of increasing the mass spectrometry laboratory scientific output, pragmatic studies were also performed, to create new set on compression algorithms for reduced storage requirement of mass spectrometry data, and also to characterize instrument stability. The final components of this thesis are the discussion of the technical and instrumental weaknesses associated with the currently employed mass spectrometry proteomics methodology, and the discussion of current lacking academical software quality and the reasons thereof. As a whole, the primary algorithms, the enabling technology, and the weakness discussions all aim to improve the current capability to perform mass spectrometry proteomics. As this technology is crucial to understand the main functional components of biology, proteins, this quest should allow better and higher quality life science data, and ultimately increase the chances of developing new treatments or diagnostics

Lund University Publications

Towards the Construction of a Transcriptional Landscape of the Human Genome: Data Analysis and Data Compression

Author: Lin Yuefeng
Publication venue
Publication date: 28/10/2014
Field of study

In the thesis, we built a genome-wide polyadenylation map with sequencing data sets from various human tissues and cell lines. With the map, we analyzed the pattern and distribution of polyadenylation sites in human genome. And we explored the differential polyadenylation patterns of non-coding and novel genes. Meanwhile, we have created the Expression and Polyadenylation Database (xPAD) as a web portal for the polyadenylation map. Moreover, we revealed the regulatory marks that might correlated with polyadenylation sites we have found. Besides, we unveiled a novel group of small YB-1 associated RNAs and investigate their possible regulation mechanism where we found multiple transcription factors and histone modification may mark the location of YB-1 associated RNA. We also implemented an Assembly-based Sequencing data Encoding Tool, AbSEnT. With this tool, we exhibited the feasibility and efficiency of the novel assembly-based compression algorithm by achieving a higher compression ratio than general-purpose compression tools. Meanwhile, we investigated the distribution of word frequency in sequencing data and found it shares a similarity with natural languages we used. If the connection could be proved, we may borrow the knowledge and experience from what we have learned in the research of natural language into the analysis of sequencing data

D-Scholarship@Pitt

New approaches for unsupervised transcriptomic data analysis based on Dictionary learning

Author: Rams Mona
Publication venue
Publication date: 01/01/2022
Field of study

The era of high-throughput data generation enables new access to biomolecular profiles and exploitation thereof. However, the analysis of such biomolecular data, for example, transcriptomic data, suffers from the so-called "curse of dimensionality". This occurs in the analysis of datasets with a significantly larger number of variables than data points. As a consequence, overfitting and unintentional learning of process-independent patterns can appear. This can lead to insignificant results in the application. A common way of counteracting this problem is the application of dimension reduction methods and subsequent analysis of the resulting low-dimensional representation that has a smaller number of variables. In this thesis, two new methods for the analysis of transcriptomic datasets are introduced and evaluated. Our methods are based on the concepts of Dictionary learning, which is an unsupervised dimension reduction approach. Unlike many dimension reduction approaches that are widely applied for transcriptomic data analysis, Dictionary learning does not impose constraints on the components that are to be derived. This allows for great flexibility when adjusting the representation to the data. Further, Dictionary learning belongs to the class of sparse methods. The result of sparse methods is a model with few non-zero coefficients, which is often preferred for its simplicity and ease of interpretation. Sparse methods exploit the fact that the analysed datasets are highly structured. Indeed, a characteristic of transcriptomic data is particularly their structuredness, which appears due to the connection of genes and pathways, for example. Nonetheless, the application of Dictionary learning in medical data analysis is mainly restricted to image analysis. Another advantage of Dictionary learning is that it is an interpretable approach. Interpretability is a necessity in biomolecular data analysis to gain a holistic understanding of the investigated processes. Our two new transcriptomic data analysis methods are each designed for one main task: (1) identification of subgroups for samples from mixed populations, and (2) temporal ordering of samples from dynamic datasets, also referred to as "pseudotime estimation". Both methods are evaluated on simulated and real-world data and compared to other methods that are widely applied in transcriptomic data analysis. Our methods convince through high performance and overall outperform the comparison methods

Institutional Repository of the Freie Universität Berlin

Preface

Author: Press Vilnius University
Publication venue: 'Vilnius University Press'
Publication date: 01/01/2018
Field of study

DAMSS-2018 is the jubilee 10th international workshop on data analysis methods for software systems, organized in Druskininkai, Lithuania, at the end of the year. The same place and the same time every year. Ten years passed from the first workshop. History of the workshop starts from 2009 with 16 presentations. The idea of such workshop came up at the Institute of Mathematics and Informatics. Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea. This idea got approval both in the Lithuanian research community and abroad. The number of this year presentations is 81. The number of registered participants is 113 from 13 countries. In 2010, the Institute of Mathematics and Informatics became a member of Vilnius University, the largest university of Lithuania. In 2017, the institute changes its name into the Institute of Data Science and Digital Technologies. This name reflects recent activities of the institute. The renewed institute has eight research groups: Cognitive Computing, Image and Signal Analysis, Cyber-Social Systems Engineering, Statistics and Probability, Global Optimization, Intelligent Technologies, Education Systems, Blockchain Technologies. The main goal of the workshop is to introduce the research undertaken at Lithuanian and foreign universities in the fields of data science and software engineering. Annual organization of the workshop allows the fast interchanging of new ideas among the research community. Even 11 companies supported the workshop this year. This means that the topics of the workshop are actual for business, too. Topics of the workshop cover big data, bioinformatics, data science, blockchain technologies, deep learning, digital technologies, high-performance computing, visualization methods for multidimensional data, machine learning, medical informatics, ontological engineering, optimization in data science, business rules, and software engineering. Seeking to facilitate relations between science and business, a special session and panel discussion is organized this year about topical business problems that may be solved together with the research community. This book gives an overview of all presentations of DAMSS-2018.DAMSS-2018 is the jubilee 10th international workshop on data analysis methods for software systems, organized in Druskininkai, Lithuania, at the end of the year. The same place and the same time every year. Ten years passed from the first workshop. History of the workshop starts from 2009 with 16 presentations. The idea of such workshop came up at the Institute of Mathematics and Informatics. Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea. This idea got approval both in the Lithuanian research community and abroad. The number of this year presentations is 81. The number of registered participants is 113 from 13 countries. In 2010, the Institute of Mathematics and Informatics became a member of Vilnius University, the largest university of Lithuania. In 2017, the institute changes its name into the Institute of Data Science and Digital Technologies. This name reflects recent activities of the institute. The renewed institute has eight research groups: Cognitive Computing, Image and Signal Analysis, Cyber-Social Systems Engineering, Statistics and Probability, Global Optimization, Intelligent Technologies, Education Systems, Blockchain Technologies. The main goal of the workshop is to introduce the research undertaken at Lithuanian and foreign universities in the fields of data science and software engineering. Annual organization of the workshop allows the fast interchanging of new ideas among the research community. Even 11 companies supported the workshop this year. This means that the topics of the workshop are actual for business, too. Topics of the workshop cover big data, bioinformatics, data science, blockchain technologies, deep learning, digital technologies, high-performance computing, visualization methods for multidimensional data, machine learning, medical informatics, ontological engineering, optimization in data science, business rules, and software engineering. Seeking to facilitate relations between science and business, a special session and panel discussion is organized this year about topical business problems that may be solved together with the research community. This book gives an overview of all presentations of DAMSS-2018

Crossref

Vilnius University Proceedings

Archivio istituzionale della ricerca - Università di Ferrara

Digital Histopathology of Cancer

Author: Helin Henrik
Publication venue: Tampere University
Publication date: 11/03/2022
Field of study

Syöpä on merkittävä ja yleistyvä kansansairaus. Maailman terveysjärjestön mukaan syöpä on maailmanlaajuisesti toiseksi yleisin kuolinsyy sydän- ja verisuonitautien jälkeen. Jos ei-melanoottisia ihosyöpiä ei oteta huomioon, ovat tavallisimmat syöpätyypit naisilla rintasyöpä ja miehillä keuhkosyöpä ja eturauhassyöpä. Sitä mukaa kun syövän biologisten syntymekanismien ymmärrys on lisääntynyt, ovat myös hoitovaihtoehdot lisääntyneet. Useampi kuin joka neljäs uusi lääke, joka lanseerattiin vuosina 2010-2018, oli tarkoitettu syövän hoitoon. Jotta potilas voisi hyötyä tarjolla olevasta laajasta syöpälääkevalikoimasta ja minimoida lääkkeiden haittavaikutukset, tulee hoito kohdistaa hänen yksilölliseen syöpäänsä. Tätä varten syöpä on sekä diagnosoitava luotettavasti että luokiteltava yksityiskohtaisesti. Vaikka kajoamattomat kuvantamistutkimukset kuten magneettikuvaus ovat viime vuosina kehittyneet huomattavasti, on syöpädiagnostiikan perusta edelleen histopatologiassa eli leikkauksessa tai neulanäytteenotossa poistetun kudoksen mikroskooppisessa tutkimuksessa. Valomikroskooppi on pysynyt patologin pääasiallisena työvälineenä yli puolentoista vuosisadan ajan. Se on sallinut kudoksen tarkastelun aina solutasolle saakka ja jopa sitä pienempiin rakenteisiin. Tärkeitä lisätutkimuksia tavallisen valomikroskooppisen tutkimuksen lisäksi ovat proteiiniantigeenien osoittamistutkimukset, kuten immunohistokemia ja in situ - hybridisaatio, joita voidaan käyttää syöpäkudoksen luokittelemiseen. Syövän tarkalla diagnosoimisella ja luokittelulla on haasteensa. Yksi sellainen on Suomessa ja ulkomailla vallitseva pula patologeista. Toinen haaste liittyy kasvainten välisen vaihtelun arviointiin, joka on tärkeää kasvainten kasvutaipumuksen luokittelussa (esim. eturauhassyövän Gleason-luokitus) ja tiettyjen värjäysten tulkinnassa (esim. rintasyövän HER2-värjäytyminen). Todellisen biologisen vaihtelun lisäksi vaihtelua esiintyy patologien välisissä arvioissa (interobserver variation) sekä saman patologin luokitteluissa eri ajan hetkellä (intraobserver variation). Kolmas haaste on itse valomikroskooppi. Vaikka se on luotettava, halpa ja helppokäyttöinen diagnostiikkalaite, on sillä omat puutteensa modernin patologian laboratorion työnkulussa. Digitaalihistopatologia edustaa uutta tapaa toteuttaa patologin pääasiallinen työtehtävä syöpäpotilaan hoidossa: asettaa diagnoosi ja luokitella syöpä yksityiskohtaisesti. Siirtyminen valomikroskoopista tietokoneympäristöön tarjoaa monia etuja, joista muutamia on tutkittu tässä väitöskirjassa. Tämän tutkimuksen tarkoituksena oli kehittää ja testata digitaalipatologian sovelluksia syöpädiagnostiikan parantamiseksi. Osatöissä tutkittiin eturauhassyövän Gleason-luokituksen opettamista ja standardointia, rinta- ja eturauhassyövän immunohistokemiallisten värjäysten tulkintaa, digitaalinäytteille kehitettyä kuvanpakkausmenetelmää, sekä näyteskannerin optimaalisen kuvausresoluution määrittämistä. Väitöskirjassa osoitetaan, että digitaalinäytteitä voi käyttää eturauhaskoepalan Gleason-luokituksen tekemiseen ja että internet-pohjainen ohjelma voi edistää tulkitsijoiden välisen vaihtelun määrittämistä sekä Gleason-luokituksen opettamista ja standardisointia. Gleason-luokituksen ohella toinen tärkeä osa eturauhassyövän histopatologiaa on immunohistokemiallisten värjäysten tulkinta. Tässä väitöskirjassa esitetään menetelmä, jolla kahta digitaalinäytettä voidaan tutkia yhtäaikaisesti ja synkronoidusti. Menetelmää testattiin eturauhassyövän immunohistokemiallisella AMACR–p63-kaksoisvärjäyksellä yhdessä rutiininomaisen hematoksyliini–eosiini- värjäyksen kanssa. Tutkimuksessa osoitettiin, että tekniikkaa voidaan käyttää hyväksi histopatologian opetuksessa ja valikoiduissa tapauksissa kliinisessä diagnostiikassa. Keskeinen asia rintasyövän diagnostiikassa on HER2-statuksen tutkiminen, koska kasvaimia, joissa HER2 on yli-ilmentynyt, voidaan hoitaa anti-HER2- lääkkeillä. Yhdessä osatöistä tutkittiin digitaalisen kuva-analyysin käyttöä niin valomikroskooppikuvilla kuin digitaalinäytteillä tarkoituksena auttaa patologia määrittämään kirurgisesti poistetun kasvainkudoksen HER2-status. Työssä osoitettiin, että ilmaista ja kaikille avointa ohjelmistoa käyttämällä voitiin vähentää HER2-statuksen suhteen vaikeatulkintaisten tapausten määrää. Digitaalihistopatologian käyttöönotto rutiinidiagnostiikkaan on laajentumassa nopeasti. Yksi tekninen haaste on digitaalinäytteiden vaatiman suuren tallennuskapasiteetin hallinta. Tarve tallentaa suuria määriä tietoa edellyttää digitaalinäytteiden kuvanlaadun ja tiedostokoon yhteensovittamista. Yhdessä tämän väitöskirjan osatöistä tutkittiin skannerimikroskoopin optimaalisen kuvausresoluution määrittämistä. Menetelmää voidaan hyödyntää esimerkiksi vertailtaessa skannereita ennen hankintaa. Toisessa osatyössä esiteltiin uusi kuvanpakkausmenetelmä, joka suunniteltiin varta vasten histopatologisia digitaalinäytteitä varten niiden tiedostokoon minimoimiseksi ja siten tallennuskustannusten pienentämiseksi. Tämän väitöskirjan kaksi ensimmäistä osatyötä edustavat digitaalipatologian alkutaivalta ja tutkimuskenttä on kehittynyt sittemmin, mahdollisesti pieneltä osin edellä mainittujen tutkimusten löydösten myötä. Yhteenvetona osatyöt toivottavasti vievät digitaalipatologian alaa eteenpäin ja siten edesauttavat syöpäpotilaiden hoitoa.Cancer is a significant and growing public health concern. According to the World Health Organisation's estimates it is – after cardiovascular diseases – the second leading cause of death worldwide. Excluding non-melanoma skin cancers the most common types of cancer are for women breast cancer and for men lung cancer followed by prostate cancer. While the biological understanding of cancer has expanded, so too has the selection of available treatments. More than one fourth of all new medicines entering the market during 2010-2018 were for treating cancer. In order for a patient to benefit from the wide variety of cancer treatments, and avoid adverse effects, their unique cancer has to be matched with the appropriate treatment. For this the cancer needs to be both diagnosed accurately and classified in detail. Although non-invasive imaging methods, such as magnetic resonance imaging, have evolved substantially in recent years, the basis of cancer diagnosis is still in histopathology, that is, the pathological evaluation of tissue removed through surgery or needle biopsy. The light microscope has remained the pathologist's main diagnostic tool for a century and a half allowing for the examination of tissue down to cellular – and even subcellular – level. Important adjuncts to routine histopathological staining of tissue, needed for light microscopy, are techniques allowing for the visualization of protein antigens and nucleic acid in the tissue. These techniques, among which are immunohistochemistry and in situ hybridization, respectively, can be used for instance in the molecular characterization of cancer. There are challenges in meeting the need for accurate diagnosing and characterization of cancer. One such challenge is posed by the shortage of pathologists observed in Finland and elsewhere. Another challenge is the variability in the interpretation of the tumor growth pattern (grading, such as Gleason grading in prostate cancer) and in the interpretation of certain tissue staining patterns (such as the immunohistochemical staining of the HER2 molecule in breast cancer). This variability manifests itself both between pathologists (interobserver variation) and also in the same pathologist's work over time (intraobserver variation). A third challenge is presented by the fact that the light microscope – although a reliable, cheap, and easy-to-use diagnostic tool – has shortcomings in the modern day pathology service. Digital histopathology presents a new way of carrying out the central task of a pathologist in managing cancer patients, namely making the diagnosis and characterising the tumor in detail. Making the shift from a light microscope to a computer environment offers many benefits, some of which have been examined in this dissertation. The present study was carried out with the purpose of developing and testing applications of digital pathology in order to improve the histopathological diagnosis of cancer. The individual studies looked at advancing the teaching and standardization of Gleason grading or prostate cancer, aiding in the interpretation of immunohistochemical staining of prostate and breast cancer, as well as facilitating the implementation of digital pathology by way of a novel whole slide image optimised image compression algorithm and mapping the determinants of an optimal imaging resolution for a whole slide scanner. We demonstrated that whole slide images can be used to assess the Gleason grade of a prostate biopsy and that the use of an internet based platform can be beneficial in assessing interobserver variation in the grading and teaching and standardising the grading. Besides Gleason grading another important aspect of prostate histopathology is the interpretation of immunohistochemistry. We created a method of viewing two whole slide images simultaneously and synchronously and tested this method in visualising the AMACR-p63 double stain along with normal hematoxylin and eosin staining of prostate biopsies. We showed that this technique can be used for histopathology education as well as in clinical diagnostics in selected cases. A key issue in breast cancer diagnostics is defining the HER2 status of a tumor, that is, whether the tumor overexpresses the molecule and can then be treated with HER2 antibody based drugs. We studied the use of digital image analysis, using both photomicrographs and whole slide images, in aiding the pathologist in defining the HER2 status on a breast cancer surgical resection specimen. We showed that using a free and publicly available image analysis software can help to resolve cases otherwise deemed equivocal by conventional light microscopy. The introduction of digital histopathology into routine diagnostic work is underway. One technical challenge is managing the large amounts of image data generated by whole slide images. When there is a need to store large numbers of whole slide images it is essential to strike a balance between image fidelity and file size. To deal with this issue we studied the optimal imaging resolution of a whole slide scanner using a methodology that can be utilised for instance in comparing whole slide scanners before acquiring one. In addition we introduced a novel way of image compression suited for whole slide images in order to reduce the storage footprint, and cost, of whole slide images. The first two studies in this dissertation represent the very beginnings of whole slide imaging in pathology, and the field has advanced since then, perhaps in small part due to the findings in these studies. Taken together, the findings in this dissertation can hopefully advance the use of digital pathology in cancer diagnostics and thereby improve the care of cancer patients

Trepo - Institutional Repository of Tampere University