    Preparation and characterization of magnetite (Fe3O4) nanoparticles By Sol-Gel method

    The magnetite (Fe3O4) nanoparticles were successfully synthesized and annealed under vacuum at different temperature. The Fe3O4 nanoparticles prepared via sol-gel assisted method and annealed at 200-400ºC were characterized by Fourier Transformation Infrared Spectroscopy (FTIR), X-ray Diffraction spectra (XRD), Field Emission Scanning Electron Microscope (FESEM) and Atomic Force Microscopy (AFM). The XRD result indicate the presence of Fe3O4 nanoparticles, and the Scherer`s Formula calculated the mean particles size in range of 2-25 nm. The FESEM result shows that the morphologies of the particles annealed at 400ºC are more spherical and partially agglomerated, while the EDS result indicates the presence of Fe3O4 by showing Fe-O group of elements. AFM analyzed the 3D and roughness of the sample; the Fe3O4 nanoparticles have a minimum diameter of 79.04 nm, which is in agreement with FESEM result. In many cases, the synthesis of Fe3O4 nanoparticles using FeCl3 and FeCl2 has not been achieved, according to some literatures, but this research was able to obtained Fe3O4 nanoparticles base on the characterization results

    Novel Dynamic Partial Reconfiguration Implementation of K-Means Clustering on FPGAs: Comparative Results with GPPs and GPUs

    K-means clustering has been widely used in processing large datasets in many fields of studies. Advancement in many data collection techniques has been generating enormous amounts of data, leaving scientists with the challenging task of processing them. Using General Purpose Processors (GPPs) to process large datasets may take a long time; therefore many acceleration methods have been proposed in the literature to speed up the processing of such large datasets. In this work, a parameterized implementation of the K-means clustering algorithm in Field Programmable Gate Array (FPGA) is presented and compared with previous FPGA implementation as well as recent implementations on Graphics Processing Units (GPUs) and GPPs. The proposed FPGA has higher performance in terms of speedup over previous GPP and GPU implementations (two orders and one order of magnitude, resp.). In addition, the FPGA implementation is more energy efficient than GPP and GPU (615x and 31x, resp.). Furthermore, three novel implementations of the K-means clustering based on dynamic partial reconfiguration (DPR) are presented offering high degree of flexibility to dynamically reconfigure the FPGA. The DPR implementations achieved speedups in reconfiguration time between 4x to 15x


    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Dynamically and partially reconfigurable hardware architectures for high performance microarray bioinformatics data analysis

    The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field that has emerged due to the computational demands of current state-of-the-art biotechnology. BCB deals with the storage, organization, retrieval, and analysis of biological datasets, which have grown in size and complexity in recent years especially after the completion of the human genome project. The advent of Microarray technology in the 1990s has resulted in the new concept of high throughput experiment, which is a biotechnology that measures the gene expression profiles of thousands of genes simultaneously. As such, Microarray requires high computational power to extract the biological relevance from its high dimensional data. Current general purpose processors (GPPs) has been unable to keep-up with the increasing computational demands of Microarrays and reached a limit in terms of clock speed. Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low power viable solution to overcome the computational limitations of GPPs and other methods. The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to accelerate some of the most widely used data mining methods used for the analysis of Microarray data in an effort to investigate the viability of the technology as an efficient, low power, and economic solution for the analysis of Microarray data. Three widely used methods have been selected for the FPGA implementations: one is the un-supervised Kmeans clustering algorithm, while the other two are supervised classification methods, namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These methods are thought to benefit from parallel implementation. This thesis presents detailed designs and implementations of these three BCB applications on FPGA captured in Verilog HDL, whose performance are compared with equivalent implementations running on GPPs. In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR) capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned data mining methods. Implementing K-means clustering on FPGA using non-DPR design flow has outperformed equivalent implementations in GPP and GPU in terms of speed-up by two orders and one order of magnitude, respectively; while being eight times more power efficient than GPP and four times more than a GPU implementation. As for the energy efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray data increases. Additionally, the DPR implementations of the K-means clustering have shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration for single-core and eight-core implementations, respectively. Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1 and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speedup. Furthermore, the FPGA implementation outperformed the equivalent GPP implementation when the dimensionality of data was increased. In addition, The DPR implementations of the K-NN classifier have achieved speed-ups in reconfiguration time between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the classifier or the complete classifier. Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA whereby the former outperformed an equivalent GPP implementation by ~61x and the latter by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of ~8x in reconfiguration time when reconfiguring the complete core or when exchanging it with a K-NN core forming a multi-classifier. The aforementioned implementations clearly show FPGAs to be an efficacious, efficient and economic solution for bioinformatics Microarrays data analysis

    Homology sequence analysis using GPU acceleration

    A number of problems in bioinformatics, systems biology and computational biology field require abstracting physical entities to mathematical or computational models. In such studies, the computational paradigms often involve algorithms that can be solved by the Central Processing Unit (CPU). Historically, those algorithms benefit from the advancements of computing power in the serial processing capabilities of individual CPU cores. However, the growth has slowed down over recent years, as scaling out CPU has been shown to be both cost-prohibitive and insecure. To overcome this problem, parallel computing approaches that employ the Graphics Processing Unit (GPU) have gained attention as complementing or replacing traditional CPU approaches. The premise of this research is to investigate the applicability of various parallel computing platforms to several problems in the detection and analysis of homology in biological sequence. I hypothesize that by exploiting the sheer amount of computation power and sequencing data, it is possible to deduce information from raw sequences without supplying the underlying prior knowledge to come up with an answer. I have developed such tools to perform analysis at scales that are traditionally unattainable with general-purpose CPU platforms. I have developed a method to accelerate sequence alignment on the GPU, and I used the method to investigate whether the Operational Taxonomic Unit (OTU) classification problem can be improved with such sheer amount of computational power. I have developed a method to accelerate pairwise k-mer comparison on the GPU, and I used the method to further develop PolyHomology, a framework to scaffold shared sequence motifs across large numbers of genomes to illuminate the structure of the regulatory network in yeasts. The results suggest that such approach to heterogeneous computing could help to answer questions in biology and is a viable path to new discoveries in the present and the future.Includes bibliographical reference


    The "data explosion'' since the era of the Internet has increased data size tremendously, from several hundred Megabytes to millions of Terabytes. Large amounts of data may not fit into memory, and a proper way of handling and processing the data is necessary. Besides, analyses of such large scale data requires complex and time consuming algorithms. On the other hand, humans play an important role in steering and driving the data analysis, while there are often times when people have a hard time getting an overview of the data or knowing which analysis to run. Sometimes they may not even know where to start. There is a huge gap between the data and understanding. An intuitive way to facilitate data analysis is to visualize it. Visualization is understandable and illustrative, while using it to support fast and rapid data exploration of large scale datasets has been a challenge for a long time. In this dissertation, we aim to facilitate efficient visual data exploration of large scale datasets from two perspectives: efficiency and interaction. The former indicates how users could understand the data efficiently, this depends on various factors, such as how fast data is processed and how data is presented, while the latter focuses more on the users: how they deal with the data and why they interact with the system in a particular way. In order to improve the efficiency of data exploration, we have looked into two steps in the visualization pipeline: rendering and processing (computations). We first address visualization rendering of large dataset through a thorough evaluation of web-based visualization performance. We evaluate and understand the page loading effects of Scalable Vector Graphics (SVG), a popular image format for interactive visualization on the web browsers. To understand the scalability of individual elements in SVG based visualization, we conduct performance tests on different types of charts, in different phases of rendering process. From the results, we have figured out optimization techniques and guidelines to achieve better performance when rendering SVG visualization. Secondly, we present a pure browser based distributed computing framework (VisHive) that exploits computational power from co-located idle devices for visualization. The VisHive framework speeds up web-based visualization, which is originally designed for single computer and cannot make use of additional computational resources on the client side. It takes advantage of multiple devices that today's users often have access to. VisHive constructs visualization applications that can transparently connect multiple devices into an ad-hoc cluster for local computation. It requires no specific software to be downloaded for setup. To achieve a more interactive data analysis process, we first propose a proactive visual analytics system (DataSite) that enable users to analyze the data smoothly with a list of pre-defined algorithms. DataSite provides results through selecting and executing computations using automatic server-side computation. It utilizes computational resources exhaustively during data analysis to reduce the burden of human thinking. Analyzing results identified by these background processes are surfaced as status updates in a feed on the front-end, akin to posts in a social media feed. DataSite effectively turns data analysis into a conversation between the user and the computer, thereby reducing the cognitive load and domain knowledge requirements on users. Next we apply the concept of proactive data analysis to genomic data, and explore how to improve data analysis through adaptive computations in bioinformatics domain. We build Epiviz Feed, a web application that supports proactive visual and statistical analysis of genomic data. It addresses common and popular biological questions that may be asked by the analyst, and shortens the time of processing and analyzing the data with automatic computations. We further present a computational steering mechanism for visual analytics that prioritizes computations performed on the dataset leveraging the analyst's navigational behavior in the data. The web-based system, called Sherpa, provides computational modules for genomic data analysis, where independent algorithms calculate test statistics relevant to biological inferences about gene regulation in various tumor types and their corresponding normal tissues

    Using machine learning to support better and intelligent visualisation for genomic data

    Massive amounts of genomic data are created for the advent of Next Generation Sequencing technologies. Great technological advances in methods of characterising the human diseases, including genetic and environmental factors, make it a great opportunity to understand the diseases and to find new diagnoses and treatments. Translating medical data becomes more and more rich and challenging. Visualisation can greatly aid the processing and integration of complex data. Genomic data visual analytics is rapidly evolving alongside with advances in high-throughput technologies such as Artificial Intelligence (AI), and Virtual Reality (VR). Personalised medicine requires new genomic visualisation tools, which can efficiently extract knowledge from the genomic data effectively and speed up expert decisions about the best treatment of an individual patient’s needs. However, meaningful visual analysis of such large genomic data remains a serious challenge. Visualising these complex genomic data requires not only simply plotting of data but should also lead to better decisions. Machine learning has the ability to make prediction and aid in decision-making. Machine learning and visualisation are both effective ways to deal with big data, but they focus on different purposes. Machine learning applies statistical learning techniques to automatically identify patterns in data to make highly accurate prediction, while visualisation can leverage the human perceptual system to interpret and uncover hidden patterns in big data. Clinicians, experts and researchers intend to use both visualisation and machine learning to analyse their complex genomic data, but it is a serious challenge for them to understand and trust machine learning models in the serious medical industry. The main goal of this thesis is to study the feasibility of intelligent and interactive visualisation which combined with machine learning algorithms for medical data analysis. A prototype has also been developed to illustrate the concept that visualising genomics data from childhood cancers in meaningful and dynamic ways could lead to better decisions. Machine learning algorithms are used and illustrated during visualising the cancer genomic data in order to provide highly accurate predictions. This research could open a new and exciting path to discovery for disease diagnostics and therapies

    Melanoomi antigeene kandvate viiruslaadsete partiklite proteoomianalüüs

    Suurem osa rakutüüpidest vabastab ekstratsellulaarseid vesiikuleid ‒ membraaniga ümbritsetud valkude, nukleiinhapete ja teiste biomolekulide kandjaid, mis osalevad nende molekulide transpordis rakkude ja kudede vahel, olles kaasatud rakkudevahelisse kommunikatsiooni, koagulatsiooni, tuumorigeneesi ja immuunraktsiooni protsessi. Terminit ekstratsellulaarne vesiikul kasutatakse mis tahes sekreteeritud vesiikulite ‒ nii eksosoomide, mikrovesiikulite kui apoptootiliste kehade kohta. Mikrovesiikulid, mida mõnikord nimetatakse ka mikroosakesteks, on plasmamembraanist saadud 100‒1000 nm läbimõõduga osakesed, mis tekivad plasmamembraani väljapoole pungumisel. Sarnaselt ekstratsellulaarsete mikrovesiikulite pungumisega punguvad ka retroviirused, kus viiruse gag valgumolekul indutseerib spontaanset mikrovesiikulite või viiruslaadsete osakeste teket imetajate rakkudes. Viiruslaadseid partikleid saab kasutada, esitamaks immuunsüsteemile võõrepitoope. Gag põhise viiruse laadseid partikleid kasutatakse vaktsiinidena, sest need stimuleerivad humoraalset ja rakulist immuunvastust. (Kurg jt, 2016.) Melanoomiga seotud antigeenid (MAGE), mis indutseerivad tsütotoksiliste T-lümfotsüütide teket, kuuluvad vähi-testise antigeenide perekonda ja avastati melanoomirakkude uurimisel. Valke MAGEA10, MAGEA4, MART1, TRP1 ja MCAM ekspresseeritakse paljudes vähitüvedes ja seetõttu peetakse neid potentsiaalseks sihtmärgiks vähktõvevaktsiinide ja immunoteraapia loomisel. Uuringus „Biochemical and proteomic characterization of retrovirus Gag based microparticles carrying melanoma antigens” (Kurg jt, 2016) genereeriti viiruslaadseid partikleid MLV gag valku kasutades. Melanoomi antigeenid MAGEA4, MAGEA10, MART1, TRP1 ja MCAM koekspresseeriti koos MLV gag valguga hiire fibroblastides ja mõõdeti valkude suhtelist arvukust saadud rakuväliste partiklite sees. Käesoleva bakalaurusetöö eesmärk on leida eelnevalt mainitud valkude seosed teiste valkude suhtelise arvukusega mikrovesiikulites, kasutades LFQ intensiivsuse analüüsi pakettidega R-studio ja SPSS. Analüüsimiseks on kasutatud andmeid, mis on saadud uurimusest „Biochemical and proteomic characterization of retrovirus Gag based microparticles carrying melanoma antigens” (Kurg jt, 2016). Töö on koostatud Tartu Ülikooli Tehnoloogiainstituudis