267 research outputs found

    Achieving Functional Correctness in Large Interconnect Systems.

    Full text link
    In today's semi-conductor industry, large chip-multiprocessors and systems-on-chip are being developed, integrating a large number of components on a single chip. The sheer size of these designs and the intricacy of the communication patterns they exhibit have propelled the development of network-on-chip (NoC) interconnects as the basis for the communication infrastructure in these systems. Faced with the interconnect's growing size and complexity, several challenges hinder its effective validation. During the interconnect's development, the functional verification process relies heavily on the use of emulation and post-silicon validation platforms. However, detecting and debugging errors on these platforms is a difficult endeavour due to the limited observability, and in turn the low verification capabilities, they provide. Additionally, with the inherent incompleteness of design-time validation efforts, the potential of design bugs escaping into the interconnect of a released product is also a concern, as these bugs can threaten the viability of the entire system. This dissertation provides solutions to enable the development of functionally correct interconnect designs. We first address the challenges encountered during design-time verification efforts, by providing two complementary mechanisms that allow emulation and post-silicon verification frameworks to capture a detailed overview of the functional behaviour of the interconnect. Our first solution re-purposes the contents of in-flight traffic to log debug data from the interconnect's execution. This approach enables the validation of the interconnect using synthetic traffic workloads, while attaining over 80% observability of the routes followed by packets and capturing valuable debugging information. We also develop an alternative mechanism that boosts observability by taking periodic snapshots of execution, thus extending the verification capabilities to run both synthetic traffic and real-application workloads. The collected snapshots enhance detection and debugging support, and they provide observability of over 50% of packets and reconstructs at least half of each of their routes. Moreover, we also develop error detection and recovery solutions to address the threat of design bugs escaping into the interconnect's runtime operation. Our runtime techniques can overcome communication errors without needing to store replicate copies of all in-flight packets, thereby achieving correctness at minimal area costsPhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116741/1/rawanak_1.pd

    Platform-based design, test and fast verification flow for mixed-signal systems on chip

    Get PDF
    This research is providing methodologies to enhance the design phase from architectural space exploration and system study to verification of the whole mixed-signal system. At the beginning of the work, some innovative digital IPs have been designed to develop efficient signal conditioning for sensor systems on-chip that has been included in commercial products. After this phase, the main focus has been addressed to the creation of a re-usable and versatile test of the device after the tape-out which is close to become one of the major cost factor for ICs companies, strongly linking it to model’s test-benches to avoid re-design phases and multi-environment scenarios, producing a very effective approach to a single, fast and reliable multi-level verification environment. All these works generated different publications in scientific literature. The compound scenario concerning the development of sensor systems is presented in Chapter 1, together with an overview of the related market with a particular focus on the latest MEMS and MOEMS technology devices, and their applications in various segments. Chapter 2 introduces the state of the art for sensor interfaces: the generic sensor interface concept (based on sharing the same electronics among similar applications achieving cost saving at the expense of area and performance loss) versus the Platform Based Design methodology, which overcomes the drawbacks of the classic solution by keeping the generality at the highest design layers and customizing the platform for a target sensor achieving optimized performances. An evolution of Platform Based Design achieved by implementation into silicon of the ISIF (Intelligent Sensor InterFace) platform is therefore presented. ISIF is a highly configurable mixed-signal chip which allows designers to perform an effective design space exploration and to evaluate directly on silicon the system performances avoiding the critical and time consuming analysis required by standard platform based approach. In chapter 3 we describe the design of a smart sensor interface for conditioning next generation MOEMS. The adoption of a new, high performance and high integrated technology allow us to integrate not only a versatile platform but also a powerful ARM processor and various IPs providing the possibility to use the platform not only as a conditioning platform but also as a processing unit for the application. In this chapter a description of the various blocks is given, with a particular emphasis on the IP developed in order to grant the highest grade of flexibility with the minimum area occupation. The architectural space evaluation and the application prototyping with ISIF has enabled an effective, rapid and low risk development of a new high performance platform achieving a flexible sensor system for MEMS and MOEMS monitoring and conditioning. The platform has been design to cover very challenging test-benches, like a laser-based projector device. In this way the platform will not only be able to effectively handle the sensor but also all the system that can be built around it, reducing the needed for further electronics and resulting in an efficient test bench for the algorithm developed to drive the system. The high costs in ASIC development are mainly related to re-design phases because of missing complete top-level tests. Analog and digital parts design flows are separately verified. Starting from these considerations, in the last chapter a complete test environment for complex mixed-signal chips is presented. A semi-automatic VHDL-AMS flow to provide totally matching top-level is described and then, an evolution for fast self-checking test development for both model and real chip verification is proposed. By the introduction of a Python interface, the designer can easily perform interactive tests to cover all the features verification (e.g. calibration and trimming) into the design phase and check them all with the same environment on the real chip after the tape-out. This strategy has been tested on a consumer 3D-gyro for consumer application, in collaboration with SensorDynamics AG

    Dataflow actor network partitioning for multiple FPGAS

    Get PDF
    Dataflow actor network is used to display the relation between different actors in a directed graph. It is suitable for modelling signal and video processing in software applications. In this paper, the use of dataflow actor network is extended to the hardware implementation of streaming applications via dataflow actor network partitioning for multiple FPGAs based on the number of cuts, connection workload, resource utilization ratio and latency. Multiple FPGAs partitioning is often required for implementing design with large logic count, for cost reduction, multi clock and multi power domains design implementation. The motivation of using the dataflow actor network is due to the nature of the network which closely resembles the structural view and the inter-connections of a design at the architecture level. This representation in the form of a dataflow actor network is suitable for implementing graph partitioning algorithms. The KL algorithm, GA, PSO, SA and WOA are used for single objective partitioning while the MOPSO, MOSA and MOWOA have been used for multi objective partitioning. The objective of this study is to develop partitioning algorithm suitable for use in dataflow actor network and to determine the appropriate partitioning criteria. Results showed that SA has better performance as compared to other partitioning algorithm for single objective partitioning. On the other hand, for multi objective partitioning the MOPSO has better performance for small design while MOSA has better performance for larger design

    Behaviour analysis in binary SoC data

    Get PDF

    Real-Time Trace Decoding and Monitoring for Safety and Security in Embedded Systems

    Get PDF
    Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estĂŁo presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequĂȘncia de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potĂȘncia de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora Ă© a necessidade de atualizar tĂ©cnicas de tolerĂąncia Ă  falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle Ă© um dos principais mecanismos para a detecção de erros em nĂ­vel de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrĂ”es de segurança como o ISO-26262. Estes erros sĂŁo conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora mĂ©todos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e ĂĄrea, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo mĂ©todo de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este mĂ©todo usa subsistemas de trace e execução de cĂłdigo para reconstruir o estado atual do processador, identificando erros atravĂ©s de com paraçÔes entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar mĂșltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e tambĂ©m de sistemas multi-core multitarefa. Por fim, nossa tĂ©cnica Ă© totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejĂĄveis Ă  aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalĂĄvel e extensĂ­vel, para proteção de sistemas embarcados crĂ­ticos e multi-core

    Optimierung der Energie und Power getriebenen Architekturexploration fĂŒr Multicore und heterogenes System on Chip

    Get PDF
    The contribution of this work builds on top of the established virtual prototype platforms to improve both SoC design quality and productivity. Initially, an automatic system-level power estimation framework was developed to address the critical issue of early power estimation in SoC design. The estimation framework models the static and dynamic power consumption of the hardware components. These models are created from the normalized values of the basic design components of SoC, obtained through one-time power simulation of RTL hardware models. The framework allows dynamic technology node reconfiguration for power estimation models. Its instantaneous power reporting aids the detection of possible hotspot early into the design process. Adding this additional data in conjunction with a steadily growing design space of complex heterogeneous SoC, finding the right parameter configuration is a challenging and laborious task for a system-level designer. This work addresses this bottleneck by optimizing the design space exploration (DSE) process for MPSoC design. An automatic DSE framework for virtual platforms (VPs) was developed which is flexible and allows the selection optimal parameter configuration without pre-existing knowledge. To reduce exploration time, the framework is equipped with several multi-objective optimization techniques based on simulated annealing and a genetic algorithm. Lastly, to aid HW/SW partitioning at system-level, a flexible and automated workflow (SW2TLM) is presented. It allows the designer to explore various possible partitioning scenarios without going into depth of the hardware architecture complexity and software integration. The framework generates system-level hardware accelerators from corresponding functionality encoded in the software code and integrates them into the VP. Power consumption and time speedups of acceleration is reported to the designer, which further increases the quality and productivity of the development process towards the final architecture. The presented tools are evaluated using a state-of-the-art VP for a range of single and multi-core applications. Viewing the energy delay product, a reduction in exploration time was recorded at approximately 62% (worst case), maintaining optimal parameter accuracy of 90% compared to previous techniques. While the SW2TLM further increases the exploration versatility by combining modern high-level synthesis with system-level architectural exploration.Der Beitrag dieser Arbeit baut auf dem etablierten Konzept der virtuellen Prototyp (VP) Plattformen auf, um die QualitĂ€t und die ProduktivitĂ€t des Entwurfsprozesses zu verbessern. ZunĂ€chst wurde ein automatisches System-Level-Framework entwickelt, um VerlustleistungsabschĂ€tzung fĂŒr SoC-Designs in einer deutlich frĂŒheren Entwicklungsphase zu ermöglichen. HierfĂŒr werden statischen und dynamischen Energieverbrauchsanteile individueller Hardwareelemente durch ein abstraktes Modell ausgedrĂŒckt. Das Framework ermöglicht eine dynamische Anpassung des Technologieknotens sowie die Integration neuer Leistungsmodelle fĂŒr Drittanbieterkomponenten. Die kontinuierliche Erfassung der Energieverbrauchseigenschaften und ihre grafische Darstellung BenutzeroberflĂ€che unterstĂŒtzt zusĂ€tzlich die frĂŒhzeitige Identifikation möglicher Hotspots. Durch die Bereitstellung zusĂ€tzlicher Daten, in Verbindung mit einem stetig wachsenden Entwurfsraum komplexer SoCs, ist die Identifikation der richtigen Parameterkonfiguration eine zeitintensive Aufgabe. Die vorgelegten Konzepte erlauben eine gesteigerte Automatisierung des Explorationsprozesses. Techniken der mehrdimensionalen Optimierung, basierend auf Simulated Annealing und genetischer Algorithmen erlauben die Identifikation von geeigneten Konfigurationen ohne vorheriges Wissen oder Erfahrungswerte Schließlich wurde zur UnterstĂŒtzung der HW/SW -Partitionierung auf System-Ebene ein flexibler und automatisierter Workflow entwickelt. Er ermöglicht es dem Designer verschiedene mögliche Partitionierungsszenarien zu untersuchen, ohne sich in die KomplexitĂ€t der Hardwarearchitektur und der Softwareintegration zu vertiefen. Das Framework erzeugt abstrakte Beschleunigermodelle aus entsprechenden Softwarefunktionen und integriert sie nahtlos in den ausfĂŒhrbare VP. Detaillierte Daten zum Energieverbrauch, Beschleunigungsfaktor und Kommunikationsoverhead der Partitionierung werden erfasst und dem Designer zur VerfĂŒgung gestellt, was die QualitĂ€t und ProduktivitĂ€t des weiter erhöht. Die vorgestellten Tools werden mit einer modernen VP fĂŒr verschiedene SW-Anwendungen evaluiert. Bei Betrachtung des Energieverzögerungsprodukts wurde eine Verringerung der Explorationszeit um mehr als 62% bei 90% Parametergenauigkeit festgestell. Darauf aufbauend, erleichtert die automatisierte Untersuchung verschiedener HW/SW Partitionierungen die Entwicklung heterogener Architekturen durch die Kombination moderner HLS mit Architektur-Exploration auf der Systemebene

    Real-time trace decoding and monitoring for safety and security in embedded systems

    Get PDF
    Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estĂŁo presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequĂȘncia de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potĂȘncia de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora Ă© a necessidade de atualizar tĂ©cnicas de tolerĂąncia Ă  falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle Ă© um dos principais mecanismos para a detecção de erros em nĂ­vel de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrĂ”es de segurança como o ISO-26262. Estes erros sĂŁo conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora mĂ©todos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e ĂĄrea, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo mĂ©todo de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este mĂ©todo usa subsistemas de trace e execução de cĂłdigo para reconstruir o estado atual do processador, identificando erros atravĂ©s de com paraçÔes entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar mĂșltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e tambĂ©m de sistemas multi-core multitarefa. Por fim, nossa tĂ©cnica Ă© totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejĂĄveis Ă  aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalĂĄvel e extensĂ­vel, para proteção de sistemas embarcados crĂ­ticos e multi-core

    Analysis and Mitigation of Remote Side-Channel and Fault Attacks on the Electrical Level

    Get PDF
    In der fortlaufenden Miniaturisierung von integrierten Schaltungen werden physikalische Grenzen erreicht, wobei beispielsweise Einzelatomtransistoren eine mögliche untere Grenze fĂŒr StrukturgrĂ¶ĂŸen darstellen. Zudem ist die Herstellung der neuesten Generationen von Mikrochips heutzutage finanziell nur noch von großen, multinationalen Unternehmen zu stemmen. Aufgrund dieser Entwicklung ist Miniaturisierung nicht lĂ€nger die treibende Kraft um die Leistung von elektronischen Komponenten weiter zu erhöhen. Stattdessen werden klassische Computerarchitekturen mit generischen Prozessoren weiterentwickelt zu heterogenen Systemen mit hoher ParallelitĂ€t und speziellen Beschleunigern. Allerdings wird in diesen heterogenen Systemen auch der Schutz von privaten Daten gegen Angreifer zunehmend schwieriger. Neue Arten von Hardware-Komponenten, neue Arten von Anwendungen und eine allgemein erhöhte KomplexitĂ€t sind einige der Faktoren, die die Sicherheit in solchen Systemen zur Herausforderung machen. Kryptografische Algorithmen sind oftmals nur unter bestimmten Annahmen ĂŒber den Angreifer wirklich sicher. Es wird zum Beispiel oft angenommen, dass der Angreifer nur auf Eingaben und Ausgaben eines Moduls zugreifen kann, wĂ€hrend interne Signale und Zwischenwerte verborgen sind. In echten Implementierungen zeigen jedoch Angriffe ĂŒber SeitenkanĂ€le und Faults die Grenzen dieses sogenannten Black-Box-Modells auf. WĂ€hrend bei Seitenkanalangriffen der Angreifer datenabhĂ€ngige MessgrĂ¶ĂŸen wie Stromverbrauch oder elektromagnetische Strahlung ausnutzt, wird bei Fault Angriffen aktiv in die Berechnungen eingegriffen, und die falschen Ausgabewerte zum Finden der geheimen Daten verwendet. Diese Art von Angriffen auf Implementierungen wurde ursprĂŒnglich nur im Kontext eines lokalen Angreifers mit Zugriff auf das ZielgerĂ€t behandelt. Jedoch haben bereits Angriffe, die auf der Messung der Zeit fĂŒr bestimmte Speicherzugriffe basieren, gezeigt, dass die Bedrohung auch durch Angreifer mit Fernzugriff besteht. In dieser Arbeit wird die Bedrohung durch Seitenkanal- und Fault-Angriffe ĂŒber Fernzugriff behandelt, welche eng mit der Entwicklung zu mehr heterogenen Systemen verknĂŒpft sind. Ein Beispiel fĂŒr neuartige Hardware im heterogenen Rechnen sind Field-Programmable Gate Arrays (FPGAs), mit welchen sich fast beliebige Schaltungen in programmierbarer Logik realisieren lassen. Diese Logik-Chips werden bereits jetzt als Beschleuniger sowohl in der Cloud als auch in EndgerĂ€ten eingesetzt. Allerdings wurde gezeigt, wie die FlexibilitĂ€t dieser Beschleuniger zur Implementierung von Sensoren zur AbschĂ€tzung der Versorgungsspannung ausgenutzt werden kann. Zudem können durch eine spezielle Art der Aktivierung von großen Mengen an Logik Berechnungen in anderen Schaltungen fĂŒr Fault Angriffe gestört werden. Diese Bedrohung wird hier beispielsweise durch die Erweiterung bestehender Angriffe weiter analysiert und es werden Strategien zur Absicherung dagegen entwickelt

    Semantic Biclustering

    Get PDF
    Tato disertačnĂ­ prĂĄce se zaměƙuje na problĂ©m hledĂĄnĂ­ interpretovatelnĂœch a prediktivnĂ­ch vzorĆŻ, kterĂ© jsou vyjĂĄdƙeny formou dvojshlukĆŻ, se specializacĂ­ na biologickĂĄ data. PrezentovanĂ© metody jsou souhrnně označovĂĄny jako sĂ©mantickĂ© dvojshlukovĂĄnĂ­, jednĂĄ se o podobor dolovĂĄnĂ­ dat. TermĂ­n sĂ©mantickĂ© dvojshlukovĂĄnĂ­ je pouĆŸit z toho dĆŻvodu, ĆŸe zohledƈuje proces hledĂĄnĂ­ koherentnĂ­ch podmnoĆŸin ƙádkĆŻ a sloupcĆŻ, tedy dvojshlukĆŻ, v 2-dimensionĂĄlnĂ­ binĂĄrnĂ­ matici a zĂĄrove ƈ bere takĂ© v potaz sĂ©mantickĂœ vĂœznam prvkĆŻ v těchto dvojshlucĂ­ch. Ačkoliv byla prĂĄce motivovĂĄna biologicky orientovanĂœmi daty, vyvinutĂ© algoritmy jsou obecně aplikovatelnĂ© v jakĂ©mkoli jinĂ©m vĂœzkumnĂ©m oboru. Je nutnĂ© pouze dodrĆŸet poĆŸadavek na formĂĄt vstupnĂ­ch dat. DisertačnĂ­ prĂĄce pƙedstavuje dva originĂĄlnĂ­ a v tomto ohledu i zĂĄkladnĂ­ pƙístupy pro hledĂĄnĂ­ sĂ©mantickĂœch dvojshlukĆŻ, jako je Bicluster enrichment analysis a Rule a tree learning. JelikoĆŸ tyto metody nevyuĆŸĂ­vajĂ­ vlastnĂ­ hierarchickĂ© uspoƙádĂĄnĂ­ termĆŻ v danĂœch ontologiĂ­ch, obecně je běh těchto algoritmĆŻ dlouhĂœ čin mĆŻĆŸe dochĂĄzet k indukci hypotĂ©z s redundantnĂ­mi termy. Z toho dĆŻvodu byl vytvoƙen novĂœ operĂĄtor zjemněnĂ­. Tento operĂĄtor byl včleněn do dobƙe znĂĄmĂ©ho algoritmu CN2, kde zavĂĄdĂ­ dvě redukčnĂ­ procedury: Redundant Generalization a Redundant Non-potential. Obě procedury pomĂĄhajĂ­ dramaticky proƙezat prohledĂĄvanĂœ prostor pravidel a tĂ­m umoĆŸĆˆujĂ­ urychlit proces indukce pravidel v porovnĂĄnĂ­ s tradičnĂ­m operĂĄtorem zjemněnĂ­ tak, jak je pĆŻvodně prezentovĂĄn v CN2. CelĂœ algoritmus spolu s redukčnĂ­mi metodami je publikovĂĄn ve formě R balííčku, kterĂœ jsme nazvali sem1R. Abychom ukĂĄzali i moĆŸnost praktickĂ©ho uĆŸitĂ­ metody sĂ©mantickĂ©ho dvojshlukovĂĄnĂ­ na reĂĄlnĂœch biologickĂœch problĂ©mech, v disertačnĂ­ prĂĄci dĂĄle popisujeme a specificky upravujeme algoritmus sem1R pro dv+ Ășlohy. ZaprvĂ©, studujeme praktickou aplikaci algoritmu sem1R v analĂœze E-3 ubikvitin ligĂĄzy v trĂĄvicĂ­ soustavě s ohledem na potenciĂĄl regenerace tkĂĄně. ZadruhĂ©, kromě objevovĂĄnĂ­ dvojshlukĆŻ v dat ech genovĂ© exprese, adaptujeme algoritmus sem1R pro hledĂĄnĂ­ potenciĂĄlne patogennĂ­ch genetickĂœch variant v kohortě pacientĆŻ.This thesis focuses on the problem of finding interpretable and predic tive patterns, which are expressed in the form of biclusters, with an orientation to biological data. The presented methods are collectively called semantic biclustering, as a subfield of data mining. The term semantic biclustering is used here because it reflects both a process of finding coherent subsets of rows and columns in a 2-dimensional binary matrix and simultaneously takes into account a mutual semantic meaning of elements in such biclusters. In spite of focusing on applications of algorithms in biological data, the developed algorithms are generally applicable to any other research field, there are only limitations on the format of the input data. The thesis introduces two novel, and in that context basic, approaches for finding semantic biclusters, as Bicluster enrichment analysis and Rule and tree learning. Since these methods do not exploit the native hierarchical order of terms of input ontologies, the run-time of algorithms is relatively long in general or an induced hypothesis might have terms that are redundant. For this reason, a new refinement operator has been invented. The refinement operator was incorporated into the well-known CN2 algorithm and uses two reduction procedures: Redundant Generalization and Redundant Non-potential, both of which help to dramatically prune the rule space and consequently, speed-up the entire process of rule induction in comparison with the traditional refinement operator as is presented in CN2. The reduction procedures were published as an R package that we called sem1R. To show a possible practical usage of semantic biclustering in real biological problems, the thesis also describes and specifically adapts the algorithm for two real biological problems. Firstly, we studied a practical application of sem1R algorithm in an analysis of E-3 ubiquitin ligase in the gastrointestinal tract with respect to tissue regeneration potential. Secondly, besides discovering biclusters in gene expression data, we adapted the sem1R algorithm for a different task, concretely for finding potentially pathogenic genetic variants in a cohort of patients
    • 

    corecore