34 research outputs found
Associative Pattern Recognition for Biological Regulation Data
In the last decade, bioinformatics data has been accumulated at an unprecedented rate, thanks to the advancement in sequencing technologies. Such rapid development poses both challenges and promising research topics. In this dissertation, we propose a series of associative pattern recognition algorithms in biological regulation studies. In particular, we emphasize efficiently recognizing associative patterns between genes, transcription factors, histone modifications and functional labels using heterogeneous data sources (numeric, sequences, time series data and textual labels).
In protein-DNA associative pattern recognition, we introduce an efficient algorithm for affinity test by searching for over-represented DNA sequences using a hash function and modulo addition calculation. This substantially improves the efficiency of \textit{next generation sequencing} data analysis. In gene regulatory network inference, we propose a framework for refining weak networks based on transcription factor binding sites, thus improved the precision of predicted edges by up to 52%. In histone modification code analysis, we propose an approach to genome-wide combinatorial pattern recognition for histone code to function associative pattern recognition, and achieved improvement by up to . We also propose a novel shape based modification pattern analysis approach, using this to successfully predict sub-classes of genes in flowering-time category. We also propose a combination to combination associative pattern recognition, and achieved better performance compared against multi-label classification and bidirectional associative memory methods. Our proposed approaches recognize associative patterns from different types of data efficiently, and provides a useful toolbox for biological regulation analysis. This dissertation presents a road-map to associative patterns recognition at genome wide level
Fundamental Approaches to Software Engineering
This open access book constitutes the proceedings of the 25th International Conference on Fundamental Approaches to Software Engineering, FASE 2022, which was held during April 4-5, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 17 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. The proceedings also contain 3 contributions from the Test-Comp Competition. The papers deal with the foundations on which software engineering is built, including topics like software engineering as an engineering discipline, requirements engineering, software architectures, software quality, model-driven development, software processes, software evolution, AI-based software engineering, and the specification, design, and implementation of particular classes of systems, such as (self-)adaptive, collaborative, AI, embedded, distributed, mobile, pervasive, cyber-physical, or service-oriented applications
Paths and walks, forests and planes : arcadian algorithms and complexity
This dissertation is concerned with new results in the area of parameterized algorithms and complexity. We develop a new technique for hard graph problems that generalizes and unifies established methods such as Color-Coding, representative families, labelled walks and algebraic fingerprinting. At the heart of the approach lies an algebraic formulation of the problems, which is effected by means of a suitable exterior algebra. This allows us to estimate the number of simple paths of given length in directed graphs faster than before. Additionally, we give fast deterministic algorithms for finding paths of given length if the input graph contains only few of such paths. Moreover, we develop faster deterministic algorithms to find spanning trees with few leaves. We also consider the algebraic foundations of our new method. Additionally, we investigate the fine-grained complexity of determining the precise number of forests with a given number of edges in a given undirected graph. To wit, this happens in two ways. Firstly, we complete the complexity classification of the Tutte plane, assuming the exponential time hypothesis. Secondly, we prove that counting forests with a given number of edges is at least as hard as counting cliques of a given size.Diese Dissertation befasst sich mit neuen Ergebnissen auf dem Gebiet parametrisierter Algorithmen und KomplexitĂ€tstheorie. Wir entwickeln eine neue Technik fĂŒr schwere Graphprobleme, die etablierte Methoden wie Color-Coding, representative families, labelled walks oder algebraic fingerprinting verallgemeinert und vereinheitlicht. Kern der Herangehensweise ist eine algebraische Formulierung der Probleme, die vermittels passender GraĂmannalgebren geschieht. Das erlaubt uns, die Anzahl einfacher Pfade gegebener LĂ€nge in gerichteten Graphen schneller als bisher zu schĂ€tzen. AuĂerdem geben wir schnelle deterministische Verfahren an, Pfade gegebener LĂ€nge zu finden, falls der Eingabegraph nur wenige solche Pfade enthĂ€lt. Ăbrigens entwickeln wir schnellere deterministische Algorithmen, um SpannbĂ€ume mit wenigen BlĂ€ttern zu finden. Wir studieren auĂerdem die algebraischen Grundlagen unserer neuen Methode. Weiters untersuchen wir die fine-grained-KomplexitĂ€t davon, die genaue Anzahl von WĂ€ldern einer gegebenen Kantenzahl in einem gegebenen ungerichteten Graphen zu bestimmen. Und zwar erfolgt das auf zwei verschiedene Arten. Erstens vervollstĂ€ndigen wir die KomplexitĂ€tsklassifizierung der Tutte-Ebene unter Annahme der Expo- nentialzeithypothese. Zweitens beweisen wir, dass WĂ€lder mit gegebener Kantenzahl zu zĂ€hlen, wenigstens so schwer ist, wie Cliquen gegebener GröĂe zu zĂ€hlen.Cluster of Excellence (Multimodal Computing and Interaction
Fundamental Approaches to Software Engineering
This open access book constitutes the proceedings of the 25th International Conference on Fundamental Approaches to Software Engineering, FASE 2022, which was held during April 4-5, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 17 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. The proceedings also contain 3 contributions from the Test-Comp Competition. The papers deal with the foundations on which software engineering is built, including topics like software engineering as an engineering discipline, requirements engineering, software architectures, software quality, model-driven development, software processes, software evolution, AI-based software engineering, and the specification, design, and implementation of particular classes of systems, such as (self-)adaptive, collaborative, AI, embedded, distributed, mobile, pervasive, cyber-physical, or service-oriented applications
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
Recommended from our members
Fast, Scalable, and Accurate Algorithms for Time-Series Analysis
Time is a critical element for the understanding of natural processes (e.g., earthquakes and weather) or human-made artifacts (e.g., stock market and speech signals). The analysis of time series, the result of sequentially collecting observations of such processes and artifacts, is becoming increasingly prevalent across scientific and industrial applications. The extraction of non-trivial features (e.g., patterns, correlations, and trends) in time series is a critical step for devising effective time-series mining methods for real-world problems and the subject of active research for decades. In this dissertation, we address this fundamental problem by studying and presenting computational methods for efficient unsupervised learning of robust feature representations from time series. Our objective is to (i) simplify and unify the design of scalable and accurate time-series mining algorithms; and (ii) provide a set of readily available tools for effective time-series analysis. We focus on applications operating solely over time-series collections and on applications where the analysis of time series complements the analysis of other types of data, such as text and graphs.
For applications operating solely over time-series collections, we propose a generic computational framework, GRAIL, to learn low-dimensional representations that natively preserve the invariances offered by a given time-series comparison method. GRAIL represents a departure from classic approaches in the time-series literature where representation methods are agnostic to the similarity function used in subsequent learning processes. GRAIL relies on the attractive idea that once we construct the data-to-data similarity matrix most time-series mining tasks can be trivially solved. To overcome scalability issues associated with approaches relying on such matrices, GRAIL exploits time-series clustering to construct a small set of landmark time series and learns representations to reduce the data-to-data matrix to a data-to-landmark points matrix. To demonstrate the effectiveness of GRAIL, we first present domain-independent, highly accurate, and scalable time-series clustering methods to facilitate exploration and summarization of time-series collections. Then, we show that GRAIL representations, when combined with suitable methods, significantly outperform, in terms of efficiency and accuracy, state-of-the-art methods in major time-series mining tasks, such as querying, clustering, classification, sampling, and visualization. Overall, GRAIL rises as a new primitive for highly accurate, yet scalable, time-series analysis.
For applications where the analysis of time series complements the analysis of other types of data, such as text and graphs, we propose generic, simple, and lightweight methodologies to learn features from time-varying measurements. Such applications often organize operations over different types of data in a pipeline such that one operation provides input---in the form of feature vectors---to subsequent operations. To reason about the temporal patterns and trends in the underlying features, we need to (i) track the evolution of features over different time periods; and (ii) transform these time-varying features into actionable knowledge (e.g., forecasting an outcome). To address this challenging problem, we propose principled approaches to model time-varying features and study two large-scale, real-world, applications. Specifically, we first study the problem of predicting the impact of scientific concepts through temporal analysis of characteristics extracted from the metadata and full text of scientific articles. Then, we explore the promise of harnessing temporal patterns in behavioral signals extracted from web search engine logs for early detection of devastating diseases. In both applications, combinations of features with time-series relevant features yielded the greatest impact than any other indicator considered in our analysis. We believe that our simple methodology, along with the interesting domain-specific findings that our work revealed, will motivate new studies across different scientific and industrial settings
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum