9 research outputs found

    Hypercomplex cross-correlation of DNA sequences

    Full text link
    A hypercomplex representation of DNA is proposed to facilitate comparing DNA sequences with fuzzy composition. With the hypercomplex number representation, the conventional sequence analysis method, such as, dot matrix analysis, dynamic programming, and cross-correlation method have been extended and improved to align DNA sequences with fuzzy composition. The hypercomplex dot matrix analysis can provide more control over the degree of alignment desired. A new scoring system has been proposed to accommodate the hypercomplex number representation of DNA and integrated with dynamic programming alignment method. By using hypercomplex cross-correlation, the match and mismatch alignment information between two aligned DNA sequences are separately stored in the resultant real part and imaginary parts respectively. The mismatch alignment information is very useful to refine consensus sequence based motif scanning

    Human Promoter Prediction Using DNA Numerical Representation

    Get PDF
    With the emergence of genomic signal processing, numerical representation techniques for DNA alphabet set {A, G, C, T} play a key role in applying digital signal processing and machine learning techniques for processing and analysis of DNA sequences. The choice of the numerical representation of a DNA sequence affects how well the biological properties can be reflected in the numerical domain for the detection and identification of the characteristics of special regions of interest within the DNA sequence. This dissertation presents a comprehensive study of various DNA numerical and graphical representation methods and their applications in processing and analyzing long DNA sequences. Discussions on the relative merits and demerits of the various methods, experimental results and possible future developments have also been included. Another area of the research focus is on promoter prediction in human (Homo Sapiens) DNA sequences with neural network based multi classifier system using DNA numerical representation methods. In spite of the recent development of several computational methods for human promoter prediction, there is a need for performance improvement. In particular, the high false positive rate of the feature-based approaches decreases the prediction reliability and leads to erroneous results in gene annotation.To improve the prediction accuracy and reliability, DigiPromPred a numerical representation based promoter prediction system is proposed to characterize DNA alphabets in different regions of a DNA sequence.The DigiPromPred system is found to be able to predict promoters with a sensitivity of 90.8% while reducing false prediction rate for non-promoter sequences with a specificity of 90.4%. The comparative study with state-of-the-art promoter prediction systems for human chromosome 22 shows that our proposed system maintains a good balance between prediction accuracy and reliability. To reduce the system architecture and computational complexity compared to the existing system, a simple feed forward neural network classifier known as SDigiPromPred is proposed. The SDigiPromPred system is found to be able to predict promoters with a sensitivity of 87%, 87%, 99% while reducing false prediction rate for non-promoter sequences with a specificity of 92%, 94%, 99% for Human, Drosophila, and Arabidopsis sequences respectively with reconfigurable capability compared to existing system

    Synchronization in the quaternionic Kuramoto model

    Full text link
    In this paper, we propose an NN oscillators Kuramoto model with quaternions H\mathbb{H}. In case the coupling strength is strong, a sufficient condition of synchronization is established for general N2N\geqslant 2. On the other hand, we analyze the case when the coupling strength is weak. For N=2N=2, when coupling strength is weak (below the critical coupling strength λc\lambda_c), we show that new periodic orbits emerge near each equilibrium point, and hence phase-locking state exists. This phenomenon is different from the real Kuramoto system since it is impossible to arrive at any synchronization when λ<λc\lambda<\lambda_c. A theorem is proved which states that the closed contours form a set of "Baumkuchen" that is dense near each equilibrium point. In other words, the trajectory of phase difference lies on a 4D4D-torus surface. Therefore, this implies that the phase-locking state is Lyapunov stable but not asymptotically stable. The proof uses a new infinite buffer method ("δ/n\delta/n criterion") and a Lyapunov function argument. This has been studied both analytically and numerically. For N=3N=3, we consider Lion Dance flow, the analog of Cherry flow, to demonstrate that the quaternionic synchronization exists even when the coupling strength is "super weak" (when λ/ω3\lambda/\omega 3, the stable manifold of Lion Dance flow exists, and the number of these equilibria is N12\lfloor \frac{N-1}{2}\rfloor. Therefore, we conjecture that quaternionic synchronization always exists.Comment: 35 pages, 6 figure

    Μελέτη αναπαραστάσεων βιολογικών ακολουθιών σε προβλήματα ταξινόμησης

    Get PDF
    Η ταχεία πρόοδος των μεθόδων αλληλούχισης του γονιδιώματος και ο τερά- στιος όγκος εύκολα προσβάσιμων κλινικών δεδομένων επέτρεψαν στους ερευνητές να επιλύσουν θεμελιώδη βιολογικά προβλήματα αξιοποιώντας την εργαλειοθήκη της Μηχανικής Μάθησης. Η Υπολογιστική Βιολογία αξιοποιεί τις μεθόδους επε- ξεργασίας σήματος για την ανάλυση, ερμηνεία και εξαγωγή συμπερασμάτων από γονιδιωματικά δεδομένα (επεξεργασία γονιδιωματικών σημάτων). Αυτές οι μέθο- δοι απαιτούν τη μετατροπή των γονιδιωματικών ακολουθιών (ακολουθίες χαρακτή- ρων) σε μια αριθμητική αναπαράσταση με τη μορφή μονοδιάστατων ή πολυδιάστα- των πινάκων αριθμητικών τιμών. Ο μετασχηματισμός αυτός επιτρέπει την αναπα- ράσταση γονιδιωματικών ακολουθιών ως διανύσματα χαρακτηριστικών, επιτρέπο- ντας έτσι περαιτέρω ανάλυση με αλγόριθμοι κατηγοριοποίησης και ομαδοποίησης. Από ποικιλία ερευνητικών μελετών έχει αποδειχθεί ότι η χρήση αριθμητικών μετασχηματισμών DNA ακολουθιών αποσκοπεί κυρίως στις αναλύσεις ομοιότητας με υπολογισμό αποστάσεων μεταξύ των διανυσμάτων που προκύπτουν. Ωστόσο, φαίνεται ότι υπάρχουν λίγες μελέτες σχετικά με την απόδοσή τους κατά τη διά- κριση συντηρημένων στοιχείων κατά μήκος του γονιδιώματος. Ο κύριος στόχος αυ- τής της μεταπτυχιακής εργασίας είναι η υλοποίηση μερικών ευρέως χρησιμοποιού- μενων αριθμητικών αναπαραστάσεων και η αξιολόγηση της απόδοσής τους στην ταξινόμηση των DNA ακολουθιών που είναι γνωστές ως συντηρημένα μη κωδικά στοιχεία (CNEs). Τα CNEs αποτελούνται από μια κατηγορία μη κωδικών γονιδιω- ματικών ακολουθιών που εμφανίζουν υψηλό βαθμό συντήρησης μεταξύ των οργα- νισμών και τα οποία έχουν συσχετιστεί με αναπτυξιακά προβλήματα και καρκίνο. Η προσέγγιση που επιλέχθηκε στην παρούσα διπλωματική περιελάμβανε την αναπαράσταση των δεδομένων με διαφορετικά σύνολα διανυσμάτων χαρακτηρι- στικών, όπου κάθε σύνολο διανυσμάτων προερχόταν από διαφορετική αναπαρά- σταση. Οι ακολουθίες χρησιμοποιήθηκαν με την μορφή αυτή για την αξιολόγηση της απόδοσης διαφορετικών αλγόριθμων ταξινόμησης. Οι αναπαραστάσεις κατη- γοριοποιήθηκαν βάσει του πλήθους των αριθμητικών τιμών που αντιστοιχεί σε κάθε χαρακτήρα της ακολουθίας DNA. Για να προσδιοριστεί ένα μοτίβο της χρησιμό- τητας των διαφόρων αναπαραστάσεων, ακολουθήθηκε συγκεκριμένη διαδικασία: κάθε πείραμα ταξινόμησης αποτελούταν από ένα ζεύγος γονιδιωματικών ακολου- θιών διαφορετικών κλάσεων. Ο συνολικός αριθμός δυαδικών πειραμάτων ταξινό- μησης ανέρχονται σε 26, καλύπτοντας ποικιλία πειραματικών ρυθμίσεων. Η από- δοση των ταξινομητών βρέθηκε ότι επηρεάζεται από το μεταβλητό μήκος των γονι- διωματικών ακολουθιών, αποτέλεσμα το οποίο οδήγησε στην διερεύνηση εναλλα- κτικών λύσεων διατηρώντας παράλληλα την ακεραιότητα των επιλεγμένων αναπα- ραστάσεων. Προκειμένου να βελτιώσουμε την απόδοση των ταξινομητών, αξιολογήσαμε μετασχηματισμούς από κωδικοποιήσεις μεταβλητού εύρους σε κωδικοποιήσεις στα- θερού μήκους, οι οποίες είναι καταλληλότερες για μία ποικιλία προσεγγίσεων μηχανικής μάθησης. Τέτοιες μέθοδοι έχουν χρησιμοποιηθεί στη φυσική γλώσσα επε- ξεργασίας και επεξεργασίας σήματος (π.χ. χώροι ομοιότητας σε μοντέλα γράφων ν-γραμμάτων, μετασχηματισμοί με βάση τη συχνότητα), αλλά μπορεί να χρειαστεί η προσαρμογή τους στο βιοϊατρικό περιβάλλον του έργου.The rapid progress in genome sequencing and the vast amount of easily accessible clinical data have enabled researchers to approach fundamental biological problems within the framework of machine learning. Computational Biology reclaims signal processing methods in order to analyze, interpret and draw conclusions from genomic data (Genomic Signal Processing). Those methods require the transformation of the genomic information (string of characters) to a numerical representation in the form of a single- or multidimensional array of numeric values. Among other uses, the transformation allows for genomic sequences to be represented as feature vectors, thus enabling further analysis by classification and clustering algorithms. Research data has shown that numerical transformations of DNA sequences are used mostly to perform similarity analyses by computing distances among the obtained vectors. However, there seems to be little evidence concerning their performance in distinguishing conserved elements along the genome. The main objective of this master thesis is the reimplementation of the most widely used numerical representations and the evaluation of their performance in the classification of DNA sequences known as Conserved Non-coding Elements (CNEs). CNEs consist a class of non-protein coding sequences exhibiting an extraordinary degree of conservation among organisms whose disruption has been linked to developmental problems and cancer. The approach chosen involved the representation of the dataset with different sets of feature vectors, where each set would originate from a different representation pipeline. The represented sequences would then be utilized to evaluate the performance of different classifiers. The representations were categorized based on the amount of numerical values assigned to each character of the DNA sequence. To identify consistent pattern on the usefulness of different representations, we worked on a number of tasks: each task dataset consisted of a pair of classes of genome sequences. The total number of 2-class classification tasks amounts to 26, covering a variety of settings. The classifiers’ performance was found to be affected by the variation in the length of the different DNA sequences, a result that led to the exploration of alternative solutions while maintaining the integrity of the chosen representations. In order to improve the classifiers’ performance, we evaluated transformations from variable-width encodings to fixed-width encodings, which are more appropriate for a variety of machine learning approaches. Such methods have been used in natural language processing and signal processing (e.g. likelihood of a sequence with respect to a set of Hidden Markov Models; similarity spaces to n-gram graph models; frequency-based transformations) but may need to be adapted to the biomedical setting of the work

    Artech 2008: proceedings of the 4th International Conference on Digital Arts

    Get PDF
    ARTECH 2008 is the fourth international conference held in Portugal and Galicia on the topic of Digital Arts. It aims to promote contacts between Iberian and International contributors concerned with the conception, production and dissemination of Digital and Electronic Art. ARTECH brings the scientific, technological and artistic community together, promoting the interest in the digital culture and its intersection with art and technology as an important research field, a common space for discussion, an exchange of experiences, a forum for emerging digital artists and a way of understanding and appreciating new forms of cultural expression. Hosted by the Portuguese Catholic University’s School of Arts (UCP-EA) at the City of Porto, ARTCH 2008 falls in alignment with the main commitment of the Research Center for Science and Technology of the Arts (CITAR) to promote knowledge in the field of the Arts trough research and development within UCP-AE and together with the local and international community. The main areas proposed for the conference were related with sound, image, video, music, multimedia and other new media related topics, in the context of emerging practice of artistic creation. Although non exclusive, the main topics of the conference are usually: Art and Science; Audio-Visual and Multimedia Design; Creativity Theory; Electronic Music; Generative and Algorithmic Art; Interactive Systems for Artistic Applications; Media Art history; Mobile Multimedia; Net Art and Digital Culture; New Experiences with New Media and New Applications; Tangible and Gesture Interfaces; Technology in Art Education; Virtual Reality and Augmented Reality. The contribution from the international community was extremely gratifying, resulting in the submission of 79 original works (Long Papers, Short Papers and installation proposals) from 22 Countries. Our Scientific Committee reviewed these submissions thoroughly resulting in a 73% acceptance ratio of a diverse and promising body of work presented in this book of proceedings. This compilation of articles provides an overview of the state of the art as well as a glimpse of new tendencies in the field of Digital Arts, with special emphasis in the topics: Sound and Music Computing; Technology Mediated Dance; Collaborative Art Performance; Digital Narratives; Media Art and Creativity Theory; Interactive Art; Audiovisual and Multimedia Design.info:eu-repo/semantics/publishedVersio

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum
    corecore