76 research outputs found

    Front Matter - Soft Computing for Data Mining Applications

    Get PDF
    Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic

    Unsupervised quantification of entity consistency between photos and text in real-world news

    Get PDF
    Das World Wide Web und die sozialen Medien übernehmen im heutigen Informationszeitalter eine wichtige Rolle für die Vermittlung von Nachrichten und Informationen. In der Regel werden verschiedene Modalitäten im Sinne der Informationskodierung wie beispielsweise Fotos und Text verwendet, um Nachrichten effektiver zu vermitteln oder Aufmerksamkeit zu erregen. Kommunikations- und Sprachwissenschaftler erforschen das komplexe Zusammenspiel zwischen Modalitäten seit Jahrzehnten und haben unter Anderem untersucht, wie durch die Kombination der Modalitäten zusätzliche Informationen oder eine neue Bedeutungsebene entstehen können. Die Anzahl gemeinsamer Konzepte oder Entitäten (beispielsweise Personen, Orte und Ereignisse) zwischen Fotos und Text stellen einen wichtigen Aspekt für die Bewertung der Gesamtaussage und Bedeutung eines multimodalen Artikels dar. Automatisierte Ansätze zur Quantifizierung von Bild-Text-Beziehungen können für zahlreiche Anwendungen eingesetzt werden. Sie ermöglichen beispielsweise eine effiziente Exploration von Nachrichten, erleichtern die semantische Suche von Multimedia-Inhalten in (Web)-Archiven oder unterstützen menschliche Analysten bei der Evaluierung der Glaubwürdigkeit von Nachrichten. Allerdings gibt es bislang nur wenige Ansätze, die sich mit der Quantifizierung von Beziehungen zwischen Fotos und Text beschäftigen. Diese Ansätze berücksichtigen jedoch nicht explizit die intermodalen Beziehungen von Entitäten, welche eine wichtige Rolle in Nachrichten darstellen, oder basieren auf überwachten multimodalen Deep-Learning-Techniken. Diese überwachten Lernverfahren können ausschließlich die intermodalen Beziehungen von Entitäten detektieren, die in annotierten Trainingsdaten enthalten sind. Um diese Forschungslücke zu schließen, wird in dieser Arbeit ein unüberwachter Ansatz zur Quantifizierung der intermodalen Konsistenz von Entitäten zwischen Fotos und Text in realen multimodalen Nachrichtenartikeln vorgestellt. Im ersten Teil dieser Arbeit werden neuartige Verfahren auf Basis von Deep Learning zur Extrahierung von Informationen aus Fotos vorgestellt, um Ereignisse (Events), Orte, Zeitangaben und Personen automatisch zu erkennen. Diese Verfahren bilden eine wichtige Voraussetzung, um die Beziehungen von Entitäten zwischen Bild und Text zu bewerten. Zunächst wird ein Ansatz zur Ereignisklassifizierung präsentiert, der neuartige Optimierungsfunktionen und Gewichtungsschemata nutzt um Ontologie-Informationen aus einer Wissensdatenbank in ein Deep-Learning-Verfahren zu integrieren. Das Training erfolgt anhand eines neu vorgestellten Datensatzes, der 570.540 Fotos und eine Ontologie mit 148 Ereignistypen enthält. Der Ansatz übertrifft die Ergebnisse von Referenzsystemen die keine strukturierten Ontologie-Informationen verwenden. Weiterhin wird ein DeepLearning-Ansatz zur Schätzung des Aufnahmeortes von Fotos vorgeschlagen, der Kontextinformationen über die Umgebung (Innen-, Stadt-, oder Naturaufnahme) und von Erdpartitionen unterschiedlicher Granularität verwendet. Die vorgeschlagene Lösung übertrifft die bisher besten Ergebnisse von aktuellen Forschungsarbeiten, obwohl diese deutlich mehr Fotos zum Training verwenden. Darüber hinaus stellen wir den ersten Datensatz zur Schätzung des Aufnahmejahres von Fotos vor, der mehr als eine Million Bilder aus den Jahren 1930 bis 1999 umfasst. Dieser Datensatz wird für das Training von zwei Deep-Learning-Ansätzen zur Schätzung des Aufnahmejahres verwendet, welche die Aufgabe als Klassifizierungs- und Regressionsproblem behandeln. Beide Ansätze erzielen sehr gute Ergebnisse und übertreffen Annotationen von menschlichen Probanden. Schließlich wird ein neuartiger Ansatz zur Identifizierung von Personen des öffentlichen Lebens und ihres gemeinsamen Auftretens in Nachrichtenfotos aus der digitalen Bibliothek Internet Archiv präsentiert. Der Ansatz ermöglicht es unstrukturierte Webdaten aus dem Internet Archiv mit Metadaten, beispielsweise zur semantischen Suche, zu erweitern. Experimentelle Ergebnisse haben die Effektivität des zugrundeliegenden Deep-Learning-Ansatzes zur Personenerkennung bestätigt. Im zweiten Teil dieser Arbeit wird ein unüberwachtes System zur Quantifizierung von BildText-Beziehungen in realen Nachrichten vorgestellt. Im Gegensatz zu bisherigen Verfahren liefert es automatisch neuartige Maße der intermodalen Konsistenz für verschiedene Entitätstypen (Personen, Orte und Ereignisse) sowie den Gesamtkontext. Das System ist nicht auf vordefinierte Datensätze angewiesen, und kann daher mit der Vielzahl und Diversität von Entitäten und Themen in Nachrichten umgehen. Zur Extrahierung von Entitäten aus dem Text werden geeignete Methoden der natürlichen Sprachverarbeitung eingesetzt. Examplarbilder für diese Entitäten werden automatisch aus dem Internet beschafft. Die vorgeschlagenen Methoden zur Informationsextraktion aus Fotos werden auf die Nachrichten- und heruntergeladenen Exemplarbilder angewendet, um die intermodale Konsistenz von Entitäten zu quantifizieren. Es werden zwei Aufgaben untersucht um die Qualität des vorgeschlagenen Ansatzes in realen Anwendungen zu bewerten. Experimentelle Ergebnisse für die Dokumentverifikation und die Beschaffung von Nachrichten mit geringer (potenzielle Fehlinformation) oder hoher multimodalen Konsistenz zeigen den Nutzen und das Potenzial des Ansatzes zur Unterstützung menschlicher Analysten bei der Untersuchung von Nachrichten.In today’s information age, the World Wide Web and social media are important sources for news and information. Different modalities (in the sense of information encoding) such as photos and text are typically used to communicate news more effectively or to attract attention. Communication scientists, linguists, and semioticians have studied the complex interplay between modalities for decades and investigated, e.g., how their combination can carry additional information or add a new level of meaning. The number of shared concepts or entities (e.g., persons, locations, and events) between photos and text is an important aspect to evaluate the overall message and meaning of an article. Computational models for the quantification of image-text relations can enable many applications. For example, they allow for more efficient exploration of news, facilitate semantic search and multimedia retrieval in large (web) archives, or assist human assessors in evaluating news for credibility. To date, only a few approaches have been suggested that quantify relations between photos and text. However, they either do not explicitly consider the cross-modal relations of entities – which are important in the news – or rely on supervised deep learning approaches that can only detect the cross-modal presence of entities covered in the labeled training data. To address this research gap, this thesis proposes an unsupervised approach that can quantify entity consistency between photos and text in multimodal real-world news articles. The first part of this thesis presents novel approaches based on deep learning for information extraction from photos to recognize events, locations, dates, and persons. These approaches are an important prerequisite to measure the cross-modal presence of entities in text and photos. First, an ontology-driven event classification approach that leverages new loss functions and weighting schemes is presented. It is trained on a novel dataset of 570,540 photos and an ontology with 148 event types. The proposed system outperforms approaches that do not use structured ontology information. Second, a novel deep learning approach for geolocation estimation is proposed that uses additional contextual information on the environmental setting (indoor, urban, natural) and from earth partitions of different granularity. The proposed solution outperforms state-of-the-art approaches, which are trained with significantly more photos. Third, we introduce the first large-scale dataset for date estimation with more than one million photos taken between 1930 and 1999, along with two deep learning approaches that treat date estimation as a classification and regression problem. Both approaches achieve very good results that are superior to human annotations. Finally, a novel approach is presented that identifies public persons and their co-occurrences in news photos extracted from the Internet Archive, which collects time-versioned snapshots of web pages that are rarely enriched with metadata relevant to multimedia retrieval. Experimental results confirm the effectiveness of the deep learning approach for person identification. The second part of this thesis introduces an unsupervised approach capable of quantifying image-text relations in real-world news. Unlike related work, the proposed solution automatically provides novel measures of cross-modal consistency for different entity types (persons, locations, and events) as well as the overall context. The approach does not rely on any predefined datasets to cope with the large amount and diversity of entities and topics covered in the news. State-of-the-art tools for natural language processing are applied to extract named entities from the text. Example photos for these entities are automatically crawled from the Web. The proposed methods for information extraction from photos are applied to both news images and example photos to quantify the cross-modal consistency of entities. Two tasks are introduced to assess the quality of the proposed approach in real-world applications. Experimental results for document verification and retrieval of news with either low (potential misinformation) or high cross-modal similarities demonstrate the feasibility of the approach and its potential to support human assessors to study news

    Structural Cheminformatics for Kinase-Centric Drug Design

    Get PDF
    Drug development is a long, expensive, and iterative process with a high failure rate, while patients wait impatiently for treatment. Kinases are one of the main drug targets studied for the last decades to combat cancer, the second leading cause of death worldwide. These efforts resulted in a plethora of structural, chemical, and pharmacological kinase data, which are collected in the KLIFS database. In this thesis, we apply ideas from structural cheminformatics to the rich KLIFS dataset, aiming to provide computational tools that speed up the complex drug discovery process. We focus on methods for target prediction and fragment-based drug design that study characteristics of kinase binding sites (also called pockets). First, we introduce the concept of computational target prediction, which is vital in the early stages of drug discovery. This approach identifies biological entities such as proteins that may (i) modulate a disease of interest (targets or on-targets) or (ii) cause unwanted side effects due to their similarity to on-targets (off-targets). We focus on the research field of binding site comparison, which lacked a freely available and efficient tool to determine similarities between the highly conserved kinase pockets. We fill this gap with the novel method KiSSim, which encodes and compares spatial and physicochemical pocket properties for all kinases (kinome) that are structurally resolved. We study kinase similarities in the form of kinome-wide phylogenetic trees and detect expected and unexpected off-targets. To allow multiple perspectives on kinase similarity, we propose an automated and production-ready pipeline; user-defined kinases can be inspected complementarily based on their pocket sequence and structure (KiSSim), pocket-ligand interactions, and ligand profiles. Second, we introduce the concept of fragment-based drug design, which is useful to identify and optimize active and promising molecules (hits and leads). This approach identifies low-molecular-weight molecules (fragments) that bind weakly to a target and are then grown into larger high-affinity drug-like molecules. With the novel method KinFragLib, we provide a fragment dataset for kinases (fragment library) by viewing kinase inhibitors as combinations of fragments. Kinases have a highly conserved pocket with well-defined regions (subpockets); based on the subpockets that they occupy, we fragment kinase inhibitors in experimentally resolved protein-ligand complexes. The resulting dataset is used to generate novel kinase-focused molecules that are recombinations of the previously fragmented kinase inhibitors while considering their subpockets. The KinFragLib and KiSSim methods are published as freely available Python tools. Third, we advocate for open and reproducible research that applies FAIR principles ---data and software shall be findable, accessible, interoperable, and reusable--- and software best practices. In this context, we present the TeachOpenCADD platform that contains pipelines for computer-aided drug design. We use open source software and data to demonstrate ligand-based applications from cheminformatics and structure-based applications from structural bioinformatics. To emphasize the importance of FAIR data, we dedicate several topics to accessing life science databases such as ChEMBL, PubChem, PDB, and KLIFS. These pipelines are not only useful to novices in the field to gain domain-specific skills but can also serve as a starting point to study research questions. Furthermore, we show an example of how to build a stand-alone tool that formalizes reoccurring project-overarching tasks: OpenCADD-KLIFS offers a clean and user-friendly Python API to interact with the KLIFS database and fetch different kinase data types. This tool has been used in this thesis and beyond to support kinase-focused projects. We believe that the FAIR-based methods, tools, and pipelines presented in this thesis (i) are valuable additions to the toolbox for kinase research, (ii) provide relevant material for scientists who seek to learn, teach, or answer questions in the realm of computer-aided drug design, and (iii) contribute to making drug discovery more efficient, reproducible, and reusable

    New Methodology for Measuring Semantic Functional Similarity Based on Bidirectional Integration

    Get PDF
    1.2 billion users in facebook, 17 million articles in Wikipedia, and 190 million tweets per day have demanded significant increase of information processing through Internet in recent years. Similarly life sciences and bioinformatics also have faced issues of processing Big data due to the explosion of publicly available genomic information resulted from the Human Genome Project (HGP) and the increasing usage of high throughput technology. HGP was completed in 2003 and resulted in identifying 20,000-25,000 genes in human DNA and determining the sequences of three billion human base pairs. The information requires huge amount of data storage and becomes difficult to process using on-hand database management tools or traditional data processing applications. This thesis introduces new method, Biological and Statistical Mean (BSM) score to calculate functional similarity between gene products (GPs) that can help to extract biologically relevant and statistically robust information from large-scale biomedical, genomic and proteomic data sources. BSM score is defined by 16 different scoring matrices derived from principles of multi-view learning in machine learning algorithm and five different databases including Gene Ontology, UniProt, SCOP, CATH, and KUPS. The proposed method also shows how diverse databases and principles in machine learning theory can be integrated into a simple scoring function, and how the simple concept can give significant impact on the studies in biomedical and human life sciences. The comprehensive evaluations and performance comparisons with other conventional methods show that BSM score clearly outperforms other methods in terms of sensitivity of clustering similarity functional groups and coverage of identifying related genes. As a part of potential applications handling large amount of diverse data sources in medical domain, this thesis introduces similarity-based drug target identification and disease networks using BSM scores. Application of BSM score is freely available through http://www.ittc.ku.edu/chenlab/goal

    Bayini, Macassans, Balanda, and Bininj : defining the Indigenous past of Arnhem land through culture contact

    Get PDF
    This study has set out to investigate unresolved issues regarding the chronology, nature, and subsequent impacts from culture contacts between South East Asian maritime communities, Europeans, and northern Australian Indigenous populations. These issues include the question of whether there is archaeological evidence for pre-Macassan visitation in north western Arnhem Land. Therefore an important aim included assessing whether it is possible to measure the level of interaction and impact the trepang industry and later European economies had on local Indigenous communities through the investigation of the archaeological record from the Wellington Ranges, coastal region of Anuru Bay, and South Goulburn Island. Within the scope of this aim, it was important to re-assess and radiocarbon date the well-known Malara (Anuru Bay A) trepang processing site in order to gain a greater understanding of the intensity and frequency of Macassan (and possibly pre-Macassan) occupation, trepang processing, and contact with Aboriginal people. The results of this study support a longer timeframe of culture contact occurring from the early to mid-17th Century with a proliferation in the Macassan trepang processing industry from the mid-1700s.The study also aimed to investigate the complexity of change in Indigenous society during the culture contact period through documentation and analysis of the Indigenous archaeological record (material culture, rock art assemblages) at the Malarrak, Djulirri, and Maliwawa rockshelter complexes in the Wellington Range. This involved an examination of the spatial distribution of Indigenous rock art and archaeological sites to assess changes in residential mobility (both local and regional), resource utilisation, and impacts on Indigenous customary trade and exchange. A particular focus of this study analysed changes in Indigenous rock art production within western Arnhem Land that occurred during the culture contact period. This archaeological evidence has also been evaluated in conjunction with historical, ethnographic, linguistic, and anthropological records. The changes that occurred in Indigenous society accompanied by culture contact have been assessed using the Indigenous hybrid economy model developed by Altman (2006). This thesis argues that the archaeological evidence (i.e. occurrence of beads, rock art paintings of firearms and ships) establishes the presence of an operating hybrid economy between Indigenous people, Europeans, and Macassans. The operation of the hybrid economy allowed for Indigenous people to negotiate and interact with others based on customary law and tradition to influence the outcomes in these exchanges, such as allowing others to be on their country and to utilise their resources (i.e. trepang, buffalo). Building on Mitchell (1994) and Clarke's (1994) models of culture contact, this study proposes that western Arnhem Land culture contact proceeds and then transforms during five significant temporal phases consisting of (a) pre-Macassan, (b) Macassan, (c) Colonial, (d) Mission, and (e) Welfare economic periods

    Creative Practice: How Communities were ‘made’ at Çatalhöyük

    Get PDF
    What role did creative practice play in social life at the Neolithic tell Çatalhöyük, and what evidence is there to suggest that making informed the maintenance of the ‘social bond’? Socio-creativity is an undeveloped but important area of research for archaeological approaches to the Neolithic, and offers a unique opportunity to consider both individual and community dynamics, tensions and changing social values from the residues of material interactions. Utilising the work of Bennett (2010a), Barad (2003, 2007, 2012), and Gell (1998) I formulate a critically-informed but practically embedded methodology that finds material “phenomena” (Barad 2003) at the settlement. Çatalhöyük offers a particularly unique example of social organisation as it is believed to have been an egalitarian settlement (Hodder 2014a,c). Furthermore, the material culture provides us with a rich dataset that contains the traces of highly creative and materially-engaged individuals who routinely made and re-made things, such as sunbaked clay figurines, basketry, and beads. I focus on Neolithic interactions with colourful or brilliant materials, substances, and spaces, and explore how these material interactions, as phenomena, reveal certain sensorial dynamics in-action at the Neolithic town. I outline how creative practices can create certain sensory dispositions - ways of seeing, feeling and doing - and I argue that the senses can be profiled during making events (cf. Howes and Classen 1991). The sensorial implications of making have wider connotations for the changing dynamics and tensions between ‘communities of practice’, and can yield important information about macro-scale changes in lifeways (Lave and Wenger 1991; Wenger 1998, 2012; Wendrich 2012; Bartlett and McAnany 2000). I contend that creative practice was an important element of egalitarian community maintenance and argue that socio-creativity played an integral role in social organisation at Çatalhöyük
    corecore