248 research outputs found
A bioinformatics approach for conceptual genome mining
Recent advances in sequencing technology have set the stage for a steadily growing number of microbial whole-genome sequences. At the same time, bioinformatic analysis increasingly sheds light on the genome-encoded capacity of certain microorganisms for the production of secondary metabolites. This work describes the development of a bioinformatic toolkit to underpin discovery and dereplication efforts in a genomics-based workflow aimed at the characterization of multimodular biosynthetic gene clusters from bacterial genomes. Key to the “conceptual genome mining” approach implemented here is the comparison of pathways architectures represented by arrangement and properties of domains in complex PKS-, NRPS- and hybrid pathways rather than resorting to DNA- or protein-level sequence similarity. The new analysis framework named BiosynML toolkit was interfaced to antiSMASH, the de-facto standard for automatic annotation of biosynthetic pathways, and integrated with an existing in-house research database system (Mxbase). BiosynML methods were tested using 42 characterized pathways from 71 myxobacterial genomes and also applied to publicly accessible genomes from relevant microbial taxa. BiosynML tools were ultimately used to create an overview of 1347 pathways of which 783 distinct models were identified. This analysis revealed minimal overlap between suborders and enabled the tentative estimation of myxobacterial secondary metabolite gene cluster richness.Die fortschreitende Verbesserung von Sequenziertechnologien ermöglicht den Zugang zu einer stetig wachsenden Zahl von mikrobiellen Genomsequenzen. Gleichzeitig liefern bioinformatische Methoden ein immer besseres Bild des genetischen Potentials der Mikroorganismen für die Produktion von Sekundärmetaboliten. Die vorliegende Arbeit befasst sich mit der Entwicklung von bioinformatischen Werkzeugen um die Entdeckung, die Dereplikation und letztendlich die Charakterisierung von multimodularen Biosynthesewegen in mikrobiellen Genomen zu unterstützen. Kernstück des Ansatzes ist der „konzept-basierte“ Vergleich der Architekturen von komplexen PKS-, NRPS- und hybriden Genclustern, der sich auf Anordnung und Eigenschaften biosynthetischer Domänen stützt anstelle von Sequenzähnlichkeit. Das neu entwickelte Softwarewerkzeug, genannt BiosynML, wurde mit antiSMASH (dem de-facto Standard für die automatische Annotation von Biosynthesewegen) verknüpft und in eine bestehende Forschungsdatenbank (Mxbase) integriert. BiosynML Methoden wurden anhand der Biosynthesewege für 42 bekannte Naturstoffe in 71 myxobakteriellen Genomsequenzen getestet und auf öffentlich zugängliche Genome relevanter Mikroorganismen angewendet. Die Analyse von 1347 Biosyntheswegen aus den Genomen der Myxobakterien, darunter ein derepliziertes Set von 783 Typen, ergab eine nur minimale Überlappung zwischen Unterordnungen und ermöglichte die Abschätzung der Diversität an myxobakteriellen Sekundärmetaboliten-Genclustern
A bioinformatics approach for conceptual genome mining
Recent advances in sequencing technology have set the stage for a steadily growing number of microbial whole-genome sequences. At the same time, bioinformatic analysis increasingly sheds light on the genome-encoded capacity of certain microorganisms for the production of secondary metabolites. This work describes the development of a bioinformatic toolkit to underpin discovery and dereplication efforts in a genomics-based workflow aimed at the characterization of multimodular biosynthetic gene clusters from bacterial genomes. Key to the “conceptual genome mining” approach implemented here is the comparison of pathways architectures represented by arrangement and properties of domains in complex PKS-, NRPS- and hybrid pathways rather than resorting to DNA- or protein-level sequence similarity. The new analysis framework named BiosynML toolkit was interfaced to antiSMASH, the de-facto standard for automatic annotation of biosynthetic pathways, and integrated with an existing in-house research database system (Mxbase). BiosynML methods were tested using 42 characterized pathways from 71 myxobacterial genomes and also applied to publicly accessible genomes from relevant microbial taxa. BiosynML tools were ultimately used to create an overview of 1347 pathways of which 783 distinct models were identified. This analysis revealed minimal overlap between suborders and enabled the tentative estimation of myxobacterial secondary metabolite gene cluster richness.Die fortschreitende Verbesserung von Sequenziertechnologien ermöglicht den Zugang zu einer stetig wachsenden Zahl von mikrobiellen Genomsequenzen. Gleichzeitig liefern bioinformatische Methoden ein immer besseres Bild des genetischen Potentials der Mikroorganismen für die Produktion von Sekundärmetaboliten. Die vorliegende Arbeit befasst sich mit der Entwicklung von bioinformatischen Werkzeugen um die Entdeckung, die Dereplikation und letztendlich die Charakterisierung von multimodularen Biosynthesewegen in mikrobiellen Genomen zu unterstützen. Kernstück des Ansatzes ist der „konzept-basierte“ Vergleich der Architekturen von komplexen PKS-, NRPS- und hybriden Genclustern, der sich auf Anordnung und Eigenschaften biosynthetischer Domänen stützt anstelle von Sequenzähnlichkeit. Das neu entwickelte Softwarewerkzeug, genannt BiosynML, wurde mit antiSMASH (dem de-facto Standard für die automatische Annotation von Biosynthesewegen) verknüpft und in eine bestehende Forschungsdatenbank (Mxbase) integriert. BiosynML Methoden wurden anhand der Biosynthesewege für 42 bekannte Naturstoffe in 71 myxobakteriellen Genomsequenzen getestet und auf öffentlich zugängliche Genome relevanter Mikroorganismen angewendet. Die Analyse von 1347 Biosyntheswegen aus den Genomen der Myxobakterien, darunter ein derepliziertes Set von 783 Typen, ergab eine nur minimale Überlappung zwischen Unterordnungen und ermöglichte die Abschätzung der Diversität an myxobakteriellen Sekundärmetaboliten-Genclustern
Automatic population of knowledge bases with multimodal data about named entities
Knowledge bases are of great importance for Web search, recommendations, and many Information Retrieval tasks. However, maintaining them for not so popular entities is often a bottleneck. Typically, such entities have limited textual coverage and only a few ontological facts. Moreover, these entities are not well populated with multimodal data, such as images, videos, or audio recordings.
The goals in this thesis are (1) to populate a given knowledge base with multimodal data about entities, such as images or audio recordings, and (2) to ease the task of maintaining and expanding the textual knowledge about a given entity, by recommending valuable text excerpts to the contributors of knowledge bases.
The thesis makes three main contributions. The first two contributions concentrate on finding images of named entities with high precision, high recall, and high visual diversity. Our main focus are less popular entities, for which the image search engines fail to retrieve good results. Our methods utilize background knowledge about the entity, such as ontological facts or a short description, and a visual-based image similarity to rank and diversify a set of candidate images.
Our third contribution is an approach for extracting text contents related to a given entity. It leverages a language-model-based similarity between a short description of the entity and the text sources, and solves a budget-constraint optimization program without any assumptions on the text structure. Moreover, our approach is also able to reliably extract entity related audio excerpts from news podcasts. We derive the time boundaries from the usually very noisy audio transcriptions.Wissensbasen wird bei der Websuche, bei Empfehlungsdiensten und vielen anderen Information Retrieval Aufgaben eine große Bedeutung zugeschrieben. Allerdings stellt sich deren Unterhalt für weniger populäre Entitäten als schwierig heraus. Üblicherweise ist die Anzahl an Texten über Entitäten dieser Art begrenzt, und es gibt nur wenige ontologische Fakten. Außerdem sind nicht viele multimediale Daten, wie zum Beispiel Bilder, Videos oder Tonaufnahmen, für diese Entitäten verfügbar.
Die Ziele dieser Dissertation sind daher (1) eine gegebene Wissensbasis mit multimedialen Daten, wie Bilder oder Tonaufnahmen, über Entitäten anzureichern und (2) die Erleichterung der Aufgabe Texte über eine gegebene Entität zu verwalten und zu erweitern, indem den Beitragenden einer Wissensbasis nützliche Textausschnitte vorgeschlagen werden.
Diese Dissertation leistet drei Hauptbeiträge. Die ersten zwei Beiträge sind im Gebiet des Auffindens von Bildern von benannten Entitäten mit hoher Genauigkeit, hoher Trefferquote, und hoher visueller Vielfalt. Das Hauptaugenmerk liegt auf den weniger populären Entitäten bei denen die Bildersuchmaschinen normalerweise keine guten Ergebnisse liefern. Unsere Verfahren benutzen Hintergrundwissen über die Entität, zum Beispiel ontologische Fakten oder eine Kurzbeschreibung, so wie ein visuell-basiertes Bilderähnlichkeitsmaß um die Bilder nach Rang zu ordnen und um eine Menge von Bilderkandidaten zu diversifizieren.
Der dritte Beitrag ist ein Ansatz um Textinhalte, die sich auf eine gegebene Entität beziehen, zu extrahieren. Der Ansatz nutzt ein auf einem Sprachmodell basierendes Ähnlichkeitsmaß zwischen einer Kurzbeschreibung der Entität und den Textquellen und löst zudem ein Optimierungsproblem mit Budgetrestriktion, das keine Annahmen an die Textstruktur macht. Darüber hinaus ist der Ansatz in der Lage Tonaufnahmen, welche in Beziehung zu einer Entität stehen, zuverlässig aus Nachrichten-Podcasts zu extrahieren. Dafür werden zeitliche Abgrenzungen aus den normalerweise sehr verrauschten Audiotranskriptionen hergeleitet
Evolutionary computing driven search based software testing and correction
For a given program, testing, locating the errors identified, and correcting those errors is a critical, yet expensive process. The field of Search Based Software Engineering (SBSE) addresses these phases by formulating them as search problems. This dissertation addresses these challenging problems through the use of two complimentary evolutionary computing based systems. The first one is the Fitness Guided Fault Localization (FGFL) system, which novelly uses a specification based fitness function to perform fault localization. The second is the Coevolutionary Automated Software Correction (CASC) system, which employs a variety of evolutionary computing techniques to perform testing, correction, and verification of software. In support of the real world application of these systems, a practitioner\u27s guide to fitness function design is provided. For the FGFL system, experimental results are presented that demonstrate the applicability of fitness guided fault localization to automate this important phase of software correction in general, and the potential of the FGFL system in particular. For the fitness function design guide, the performance of a guide generated fitness function is compared to that of an expert designed fitness function demonstrating the competitiveness of the guide generated fitness function. For the CASC system, results are presented that demonstrate the system\u27s abilities on a series of problems of both increasing size as well as number of bugs present. The system presented solutions more than 90% of the time for versions of the programs containing one or two bugs. Additionally, scalability results are presented for the CASC system that indicate that success rate linearly decreases with problem size and that the estimated convergence rate scales at worst linearly with problem size --Abstract, page ii
Diagnostic reasoning in medical students using a simulated environment
No description supplie
Testing in the Professions
Testing in the Professions focuses on current practices in credentialing testing as a guide for practitioners. With a broad focus on the key components, issues, and concerns surrounding the test development and validation process, this book brings together a wide range of research and theory—from design and analysis of tests to security, scoring, and reporting. Written by leading experts in the field of measurement and assessment, each chapter includes authentic examples as to how various practices are implemented or current issues observed in credentialing programs. The volume begins with an exploration of the various types of credentialing programs as well as key differences in the interpretation and evaluation of test scores. The next set of chapters discusses key test development steps, including test design, content development, analysis, and evaluation. The final set of chapters addresses specific topics that span the testing process, including communication with stakeholders, security, program evaluation, and legal principles. As a response to the growing number of professions and professional designations that are tied to testing requirements, Testing in the Professions is a comprehensive source for up-to-date measurement and credentialing practices
Recommended from our members
The Automatic Assessment of Multiple Artefacts: An Investigation into Design Diagrams and Their Implementations
As the Higher Education sector has moved towards student-centred learning so too has the growth in electronic support for learning. E-assessment has been a part of this growth as increasingly assessment and its feedback is seen as an integral part of the students’ learning process. Mature e-assessment systems exist, particularly where answers to questions are restricted to a prescribed list of alternatives. However, for free response artefacts, where there is a limited restriction placed on answers to questions, automated assessment systems are embryonic.
This dissertation presents an investigation into the automated assessment of free response artefacts. Design diagrams and their accompanying source code implementations are examples of free response artefacts. A case study is developed that investigates how to automatically generate formative feedback for a design diagram by utilizing its accompanying implementation. The dissertation presents a two-staged solution, initially analysing the design diagram in isolation before comparing it with the implementation. A framework for this approach has been developed and tested using a tool applied to coursework submitted by undergraduate computer science students.
The tool was evaluated by comparing the formative feedback comments generated by the tool with those produced by a team of computer science educators. Evaluation was undertaken via two Likert questionnaires, one completed by students and one completed by a team of computer scientists. The results presented are favourable, with the majority of comments produced by the tool being seen to be as least as good as those generated by the computer science educators
Recommended from our members
Evaluating human-centered approaches for geovisualization
Working with two small group of domain experts I evaluate human-centered approaches to application development which are applicable to geovisualization, following an ISO13407 taxonomy that covers context of use, eliciting requirements, and design. These approaches include field studies and contextual analysis of subjects' context; establishing requirements using a template, via a lecture to communicate geovisualization to subjects and by communicating subjects' context to geovisualization experts with a scenario; autoethnography to understand the geovisualization design process; wireframe, paper and digital interactive prototyping with alternative protocols; and a decision making process for prioritising application improvement. I find that the acquisition and use of real user data is key; that a template approach and teaching subjects about visualization tools and interactions both fail to elicit useful requirements for a visualization application. Consulting geovisualization experts with a scenario of user context and samples of user data does yield suggestions for tools and interactions of use to a visualization designer. The complex and composite natures of both visualization and human-centered domains, incorporating learning from both domains, with user context, makes design challenging. Wireframe, paper and digital interactive prototypes mediate between the user and visualization domains successfully, eliciting exploratory behaviour and suggestions to improve prototypes. Paper prototypes are particularly successful at eliciting suggestions and especially novel visualization improvements. Decision-making techniques prove useful for prioritising different possible improvements, although domain subjects select data-related features over more novel alternative and rank these more inconsistently. The research concludes that understanding subject context of use and data is important and occurs throughout the process of engagement with domain experts, and that standard requirements elicitation techniques are unsuccessful for geovisualization. Engagement with subjects at an early stage with simple prototypes incorporating real subject data and moving to successively more complex prototypes holds the best promise for creating successful geovisualization applications
Testing in the Professions
Testing in the Professions focuses on current practices in credentialing testing as a guide for practitioners. With a broad focus on the key components, issues, and concerns surrounding the test development and validation process, this book brings together a wide range of research and theory—from design and analysis of tests to security, scoring, and reporting. Written by leading experts in the field of measurement and assessment, each chapter includes authentic examples as to how various practices are implemented or current issues observed in credentialing programs. The volume begins with an exploration of the various types of credentialing programs as well as key differences in the interpretation and evaluation of test scores. The next set of chapters discusses key test development steps, including test design, content development, analysis, and evaluation. The final set of chapters addresses specific topics that span the testing process, including communication with stakeholders, security, program evaluation, and legal principles. As a response to the growing number of professions and professional designations that are tied to testing requirements, Testing in the Professions is a comprehensive source for up-to-date measurement and credentialing practices
- …