7 research outputs found

    Motif discovery in sequential data

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (v. 2, leaves [435]-467).In this thesis, I discuss the application and development of methods for the automated discovery of motifs in sequential data. These data include DNA sequences, protein sequences, and real-valued sequential data such as protein structures and timeseries of arbitrary dimension. As more genomes are sequenced and annotated, the need for automated, computational methods for analyzing biological data is increasing rapidly. In broad terms, the goal of this thesis is to treat sequential data sets as unknown languages and to develop tools for interpreting an understanding these languages. The first chapter of this thesis is an introduction to the fundamentals of motif discovery, which establishes a common mode of thought and vocabulary for the subsequent chapters. One of the central themes of this work is the use of grammatical models, which are more commonly associated with the field of computational linguistics. In the second chapter, I use grammatical models to design novel antimicrobial peptides (AmPs). AmPs are small proteins used by the innate immune system to combat bacterial infection in multicellular eukaryotes. There is mounting evidence that these peptides are less susceptible to bacterial resistance than traditional antibiotics and may form the basis for a novel class of therapeutics.(cont.) In this thesis, I described the rational design of novel AmPs that show limited homology to naturally-occurring proteins but have strong bacteriostatic activity against several species of bacteria, including Staphylococcus aureus and Bacillus anthracis. These peptides were designed using a linguistic model of natural AmPs by treating the amino acid sequences of natural AmPs as a formal language and building a set of regular grammars to describe this language. is set of grammars was used to create novel, unnatural AmP sequences that conform to the formal syntax of natural antimicrobial peptides but populate a previously unexplored region of protein sequence space. The third chapter describes a novel, GEneric MOtif DIscovery Algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As I show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. These motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices, or any other model for sequential data.(cont.) I demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids and DNA sequences, and the discovery of conserved protein sub-structures. The final chapter is devoted to a series of smaller projects, employing tool methods indirectly related to motif discovery in sequential data. I describe the construction of a software tool, Biogrep that is designed to match large pattern sets against large biosequence databases in a parallel fashion. is makes biogrep well-suited to annotating sets of sequences using biologically significant patterns. In addition, I show that the BLOSUM series of amino acid substitution matrices, which are commonly used in motif discovery and sequence alignment problems, have changed drastically over time.The fidelity of amino acid sequence alignment and motif discovery tools depends strongly on the target frequencies implied by these underlying matrices. us, these results suggest that further optimization of these matrices is possible. The final chapter also contains two projects wherein I apply statistical motif discovery tools instead of grammatical tools.(cont.) In the first of these two, I develop three different physiochemical representations for a set of roughly 700 HIV-I protease substrates and use these representations for sequence classification and annotation. In the second of these two projects, I develop a simple statistical method for parsing out the phenotypic contribution of a single mutation from libraries of functional diversity that contain a multitude of mutations and varied phenotypes. I show that this new method successfully elucidates the effects of single nucleotide polymorphisms on the strength of a promoter placed upstream of a reporter gene. The central theme, present throughout this work, is the development and application of novel approaches to finding motifs in sequential data. The work on the design of AmPs is very applied and relies heavily on existing literature. In contrast, the work on Gemoda is the greatest contribution of this thesis and contains many new ideas.by Kyle L. Jensen.Ph.D

    Cooking with plants in ancient Europe and beyond

    Get PDF
    Plants have constituted the basis of human subsistence. This volume focuses on plant food ingredients that were consumed by the members of past societies and on the ways these ingredients were transformed into food. The thirty chapters of this book unfold the story of culinary transformation of cereals, pulses as well as of a wide range of wild and cultivated edible plants. Regional syntheses provide insights on plant species choices and changes over time and fragments of recipes locked inside amorphous charred masses. Grinding equipment, cooking installations and cooking pots are used to reveal the ancient cooking steps in order to pull together the pieces of a culinary puzzle of the past. From the big picture of spatiotemporal patterns and changes to the micro-imaging of usewear on grinding tool surfaces, the book attempts for the first time a comprehensive and systematic approach to ancient plant food culinary transformation. Focusing mainly on Europe and the Mediterranean world in prehistory, the book expands to other regions such as South Asia and Latin America and covers a time span from the Palaeolithic to the historic periods. Several of the contributions stem from original research conducted in the context of ERC project PlantCult: Investigating the Plant Food Cultures of Ancient Europe. The book’s exploration into ancient cuisines culminates with an investigation of the significance of ethnoarchaeology towards a better understanding of past foodways as well as of the impact of archaeology in shaping modern culinary and consumer trends. The book will be of interest to archaeologists, food historians, agronomists, botanists as well as the wider public with an interest in ancient cooking

    Proceedings of the International Symposium on Sorghum Grain Quality

    Get PDF
    There has long been a need to review the present knowledge on the quality of sorghum grain, especially since it is one of the major food grains of 700 million people living under impoverished conditions in the semi-arid tropics. To meet this need, ICRISAT hosted an International Symposium on Sorghum Grain Quality in October 1981 at ICRISAT Center near Hyderabad, India. It was sponsored by the USAID Title XII Collaborative Research Support Program on Sorghum and Pearl Millet ( INTSORMIL) , the Indian Council of Agricultural Research ( ICAR) ,and the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT). Participants interested in sorghum as a food who attended the Symposium represented diverse disciplines: food technology, home economics, nutrition, breeding, biochemistry, food processing, engineering, pathology, and economics, and the topics included the existing knowledge on preparing sorghum as a food, its grain structure and deterioration, milling and laboratory methods for evaluating and improving food quality, nutrition, consumer acceptance, marketing, and quality standards. A wide range of sorghum grain types is used to prepare different solid and liquid foods such as porridges, leavened and unleavened breads, snacks, beverages, and beer. However, there are two major disadvantages of sorghum as a food—the problems of nutrient uptake, and the constant drudgery involved in hand pounding and hand grinding to make sorghum flour. Sorghum grain quality is a complex subject. Only in recent years have nutritionists and millers studied the problems associated wi t h sorghum. To replace hand processing, several pilot projects using machines for pearling and grinding are under way in some locations in Africa. Increasingly, plant breeders are developing new varieties and hybrids. For successful adoption of new cultivars by farmers, consumer acceptance is an essential requirement. We need more information on why sorghum is accepted or rejected as a food, and work still needs to be done to develop laboratory tests to screen sorghum for food quality
    corecore