206 research outputs found

    Workflow models for heterogeneous distributed systems

    Get PDF
    The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

    Detection and Evaluation of Clusters within Sequential Data

    Full text link
    Motivated by theoretical advancements in dimensionality reduction techniques we use a recent model, called Block Markov Chains, to conduct a practical study of clustering in real-world sequential data. Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees and can be deployed in sparse data regimes. Despite these favorable theoretical properties, a thorough evaluation of these algorithms in realistic settings has been lacking. We address this issue and investigate the suitability of these clustering algorithms in exploratory data analysis of real-world sequential data. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. In order to evaluate the determined clusters, and the associated Block Markov Chain model, we further develop a set of evaluation tools. These tools include benchmarking, spectral noise analysis and statistical model selection tools. An efficient implementation of the clustering algorithm and the new evaluation tools is made available together with this paper. Practical challenges associated to real-world data are encountered and discussed. It is ultimately found that the Block Markov Chain model assumption, together with the tools developed here, can indeed produce meaningful insights in exploratory data analyses despite the complexity and sparsity of real-world data.Comment: 37 pages, 12 figure

    Prehistoric Plant Procurement, Food Production, and Land Use in Southwestern Tamaulipas, Mexico

    Get PDF
    In this dissertation, I examine plant use, food production, and land use in the Ocampo region of southwestern Tamaulipas, northeastern Mexico. In the early 1950s Richard S. MacNeish excavated in a series of dry cave sites within the study area and discovered evidence for the local adoption of domesticated plants and the subsequent development of a mixed foraging-farming economy that persisted for millennia, before culminating in the establishment of settled farming villages. This research remains central to discussions of early Mesoamerican agriculture. However, the spectrum of land use and wild plant utilization over the prehistoric sequence remains poorly understood, as MacNeish\u27s Ocampo investigations focused on one aspect of a larger settlement pattern: cave occupations), and his results are incompletely published. This dissertation expands on earlier work through an examination of curated plant collections from MacNeish\u27s excavations and an archaeological survey near the Ocampo caves. Although most published sources acknowledge that wild plants comprised the majority of the local diet: especially during the early cultural phases), these sources often do not describe the species in question. Inspection of plant materials curated in several facilities in the United States and Mexico revealed a range of wild plants that are not mentioned previously in publication. Some curated specimens indicate that even when local populations lived in permanent habitations in villages, foraging activities drew them as far as 30 km away. Observations of present-day casual cultivation behaviors provided insights into how the earliest domesticated plants in the region: squashes and gourds) may have been incorporated into a primarily hunter-gatherer economy with minimal disruptions. Archaeological survey of the study area revealed that during the peak of population density: ca. 2400-1000 B.P.), large agricultural villages were established not only in narrow river valleys but also on moderate mountain slopes and high summits, likely due to a general lack of level land. Traditional farmers here today practice slash-and-burn agriculture on steep hill sides as flat alluvial terraces and gentle slopes become less available, and it is probable that prehistoric villagers did the same. Even as large permanent settlements became abundant, caves continued to be used for a variety of pursuits, including base camps for wild plant harvesting, winter-season hunting camps, and burial of the dead. Major contributions of this work include: 1) insights into the non-agricultural plant component of early low-level food producing economies in the study area; 2) availability of an important archaeobotanical data set previously not accessible to the general archaeological community; 3) refined classification of previously identified remains in the curated archaeobotanical collections; 4)increased awareness of the range of site types and land use practices utilized by early food producers in the Ocampo region: through preliminary archaeological survey and artifact assemblages on discovered sites); 5) documentation and registration of discovered sites with the in the Instituto Nacional de Antropologia e Historia: INAH) Registro Publico de Monumentos y Zonas Arqueológicos: Public Register of Archaeological Monuments and Zones ); and historically contextualize MacNeish\u27s groundbreaking investigations in the Ocampo caves

    Seeds as Artifacts of Communities of Practice: The Domestication of Erect Knotweed in Eastern North America

    Get PDF
    Humans are the ultimate ecosystem engineers, and in transforming ecosystems we also change the selective environment for the plants and animals that live among us. The bodies and behaviors of domesticated plants and animals are thus rich artifacts of traditional ecological knowledge and practice. I study the morphology and behavior of domesticated plants as a proxy for ancient agricultural communities of practice. The transition from food procurement to food production is one of the most significant shifts in human history. I consider this process as the evolution and spread of a knowledge system. Domestication studies are usually focused on differentiating wild from domestic types, but I wanted to investigate variation under cultivation. Normally discussed in the context of contemporary or historical small-scale farming, landraces are plant varieties that have been developed to grow particularly well under local conditions or to suit local preferences. Because landraces need to be maintained across generations of both plants and people, they are reflections of communities of practice, social learning, and Traditional Ecological Knowledge systems. By undertaking a detailed case study of variation within a single crop, I hoped to be able to use seeds in the same way that pottery, lithic tools, or iconography are used: to reveal shared traditions and connections between communities. This dissertation is focused on the lost crops of Eastern North America: a suite of annual seed crops that were cultivated for thousands of years before the introduction of maize and other tropical crops through trade. These crops are referred to as the Eastern Agricultural Complex (EAC). I chose to investigate one of these, erect knotweed (Polygonum erectum L.), which was cultivated for its edible seeds by Indigenous people in Eastern North American for ~2,000 years. My goals were 1) to establish whether or not erect knotweed had been domesticated by ancient farmers; and 2) to document variation under cultivation that might reveal different communities of practice in Eastern North America. This dissertation consists of five chapters: 1) A formal description of the domesticated sub-species of erect knotweed (Polygonum erectum ssp. watsoniae N.G. Muell.) including taxonomic background and a comparative analysis of other species of Polygonum native to the study area/ 2) An overview of domestication syndrome in a desiccated assemblage of erect knotweed from the Whitney Bluff site, Arkansas, and a discussion of its implications for ancient agricultural practice in Eastern North America. 3) The results of field studies and experimental cultivation of erect knotweed over two growing seasons, with a discussion of the hypothesized roles of plasticity and heredity in the domestication of this species. 4) An experimental study of the processes that affect preservation of erect knotweed seeds and fruits, namely: carbonization (burning in anoxic conditions) and taphonomy (physical weathering after deposition). These processes systematically bias the archaeobotanical record and need to be accounted for in domestication studies. 5) A review of the archaeological background, and a comparison of ancient erect knotweed assemblages from 14 archaeological sites spanning 2,000 years. My concluding thoughts place this research in the context of global studies of domestication and food production. I suggest that optimal foraging models used in human behavioral ecology may consistently under-rank the seeds of small seeded annuals, and that plasticity under cultivation may have been one factor that made disturbance adapted plants attractive to ancient foragers. I argue that niche construction, food production, and delayed return strategies are all roughly synonymous terms, and that domestication is a likely, but not predetermined, outcome of such systems and behaviors. The spread of food producing economies was dependent on the spread of complex systems of knowledge through interacting communities of practice and without these systems of traditional ecological knowledge domesticated varieties could not be maintained

    The inclusion of engineering design into the high school biology curriculum

    Get PDF
    The purpose of this project is to develop engineering-based lessons for a life science course,i.e. biology, in order to meet the NGSS standards as well as increase student interest in engineering by incorporating the principles of engineering design into the traditional science classroom

    Automatic Population of Structured Reports from Narrative Pathology Reports

    Get PDF
    There are a number of advantages for the use of structured pathology reports: they can ensure the accuracy and completeness of pathology reporting; it is easier for the referring doctors to glean pertinent information from them. The goal of this thesis is to extract pertinent information from free-text pathology reports and automatically populate structured reports for cancer diseases and identify the commonalities and differences in processing principles to obtain maximum accuracy. Three pathology corpora were annotated with entities and relationships between the entities in this study, namely the melanoma corpus, the colorectal cancer corpus and the lymphoma corpus. A supervised machine-learning based-approach, utilising conditional random fields learners, was developed to recognise medical entities from the corpora. By feature engineering, the best feature configurations were attained, which boosted the F-scores significantly from 4.2% to 6.8% on the training sets. Without proper negation and uncertainty detection, the quality of the structured reports will be diminished. The negation and uncertainty detection modules were built to handle this problem. The modules obtained overall F-scores ranging from 76.6% to 91.0% on the test sets. A relation extraction system was presented to extract four relations from the lymphoma corpus. The system achieved very good performance on the training set, with 100% F-score obtained by the rule-based module and 97.2% F-score attained by the support vector machines classifier. Rule-based approaches were used to generate the structured outputs and populate them to predefined templates. The rule-based system attained over 97% F-scores on the training sets. A pipeline system was implemented with an assembly of all the components described above. It achieved promising results in the end-to-end evaluations, with 86.5%, 84.2% and 78.9% F-scores on the melanoma, colorectal cancer and lymphoma test sets respectively

    Chimpanzee material culture: implications for human evolution

    Get PDF
    The chimpanzee (Pan troglodytes, Pongidae) among all other living species, is our closest relation, with whom we last shared a common ancestor less than five million years ago. These African apes make and use a rich and varied kit of tools. Of the primates, and even of the other Great Apes, they are the only consistent and habitual tool-users. Chimpanzees meet the criteria of working definitions of culture as originally devised for human beings in socio-cultural anthropology. They show sex differences in using tools to obtain and to process a variety of plant and animal foods. The technological gap between chimpanzees and human societies living by foraging (hunter-gatherers) is surprisingly narrow, at least for food-getting. Different communities of chimpanzees have different tool-kits, and not all of this regional and local variation can be explained by the varied physical and biotic environments in which they live. Some differences are likely customs based on non-functionally derived and symbolically encoded traditions. Chimpanzees serve as heuristic, referential models for the reconstruction of cultural evolution in apes and humans from an ancestral hominoid. However, chimpanzees are not humans, and key differences exist between them, though many of these apparent contrasts remain to be explored empirically and theoretically

    Food for Thought: An Analysis of the Robenhausen Botanicals at the Milwaukee Public Museum

    Get PDF
    Museum collections excavated from archaeological sites represent an intersection of disciplines and provoke innovative approaches to the study of these material aspects of culture. Botanical collections of food remains in particular, provide an opportunity to interrogate the way in which culinary practices in the past are understood. The circum-Alpine lake dwelling complex of central Europe includes hundreds of archaeological sites dating to the Neolithic, Bronze, and Iron Age; many of these sites are known for exceptional preservation of organic material due to a waterlogged, anaerobic environment. Robenhausen, located in eastern Switzerland was one of many lake dwellings discovered in the 19th century when these sites first became known to the archaeological community and the general public. Because of this early discovery date combined with a variety of other circumstances, material culture from this site and many others was part of an artifact diaspora which scattered objects from Robenhausen throughout museums in the U.S. and Europe. Artifacts from this site were rediscovered in the Milwaukee Public Museum’s permanent collection in the early 2000s and include over 8000 plant and food remains, most of which are carbonized and have remained intact for over a century since their removal from the site in Switzerland. This thesis uses a combination of approaches including scientific reporting, macrobotanical identification, experimental archaeology, and theoretical interpretation based in foodways research to interpret this collection of botanical remains. In addition, this project digitally reunites the food and botanicals from Robenhausen with those scattered throughout other museum collections and contributes to our understanding of the complex nature of foodways at the Robenhausen site during the Late Neolithic and Bronze Age
    • …
    corecore