37 research outputs found

    Natural Language Processing (NLP) of Liberal Arts College Newspapers in Ohio over 30 years

    Get PDF
    Computers have been extremely useful in humanity’s quest for knowledge, performing calculations and other strenuous tasks in seconds. For a computer to perform the tasks, it requires a specific set of instructions, or code, to tell it what to do. These series of commands and instructions are strict, in that any syntactic error results in faulty, or zero functionality. Human language is very much unlike that of a computer, in that it can be grammatically incorrect, irregular, or even incomplete, yet another human may still get the point and understand the information being exchanged. A significant part about what makes us human is the ability to use and develop our dynamic language. When a computer is able to completely understand and mimic human language, we will likely have something closer to artificial intelligence than anything we’ve seen yet. Development in this area has lead to Natural Language Processing, or NLP. On March 31st2017, students, professors, and hobbyists alike gathered together at the HackOH5 Student Newspaper Hackathon to analyze 170,000+ pages of student newspapers from 5 colleges: Kenyon, Denison, Oberlin, Ohio Wesleyan, and the College of Wooster. Spanning over 160 years, the digitized libraries were filled with years of student coverage organized in a huge dataset of text and images. What can be done with all of this newly digitized information? NLP allows us to analyze, visualize, and contextualize this textual data. This project aims to analyze textual information recorded between 1970 and 2000 by three different colleges: Kenyon, Denison, and Oberlin. By using NLP, any word used by any school from any issue can be mapped to an n-dimensional semantic space where the distances between the words can be used to represent their semantic closeness. For example, words like student will be closely associated with professor, college, people, alumni, etc. By investigating specific words that are historically relevant, we can try to understand how each college might perceive certain events and compare them with each other

    Small Satellite Payload Calibration

    Get PDF
    This project focused on developing an efficient and cost-effective method for calibrating optical payloads that streamlines setup, measurement, and analysis time while staying within a SmallSat budget. To develop and test the concept, the team identified key calibration parameters and performed a demonstration on a surrogate payload using spatial, spectral, and radiometric calibration methods. Calibration results were derived from the demonstration and are detailed below

    An expansive human regulatory lexicon encoded in transcription factor footprints.

    Get PDF
    Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency

    The accessible chromatin landscape of the human genome

    Get PDF
    DNaseI hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers, and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ~2.9 million DHSs that encompass virtually all known experimentally-validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation, and regulatory factor occupancy patterns. We connect ~580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is choreographed with dozens to hundreds of co-activated elements, and the trans-cellular DNaseI sensitivity pattern at a given region can predict cell type-specific functional behaviors. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation

    Sustained increases in atmospheric oxygen and marine productivity in the Neoproterozoic and Palaeozoic eras

    Get PDF
    A geologically rapid Neoproterozoic oxygenation event is commonly linked to the appearance of marine animal groups in the fossil record. However, there is still debate about what evidence from the sedimentary geochemical record—if any—provides strong support for a persistent shift in surface oxygen immediately preceding the rise of animals. We present statistical learning analyses of a large dataset of geochemical data and associated geological context from the Neoproterozoic and Palaeozoic sedimentary record and then use Earth system modelling to link trends in redox-sensitive trace metal and organic carbon concentrations to the oxygenation of Earth’s oceans and atmosphere. We do not find evidence for the wholesale oxygenation of Earth’s oceans in the late Neoproterozoic era. We do, however, reconstruct a moderate long-term increase in atmospheric oxygen and marine productivity. These changes to the Earth system would have increased dissolved oxygen and food supply in shallow-water habitats during the broad interval of geologic time in which the major animal groups first radiated. This approach provides some of the most direct evidence for potential physiological drivers of the Cambrian radiation, while highlighting the importance of later Palaeozoic oxygenation in the evolution of the modern Earth system
    corecore