37 research outputs found

    Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks

    Full text link
    Large language models have introduced exciting new opportunities and challenges in designing and developing new AI-assisted writing support tools. Recent work has shown that leveraging this new technology can transform writing in many scenarios such as ideation during creative writing, editing support, and summarization. However, AI-supported expository writing--including real-world tasks like scholars writing literature reviews or doctors writing progress notes--is relatively understudied. In this position paper, we argue that developing AI supports for expository writing has unique and exciting research challenges and can lead to high real-world impacts. We characterize expository writing as evidence-based and knowledge-generating: it contains summaries of external documents as well as new information or knowledge. It can be seen as the product of authors' sensemaking process over a set of source documents, and the interplay between reading, reflection, and writing opens up new opportunities for designing AI support. We sketch three components for AI support design and discuss considerations for future research.Comment: 3 pages, 1 figure, accepted by The Second Workshop on Intelligent and Interactive Writing Assistant

    Analyzing petabytes of data with Hadoop

    No full text
    Abstract The open source Apache Hadoop project provides a powerful suite of tools for storing and analyzing petabytes of data using commodity hardware. After several years of production use inside of web companies like Yahoo! and Facebook and nearly a year of commercial support and development by Cloudera, the technology is spreading rapidly through other disciplines, from financial services and government to life sciences and high energy physics. The talk will motivate the design of Hadoop and discuss some key implementation details in depth. It will also cover the major subprojects in the Hadoop ecosystem, go over some example applications, highlight best practices for deploying Hadoop in your environment, discuss plans for the future of the technology, and provide pointers to the many resources available for learning more. In addition to providing more information about the Hadoop platform, a major goal of this talk is to begin a dialogue with the ATLAS research team on how the tools commonly used in their environment compare to Hadoop, and how Hadoop could improve better to serve the high energy physics community. Short Biography Jeff Hammerbacher is Vice President of Products and Chief Scientist at Cloudera. Jeff was an Entrepreneur in Residence at Accel Partners immediately prior to founding Cloudera. Before Accel, he conceived, built, and led the Data team at Facebook. The Data team was responsible for driving many of the applications of statistics and machine learning at Facebook, as well as building out the infrastructure to support these tasks for massive data sets. The team produced two open source projects: Hive, a system for offline analysis built above Hadoop, and Cassandra, a structured storage system on a P2P network. Before joining Facebook, Jeff was a quantitative analyst on Wall Street. Jeff earned his Bachelor's Degree in Mathematics from Harvard University and recently served as contributing editor to the book "Beautiful Data", published by O'Reilly in July 2009

    Computing in High Energy and Nuclear Physics (CHEP) 2012

    No full text
    Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resulting in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We will present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows

    Information Platforms and the Rise of the Data Scientist

    No full text
    Book chapter from "Beautiful Data

    The State-Space Paradigm

    No full text
    A paper I wrote for a computational neuroscience course. It's just a summary of the work of other people. There's nothing new from me in it. I just need to be able to link to it

    Designing the Data Science Curriculum

    No full text
    Slides from my Interface 2013 presentatio

    Beautiful Data: The Stories Behind Elegant Data Solutions

    No full text
    In this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging -- and beautiful -- working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video. With Beautiful Data, you will: Explore the opportunities and challenges involved in working with the vast number of datasets made available by the WebLearn how to visualize trends in urban crime, using maps and data mashupsDiscover the challenges of designing a data processing system that works w

    Heterogeneity of mutated tumor antigens in a single high grade ovarian serous carcinoma

    No full text
    <div>Abstract: Patient PT189 presented with stage IIIC high grade papillary serous ovarian cancer in 2012. She was originally treated with carboplatin/paclitaxel. Having failed this original doublet chemotherapy, she has received a total of five additional chemotherapy regimens, all with recurrence of her disease. We performed exome and RNA sequencing on normal PBMC as well as nine tumor samples collected over a two year period during the course of her treatment.</div><p>Choice of variant calling algorithm yields significantly different results in each sample. Low overlap between epitope predictions for primary vs. recurrent sample sets using “confident” variants. Greater concordance between samples from the primary time point compared with later recurrences.<br><br>Presented at: 13th Cancer Immunotherapy (CIMT) annual meeting 2015<br><br>Authors: Alex Rubinsteyn, John Martignetti , Elena Pereira, Tim O'Donnell , Arun Ahuja , Leo Garnar-Wortzel , Robert Sebra , Peter Dottino , Jeff Hammerbacher , Eric Schadt</p><br><br

    hammerlab/ketrew: Ketrew 3.1.0: Performance and UI improvements

    No full text
    Improve display of programs and logs in the WebUI. Add more User-level notifications for async-errors. Add indexes to the DB. Improve node-list display in the TextUI. Remove and clean-out some code. Add option ~safe_ids to job submission (on by default). Fix build with Lwt ≄ 3.0.0. Improve "Getting Started" documentation
    corecore