212 research outputs found

    Assessing and Remedying Coverage for a Given Dataset

    Full text link
    Data analysis impacts virtually every aspect of our society today. Often, this analysis is performed on an existing dataset, possibly collected through a process that the data scientists had limited control over. The existing data analyzed may not include the complete universe, but it is expected to cover the diversity of items in the universe. Lack of adequate coverage in the dataset can result in undesirable outcomes such as biased decisions and algorithmic racism, as well as creating vulnerabilities such as opening up room for adversarial attacks. In this paper, we assess the coverage of a given dataset over multiple categorical attributes. We first provide efficient techniques for traversing the combinatorial explosion of value combinations to identify any regions of attribute space not adequately covered by the data. Then, we determine the least amount of additional data that must be obtained to resolve this lack of adequate coverage. We confirm the value of our proposal through both theoretical analyses and comprehensive experiments on real data.Comment: in ICDE 201

    Democratizing Self-Service Data Preparation through Example Guided Program Synthesis,

    Full text link
    The majority of real-world data we can access today have one thing in common: they are not immediately usable in their original state. Trapped in a swamp of data usability issues like non-standard data formats and heterogeneous data sources, most data analysts and machine learning practitioners have to burden themselves with "data janitor" work, writing ad-hoc Python, PERL or SQL scripts, which is tedious and inefficient. It is estimated that data scientists or analysts typically spend 80% of their time in preparing data, a significant amount of human effort that can be redirected to better goals. In this dissertation, we accomplish this task by harnessing knowledge such as examples and other useful hints from the end user. We develop program synthesis techniques guided by heuristics and machine learning, which effectively make data preparation less painful and more efficient to perform by data users, particularly those with little to no programming experience. Data transformation, also called data wrangling or data munging, is an important task in data preparation, seeking to convert data from one format to a different (often more structured) format. Our system Foofah shows that allowing end users to describe their desired transformation, through providing small input-output transformation examples, can significantly reduce the overall user effort. The underlying program synthesizer can often succeed in finding meaningful data transformation programs within a reasonably short amount of time. Our second system, CLX, demonstrates that sometimes the user does not even need to provide complete input-output examples, but only label ones that are desirable if they exist in the original dataset. The system is still capable of suggesting reasonable and explainable transformation operations to fix the non-standard data format issue in a dataset full of heterogeneous data with varied formats. PRISM, our third system, targets a data preparation task of data integration, i.e., combining multiple relations to formulate a desired schema. PRISM allows the user to describe the target schema using not only high-resolution (precise) constraints of complete example data records in the target schema, but also (imprecise) constraints of varied resolutions, such as incomplete data record examples with missing values, value ranges, or multiple possible values in each element (cell), so as to require less familiarity of the database contents from the end user.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163059/1/markjin_1.pd

    Components of the Hematopoietic Compartments in Tumor Stroma and Tumor-Bearing Mice

    Get PDF
    Solid tumors are composed of cancerous cells and non-cancerous stroma. A better understanding of the tumor stroma could lead to new therapeutic applications. However, the exact compositions and functions of the tumor stroma are still largely unknown. Here, using a Lewis lung carcinoma implantation mouse model, we examined the hematopoietic compartments in tumor stroma and tumor-bearing mice. Different lineages of differentiated hematopoietic cells existed in tumor stroma with the percentage of myeloid cells increasing and the percentage of lymphoid and erythroid cells decreasing over time. Using bone marrow reconstitution analysis, we showed that the tumor stroma also contained functional hematopoietic stem cells. All hematopoietic cells in the tumor stroma originated from bone marrow. In the bone marrow and peripheral blood of tumor-bearing mice, myeloid populations increased and lymphoid and erythroid populations decreased and numbers of hematopoietic stem cells markedly increased with time. To investigate the function of hematopoietic cells in tumor stroma, we co-implanted various types of hematopoietic cells with cancer cells. We found that total hematopoietic cells in the tumor stroma promoted tumor development. Furthermore, the growth of the primary implanted Lewis lung carcinomas and their metastasis were significantly decreased in mice reconstituted with IGF type I receptor-deficient hematopoietic stem cells, indicating that IGF signaling in the hematopoietic tumor stroma supports tumor outgrowth. These results reveal that hematopoietic cells in the tumor stroma regulate tumor development and that tumor progression significantly alters the host hematopoietic compartment

    Constrained Load Transportation by A Team of Quadrotors

    Full text link

    Software for Foofah

    No full text

    A Self-Cloning Agents Based Model for High-Performance Mobile-Cloud Computing

    No full text
    The rise of the mobile-cloud computing paradigm in recent years has enabled mobile devices with processing power and battery life limitations to achieve complex tasks in real-time. While mobile-cloud computing is promising to overcome the limitations of mobile devices for real-time computing, the lack of frameworks compatible with standard technologies and techniques for dynamic performance estimation and program component relocation makes it harder to adopt mobile-cloud computing at large. Most of the available frameworks rely on strong assumptions such as the availability of a full clone of the application code and negligible execution time in the cloud. In this paper, we present a dynamic computation offloading model for mobile-cloud computing, based on autonomous agents. Our approach does not impose any requirements on the cloud platform other than providing isolated execution containers, and it alleviates the management burden of offloaded code by the mobile platform using stateful, autonomous application partitions. We also investigate the effects of different cloud runtime environment conditions on the performance of mobile-cloud computing, and present a simple and low-overhead dynamic makespan estimation model integrated into autonomous agents to enhance them with self-performance evaluation in addition to self-cloning capabilities. The proposed performance profiling model is used in conjunction with a cloud resource optimization scheme to ensure optimal performance. Experiments with two mobile applications demonstrate the effectiveness of the proposed approach for high-performance mobile-cloud computing
    corecore