3 research outputs found

    De-identifier for electronic medical records

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from PDF version of thesis.Includes bibliographical references (p. 73-74).In this thesis, I describe our effort to build an extended and specialized Named Entity Recognizer (NER) to detect instances of Protected Health Information (PHI) in electronic medical records (A de-identifier). The de-identifier was built by creating a comprehensive set of features formed by combining features from the most successful named entity recognizers and de-identifiers and using them in a SVM classifier. We show that the benefit from having an inclusive set of features outweighs the harm from the very large dimensionality of the resulting classification problem. We also show that our classifier does not over-fit the training data. We test whether this approach is more effective than using the NERs separately and combining the results using a committee voting procedure. Finally, we show that our system achieves a precision of up to 1.00, a recall of up to 0.97, and an f-measure of up to 0.98 on a variety of corpora.by Arya Tafvizi.M.Eng

    Dynamics of the urban lightscape

    Get PDF
    The manifest importance of cities and the advent of novel data about them are stimulating interest in both basic and applied “urban science” (Bettencourt et al., 2007 [4]; Bettencourt, 2013 [3]). A central task in this emerging field is to document and understand the “pulse of the city” in its diverse manifestations (e.g., in mobility, energy use, communications, economics) both to define the normal state against which anomalies can be judged and to understand how macroscopic city observables emerge from the aggregate behavior of many individuals (Louail, 2013 [9]; Ferreira et al., 2013 [6]). Here we quantify the dynamics of an urban lightscape through the novel modality of persistent synoptic observations from an urban vantage point. Established astronomical techniques are applied to visible light images captured at 0.1 Hz to extract and analyze the light curves of 4147 sources in an urban scene over a period of 3 weeks. We find that both residential and commercial sources in our scene exhibit recurring aggregate patterns, while the individual sources decorrelate by an average of one hour after only one night. These highly granular, stand-off observations of aggregate human behavior – which do not require surveys, in situ monitors, or other intrusive methodologies – have a direct relationship to average and dynamic energy usage, lighting technology, and the impacts of light pollution. They may also be used indirectly to address questions in urban operations as well as behavioral and health science. Our methodology can be extended to other remote sensing modalities and, when combined with correlative data, can yield new insights into cities and their inhabitants
    corecore