8 research outputs found

    Lower Bounds on the Communication Complexity of Shifting

    Get PDF
    We study the communication complexity of the SHIFT (equivalently, SUM-INDEX) function in a 3-party simultaneous message model. Alice and Bob share an n-bit string x and Alice holds an index i and Bob an index j. They must send messages to a referee who knows only n, i and j, enabling him to determine x[(i+j) mod n]. Surprisingly, it is possible to achieve nontrivial savings even with such a strong restriction: Bob can now make do with only ceil(n/2) bits. Here we show that this bound is completely tight, for all n. This is an exact lower bound, with no asymptotics involved

    Schema extraction for tabular data on the web

    No full text
    Tabular data is an abundant source of information on the Web, but remains mostly isolated from the latter’s interconnections since tables lack links and computer-accessible descriptions of their structure. In other words, the schemas ofthesetables—attributenames, values, datatypes, etc.— are not explicitly stored as table metadata. Consequently, the structure that these tables contain is not accessible to the crawlers that power search engines and thus not accessible to user search queries. We address this lack of structure with a new method for leveraging the principles of table construction in order to extract table schemas. Discovering the schema by which a table is constructed is achieved by harnessing the similarities and differences of nearby table rows through the use of a novel set of features and a feature processing scheme. The schemas of these data tables are determined using a classification technique based on conditional random fields in combination with a novel feature encoding method called logarithmic binning, which is specifically designed for the data table extraction task. Our method provides considerable improvement over the wellknown WebTables schema extraction method. In contrast with previous work that focuses on extracting individual relations, our method excels at correctly interpreting full tables, thereby being capable of handling general tables such as those found in spreadsheets, instead of being restricted to HTMLtablesasisthecasewiththeWebTablesmethod. We also extract additional schema characteristics, such as row groupings, which are important for supporting information retrieval tasks on tabular data. 1

    Spatio-temporal disease tracking using news articles

    No full text
    Geographical Information Systems have been increasingly used to aid the prompt detection, tracking, and analysis of disease outbreaks. Web content which is full of health-related data also serves as a useful resource for disease out-break analysis. News posts often report the initial outbreak of diseases and contain valuable information that aids in as-certaining the time and location of the disease outbreak. The locations mentioned in the news posts are specified textu-ally rather than geometrically thereby requiring the use of geotagging methods to detect them and to map the textual specification to the corresponding actual geometric specifi-cation. The NewsStand system which aggregates news posts by topic and location while providing a map query inter-face to them is enhanced to enable disease tracking and analysis by geotagging disease-related web news posts. Be-sides the powerful functionalities of NewsStand for news ex-ploration, enhancements of NewsStand with respect to the analysis of temporal information are described which include a well-designed time slider, a heatmap-based visualization tool for displaying disease distribution, and intuitive spatio-temporal querying methods. Future improvements to News-Stand are also discussed
    corecore