3 research outputs found

    Benchmarking real-time distributed object management systems for evolvable and adaptable command and control applications

    Get PDF
    Abstract This paper describes benchmarking for evolvable and adaptable real-time command and control systems Introduction MITRE's Evolvable Real-Time C3 initiative developed an approach that would enable current real-time systems to evolve into the systems of the future. We designed and implemented an infrastructure and data manager so that various applications could be hosted on the infrastructure. Then we completed a follow-on effort to design flexible adaptable distributed object management systems for command and control (C2) systems. Such an adaptable system would switch scheduling algorithms, policies, and protocols depending on the need and the environment. Both initiatives were carried out for the United States Air Force. One of the key contributions of our work is the investigation of real-time features for distributed object management systems. Partly as a result of our work we are now seeing various real-time distributed object management products being developed. In selecting a real-time distributed object management systems, we need to analyze various criteria. Therefore, we need benchmarking studies for realtime distributed object management systems. Although benchmarking systems such as Hartstone and Distributed Hartstone have been developed for middleware systems, these systems are not developed specifically for distributed object-based middleware. Since much of our work is heavily based on distributed objects, we developed benchmarking systems by adapting the Hartstone system. This paper describes out effort on developing benchmarks. In section 2 we discuss Distributed Hartstone. Then in section 3, we first provide background on the original Hartstone and DHartstone designs from SEI (Software Engineering Institute) and CMU (Carnegie Mellon University). We then describe our design and modification of DHartstone to incorporate the capability to benchmark real-time middleware in Section 4. Sections 5 and 6 describe the design of the benchmarking systems. For more details of our work on benchmarking and experimental results we refer to [MAUR98] and [MAUR99]. For background information of our work we refer t

    Mitap: A case study of integrated knowledge discovery tools

    No full text
    The MiTAP system was developed as an experimental prototype using human language technologies for the monitoring of infectious disease outbreaks. The system provides timely, multi-lingual, global information access to analysts, medical experts and individuals involved in humanitarian assistance and relief work. Each day, thousands of articles from electronic information sources spanning multiple languages are automatically captured, translated, tagged, summarized, and presented to users in a variety of ways. Over the course of the past year and half, MiTAP has become a useful tool for real users to solve real problems. The success of MiTAP is greatly attributed to its user-focused design that accommodates the imperfect component technologies and that allows users to interact with the system in familiar ways. We will discuss the problem, the design process, and the implementation from the perspective of services provided and how these services support system capabilities that satisfy user requirements. 1

    MiTAP for Bio-Security: A Case Study Global Tracking of Infectious Disease Outbreaks and Emerging Biological Threats

    No full text
    Abstract * MiTAP (MITRE Text and Audio Processing) is a prototype system available for monitoring infectious disease outbreaks and other global events. MiTAP focuses on providing timely, multi-lingual, global information access to medical experts and individuals involved in humanitarian assistance and relief work. Multiple information sources in multiple languages are automatically captured, filtered, translated, summarized, and categorized by disease, region, information source, person, and organization. Critical information is automatically extracted and tagged to facilitate browsing, searching, and sorting. The system supports shared situational awareness through collaboration, allowing users to submit other articles for processing, annotate existing documents, post directly to the system, and flag messages for others to see. MiTAP currently stores over one million articles and processes an additional 2000 to 10,000 daily, delivering up-to-date information to dozens of regular users. Global Tracking of Infectious Disease Outbreaks and Emerging Biological Threats Over the years, greatly expanded trade and travel have increased the potential economic and political impacts of major disease outbreaks, given their ability to move rapidly across national borders. These diseases can affect people (West Nile virus, HIV, Ebola, Bovine Spongiform Encephalitis), animals (foot-and-mouth disease) and plants (citrus canker in Florida). More recently, the potential of biological terrorism has become a very real threat. On September 11 th , 2001, the Center for Disease Control alerted states and local public health agencies to monitor for any unusual disease patterns, including the effects of chemical and biological agents. In addition to possible disruption and loss of life, bioterrorism could foment political instability, given the panic that fast-moving plagues have historically engendered. Appropriate response to disease outbreaks and emerging threats depends on obtaining reliable and up-to-date Approved for Public Release: Distribution Unlimited. To be published in AI Magazine, special edition highlighting best work from information, which often means monitoring many news sources, particularly local news sources, in many languages worldwide. Analysts cannot feasibly acquire, manage, and digest the vast amount of information available 24 hours a day, seven days a week. In addition, access to foreign language documents and the local news of other countries is generally limited. Even when foreign language news is available, it is usually no longer current by the time it is translated and reaches the hands of an analyst. This is a very real problem that raises an urgent need to develop automated support for global tracking of infectious disease outbreaks and emerging biological threats. The MiTAP (MITRE Text and Audio Processing) system was created to explore the integration of synergistic TIDES language processing technologies: Translation, Information Detection, Extraction, and Summarization. TIDES aims to revolutionize the way that information is obtained from human language by enabling people to find and interpret needed information quickly and effectively, regardless of language or medium. MiTAP is designed to provide the end user with timely, accurate, novel information and present it in a way that allows the analyst to spend more time on analysis and less time on finding, translating, distilling and presenting information. On September 11 th , 2001, the research prototype system became available to real users for real problems. Text and Audio Processing for Bio-Security MiTAP focuses on providing timely, multi-lingual, global information access to analysts, medical experts and individuals involved in humanitarian assistance and relief work. Multiple information sources (epidemiological reports, newswire feeds, email, online news) in multiple languages (English, Chinese, French, German, Italian, Portuguese, Russian, and Spanish) are automatically captured, filtered, translated, summarized, and categorized into searchable newsgroups based on disease, region, information source, person, organization, and language. Critical information is automatically extracted and tagged to facilitate browsing, searching, and sorting. The system supports shared situational awareness through collaboration, allowing users to submit other articles for processing, annotate existing documents, and post directly to the system. A web-based search engine supports sourcespecific, full-text information retrieval. Additional "views" into the data facilitate analysis and can serve as alerts to events, such as disease outbreaks. Information Processing Each normalized message is passed through a zoner that uses human-generated rules to identify the source, date, and other fields such as headline or title, article body, etc. The zoned messages are preprocessed to identify paragraph, sentence, and word boundaries as well as partof-speech tags. This preprocessing is carried out by the Alembic natural language analyzer User Interface The final phase consists of the user interface and related processing. The processed messages are converted to HTML, with color-coded named entities, and routed to newsgroups hosted by a Network News Transport Protocol (NNTP) server, InterNetNews (INN 2001). See One major advantage to using the NNTP server is that users can access the information using a standard mail/news browser such as Netscape Messenger or Outlook Express. There is no need to install custom software, and the instant sense of familiarity with the interface is crucial in gaining user acceptance -little to no training is required. Mail readers also provide additional functionality such as alerting to new messages on specified topics, flagging messages of significance, and access to local directories that can be used as a private workspace. Other newsgroups can be created as collaborative repositories for users to share collective information. To supplement access to the data, messages are indexed using the Lucene information retrieval system (The Jakarta Project 2001), allowing users to do full text, sourcespecific Boolean queries over the entire set of messages. As the relevance of messages tends to be time dependent, we have implemented an optimized query mechanism to do faster searches over time intervals. MiTAP Development and Deployment The initial MiTAP system was put together over a 9-month period. Our goal was to build a prototype quickly to demonstrate the results of integrating multiple natural language processing (NLP) technologies. The longer-term strategy is to upgrade the components progressively as better performing modules become available and to migrate towards our developing architecture. For the initial implementation, we chose components based on availability as well as ease of integration and modification. This meant that we used components developed at MITRE (extraction, summarization) or developed with MITRE involvement (translation support), or commercial off-theshelf (COTS) components (translation engines, information retrieval, news server, news browser interface). In cases where no component was readily available, we developed a minimal capability for MiTAP, e.g., scripts for capture of news sources, or use of named entity extraction for headline generation and binning of messages into appropriate newsgroups. Since July 2000, we have been working to incorporate modules from other groups (e.g., Columbia's Newsblaster, As part of the long-term efforts, we have been concurrently developing a framework known as Catalyst (Mardis and Burger 2001). Catalyst provides a common data model based on standoff annotation, efficient compressed data formats, distributed processing, and annotation indexing. Standoff annotation (see, for example, Uses of AI Technology Artificial Intelligence (AI) technology and techniques pervade MiTAP to support its multi-faceted, multi-lingual and multi-functional requirements. From automated natural language processing techniques to information retrieval, the NLP modules utilize AI extensively. The techniques utilized fall predominantly into the data-driven camp of methods. Below we describe the components, roughly in their order of processing flow. The CyberTrans machine translation server employs a combination of AI techniques to optimize the performance of COTS machine translation (MT) systems. Since system developers have only the most basic insight into the MT systems, we will not describe related AI techniques in depth here, and interested readers are referred to COTS MT systems are designed primarily for interactive use in situations where users have control over the language, formatting and well-formedness of the input text. In adapting CyberTrans for real users and real-world data, the necessity for supporting technologies was quickly apparent. Three of these are of particular interest: automated language identification, automated code set conversion, and automated spelling correction, particularly for the incorporation of diacritics. The resulting tools can be used individually and eventually as standalone modules, but are currently integrated into the CyberTrans processing flow. The first, most essential, part of automated processing of language data is to determine both the language and code set representation of the input text. While it would seem obvious that users know at least what the language of a given document is, this has proven not to be the case, particularly in non-Romanized languages such as Arabic or Chinese. In these situations, documents appear as unintelligible byte streams. In addition, some of the data sources contain documents in a mix of languages, so knowledge of the source does not necessarily determine the language. This is a classical categorization problem with a search a space of N*M where N is the number of languages to be recognized and M the number of code set representations. The categories are determined by a combination of n-graph measurements using the Acquaintance algorithm (Huffman 1996) with simple heuristics whittling down the search space. Once the code set has been determined, it is converted into a standard representation. This process is not without information loss, so spelling corrections are applied. The most straight-forward spelling correction involves the reinsertion of diacritical markers where they are missing. This is treated as a word-sense disambiguation problem (Yarowsky 1994) and relies on both language spelling rules and trained probabilities of word occurrences. Here, the solution is a hybrid system where hand-coded rules are enforced using statistical measures of likely word occurrences. In addition, a specialized tagging operation occurs, that of temporal resolution. While dates such as 09 September 2000 are relatively unambiguous, many time references found in natural language are not, for instance last Tuesday. To get the time sequencing of events of multiple stories correct, it is necessary to resolve the possible wide range of time references accurately. In this case, the resolution algorithm also combines basic linguistic knowledge with rules learned from corpora Similarly, place names are often only partially specified. For example, there are a great many places in South America named La Esperanza. We are currently developing a module to apply a mix of hand-written rules and machine learning to metadata and contextual clues drawn from a large corpus to disambiguate place names. This range of tagging procedures represents a strong shift in natural language processing research over the past fifteen years towards "corpus-based" methods. This work begins with the manual annotation of a corpus, a set of naturally occurring linguistic artifacts, by which some level of linguistic analysis (word segmentation, part-ofspeech, semantic referent, syntactic phrase, etc.) is associated with the relevant portion of text. The resulting data provides a rich basis for empirically-driven research and development, as well as formal evaluations of systems attempting to re-create this analysis automatically. The availability of such corpora have spurred a significant interest in machine learning and statistical methods in natural language processing research, of which those mentioned above are just a few. One of the benefits of the rule-sequence model adopted in MiTAP's Alembic component is its support for easily and effectively combining automatically derived heuristics with those developed manually. This was a key element in successfully modifying the Alembic NLP system for MiTAP in the absence of any significant annotated corpus
    corecore