945 research outputs found

    Learning facet-specific entity embeddings

    Get PDF
    An entity embedding is a vector space representation of entities in which similar entities have similar representations. However, similarity is a multi-faceted notion; for example, a person may be similar to one group of people because they graduated from the same university and similar to another group through having the same nationality or playing the same sport. Our hypothesis in this thesis is that learning a single entity embedding is a sub-optimal way to faithfully capture these different facets of similarity. Therefore, this thesis aims to learn facet-specific entity embeddings that capture different facets of similarity, taking inspiration from a framework widely known in cognitive science called conceptual spaces framework. Conceptual spaces [48] are vector space models designed to represent entities of a given kind (e.g. movies), together with their associated properties (e.g. scary), and concepts (e.g. thrillers). As such, they are similar in spirit to the vector space models that have been proposed in natural language processing, but there are also notable differences. First, the dimensions of conceptual spaces, referred to as quality dimensions, are interpretable, as they correspond to semantically meaningful features. Second, conceptual spaces are organized into sets of semantic domains or facets (e.g. genre, language), which are formed by grouping the quality dimensions. Each facet is associated with its own low-dimensional vector space, which intuitively captures similarity with respect to the corresponding facet. For instance, the vector space for the budget facet would only capture whether two movies had similar budgets. From an application point of view, the fact that conceptual spaces are structured into facets is appealing because this allows us to model the different facets of similarity in a more flexible and cognitively more plausible way. Based on this, we hypothesize that learning facet-specific entity embeddings that are similar in spirit to conceptual spaces will allow us to predict the properties and categories of entities more reliably than from standard single space representations. Learning data-driven conceptual spaces, especially in an unsupervised way, has received very limited attention to date. Therefore, in this thesis, we will learn facet-specific entity embeddings that is similar in spirit to conceptual spaces. This includes learning quality dimensions and then grouping them into facets. In particular, in this thesis, we propose three unsupervised models to learn this type of vector space representations for a set of entities using their textual descriptions. In two of these models, we convert traditional vector space embeddings into facet-specific entity embeddings, using quality dimensions-like features. In these cases, we rely on an existing method to learn these features. In our first proposed model, we structured the vector space representations implicitly into meaningful facets by identifying the quality dimensions in a two-level hierarchy: The first level corresponds to the facets, and the second level corresponds to the facet-specific features. In our second developed model, using the quality dimensions and pre-trained word embeddings, we decompose the vector space representations into low-dimensional facets in an incremental way. In both of these models, we depend on clustering algorithms to find facet-specific features. In contrast, our third proposed model uses a mixture-of experts formulation to find the features that describe each facet and it simultaneously learns the facet-specific embeddings directly from the bag-of-words. We evaluate our models on several datasets, each of which contains a set of entities with their textual descriptions and a number of classification tasks, using a range of different classifiers. The experimental results support our hypothesis that, by capturing different facets of similarity, facet-specific vector space representations improve a model’s ability to predict the categories and properties of entities

    Fourteenth Biennial Status Report: März 2017 - February 2019

    No full text

    Semantic Document Clustering for Crime Investigation

    Get PDF
    Computers are increasingly used as tools to commit crimes such as unauthorized access (hacking), drug trafficking, and child pornography. The proliferation of crimes involving computers has created a demand for special forensic tools that allow investigators to look for evidence on a suspect’s computer by analyzing communications and data on the computer’s storage devices. Motivated by the forensic process at Sûreté du Québec (SQ), the Québec provincial police, we propose a new subject-based semantic document clustering model that allows an investigator to cluster documents stored on a suspect’s computer by grouping them into a set of overlapping clusters, each corresponding to a subject of interest initially defined by the investigator

    Ethical Control of Unmanned Systems: lifesaving/lethal scenarios for naval operations

    Get PDF
    Prepared for: Raytheon Missiles & Defense under NCRADA-NPS-19-0227This research in Ethical Control of Unmanned Systems applies precepts of Network Optional Warfare (NOW) to develop a three-step Mission Execution Ontology (MEO) methodology for validating, simulating, and implementing mission orders for unmanned systems. First, mission orders are represented in ontologies that are understandable by humans and readable by machines. Next, the MEO is validated and tested for logical coherence using Semantic Web standards. The validated MEO is refined for implementation in simulation and visualization. This process is iterated until the MEO is ready for implementation. This methodology is applied to four Naval scenarios in order of increasing challenges that the operational environment and the adversary impose on the Human-Machine Team. The extent of challenge to Ethical Control in the scenarios is used to refine the MEO for the unmanned system. The research also considers Data-Centric Security and blockchain distributed ledger as enabling technologies for Ethical Control. Data-Centric Security is a combination of structured messaging, efficient compression, digital signature, and document encryption, in correct order, for round-trip messaging. Blockchain distributed ledger has potential to further add integrity measures for aggregated message sets, confirming receipt/response/sequencing without undetected message loss. When implemented, these technologies together form the end-to-end data security that ensures mutual trust and command authority in real-world operational environments—despite the potential presence of interfering network conditions, intermittent gaps, or potential opponent intercept. A coherent Ethical Control approach to command and control of unmanned systems is thus feasible. Therefore, this research concludes that maintaining human control of unmanned systems at long ranges of time-duration and distance, in denied, degraded, and deceptive environments, is possible through well-defined mission orders and data security technologies. Finally, as the human role remains essential in Ethical Control of unmanned systems, this research recommends the development of an unmanned system qualification process for Naval operations, as well as additional research prioritized based on urgency and impact.Raytheon Missiles & DefenseRaytheon Missiles & Defense (RMD).Approved for public release; distribution is unlimited
    • …
    corecore