971 research outputs found
A hierarchical taxonomy for classifying hardness of inference tasks
International audienceExhibiting inferential capabilities is one of the major goals of many modern Natural Language Processing systems. However, if attempts have been made to define what textual inferences are, few seek to classify inference phenomena by difficulty. In this paper we propose a hierarchical taxonomy for inferences, relatively to their hardness, and with corpus annotation and system design and evaluation in mind. Indeed, a fine-grained assessment of the difficulty of a task allows us to design more appropriate systems and to evaluate them only on what they are designed to handle. Each of seven classes is described and provided with examples from different tasks like question answering, textual entailment and coreference resolution. We then test the classes of our hierarchy on the specific task of question answering. Our annotation process of the testing data at the QA4MRE 2013 evaluation campaign reveals that it is possible to quantify the contrasts in types of difficulty on datasets of the same task
Automatic document classification of biological literature
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature.
Results: We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept.
Conclusions: We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept
An approach to knowledge assessment in an Intelligent Tutoring System
In this paper, we present an approach to student's evaluation in a well-defined domain based on a semantic network. A similarity matrix based on the semantic memory structure of humans is used to build a semantic distance model in order to describe an assessment technique to evaluate the student's state of knowledge. Our aim is to facilitate a deeper conceptual understanding of domain principles. We are developing a new student model including an assessment module with DistSem model.Workshop de Tecnología Informática Aplicada en Educación (WTIAE)Red de Universidades con Carreras en Informática (RedUNCI
Reasoning about river basins: WaWO+ revisited
© . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/This paper characterizes part of an interdisciplinary research effort on Artificial Intelligence (AI) techniques and tools applied to Environmental Decision-Support Systems (EDSS). WaWO+ the ontology we present here, provides a set of concepts that are queried, advertised and used to support reasoning about and the management of urban water resources in complex scenarios as a River Basin. The goal of this research is to increase efficiency in Data and Knowledge interoperability and data integration among heterogeneous environmental data sources (e.g., software agents) using an explicit, machine understandable ontology to facilitate urban water resources management within a River Basin.Peer ReviewedPostprint (author's final draft
Automated ontology framework for service robots
This paper presents an automated ontology framework for service robots. The framework is designed to automatically create an ontology and an instance of concept in dynamic environment. Ontology learning from text is applied to build a concept hierarchy using WordNet which provides a rich semantic processing for physical objects. The Automated Ontology is composed of four modules: Concept Creation, Property Creation, Relationship Creation and Instance of Concept Creation. The automated ontology algorithm was implemented in order to create the concept hierarchy in the Robot Ontology. The Semantic Knowledge Acquisition represents knowledge of physical objects in dynamic environments. In simulation experiments, the list of object names and property names was identified. The result shows the concept hierarchy which represents explicit terms and the semantic knowledge of physical objects for performing everyday manipulation tasks
Part grouping for efficient process planning
A framework to provide automated part grouping has been investigated in order
to overcome the limitations found in existing part grouping techniques. The work is
targeted at: exploration of criteria for feature-based part grouping to make the process
planning activity efficient; determination of the optimal number of part families in the part grouping
process; development of an experimental hybrid process planning system (HYCAPP); investigation of the effects of improved part grouping on manufacturing cell
design.
The research work has explored the creation of a feature-based component data
model and manufacturing system capability data model, and checked the limitations
inherent in existing part grouping techniques i.e. part grouping: around methods; based
on part geometry; based on machining processes; and based on machines. [Continues.
Meta-Generalization for Multiparty Privacy Learning to Identify Anomaly Multimedia Traffic in Graynet
Identifying anomaly multimedia traffic in cyberspace is a big challenge in
distributed service systems, multiple generation networks and future internet
of everything. This letter explores meta-generalization for a multiparty
privacy learning model in graynet to improve the performance of anomaly
multimedia traffic identification. The multiparty privacy learning model in
graynet is a globally shared model that is partitioned, distributed and trained
by exchanging multiparty parameters updates with preserving private data. The
meta-generalization refers to discovering the inherent attributes of a learning
model to reduce its generalization error. In experiments, three
meta-generalization principles are tested as follows. The generalization error
of the multiparty privacy learning model in graynet is reduced by changing the
dimension of byte-level imbedding. Following that, the error is reduced by
adapting the depth for extracting packet-level features. Finally, the error is
reduced by adjusting the size of support set for preprocessing traffic-level
data. Experimental results demonstrate that the proposal outperforms the
state-of-the-art learning models for identifying anomaly multimedia traffic.Comment: Correct some typo
A Survey of Imbalanced Learning on Graphs: Problems, Techniques, and Future Directions
Graphs represent interconnected structures prevalent in a myriad of
real-world scenarios. Effective graph analytics, such as graph learning
methods, enables users to gain profound insights from graph data, underpinning
various tasks including node classification and link prediction. However, these
methods often suffer from data imbalance, a common issue in graph data where
certain segments possess abundant data while others are scarce, thereby leading
to biased learning outcomes. This necessitates the emerging field of imbalanced
learning on graphs, which aims to correct these data distribution skews for
more accurate and representative learning outcomes. In this survey, we embark
on a comprehensive review of the literature on imbalanced learning on graphs.
We begin by providing a definitive understanding of the concept and related
terminologies, establishing a strong foundational understanding for readers.
Following this, we propose two comprehensive taxonomies: (1) the problem
taxonomy, which describes the forms of imbalance we consider, the associated
tasks, and potential solutions; (2) the technique taxonomy, which details key
strategies for addressing these imbalances, and aids readers in their method
selection process. Finally, we suggest prospective future directions for both
problems and techniques within the sphere of imbalanced learning on graphs,
fostering further innovation in this critical area.Comment: The collection of awesome literature on imbalanced learning on
graphs: https://github.com/Xtra-Computing/Awesome-Literature-ILoG
An approach to knowledge assessment in an Intelligent Tutoring System
In this paper, we present an approach to student's evaluation in a well-defined domain based on a semantic network. A similarity matrix based on the semantic memory structure of humans is used to build a semantic distance model in order to describe an assessment technique to evaluate the student's state of knowledge. Our aim is to facilitate a deeper conceptual understanding of domain principles. We are developing a new student model including an assessment module with DistSem model.Workshop de Tecnología Informática Aplicada en Educación (WTIAE)Red de Universidades con Carreras en Informática (RedUNCI
- …