4,016 research outputs found

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    A Survey of Multimodal Information Fusion for Smart Healthcare: Mapping the Journey from Data to Wisdom

    Full text link
    Multimodal medical data fusion has emerged as a transformative approach in smart healthcare, enabling a comprehensive understanding of patient health and personalized treatment plans. In this paper, a journey from data to information to knowledge to wisdom (DIKW) is explored through multimodal fusion for smart healthcare. We present a comprehensive review of multimodal medical data fusion focused on the integration of various data modalities. The review explores different approaches such as feature selection, rule-based systems, machine learning, deep learning, and natural language processing, for fusing and analyzing multimodal data. This paper also highlights the challenges associated with multimodal fusion in healthcare. By synthesizing the reviewed frameworks and theories, it proposes a generic framework for multimodal medical data fusion that aligns with the DIKW model. Moreover, it discusses future directions related to the four pillars of healthcare: Predictive, Preventive, Personalized, and Participatory approaches. The components of the comprehensive survey presented in this paper form the foundation for more successful implementation of multimodal fusion in smart healthcare. Our findings can guide researchers and practitioners in leveraging the power of multimodal fusion with the state-of-the-art approaches to revolutionize healthcare and improve patient outcomes.Comment: This work has been submitted to the ELSEVIER for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Identity Management Framework for Internet of Things

    Get PDF

    Intrusion Detection from Heterogenous Sensors

    Get PDF
    RÉSUMÉ De nos jours, la protection des systèmes et réseaux informatiques contre différentes attaques avancées et distribuées constitue un défi vital pour leurs propriétaires. L’une des menaces critiques à la sécurité de ces infrastructures informatiques sont les attaques réalisées par des individus dont les intentions sont malveillantes, qu’ils soient situés à l’intérieur et à l’extérieur de l’environnement du système, afin d’abuser des services disponibles, ou de révéler des informations confidentielles. Par conséquent, la gestion et la surveillance des systèmes informatiques est un défi considérable considérant que de nouvelles menaces et attaques sont découvertes sur une base quotidienne. Les systèmes de détection d’intrusion, Intrusion Detection Systems (IDS) en anglais, jouent un rôle clé dans la surveillance et le contrôle des infrastructures de réseau informatique. Ces systèmes inspectent les événements qui se produisent dans les systèmes et réseaux informatiques et en cas de détection d’activité malveillante, ces derniers génèrent des alertes afin de fournir les détails des attaques survenues. Cependant, ces systèmes présentent certaines limitations qui méritent d’être adressées si nous souhaitons les rendre suffisamment fiables pour répondre aux besoins réels. L’un des principaux défis qui caractérise les IDS est le grand nombre d’alertes redondantes et non pertinentes ainsi que le taux de faux-positif générés, faisant de leur analyse une tâche difficile pour les administrateurs de sécurité qui tentent de déterminer et d’identifier les alertes qui sont réellement importantes. Une partie du problème réside dans le fait que la plupart des IDS ne prennent pas compte les informations contextuelles (type de systèmes, applications, utilisateurs, réseaux, etc.) reliées à l’attaque. Ainsi, une grande partie des alertes générées par les IDS sont non pertinentes en ce sens qu’elles ne permettent de comprendre l’attaque dans son contexte et ce, malgré le fait que le système ait réussi à correctement détecter une intrusion. De plus, plusieurs IDS limitent leur détection à un seul type de capteur, ce qui les rend inefficaces pour détecter de nouvelles attaques complexes. Or, ceci est particulièrement important dans le cas des attaques ciblées qui tentent d’éviter la détection par IDS conventionnels et par d’autres produits de sécurité. Bien que de nombreux administrateurs système incorporent avec succès des informations de contexte ainsi que différents types de capteurs et journaux dans leurs analyses, un problème important avec cette approche reste le manque d’automatisation, tant au niveau du stockage que de l’analyse. Afin de résoudre ces problèmes d’applicabilité, divers types d’IDS ont été proposés dans les dernières années, dont les IDS de type composant pris sur étagère, commercial off-the-shelf (COTS) en anglais, qui sont maintenant largement utilisés dans les centres d’opérations de sécurité, Security Operations Center (SOC) en anglais, de plusieurs grandes organisations. D’un point de vue plus général, les différentes approches proposées peuvent être classées en différentes catégories : les méthodes basées sur l’apprentissage machine, tel que les réseaux bayésiens, les méthodes d’extraction de données, les arbres de décision, les réseaux de neurones, etc., les méthodes impliquant la corrélation d’alertes et les approches fondées sur la fusion d’alertes, les systèmes de détection d’intrusion sensibles au contexte, les IDS dit distribués et les IDS qui reposent sur la notion d’ontologie de base. Étant donné que ces différentes approches se concentrent uniquement sur un ou quelques-uns des défis courants reliés aux IDS, au meilleure de notre connaissance, le problème dans son ensemble n’a pas été résolu. Par conséquent, il n’existe aucune approche permettant de couvrir tous les défis des IDS modernes précédemment mentionnés. Par exemple, les systèmes qui reposent sur des méthodes d’apprentissage machine classent les événements sur la base de certaines caractéristiques en fonction du comportement observé pour un type d’événements, mais ils ne prennent pas en compte les informations reliées au contexte et les relations pouvant exister entre plusieurs événements. La plupart des techniques de corrélation d’alerte proposées ne considèrent que la corrélation entre plusieurs capteurs du même type ayant un événement commun et une sémantique d’alerte similaire (corrélation homogène), laissant aux administrateurs de sécurité la tâche d’effectuer la corrélation entre les différents types de capteurs hétérogènes. Pour leur part, les approches sensibles au contexte n’emploient que des aspects limités du contexte sous-jacent. Une autre limitation majeure des différentes approches proposées est l’absence d’évaluation précise basée sur des ensembles de données qui contiennent des scénarios d’attaque complexes et modernes. À cet effet, l’objectif de cette thèse est de concevoir un système de corrélation d’événements qui peut prendre en considération plusieurs types hétérogènes de capteurs ainsi que les journaux de plusieurs applications (par exemple, IDS/IPS, pare-feu, base de données, système d’exploitation, antivirus, proxy web, routeurs, etc.). Cette méthode permettra de détecter des attaques complexes qui laissent des traces dans les différents systèmes, et d’incorporer les informations de contexte dans l’analyse afin de réduire les faux-positifs. Nos contributions peuvent être divisées en quatre parties principales : 1) Nous proposons la Pasargadae, une solution complète sensible au contexte et reposant sur une ontologie de corrélation des événements, laquelle effectue automatiquement la corrélation des événements par l’analyse des informations recueillies auprès de diverses sources. Pasargadae utilise le concept d’ontologie pour représenter et stocker des informations sur les événements, le contexte et les vulnérabilités, les scénarios d’attaques, et utilise des règles d’ontologie de logique simple écrites en Semantic Query-Enhance Web Rule Language (SQWRL) afin de corréler diverse informations et de filtrer les alertes non pertinentes, en double, et les faux-positifs. 2) Nous proposons une approche basée sur, méta-événement , tri topologique et l‘approche corrélation d‘événement basée sur sémantique qui emploie Pasargadae pour effectuer la corrélation d’événements à travers les événements collectés de plusieurs capteurs répartis dans un réseau informatique. 3) Nous proposons une approche alerte de fusion basée sur sémantique, contexte sensible, qui s‘appuie sur certains des sous-composantes de Pasargadae pour effectuer une alerte fusion hétérogène recueillies auprès IDS hétérogènes. 4) Dans le but de montrer le niveau de flexibilité de Pasargadae, nous l’utilisons pour mettre en oeuvre d’autres approches proposées d‘alertes et de corrélation d‘événements. La somme de ces contributions représente une amélioration significative de l’applicabilité et la fiabilité des IDS dans des situations du monde réel. Afin de tester la performance et la flexibilité de l’approche de corrélation d’événements proposés, nous devons aborder le manque d’infrastructures expérimental adéquat pour la sécurité du réseau. Une étude de littérature montre que les approches expérimentales actuelles ne sont pas adaptées pour générer des données de réseau de grande fidélité. Par conséquent, afin d’accomplir une évaluation complète, d’abord, nous menons nos expériences sur deux scénarios d’étude d‘analyse de cas distincts, inspirés des ensembles de données d’évaluation DARPA 2000 et UNB ISCX IDS. Ensuite, comme une étude déposée complète, nous employons Pasargadae dans un vrai réseau informatique pour une période de deux semaines pour inspecter ses capacités de détection sur un vrai terrain trafic de réseau. Les résultats obtenus montrent que, par rapport à d’autres améliorations IDS existants, les contributions proposées améliorent considérablement les performances IDS (taux de détection) tout en réduisant les faux positifs, non pertinents et alertes en double.----------ABSTRACT Nowadays, protecting computer systems and networks against various distributed and multi-steps attack has been a vital challenge for their owners. One of the essential threats to the security of such computer infrastructures is attacks by malicious individuals from inside and outside of the system environment to abuse available services, or reveal their confidential information. Consequently, managing and supervising computer systems is a considerable challenge, as new threats and attacks are discovered on a daily basis. Intrusion Detection Systems (IDSs) play a key role in the surveillance and monitoring of computer network infrastructures. These systems inspect events occurred in computer systems and networks and in case of any malicious behavior they generate appropriate alerts describing the attacks’ details. However, there are a number of shortcomings that need to be addressed to make them reliable enough in the real-world situations. One of the fundamental challenges in real-world IDS is the large number of redundant, non-relevant, and false positive alerts that they generate, making it a difficult task for security administrators to determine and identify real and important alerts. Part of the problem is that most of the IDS do not take into account contextual information (type of systems, applications, users, networks, etc.), and therefore a large portion of the alerts are non-relevant in that even though they correctly recognize an intrusion, the intrusion fails to reach its objectives. Additionally, to detect newer and complicated attacks, relying on only one detection sensor type is not adequate, and as a result many of the current IDS are unable to detect them. This is especially important with respect to targeted attacks that try to avoid detection by conventional IDS and by other security products. While many system administrators are known to successfully incorporate context information and many different types of sensors and logs into their analysis, an important problem with this approach is the lack of automation in both storage and analysis. In order to address these problems in IDS applicability, various IDS types have been proposed in the recent years and commercial off-the-shelf (COTS) IDS products have found their way into Security Operations Centers (SOC) of many large organizations. From a general perspective, these works can be categorized into: machine learning based approaches including Bayesian networks, data mining methods, decision trees, neural networks, etc., alert correlation and alert fusion based approaches, context-aware intrusion detection systems, distributed intrusion detection systems, and ontology based intrusion detection systems. To the best of our knowledge, since these works only focus on one or few of the IDS challenges, the problem as a whole has not been resolved. Hence, there is no comprehensive work addressing all the mentioned challenges of modern intrusion detection systems. For example, works that utilize machine learning approaches only classify events based on some features depending on behavior observed with one type of events, and they do not take into account contextual information and event interrelationships. Most of the proposed alert correlation techniques consider correlation only across multiple sensors of the same type having a common event and alert semantics (homogeneous correlation), leaving it to security administrators to perform correlation across heterogeneous types of sensors. Context-aware approaches only employ limited aspects of the underlying context. The lack of accurate evaluation based on the data sets that encompass modern complex attack scenarios is another major shortcoming of most of the proposed approaches. The goal of this thesis is to design an event correlation system that can correlate across several heterogeneous types of sensors and logs (e.g. IDS/IPS, firewall, database, operating system, anti-virus, web proxy, routers, etc.) in order to hope to detect complex attacks that leave traces in various systems, and incorporate context information into the analysis, in order to reduce false positives. To this end, our contributions can be split into 4 main parts: 1) we propose the Pasargadae comprehensive context-aware and ontology-based event correlation framework that automatically performs event correlation by reasoning on the information collected from various information resources. Pasargadae uses ontologies to represent and store information on events, context and vulnerability information, and attack scenarios, and uses simple ontology logic rules written in Semantic Query-Enhance Web Rule Language (SQWRL) to correlate various information and filter out non-relevant alerts and duplicate alerts, and false positives. 2) We propose a meta-event based, topological sort based and semantic-based event correlation approach that employs Pasargadae to perform event correlation across events collected form several sensors distributed in a computer network. 3) We propose a semantic-based context-aware alert fusion approach that relies on some of the subcomponents of Pasargadae to perform heterogeneous alert fusion collected from heterogeneous IDS. 4) In order to show the level of flexibility of Pasargadae, we use it to implement some other proposed alert and event correlation approaches. The sum of these contributions represent a significant improvement in the applicability and reliability of IDS in real-world situations. In order to test the performance and flexibility of the proposed event correlation approach, we need to address the lack of experimental infrastructure suitable for network security. A study of the literature shows that current experimental approaches are not appropriate to generate high fidelity network data. Consequently, in order to accomplish a comprehensive evaluation, first, we conduct our experiments on two separate analysis case study scenarios, inspired from the DARPA 2000 and UNB ISCX IDS evaluation data sets. Next, as a complete field study, we employ Pasargadae in a real computer network for a two weeks period to inspect its detection capabilities on a ground truth network traffic. The results obtained show that compared to other existing IDS improvements, the proposed contributions significantly improve IDS performance (detection rate) while reducing false positives, non-relevant and duplicate alerts

    A soft computing decision support framework for e-learning

    Get PDF
    Tesi per compendi de publicacions.Supported by technological development and its impact on everyday activities, e-Learning and b-Learning (Blended Learning) have experienced rapid growth mainly in higher education and training. Its inherent ability to break both physical and cultural distances, to disseminate knowledge and decrease the costs of the teaching-learning process allows it to reach anywhere and anyone. The educational community is divided as to its role in the future. It is believed that by 2019 half of the world's higher education courses will be delivered through e-Learning. While supporters say that this will be the educational mode of the future, its detractors point out that it is a fashion, that there are huge rates of abandonment and that their massification and potential low quality, will cause its fall, assigning it a major role of accompanying traditional education. There are, however, two interrelated features where there seems to be consensus. On the one hand, the enormous amount of information and evidence that Learning Management Systems (LMS) generate during the e-Learning process and which is the basis of the part of the process that can be automated. In contrast, there is the fundamental role of e-tutors and etrainers who are guarantors of educational quality. These are continually overwhelmed by the need to provide timely and effective feedback to students, manage endless particular situations and casuistics that require decision making and process stored information. In this sense, the tools that e-Learning platforms currently provide to obtain reports and a certain level of follow-up are not sufficient or too adequate. It is in this point of convergence Information-Trainer, where the current developments of the LMS are centered and it is here where the proposed thesis tries to innovate. This research proposes and develops a platform focused on decision support in e-Learning environments. Using soft computing and data mining techniques, it extracts knowledge from the data produced and stored by e-Learning systems, allowing the classification, analysis and generalization of the extracted knowledge. It includes tools to identify models of students' learning behavior and, from them, predict their future performance and enable trainers to provide adequate feedback. Likewise, students can self-assess, avoid those ineffective behavior patterns, and obtain real clues about how to improve their performance in the course, through appropriate routes and strategies based on the behavioral model of successful students. The methodological basis of the mentioned functionalities is the Fuzzy Inductive Reasoning (FIR), which is particularly useful in the modeling of dynamic systems. During the development of the research, the FIR methodology has been improved and empowered by the inclusion of several algorithms. First, an algorithm called CR-FIR, which allows determining the Causal Relevance that have the variables involved in the modeling of learning and assessment of students. In the present thesis, CR-FIR has been tested on a comprehensive set of classical test data, as well as real data sets, belonging to different areas of knowledge. Secondly, the detection of atypical behaviors in virtual campuses was approached using the Generative Topographic Mapping (GTM) methodology, which is a probabilistic alternative to the well-known Self-Organizing Maps. GTM was used simultaneously for clustering, visualization and detection of atypical data. The core of the platform has been the development of an algorithm for extracting linguistic rules in a language understandable to educational experts, which helps them to obtain patterns of student learning behavior. In order to achieve this functionality, the LR-FIR algorithm (Extraction of Linguistic Rules in FIR) was designed and developed as an extension of FIR that allows both to characterize general behavior and to identify interesting patterns. In the case of the application of the platform to several real e-Learning courses, the results obtained demonstrate its feasibility and originality. The teachers' perception about the usability of the tool is very good, and they consider that it could be a valuable resource to mitigate the time requirements of the trainer that the e-Learning courses demand. The identification of student behavior models and prediction processes have been validated as to their usefulness by expert trainers. LR-FIR has been applied and evaluated in a wide set of real problems, not all of them in the educational field, obtaining good results. The structure of the platform makes it possible to assume that its use is potentially valuable in those domains where knowledge management plays a preponderant role, or where decision-making processes are a key element, e.g. ebusiness, e-marketing, customer management, to mention just a few. The Soft Computing tools used and developed in this research: FIR, CR-FIR, LR-FIR and GTM, have been applied successfully in other real domains, such as music, medicine, weather behaviors, etc.Soportado por el desarrollo tecnológico y su impacto en las diferentes actividades cotidianas, el e-Learning (o aprendizaje electrónico) y el b-Learning (Blended Learning o aprendizaje mixto), han experimentado un crecimiento vertiginoso principalmente en la educación superior y la capacitación. Su habilidad inherente para romper distancias tanto físicas como culturales, para diseminar conocimiento y disminuir los costes del proceso enseñanza aprendizaje le permite llegar a cualquier sitio y a cualquier persona. La comunidad educativa se encuentra dividida en cuanto a su papel en el futuro. Se cree que para el año 2019 la mitad de los cursos de educación superior del mundo se impartirá a través del e-Learning. Mientras que los partidarios aseguran que ésta será la modalidad educativa del futuro, sus detractores señalan que es una moda, que hay enormes índices de abandono y que su masificación y potencial baja calidad, provocará su caída, reservándole un importante papel de acompañamiento a la educación tradicional. Hay, sin embargo, dos características interrelacionadas donde parece haber consenso. Por un lado, la enorme generación de información y evidencias que los sistemas de gestión del aprendizaje o LMS (Learning Management System) generan durante el proceso educativo electrónico y que son la base de la parte del proceso que se puede automatizar. En contraste, está el papel fundamental de los e-tutores y e-formadores que son los garantes de la calidad educativa. Éstos se ven continuamente desbordados por la necesidad de proporcionar retroalimentación oportuna y eficaz a los alumnos, gestionar un sin fin de situaciones particulares y casuísticas que requieren toma de decisiones y procesar la información almacenada. En este sentido, las herramientas que las plataformas de e-Learning proporcionan actualmente para obtener reportes y cierto nivel de seguimiento no son suficientes ni demasiado adecuadas. Es en este punto de convergencia Información-Formador, donde están centrados los actuales desarrollos de los LMS y es aquí donde la tesis que se propone pretende innovar. La presente investigación propone y desarrolla una plataforma enfocada al apoyo en la toma de decisiones en ambientes e-Learning. Utilizando técnicas de Soft Computing y de minería de datos, extrae conocimiento de los datos producidos y almacenados por los sistemas e-Learning permitiendo clasificar, analizar y generalizar el conocimiento extraído. Incluye herramientas para identificar modelos del comportamiento de aprendizaje de los estudiantes y, a partir de ellos, predecir su desempeño futuro y permitir a los formadores proporcionar una retroalimentación adecuada. Así mismo, los estudiantes pueden autoevaluarse, evitar aquellos patrones de comportamiento poco efectivos y obtener pistas reales acerca de cómo mejorar su desempeño en el curso, mediante rutas y estrategias adecuadas a partir del modelo de comportamiento de los estudiantes exitosos. La base metodológica de las funcionalidades mencionadas es el Razonamiento Inductivo Difuso (FIR, por sus siglas en inglés), que es particularmente útil en el modelado de sistemas dinámicos. Durante el desarrollo de la investigación, la metodología FIR ha sido mejorada y potenciada mediante la inclusión de varios algoritmos. En primer lugar un algoritmo denominado CR-FIR, que permite determinar la Relevancia Causal que tienen las variables involucradas en el modelado del aprendizaje y la evaluación de los estudiantes. En la presente tesis, CR-FIR se ha probado en un conjunto amplio de datos de prueba clásicos, así como conjuntos de datos reales, pertenecientes a diferentes áreas de conocimiento. En segundo lugar, la detección de comportamientos atípicos en campus virtuales se abordó mediante el enfoque de Mapeo Topográfico Generativo (GTM), que es una alternativa probabilística a los bien conocidos Mapas Auto-organizativos. GTM se utilizó simultáneamente para agrupamiento, visualización y detección de datos atípicos. La parte medular de la plataforma ha sido el desarrollo de un algoritmo de extracción de reglas lingüísticas en un lenguaje entendible para los expertos educativos, que les ayude a obtener los patrones del comportamiento de aprendizaje de los estudiantes. Para lograr dicha funcionalidad, se diseñó y desarrolló el algoritmo LR-FIR, (extracción de Reglas Lingüísticas en FIR, por sus siglas en inglés) como una extensión de FIR que permite tanto caracterizar el comportamiento general, como identificar patrones interesantes. En el caso de la aplicación de la plataforma a varios cursos e-Learning reales, los resultados obtenidos demuestran su factibilidad y originalidad. La percepción de los profesores acerca de la usabilidad de la herramienta es muy buena, y consideran que podría ser un valioso recurso para mitigar los requerimientos de tiempo del formador que los cursos e-Learning exigen. La identificación de los modelos de comportamiento de los estudiantes y los procesos de predicción han sido validados en cuanto a su utilidad por los formadores expertos. LR-FIR se ha aplicado y evaluado en un amplio conjunto de problemas reales, no todos ellos del ámbito educativo, obteniendo buenos resultados. La estructura de la plataforma permite suponer que su utilización es potencialmente valiosa en aquellos dominios donde la administración del conocimiento juegue un papel preponderante, o donde los procesos de toma de decisiones sean una pieza clave, por ejemplo, e-business, e-marketing, administración de clientes, por mencionar sólo algunos. Las herramientas de Soft Computing utilizadas y desarrolladas en esta investigación: FIR, CR-FIR, LR-FIR y GTM, ha sido aplicadas con éxito en otros dominios reales, como música, medicina, comportamientos climáticos, etc.Postprint (published version

    A DATA DRIVEN APPROACH TO IDENTIFY JOURNALISTIC 5WS FROM TEXT DOCUMENTS

    Get PDF
    Textual understanding is the process of automatically extracting accurate high-quality information from text. The amount of textual data available from different sources such as news, blogs and social media is growing exponentially. These data encode significant latent information which if extracted accurately can be valuable in a variety of applications such as medical report analyses, news understanding and societal studies. Natural language processing techniques are often employed to develop customized algorithms to extract such latent information from text. Journalistic 5Ws refer to the basic information in news articles that describes an event and include where, when, who, what and why. Extracting them accurately may facilitate better understanding of many social processes including social unrest, human rights violations, propaganda spread, and population migration. Furthermore, the 5Ws information can be combined with socio-economic and demographic data to analyze state and trajectory of these processes. In this thesis, a data driven pipeline has been developed to extract the 5Ws from text using syntactic and semantic cues in the text. First, a classifier is developed to identify articles specifically related to social unrest. The classifier has been trained with a dataset of over 80K news articles. We then use NLP algorithms to generate a set of candidates for the 5Ws. Then, a series of algorithms to extract the 5Ws are developed. These algorithms based on heuristics leverage specific words and parts-of-speech customized for individual Ws to compute their scores. The heuristics are based on the syntactic structure of the document as well as syntactic and semantic representations of individual words and sentences. These scores are then combined and ranked to obtain the best answers to Journalistic 5Ws. The classification accuracy of the algorithms is validated using a manually annotated dataset of news articles
    corecore