9 research outputs found

    Toward Entity-Aware Search

    Get PDF
    As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability

    From Search Engines to Augmented Search Services: An End-User Development Approach

    Get PDF
    The World Wide Web is a vast and continuously changing source of information where searching is a frequent, and sometimes critical, user task. Searching is not always the user’s primary goal but an ancillary task that is performed to find complementary information allowing to complete another task. In this paper, we explore primary and/or ancillary search tasks and propose an approach for simplifying the user interaction during search tasks. Rather than focusing on dedicated search engines, our approach allows the user to abstract search engines already provided by Web applications into pervasive search services that will be available for performing searches from any other Web site. We also propose allowing users to manage the way in which the search results are presented and some possible interactions. In order to illustrate the feasibility of this approach, we have built a support tool based on a plug-in architecture that allows users to integrate new search services (created by themselves by means of visual tools) and execute them in the context of both kinds of searches. A case study illustrates the use of such tool. We also present the results of two evaluations that demonstrate the feasibility of the approach and the benefits in its use

    Modelos y algoritmos de búsqueda + redes sociales para aplicaciones verticales de recuperación de información

    Get PDF
    El espacio web no es solamente un enorme repositorio de información de todo tipo, sino - además – es una plataforma para soportar servicios globales de naturaleza diversa. El incremento exponencial de contenido y de usuarios (por ejemplo: en las redes sociales), junto con la constante aparición de nuevas aplicaciones, exceden largamente la visión de la web como un mero repositorio de contenidos. En todos los casos, existe como común denominador la necesidad de realizar “búsquedas” de diferente tipo y con objetivos también diversos. En la actualidad, las redes sociales son unas de las aplicaciones más populares, incluso han modificado la forma en que los usuarios se vinculan, relacionan, interactuan e intercambian información. De forma implícita, generan estructuras sociales con propiedades emergentes que surgen del comportamiento global y, se estima, pueden aportar a mejorar los procesos de búsquedas. En este documento se presenta un nuevo proyecto de investigación, donde se propone abordar algunas de las problemáticas relacionadas con las búsquedas en Internet. Para ello, se integrarán técnicas de recuperación de información y construcción de motores de búsqueda, junto con información proveniente de redes sociales, para brindar mayor eficiencia en la tarea de búsqueda, abarcando múltiples escenarios como: porciones específicas de la web, información científica y/o geográfica, búsquedas en dispositivos móviles, entre otras.Eje: Base de datos y Minería de datosRed de Universidades con Carreras en Informática (RedUNCI

    Mejoras algorítmicas y estructuras de datos para búsquedas altamente eficientes

    Get PDF
    El problema de la búsqueda en Internet presenta desafíos constantes. Los datos son cada vez más ricos y complejos, se utilizan y varían en tiempo real, aportando nuevo valor, pero solamente si están disponibles en tiempo y forma. Los usuarios utilizan cada vez más motores de búsqueda, esperando satisfacer sus necesidades de información, navegación o para hacer transacciones, requiriendo que respondan miles de consultas por segundo. Para poder manejar eficientemente el tamaño de una colección de documentos recolectados desde la web, los motores de búsqueda utilizan estructuras de datos distribuidas para hacer eficiente la búsqueda y técnicas de caching para optimizar los tiempos de respuesta. En este proyecto se propone diseñar y evaluar estructuras de datos avanzadas junto con nuevas técnicas algorítmicas que permitan mejorar la performance en las búsquedas para colecciones de datos de escala web.Eje: Procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

    The Moderator Role of Brand Awareness and Brand Loyalty on Consumers’ Online Impulse Buying Behavior

    Get PDF
    In recent years, with consumers’ widespread preference for shopping in Private Shopping Clubs (PSCs) on the internet, there has been a remarkable increase in impulse purchases with the attractive opportunities and smart strategies of PSCs stimulating consumers’ impulse buying behavior. Within the PSC framework, the purpose of this article is to investigate the moderator effect of brand awareness and brand loyalty on the relationship between online impulse buying behavior and perceived low price, browsing behavior and time pressure. The study created and tested five hypotheses using data collected in Turkey. Results indicate that browsing behavior, time pressure and perceived low price do influence online impulse buying behavior. A hierarchical regression analysis was also used to analyze the moderating role of brand awareness and brand loyalty on impulse buying behavior and both variables were found to have a moderating role. The results provide substantial information on strategy development for internet retailers

    Was Suchmaschinen nicht können. Holistische Entitätssuche auf Web Daten

    Get PDF
    Mehr als 50% aller Web Suchanfragen sind entitätsbezogen. Benutzer suchen entweder nach Entitäten oder nach Entitätsinformationen. Dennoch solche Anfragen von Suchmaschinen nicht gut unterstützt. Aufbauend auf dem Konzept des semiotischen Dreiecks aus der kognitiven Psychologie, haben wir drei Anfragetypen zur Entitätssuche identifiziert: typbasierte Anfragen – Suche nach Entitäten eines gegebenen Typs, prototypbasierte Anfragen – Suche nach Entitäten mit bestimmten Eigenschaften, und instanzbasierte Anfragen – Suche nach Entitäten die ähnlich zu einer gegebene Entität sind. Für typbasierte Anfragen haben wir eine Methode entwickelt die query expansion mit einer self-supervised vocabulary learning Technik auf strukturierten und unstrukturierten Daten verbindet. Unser Ansatz liefert einen guten Kompromiss zwischen Precision und Recall. Für prototypbasierte Anfragen stellen wir ProSWIP vor. Dies ist ein eigenschaftsbasiertes System um Entitäten aus dem Web abzurufen. Da aber die Anzahl der Eigenschaften die durch die Benutzer bereitgestellt werden relativ klein sein kann, baut ProSWIP auf direkten Fragen und Benutzer Feedback um die Menge der Eigenschaften zu einer Menge welche die Intentionen der Benutzer korrekt erfasst zu erweitern. Unsere Experimente zeigen dass mit maximal vier Fragen eine perfekte Precision erreicht wird. In dem Fall von instanzbasierten Anfragen besteht die Schwierigkeit darin eine Anfrageform zu finden die die Benutzerintentionen eindeutig macht. Wir stellen eine minimalistische instanzbasierte Anfrage, die aus einem Beispiel und dem entsprechenden Entitätstypen besteht vor. Mit Hilfe des Konzepts der Familienähnlichkeit entwickeln wir eine praktische Lösung um Entitäten mit Bezug zur der Anfragenentität direkt aus dem Web abzurufen. Unser Ansatz erzielt sogar für Anfragen, die für standard Entitätssuchaufgaben wie related entity finding problematisch waren, gute Ergebnisse. Entitätszusammenfassung ist ein anderer Typ von entitätszentrischen Anfragen, der Informationen bezüglich einer Entität bereitstellt. Googles Knowledge Graph ist der Stand der Technik für solche Aufgaben. Aber das Zurückgreifen auf manuell erstellte Knowledgebases schließt weniger bekannten Entitäten für das Knowledge Graph aus. Wir schlagen daher vor datengetriebene Ansätze zu nutzen. Wir sind überzeugt dass das Bewältigen dieser vier Anfragetypen eine holistische Entitätssuche auf Web Daten für die nächste Generation von Suchmaschinen ermöglicht.More than 50% of all Web queries are entity related. Users search either for entities or for entity information. Still, search engines do not accommodate entity-centric search very well. Building on the concept of the semiotic triangle from cognitive psychology, which models entity types in terms of intensions and extensions, we identified three types of queries for retrieving entities: type-based queries - searching for entities of a given type, prototype-based queries - searching for entities having certain properties, and instance-based queries - searching for entities being similar to a given entity. For type-based queries we present a method that combines query expansion with a self-supervised vocabulary learning technique built on both structured and unstructured data. Our approach is able to achieve a good tradeoff between precision and recall. For prototype-based queries we propose ProSWIP, a property-based system for retrieving entities from the Web. Since the number of properties given by the users can be quite small, ProSWIP relies on direct questions and user feedback to expand the set of properties to a set that captures the user’s intentions correctly. Our experiments show that within a maximum of four questions the system achieves perfect precision of the selected entities. In the case of instance-based queries the first challenge is to establish a query form that allows for disambiguating user intentions without putting too much cognitive pressure on the user. We propose a minimalistic instance-based query comprising the example entity and intended entity type. With this query and building on the concept of family resemblance we present a practical way for retrieving entities directly from the Web. Our approach can even cope with queries which have proven problematic for benchmark tasks like related entity finding. Providing information about a given entity, entity summarization is another kind of entity-centric query. Google’s Knowledge Graph is the state of the art for this task. But relying entirely on manually curated knowledge bases, the Knowledge Graph does not include all new and less known entities. We propose to use a data-driven approach. Our experiments on real-world entities show the superiority of our method. We are confident that mastering these four query types enables holistic entity search on Web data for the next generation of search engines

    Data quality issues in electronic health records for large-scale databases

    Get PDF
    Data Quality (DQ) in Electronic Health Records (EHRs) is one of the core functions that play a decisive role to improve the healthcare service quality. The DQ issues in EHRs are a noticeable trend to improve the introduction of an adaptive framework for interoperability and standards in Large-Scale Databases (LSDB) management systems. Therefore, large data communications are challenging in the traditional approaches to satisfy the needs of the consumers, as data is often not capture directly into the Database Management Systems (DBMS) in a seasonably enough fashion to enable their subsequent uses. In addition, large data plays a vital role in containing plenty of treasures for all the fields in the DBMS. EHRs technology provides portfolio management systems that allow HealthCare Organisations (HCOs) to deliver a higher quality of care to their patients than that which is possible with paper-based records. EHRs are in high demand for HCOs to run their daily services as increasing numbers of huge datasets occur every day. Efficient EHR systems reduce the data redundancy as well as the system application failure and increase the possibility to draw all necessary reports. However, one of the main challenges in developing efficient EHR systems is the inherent difficulty to coherently manage data from diverse heterogeneous sources. It is practically challenging to integrate diverse data into a global schema, which satisfies the need of users. The efficient management of EHR systems using an existing DBMS present challenges because of incompatibility and sometimes inconsistency of data structures. As a result, no common methodological approach is currently in existence to effectively solve every data integration problem. The challenges of the DQ issue raised the need to find an efficient way to integrate large EHRs from diverse heterogeneous sources. To handle and align a large dataset efficiently, the hybrid algorithm method with the logical combination of Fuzzy-Ontology along with a large-scale EHRs analysis platform has shown the results in term of improved accuracy. This study investigated and addressed the raised DQ issues to interventions to overcome these barriers and challenges, including the provision of EHRs as they pertain to DQ and has combined features to search, extract, filter, clean and integrate data to ensure that users can coherently create new consistent data sets. The study researched the design of a hybrid method based on Fuzzy-Ontology with performed mathematical simulations based on the Markov Chain Probability Model. The similarity measurement based on dynamic Hungarian algorithm was followed by the Design Science Research (DSR) methodology, which will increase the quality of service over HCOs in adaptive frameworks
    corecore