4,869 research outputs found

    Deriving query suggestions for site search

    Get PDF
    Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T

    Query SessionDetectionas aCascade

    Get PDF
    Abstract We propose a cascading method for query session detection in search engine logs (i.e., for finding consecutive queries a user submitted for the same information need). Our approach involves different detection steps that form a cascade in the sense that computationally costly features are applied only after cheapfeatures“failed.”Thiscascadeisdifferenttoprevioussessiondetectionapproaches most of which involve many features simultaneously. Our experiments show thecascadingmethodtosaveruntimecomparedtothestateoftheartwhile the detected sessions ’ accuracy is improved.

    Recognizing Topic Change in Search Sessions of Digital Libraries based on Thesaurus and Classification System

    Full text link
    Log analysis in Web search showed that user sessions often contain several different topics. This means sessions need to be segmented into parts which handle the same topic in order to give appropriate user support based on the topic, and not on a mixture of topics. Different methods have been proposed to segment a user session to different topics based on timeouts, lexical analysis, query similarity or external knowledge sources. In this paper, we study the problem in a digital library for the social sciences. We present a method based on a thesaurus and a classification system which are typical knowledge organization systems in digital libraries. Five experts evaluated our approach and rated it as good for the segmentation of search sessions into parts that treat the same topic

    Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising

    Full text link
    Sponsored search represents a major source of revenue for web search engines. This popular advertising model brings a unique possibility for advertisers to target users' immediate intent communicated through a search query, usually by displaying their ads alongside organic search results for queries deemed relevant to their products or services. However, due to a large number of unique queries it is challenging for advertisers to identify all such relevant queries. For this reason search engines often provide a service of advanced matching, which automatically finds additional relevant queries for advertisers to bid on. We present a novel advanced matching approach based on the idea of semantic embeddings of queries and ads. The embeddings were learned using a large data set of user search sessions, consisting of search queries, clicked ads and search links, while utilizing contextual information such as dwell time and skipped ads. To address the large-scale nature of our problem, both in terms of data and vocabulary size, we propose a novel distributed algorithm for training of the embeddings. Finally, we present an approach for overcoming a cold-start problem associated with new ads and queries. We report results of editorial evaluation and online tests on actual search traffic. The results show that our approach significantly outperforms baselines in terms of relevance, coverage, and incremental revenue. Lastly, we open-source learned query embeddings to be used by researchers in computational advertising and related fields.Comment: 10 pages, 4 figures, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Ital

    Analyzing audit trails in a distributed and hybrid intrusion detection platform

    Get PDF
    Efforts have been made over the last decades in order to design and perfect Intrusion Detection Systems (IDS). In addition to the widespread use of Intrusion Prevention Systems (IPS) as perimeter defense devices in systems and networks, various IDS solutions are used together as elements of holistic approaches to cyber security incident detection and prevention, including Network-Intrusion Detection Systems (NIDS) and Host-Intrusion Detection Systems (HIDS). Nevertheless, specific IDS and IPS technology face several effectiveness challenges to respond to the increasing scale and complexity of information systems and sophistication of attacks. The use of isolated IDS components, focused on one-dimensional approaches, strongly limits a common analysis based on evidence correlation. Today, most organizations’ cyber-security operations centers still rely on conventional SIEM (Security Information and Event Management) technology. However, SIEM platforms also have significant drawbacks in dealing with heterogeneous and specialized security event-sources, lacking the support for flexible and uniform multi-level analysis of security audit-trails involving distributed and heterogeneous systems. In this thesis, we propose an auditing solution that leverages on different intrusion detection components and synergistically combines them in a Distributed and Hybrid IDS (DHIDS) platform, taking advantage of their benefits while overcoming the effectiveness drawbacks of each one. In this approach, security events are detected by multiple probes forming a pervasive, heterogeneous and distributed monitoring environment spread over the network, integrating NIDS, HIDS and specialized Honeypot probing systems. Events from those heterogeneous sources are converted to a canonical representation format, and then conveyed through a Publish-Subscribe middleware to a dedicated logging and auditing system, built on top of an elastic and scalable document-oriented storage system. The aggregated events can then be queried and matched against suspicious attack signature patterns, by means of a proposed declarative query-language that provides event-correlation semantics

    Grand Challenges of Traceability: The Next Ten Years

    Full text link
    In 2007, the software and systems traceability community met at the first Natural Bridge symposium on the Grand Challenges of Traceability to establish and address research goals for achieving effective, trustworthy, and ubiquitous traceability. Ten years later, in 2017, the community came together to evaluate a decade of progress towards achieving these goals. These proceedings document some of that progress. They include a series of short position papers, representing current work in the community organized across four process axes of traceability practice. The sessions covered topics from Trace Strategizing, Trace Link Creation and Evolution, Trace Link Usage, real-world applications of Traceability, and Traceability Datasets and benchmarks. Two breakout groups focused on the importance of creating and sharing traceability datasets within the research community, and discussed challenges related to the adoption of tracing techniques in industrial practice. Members of the research community are engaged in many active, ongoing, and impactful research projects. Our hope is that ten years from now we will be able to look back at a productive decade of research and claim that we have achieved the overarching Grand Challenge of Traceability, which seeks for traceability to be always present, built into the engineering process, and for it to have "effectively disappeared without a trace". We hope that others will see the potential that traceability has for empowering software and systems engineers to develop higher-quality products at increasing levels of complexity and scale, and that they will join the active community of Software and Systems traceability researchers as we move forward into the next decade of research

    Contributions to Lifelogging Protection In Streaming Environments

    Get PDF
    Tots els dies, més de cinc mil milions de persones generen algun tipus de dada a través d'Internet. Per accedir a aquesta informació, necessitem utilitzar serveis de recerca, ja siguin motors de cerca web o assistents personals. A cada interacció amb ells, el nostre registre d'accions, logs, s'utilitza per oferir una millor experiència. Per a les empreses, també són molt valuosos, ja que ofereixen una forma de monetitzar el servei. La monetització s'aconsegueix venent dades a tercers, però, els logs de consultes podrien exposar informació confidencial de l'usuari (identificadors, malalties, tendències sexuals, creences religioses) o usar-se per al que es diu "life-logging ": Un registre continu de les activitats diàries. La normativa obliga a protegir aquesta informació. S'han proposat prèviament sistemes de protecció per a conjunts de dades tancats, la majoria d'ells treballant amb arxius atòmics o dades estructurades. Desafortunadament, aquests sistemes no s'adapten quan es fan servir en el creixent entorn de dades no estructurades en temps real que representen els serveis d'Internet. Aquesta tesi té com objectiu dissenyar tècniques per protegir la informació confidencial de l'usuari en un entorn no estructurat d’streaming en temps real, garantint un equilibri entre la utilitat i la protecció de dades. S'han fet tres propostes per a una protecció eficaç dels logs. La primera és un nou mètode per anonimitzar logs de consultes, basat en k-anonimat probabilística i algunes eines de desanonimització per determinar fuites de dades. El segon mètode, s'ha millorat afegint un equilibri configurable entre privacitat i usabilitat, aconseguint una gran millora en termes d'utilitat de dades. La contribució final es refereix als assistents personals basats en Internet. La informació generada per aquests dispositius es pot considerar "life-logging" i pot augmentar els riscos de privacitat de l'usuari. Es proposa un esquema de protecció que combina anonimat de logs i signatures sanitizables.Todos los días, más de cinco mil millones de personas generan algún tipo de dato a través de Internet. Para acceder a esa información, necesitamos servicios de búsqueda, ya sean motores de búsqueda web o asistentes personales. En cada interacción con ellos, nuestro registro de acciones, logs, se utiliza para ofrecer una experiencia más útil. Para las empresas, también son muy valiosos, ya que ofrecen una forma de monetizar el servicio, vendiendo datos a terceros. Sin embargo, los logs podrían exponer información confidencial del usuario (identificadores, enfermedades, tendencias sexuales, creencias religiosas) o usarse para lo que se llama "life-logging": Un registro continuo de las actividades diarias. La normativa obliga a proteger esta información. Se han propuesto previamente sistemas de protección para conjuntos de datos cerrados, la mayoría de ellos trabajando con archivos atómicos o datos estructurados. Desafortunadamente, esos sistemas no se adaptan cuando se usan en el entorno de datos no estructurados en tiempo real que representan los servicios de Internet. Esta tesis tiene como objetivo diseñar técnicas para proteger la información confidencial del usuario en un entorno no estructurado de streaming en tiempo real, garantizando un equilibrio entre utilidad y protección de datos. Se han hecho tres propuestas para una protección eficaz de los logs. La primera es un nuevo método para anonimizar logs de consultas, basado en k-anonimato probabilístico y algunas herramientas de desanonimización para determinar fugas de datos. El segundo método, se ha mejorado añadiendo un equilibrio configurable entre privacidad y usabilidad, logrando una gran mejora en términos de utilidad de datos. La contribución final se refiere a los asistentes personales basados en Internet. La información generada por estos dispositivos se puede considerar “life-logging” y puede aumentar los riesgos de privacidad del usuario. Se propone un esquema de protección que combina anonimato de logs y firmas sanitizables.Every day, more than five billion people generate some kind of data over the Internet. As a tool for accessing that information, we need to use search services, either in the form of Web Search Engines or through Personal Assistants. On each interaction with them, our record of actions via logs, is used to offer a more useful experience. For companies, logs are also very valuable since they offer a way to monetize the service. Monetization is achieved by selling data to third parties, however query logs could potentially expose sensitive user information: identifiers, sensitive data from users (such as diseases, sexual tendencies, religious beliefs) or be used for what is called ”life-logging”: a continuous record of one’s daily activities. Current regulations oblige companies to protect this personal information. Protection systems for closed data sets have previously been proposed, most of them working with atomic files or structured data. Unfortunately, those systems do not fit when used in the growing real-time unstructured data environment posed by Internet services. This thesis aims to design techniques to protect the user’s sensitive information in a non-structured real-time streaming environment, guaranteeing a trade-off between data utility and protection. In this regard, three proposals have been made in efficient log protection. The first is a new method to anonymize query logs, based on probabilistic k-anonymity and some de-anonymization tools to determine possible data leaks. A second method has been improved in terms of a configurable trade-off between privacy and usability, achieving a great improvement in terms of data utility. Our final contribution concerns Internet-based Personal Assistants. The information generated by these devices is likely to be considered life-logging, and it can increase the user’s privacy risks. The proposal is a protection scheme that combines log anonymization and sanitizable signatures
    corecore