243 research outputs found

    A Coherent Unsupervised Model for Toponym Resolution

    Full text link
    Toponym Resolution, the task of assigning a location mention in a document to a geographic referent (i.e., latitude/longitude), plays a pivotal role in analyzing location-aware content. However, the ambiguities of natural language and a huge number of possible interpretations for toponyms constitute insurmountable hurdles for this task. In this paper, we study the problem of toponym resolution with no additional information other than a gazetteer and no training data. We demonstrate that a dearth of large enough annotated data makes supervised methods less capable of generalizing. Our proposed method estimates the geographic scope of documents and leverages the connections between nearby place names as evidence to resolve toponyms. We explore the interactions between multiple interpretations of mentions and the relationships between different toponyms in a document to build a model that finds the most coherent resolution. Our model is evaluated on three news corpora, two from the literature and one collected and annotated by us; then, we compare our methods to the state-of-the-art unsupervised and supervised techniques. We also examine three commercial products including Reuters OpenCalais, Yahoo! YQL Placemaker, and Google Cloud Natural Language API. The evaluation shows that our method outperforms the unsupervised technique as well as Reuters OpenCalais and Google Cloud Natural Language API on all three corpora; also, our method shows a performance close to that of the state-of-the-art supervised method and outperforms it when the test data has 40% or more toponyms that are not seen in the training data.Comment: 9 pages (+1 page reference), WWW '18 Proceedings of the 2018 World Wide Web Conferenc

    Examining Scientific Writing Styles from the Perspective of Linguistic Complexity

    Full text link
    Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. In order to uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (1) syntactic complexity, including measurements of sentence length and sentence complexity; and (2) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.Comment: 6 figure

    A Google trends spatial clustering approach for a worldwide Twitter user geolocation

    Get PDF
    User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldwide city-level locations of Twitter users, obtaining competitive results when compared with a state-of-the-art Word Distribution (WD) user location estimation method. The best overall results were achieved by the GT noun DBSCAN (GTN-DB) method, which is computationally fast, and correctly predicts the ground truth locations of 15%, 23%, 39% and 58% of the users for tolerance distances of 250 km, 500 km, 1,000 km and 2,000 km.The work of P. Cortez was supported by FCT – Funda ̧c ̃ao para a Ciˆencia eTecnologia within the R&D Units Project Scope: UIDB/00319/2020. We wouldalso like to thank the anonymous reviewers for their helpful suggestions

    A Survey on Cross-domain Recommendation: Taxonomies, Methods, and Future Directions

    Full text link
    Traditional recommendation systems are faced with two long-standing obstacles, namely, data sparsity and cold-start problems, which promote the emergence and development of Cross-Domain Recommendation (CDR). The core idea of CDR is to leverage information collected from other domains to alleviate the two problems in one domain. Over the last decade, many efforts have been engaged for cross-domain recommendation. Recently, with the development of deep learning and neural networks, a large number of methods have emerged. However, there is a limited number of systematic surveys on CDR, especially regarding the latest proposed methods as well as the recommendation scenarios and recommendation tasks they address. In this survey paper, we first proposed a two-level taxonomy of cross-domain recommendation which classifies different recommendation scenarios and recommendation tasks. We then introduce and summarize existing cross-domain recommendation approaches under different recommendation scenarios in a structured manner. We also organize datasets commonly used. We conclude this survey by providing several potential research directions about this field

    Luck of the Draw III: Using AI to Examine Decision‐Making in Federal Court Stays of Removal

    Get PDF
    This article examines decision‐making in Federal Court of Canada immigration law applications for stays of removal, focusing on how the rates at which stays are granted depend on which judge decides the case. The article deploys a form of computational natural language processing, using a large‐language model machine learning process (GPT‐3) to extract data from online Federal Court dockets. The article reviews patterns in outcomes in thousands of stay of removal applications identified through this process and reveals a wide range in stay grant rates across many judges. The article argues that the Federal Court should take measures to encourage more consistency in stay decision‐making and cautions against relying heavily on stays of removal to ensure that deportation complies with constitutional procedural justice protections. The article is also a demonstration of how machine learning can be used to pursue empirical legal research projects that would have been cost‐prohibitive or technically challenging only a few years ago – and shows how technology that is increasingly used to enhance the power of the state at the expense of marginalized migrants can instead be used to scrutinize legal decision‐making in the immigration law field, hopefully in ways that enhance the rights of migrants. The article also contributes to the broader field of computational legal research in Canada by making available to other non‐commercial researchers the code used for the project, as well as a large dataset of Federal Court dockets

    Designing semantic Application Programming Interfaces for open government data

    Get PDF
    Many countries currently maintain a national data catalog, which provides access to the available datasets – sometimes via an Application Programming Interface (API). These APIs play a crucial role in realizing the benefits of open data as they are the means by which data is discovered and accessed by applications that make use of it. This article proposes semantic APIs as a way of improving access to open data. A semantic API helps to retrieve datasets according to their type (e.g., sensor, climate, finance), and facilitates reasoning about and learning from data. The article examines categories of open datasets from 40 European open data catalogs to gather some insights into types of datasets which should be considered while building semantic APIs for open government data. The results show that the probability of inter-country agreement between open data catalogs is less than 30 percent, and that few categories stand out as candidates for a transnational semantic API. They stress the need for coordination - at the local, regional, and national level - between data providers of Germany, France, Spain, and the United Kingdom.The authors gratefully acknowledge funding from the European Union through the GEO-C project (H2020-MSCA-ITN-2014, Grant Agreement Number 642332, http://www.geoc. eu/). Carlos Granell has been funded by the Ramón y Cajal Programme (grant number RYC- 2014-16913). Sergio Trilles has been funded by the postdoctoral programme Vali+d (GVA) (grant number APOSTD/2016/058)

    Educational Technology and Related Education Conferences for June to December 2011

    Get PDF
    This potpourri of educational technology conferences includes gems such as “Saving Your Organisation from Boring eLearning” and “Lessons and Insights from Ten eLearning Masters”. And, if you wish, you can “Be an Open Learning Hero”. You will also find that the number of mobile learning conferences (and conferences that have a mobile learning component) have increased significantly. Countries such as China, Indonesia, Japan, and Thailand have shown a keen interest in mobile learning. It would be impossible for you to be present at all the conferences that you would like to attend. But, you could go to the conference website/url during and after the conference. Many conference organizers post abstracts, full papers, and/or videos of conference presentations. Thus, you can visit the conference virtually and may encounter information and contacts that would be useful in your work. The list below covers selected events focused primarily on the use of technology in educational settings and on teaching, learning, and educational administration. Only listings until December 2011 are complete as dates, locations, or URLs are not available for a number of events held after December 2011. But, take a look at the conference organizers who planned ahead in 2012. A Word 2003 format is used to enable people who do not have access to Word 2007 or higher version and those with limited or high-cost Internet access to find a conference that is congruent with their interests or obtain conference proceedings. (If you are seeking a more interactive listing, refer to online conference sites.) Consider using the “Find” tool under Microsoft Word’s “Edit” tab or similar tab in OpenOffice to locate the name of a particular conference, association, city, or country. If you enter the country “Australia” or “Singapore” in the “Find” tool, all conferences that occur in Australia or Singapore will be highlighted. Or, enter the word “research”. Then, “cut and paste” a list of suitable events for yourself and your colleagues. Please note that events, dates, titles, and locations may change; thus, CHECK the specific conference website. Note also that some events will be cancelled at a later date. All Internet addresses were verified at the time of publication. No liability is assumed for any errors that may have been introduced inadvertently during the assembly of this conference list. If possible, do not remove the contact information when you re-distribute the list as that is how I receive updates and corrections. If you mount the list on the web, please note its source

    Exploring attributes, sequences, and time in Recommender Systems: From classical to Point-of-Interest recommendation

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informåtica. Fecha de lectura: 08-07-2021Since the emergence of the Internet and the spread of digital communications throughout the world, the amount of data stored on the Web has been growing exponentially. In this new digital era, a large number of companies have emerged with the purpose of ltering the information available on the web and provide users with interesting items. The algorithms and models used to recommend these items are called Recommender Systems. These systems are applied to a large number of domains, from music, books, or movies to dating or Point-of-Interest (POI), which is an increasingly popular domain where users receive recommendations of di erent places when they arrive to a city. In this thesis, we focus on exploiting the use of contextual information, especially temporal and sequential data, and apply it in novel ways in both traditional and Point-of-Interest recommendation. We believe that this type of information can be used not only for creating new recommendation models but also for developing new metrics for analyzing the quality of these recommendations. In one of our rst contributions we propose di erent metrics, some of them derived from previously existing frameworks, using this contextual information. Besides, we also propose an intuitive algorithm that is able to provide recommendations to a target user by exploiting the last common interactions with other similar users of the system. At the same time, we conduct a comprehensive review of the algorithms that have been proposed in the area of POI recommendation between 2011 and 2019, identifying the common characteristics and methodologies used. Once this classi cation of the algorithms proposed to date is completed, we design a mechanism to recommend complete routes (not only independent POIs) to users, making use of reranking techniques. In addition, due to the great di culty of making recommendations in the POI domain, we propose the use of data aggregation techniques to use information from di erent cities to generate POI recommendations in a given target city. In the experimental work we present our approaches on di erent datasets belonging to both classical and POI recommendation. The results obtained in these experiments con rm the usefulness of our recommendation proposals, in terms of ranking accuracy and other dimensions like novelty, diversity, and coverage, and the appropriateness of our metrics for analyzing temporal information and biases in the recommendations producedDesde la aparici on de Internet y la difusi on de las redes de comunicaciones en todo el mundo, la cantidad de datos almacenados en la red ha crecido exponencialmente. En esta nueva era digital, han surgido un gran n umero de empresas con el objetivo de ltrar la informaci on disponible en la red y ofrecer a los usuarios art culos interesantes. Los algoritmos y modelos utilizados para recomendar estos art culos reciben el nombre de Sistemas de Recomendaci on. Estos sistemas se aplican a un gran n umero de dominios, desde m usica, libros o pel culas hasta las citas o los Puntos de Inter es (POIs, en ingl es), un dominio cada vez m as popular en el que los usuarios reciben recomendaciones de diferentes lugares cuando llegan a una ciudad. En esta tesis, nos centramos en explotar el uso de la informaci on contextual, especialmente los datos temporales y secuenciales, y aplicarla de forma novedosa tanto en la recomendaci on cl asica como en la recomendaci on de POIs. Creemos que este tipo de informaci on puede utilizarse no s olo para crear nuevos modelos de recomendaci on, sino tambi en para desarrollar nuevas m etricas para analizar la calidad de estas recomendaciones. En una de nuestras primeras contribuciones proponemos diferentes m etricas, algunas derivadas de formulaciones previamente existentes, utilizando esta informaci on contextual. Adem as, proponemos un algoritmo intuitivo que es capaz de proporcionar recomendaciones a un usuario objetivo explotando las ultimas interacciones comunes con otros usuarios similares del sistema. Al mismo tiempo, realizamos una revisi on exhaustiva de los algoritmos que se han propuesto en el a mbito de la recomendaci o n de POIs entre 2011 y 2019, identi cando las caracter sticas comunes y las metodolog as utilizadas. Una vez realizada esta clasi caci on de los algoritmos propuestos hasta la fecha, dise~namos un mecanismo para recomendar rutas completas (no s olo POIs independientes) a los usuarios, haciendo uso de t ecnicas de reranking. Adem as, debido a la gran di cultad de realizar recomendaciones en el ambito de los POIs, proponemos el uso de t ecnicas de agregaci on de datos para utilizar la informaci on de diferentes ciudades y generar recomendaciones de POIs en una determinada ciudad objetivo. En el trabajo experimental presentamos nuestros m etodos en diferentes conjuntos de datos tanto de recomendaci on cl asica como de POIs. Los resultados obtenidos en estos experimentos con rman la utilidad de nuestras propuestas de recomendaci on en t erminos de precisi on de ranking y de otras dimensiones como la novedad, la diversidad y la cobertura, y c omo de apropiadas son nuestras m etricas para analizar la informaci on temporal y los sesgos en las recomendaciones producida

    Making Sense of Document Collections with Map-Based Visualizations

    Get PDF
    As map-based visualizations of documents become more ubiquitous, there is a greater need for them to support intellectual and creative high-level cognitive activities with collections of non-cartographic materials -- documents. This dissertation concerns the conceptualization of map-based visualizations as tools for sensemaking and collection understanding. As such, map-based visualizations would help people use georeferenced documents to develop understanding, gain insight, discover knowledge, and construct meaning. This dissertation explores the role of graphical representations (such as maps, Kohonen maps, pie charts, and other) and interactions with them for developing map-based visualizations capable of facilitating sensemaking activities such as collection understanding. While graphical representations make document collections more perceptually and cognitively accessible, interactions allow users to adapt representations to users’ contextual needs. By interacting with representations of documents or collections and being able to construct representations of their own, people are better able to make sense of information, comprehend complex structures, and integrate new information into their existing mental models. In sum, representations and interactions may reduce cognitive load and consequently expedite the overall time necessary for completion of sensemaking activities, which typically take much time to accomplish. The dissertation proceeds in three phases. The first phase develops a conceptual framework for translating ontological properties of collections to representations and for supporting visual tasks by means of graphical representations. The second phase concerns the cognitive benefits of interaction. It conceptualizes how interactions can help people during complex sensemaking activities. Although the interactions are explained on the example of a prototype built with Google Maps, they are independent iv of Google Maps and can be applicable to various other technologies. The third phase evaluates the utility, analytical capabilities and usability of the additional representations when users interact with a visualization prototype – VIsual COLlection EXplorer. The findings suggest that additional representations can enhance understanding of map-based visualizations of library collections: specifically, they can allow users to see trends, gaps, and patterns in ontological properties of collections

    Languages of games and play: A systematic mapping study

    Get PDF
    Digital games are a powerful means for creating enticing, beautiful, educational, and often highly addictive interactive experiences that impact the lives of billions of players worldwide. We explore what informs the design and construction of good games to learn how to speed-up game development. In particular, we study to what extent languages, notations, patterns, and tools, can offer experts theoretical foundations, systematic techniques, and practical solutions they need to raise their productivity and improve the quality of games and play. Despite the growing number of publications on this topic there is currently no overview describing the state-of-the-art that relates research areas, goals, and applications. As a result, efforts and successes are often one-off, lessons learned go overlooked, language reuse remains minimal, and opportunities for collaboration and synergy are lost. We present a systematic map that identifies relevant publications and gives an overview of research areas and publication venues. In addition, we categorize research perspectives along common objectives, techniques, and approaches, illustrated by summaries of selected languages. Finally, we distill challenges and opportunities for future research and development
    • 

    corecore