460 research outputs found
Evaluation of taxonomic and neural embedding methods for calculating semantic similarity
Modelling semantic similarity plays a fundamental role in lexical semantic
applications. A natural way of calculating semantic similarity is to access
handcrafted semantic networks, but similarity prediction can also be
anticipated in a distributional vector space. Similarity calculation continues
to be a challenging task, even with the latest breakthroughs in deep neural
language models. We first examined popular methodologies in measuring taxonomic
similarity, including edge-counting that solely employs semantic relations in a
taxonomy, as well as the complex methods that estimate concept specificity. We
further extrapolated three weighting factors in modelling taxonomic similarity.
To study the distinct mechanisms between taxonomic and distributional
similarity measures, we ran head-to-head comparisons of each measure with human
similarity judgements from the perspectives of word frequency, polysemy degree
and similarity intensity. Our findings suggest that without fine-tuning the
uniform distance, taxonomic similarity measures can depend on the shortest path
length as a prime factor to predict semantic similarity; in contrast to
distributional semantics, edge-counting is free from sense distribution bias in
use and can measure word similarity both literally and metaphorically; the
synergy of retrofitting neural embeddings with concept relations in similarity
prediction may indicate a new trend to leverage knowledge bases on transfer
learning. It appears that a large gap still exists on computing semantic
similarity among different ranges of word frequency, polysemous degree and
similarity intensity
Site Selection Using Geo-Social Media: A Study For Eateries In Lisbon
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesThe rise in the influx of multicultural societies, studentification, and overall population growth has positively impacted the local economy of eateries in Lisbon, Portugal. However, this has also increased retail competition, especially in tourism. The overall increase in multicultural societies has also led to an increase in multiple smaller hotspots of human-urban attraction, making the concept of just one downtown in the city a little vague. These transformations of urban cities pose a big challenge for upcoming retail and eateries store owners in finding the most optimal location to set up their shops. An optimal site selection strategy should recommend new locations that can maximize the revenues of a business. Unfortunately, with dynamically changing human-urban interactions, traditional methods like relying on census data or surveys to understand neighborhoods and their impact on businesses are no more reliable or scalable. This study aims to address this gap by using geo-social data extracted from social media platforms like Twitter, Flickr, Instagram, and Google Maps, which then acts as a proxy to the real population. Seven variables are engineered at a neighborhood level using this data: business interest, age, gender, spatial competition, spatial proximity to stores, homogeneous neighborhoods, and percentage of the native population. A Random Forest based binary classification method is then used to predict whether a Point of Interest (POI) can be a part of any neighborhood n. The results show that using only these 7 variables, an F1-Score of 83% can be achieved in classifying whether a neighborhood is good for an “eateries” POI. The methodology used in this research is made to work with open data and be generic and reproducible to any city worldwide
Towards a temporospatial framework for measurements of disorganization in speech using semantic vectors
Incoherent speech in schizophrenia has long been described as the mind making “leaps” of large distances between thoughts and ideas. Such a view seems intuitive, and for almost two decades, attempts to operationalize these conceptual “leaps” in spoken word meanings have used language-based embedding spaces. An embedding space represents meaning of words as numerical vectors where a greater proximity between word vectors represents more shared meaning. However, there are limitations with word vector-based operationalizations of coherence which can limit their appeal and utility in clinical practice. First, the use of esoteric word embeddings can be conceptually hard to grasp, and this is complicated by several different operationalizations of incoherent speech. This problem can be overcome by a better visualization of methods. Second, temporal information from the act of speaking has been largely neglected since models have been built using written text, yet speech is spoken in real time. This issue can be resolved by leveraging time stamped transcripts of speech. Third, contextual information - namely the situation of where something is spoken - has often only been inferred and never explicitly modeled. Addressing this situational issue opens up new possibilities for models with increased temporal resolution and contextual relevance. In this paper, direct visualizations of semantic distances are used to enable the inspection of examples of incoherent speech. Some common operationalizations of incoherence are illustrated, and suggestions are made for how temporal and spatial contextual information can be integrated in future implementations of measures of incoherence
Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology
Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/
Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology
Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/
Mixing Methods: Practical Insights from the Humanities in the Digital Age
The digital transformation is accompanied by two simultaneous processes: digital humanities challenging the humanities, their theories, methodologies and disciplinary identities, and pushing computer science to get involved in new fields. But how can qualitative and quantitative methods be usefully combined in one research project? What are the theoretical and methodological principles across all disciplinary digital approaches? This volume focusses on driving innovation and conceptualising the humanities in the 21st century. Building on the results of 10 research projects, it serves as a useful tool for designing cutting-edge research that goes beyond conventional strategies
Distributed Representations for Compositional Semantics
The mathematical representation of semantics is a key issue for Natural
Language Processing (NLP). A lot of research has been devoted to finding ways
of representing the semantics of individual words in vector spaces.
Distributional approaches --- meaning distributed representations that exploit
co-occurrence statistics of large corpora --- have proved popular and
successful across a number of tasks. However, natural language usually comes in
structures beyond the word level, with meaning arising not only from the
individual words but also the structure they are contained in at the phrasal or
sentential level. Modelling the compositional process by which the meaning of
an utterance arises from the meaning of its parts is an equally fundamental
task of NLP.
This dissertation explores methods for learning distributed semantic
representations and models for composing these into representations for larger
linguistic units. Our underlying hypothesis is that neural models are a
suitable vehicle for learning semantically rich representations and that such
representations in turn are suitable vehicles for solving important tasks in
natural language processing. The contribution of this thesis is a thorough
evaluation of our hypothesis, as part of which we introduce several new
approaches to representation learning and compositional semantics, as well as
multiple state-of-the-art models which apply distributed semantic
representations to various tasks in NLP.Comment: DPhil Thesis, University of Oxford, Submitted and accepted in 201
- …