14 research outputs found
Improving Word Association Measures in Repetitive Corpora with Context Similarity Weighting
Peer reviewe
Keyphrases analysis of BIM standards through occurrence of most common BIM uses
The 8th PSU-UNS International Conference on Engineering and
Technology (ICET-2017), Novi Sad, Serbia, June 8-10, 2017
University of Novi Sad, Faculty of Technical Sciences
Abstract: Building Information Modeling (BIM) does
not represent only the virtual model of the facility but a
comperhensive approach consisting of technology,
processes, stakeholders' behavior and accompanying
standards.Given the fast evolution of BIM, this paper is
analysing trends of development of BIM standards
throughout the years by applying the keyphrases analysis
method, for some of the most common BIM uses and
recognizable phrases in BIM industry
Recommended from our members
Improving a Fundamental Measure of Lexical Association
Pointwise mutual information (PMI), a simple measure of lexical association, is part of several algorithms used as models of lexical semantic memory. Typically, it is used as a component of more complex distributional models rather than in isolation. We show that when two simple techniques are appliedâ(1) down-weighting co-occurrences involving low-frequency words in order to address PMIâs so-called âfrequency bias,â and (2) defining co-occurrences as counts of âevents in which instances of word1 and word2 co-occur in a contextâ rather than âcontexts in which word1 and word2 co-occurââthen PMI outperforms default parameterizations of word embedding models in terms of how closely it matches human relatedness judgments. We also identify which down-weighting techniques are most helpful. The results suggest that simple measures may be capable of modeling certain phenomena in semantic memory, and that complex models which incorporate PMI might be improved with these modifications.Cambridge Centre for Digital Knowledg
Word Knowledge and Word Usage
Word storage and processing define a multi-factorial domain of scientific inquiry whose thorough investigation goes well beyond the boundaries of traditional disciplinary taxonomies, to require synergic integration of a wide range of methods, techniques and empirical and experimental findings. The present book intends to approach a few central issues concerning the organization, structure and functioning of the Mental Lexicon, by asking domain experts to look at common, central topics from complementary standpoints, and discuss the advantages of developing converging perspectives. The book will explore the connections between computational and algorithmic models of the mental lexicon, word frequency distributions and information theoretical measures of word families, statistical correlations across psycho-linguistic and cognitive evidence, principles of machine learning and integrative brain models of word storage and processing. Main goal of the book will be to map out the landscape of future research in this area, to foster the development of interdisciplinary curricula and help single-domain specialists understand and address issues and questions as they are raised in other disciplines
A study of model parameters for scaling up word to sentence similarity tasks in distributional semantics
PhDRepresentation of sentences that captures semantics is an essential part of natural language
processing systems, such as information retrieval or machine translation. The representation
of a sentence is commonly built by combining the representations of the words that the sentence
consists of. Similarity between words is widely used as a proxy to evaluate semantic
representations. Word similarity models are well-studied and are shown to positively correlate
with human similarity judgements.
Current evaluation of models of sentential similarity builds on the results obtained in lexical
experiments. The main focus is how the lexical representations are used, rather than what
they should be. It is often assumed that the optimal representations for word similarity are
also optimal for sentence similarity. This work discards this assumption and systematically
looks for lexical representations that are optimal for similarity measurement between sentences.
We find that the best representation for word similarity is not always the best for sentence
similarity and vice versa. The best models in word similarity tasks perform best with additive
composition. However, the best result on compositional tasks is achieved with Kroneckerbased
composition. There are representations that are equally good in both tasks when used
with multiplicative composition.
The systematic study of the parameters of similarity models reveals that the more information
lexical representations contain, the more attention should be paid to noise. In particular,
the word vectors in models with the feature size at the magnitude of the vocabulary size
should be sparse, but if a small number of context features is used then the vectors should be
dense.
Given the right lexical representations, compositional operators achieve state-of-the-art performance,
improving over models that use neural-word embeddings. To avoid overfitting, either
several test datasets should be used or parameter selection should be based on parametersâ
average behaviours.EPSRC grant EP/J002607/1
Computational Analysis of Metabolic Reprogramming in Tumors
BACKGROUND
Cancer is a direct consequence of genomic aberrations, such as somatic copy number alterations that frequently occur in the cancer genome affecting not only oncogenic genes, but also multiple passenger and potential co-driver genes. An intrinsic feature resulting from such a disruption of the genome is deregulation of the tumor metabolic landscape, as a result of which, multiple metabolic genes have been identified as oncogenes, tumor suppressor genes or targets of oncogenic signaling.
RESULTS
Here we elucidate that linear proximity of metabolic and cancer-causing genes in the genome can lead to metabolic remodeling through copy number co-alterations. We observed that cancer-metabolic gene pairs are unexpectedly often proximally positioned in the chromosomes and share loci with altered copy number, thus being either co-deleted or co-amplified across all cancers analyzed (19 cancer types from The Cancer Genome Atlas). We have developed an analysis pipeline - Identification of Metabolic Cancer Genes (iMetCG), to infer the functional impact on oncogenic metabolism from such co-alteration events and delineate genes truly driving cancer metabolism from those that are neutral. Using this approach, we have identified novel and well known metabolic genes that target crucial pathways relevant for tumors. Moreover, using these identified metabolic genes we were able to classify tumors based on its tissue and developmental origins. We further observed that these putative metabolic cancer genes (identified across cancers) had higher network connectivity, were indicators for patient survival, had significant overlap with known cancer metabolic genes and shared similar features with known cancer genes in terms of their isoform diversity, evolutionary rate and selection pressure.
CONCLUSIONS
This thesis provides novel insights into the functional mechanism of metabolic regulation and rewiring of the metabolic landscape in cancer cells. Our pan-cancer, genomic data driven approach revealed a hitherto unknown generic mechanism for large scale metabolic reprogramming in cancer cells based on linear gene proximities and identified 119 new metabolic cancer genes likely to be involved in remodeling tumor cell metabolism. Furthermore, our newly identified metabolic cancer genes will serve as a vital resource to the experimental community engaged in tumor metabolism and genomics research to further expand the scope of this field