251 research outputs found
A Rule-based Part-of-speech Tagger for Classical Tibetan
This paper reports on the development of a rule-based part-of-speech tagger for Classical Tibetan. Far from being an obscure tool of minor utility to scholars, the rule-based tagger is a key component of a larger initiative aimed at radically transforming the practice of Tibetan linguistics through the application of corpus and computational methods
The contribution of corpus linguistics to lexicography and the future of Tibetan dictionaries
The first alphabetized dictionary of Tibetan appeared in 1829 (cf. Bray 2008) and the intervening 184 years have witnessed the publication of scores of other Tibetan dictionaries (cf. Simon 1964). Hundreds of Tibetan dictionaries are now available; these include bilin
gual dictionaries, both to and from such languages
as English, French, German, Latin, Japanese, etc. and specialized dictionaries focusing on medicine, plants, dialects, archaic terms, neologisms, etc. (cf. Walter 2006, McGrath 2008). However, if one classifies Tibetan dictionaries by the methods of their compilation the
accomplishments of Tibetan lexicography are less impressive.
Methodologies of dictionary compilation divide heuristically into three types. First, some dictionaries lack explicit methodology; these works assemble words in an
ad hoc manner and illustrate them with invented examples. Second, there are dictionaries that are compiled over very long periods of time on the basis of collections of slips
recording attestations of words as used in context. Third, more recent dictionaries are compiled on the basis of electronic text corpora, which are processed computationally to aid in the precision, consistency and speed of dictionary compilation. These methods may be called respectively the 'informal method', the 'traditional method', and the 'modern method'. The overwhelming majority of Tibetan dictionaries were compiled with the informal method. Only five Tibetan dictionaries use the traditional methodology. No Tibetan dictionary yet compiled makes
use of the modern method
Compartmentalized PDE4A5 signaling impairs hippocampal synaptic plasticity and long-term memory
Alterations in cAMP signaling are thought to contribute to neurocognitive and neuropsychiatric disorders. Members of the cAMP-specific phosphodiesterase 4 (PDE4) family, which contains >25 different isoforms, play a key role in determining spatial cAMP degradation so as to orchestrate compartmentalized cAMP signaling in cells. Each isoform binds to a different set of protein complexes through its unique N-terminal domain, thereby leading to targeted degradation of cAMP in specific intracellular compartments. However, the functional role of specific compartmentalized PDE4 isoforms has not been examined in vivo. Here, we show that increasing protein levels of the PDE4A5 isoform in mouse hippocampal excitatory neurons impairs a long-lasting form of hippocampal synaptic plasticity and attenuates hippocampus-dependent long-term memories without affecting anxiety. In contrast, viral expression of a truncated version of PDE4A5, which lacks the unique N-terminal targeting domain, does not affect long-term memory. Further, overexpression of the PDE4A1 isoform, which targets a different subset of signalosomes, leaves memory undisturbed. Fluorescence resonance energy transfer sensor-based cAMP measurements reveal that the full-length PDE4A5, in contrast to the truncated form, hampers forskolin-mediated increases in neuronal cAMP levels. Our study indicates that the unique N-terminal localization domain of PDE4A5 is essential for the targeting of specific cAMP-dependent signaling underlying synaptic plasticity and memory. The development of compounds to disrupt the compartmentalization of individual PDE4 isoforms by targeting their unique N-terminal domains may provide a fruitful approach to prevent cognitive deficits in neuropsychiatric and neurocognitive disorders that are associated with alterations in cAMP signaling
Data context informed data wrangling
The process of preparing potentially large and complex data sets for further
analysis or manual examination is often called data wrangling. In classical
warehousing environments, the steps in such a process have been carried out
using Extract-Transform-Load platforms, with significant manual involvement in
specifying, configuring or tuning many of them. Cost-effective data wrangling
processes need to ensure that data wrangling steps benefit from automation
wherever possible. In this paper, we define a methodology to fully automate an
end-to-end data wrangling process incorporating data context, which associates
portions of a target schema with potentially spurious extensional data of types
that are commonly available. Instance-based evidence together with data
profiling paves the way to inform automation in several steps within the
wrangling process, specifically, matching, mapping validation, value format
transformation, and data repair. The approach is evaluated with real estate
data showing substantial improvements in the results of automated wrangling
The VADA Architecture for Cost-Effective Data Wrangling
Data wrangling, the multi-faceted process by which the data
required by an application is identified, extracted, cleaned
and integrated, is often cumbersome and labor intensive.
In this paper, we present an architecture that supports a
complete data wrangling lifecycle, orchestrates components
dynamically, builds on automation wherever possible, is informed
by whatever data is available, refines automatically
produced results in the light of feedback, takes into account
the userâs priorities, and supports data scientists with diverse
skill sets. The architecture is demonstrated in practice
for wrangling property sales and open government data
- âŠ