66,038 research outputs found
Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections
Financial firms commonly process and store billions of time-series data,
generated continuously and at a high frequency. To support efficient data
storage and retrieval, specialized time-series databases and systems have
emerged. These databases support indexing and querying of time-series by a
constrained Structured Query Language(SQL)-like format to enable queries like
"Stocks with monthly price returns greater than 5%", and expressed in rigid
formats. However, such queries do not capture the intrinsic complexity of high
dimensional time-series data, which can often be better described by images or
language (e.g., "A stock in low volatility regime"). Moreover, the required
storage, computational time, and retrieval complexity to search in the
time-series space are often non-trivial. In this paper, we propose and
demonstrate a framework to store multi-modal data for financial time-series in
a lower-dimensional latent space using deep encoders, such that the latent
space projections capture not only the time series trends but also other
desirable information or properties of the financial time-series data (such as
price volatility). Moreover, our approach allows user-friendly query
interfaces, enabling natural language text or sketches of time-series, for
which we have developed intuitive interfaces. We demonstrate the advantages of
our method in terms of computational efficiency and accuracy on real historical
data as well as synthetic data, and highlight the utility of latent-space
projections in the storage and retrieval of financial time-series data with
intuitive query modalities.Comment: Accepted to ICAIF 202
An evaluation of the role of sentiment in second screen microblog search tasks
The recent prominence of the real-time web is proving both challenging and disruptive for information retrieval and web data mining research. User-generated content on the real-time web is perhaps best epitomised by content on microblogging platforms, such as Twitter. Given the substantial quantity of microblog posts that may be relevant to a user's query at a point in time, automated methods are required to sift through this information. Sentiment analysis offers a promising direction for modelling microblog content. We build and evaluate a sentiment-based filtering system using real-time user studies. We find a significant role played by sentiment in the search scenarios, observing detrimental effects in filtering out certain sentiment types. We make a series of observations regarding associations between document-level sentiment and user feedback, including associations with user profile attributes, and users' prior topic sentiment
An Efficient Content-based Time Series Retrieval System
A Content-based Time Series Retrieval (CTSR) system is an information
retrieval system for users to interact with time series emerged from multiple
domains, such as finance, healthcare, and manufacturing. For example, users
seeking to learn more about the source of a time series can submit the time
series as a query to the CTSR system and retrieve a list of relevant time
series with associated metadata. By analyzing the retrieved metadata, users can
gather more information about the source of the time series. Because the CTSR
system is required to work with time series data from diverse domains, it needs
a high-capacity model to effectively measure the similarity between different
time series. On top of that, the model within the CTSR system has to compute
the similarity scores in an efficient manner as the users interact with the
system in real-time. In this paper, we propose an effective and efficient CTSR
model that outperforms alternative models, while still providing reasonable
inference runtimes. To demonstrate the capability of the proposed method in
solving business problems, we compare it against alternative models using our
in-house transaction data. Our findings reveal that the proposed model is the
most suitable solution compared to others for our transaction data problem
Interactive Pattern Search in Time Series (2004)
The need for pattern discovery in long time series data led researchers to develop algorithms for similarity search. Most of the literature about time series focuses on algorithms that index time series and bring the data into the main storage, thus providing fast information retrieval on large time series. This paper reviews the state of the art in visualizing time series, and focuses on techniques that enable users to interactively query time series. Then it presents TimeSearcher 2, a tool that enables users to explore multidimensional data using coordinated tables and graphs with overview+detail, filter the time series data to reduce the scope of the search, select an existing pattern to find similar occurrences, and interactively adjust similarity parameters to narrow the result set. This tool is an extension of previous work, TimeSearcher 1, which uses graphical timeboxes to interactively query time series data
Using Search Term Positions for Determining Document Relevance
The technological advancements in computer networks and the substantial reduction of their production costs have caused a massive explosion of digitally stored information.
In particular, textual information is becoming increasingly available in electronic form.
Finding text documents dealing with a certain topic is not a simple task. Users need tools to sift through non-relevant information and retrieve only pieces of information relevant to their needs.
The traditional methods of information retrieval (IR) based on search term frequency have somehow reached their limitations, and novel ranking methods based on hyperlink information are not applicable to unlinked documents.
The retrieval of documents based on the positions of search terms in a document has the potential of yielding improvements, because other terms in the environment where a search term appears (i.e. the neighborhood) are considered. That is to say, the grammatical type, position and frequency of other words help to clarify and specify the meaning of a given search term.
However, the required additional analysis task makes position-based methods slower than methods based on term frequency and requires more storage to save the positions of terms. These drawbacks directly affect the performance of the most user critical phase of the retrieval process, namely query evaluation time, which explains the scarce use of positional information in contemporary retrieval systems.
This thesis explores the possibility of extending traditional information retrieval systems with positional information in an efficient manner that permits us to optimize the retrieval performance by handling term positions at query evaluation time.
To achieve this task, several abstract representation of term positions to efficiently store and operate on term positional data are investigated. In the Gauss model, descriptive statistics methods are used to estimate term positional information, because they minimize outliers and irregularities in the data. The Fourier model is based on Fourier series to represent positional information. In the Hilbert model, functional analysis methods are used to provide reliable term position estimations and simple mathematical operators to handle positional data.
The proposed models are experimentally evaluated using standard resources of the IR research community (Text Retrieval Conference). All experiments demonstrate that the use of positional information can enhance the quality of search results. The suggested models outperform state-of-the-art retrieval utilities.
The term position models open new possibilities to analyze and handle textual data. For instance, document clustering and compression of positional data based on these models could be interesting topics to be considered in future research
Boolean versus ranked querying for biomedical systematic reviews
Background: The process of constructing a systematic review, a document that compiles the published evidence pertaining to a specified medical topic, is intensely time-consuming, often taking a team of researchers over a year, with the identification of relevant published research comprising a substantial portion of the effort. The standard paradigm for this information-seeking task is to use Boolean search; however, this leaves the user(s) the requirement of examining every returned result. Further, our experience is that effective Boolean queries for this specific task are extremely difficult to formulate and typically require multiple iterations of refinement before being finalized. Methods: We explore the effectiveness of using ranked retrieval as compared to Boolean querying for the purpose of constructing a systematic review. We conduct a series of experiments involving ranked retrieval, using queries defined methodologically, in an effort to understand the practicalities of incorporating ranked retrieval into the systematic search task. Results: Our results show that ranked retrieval by itself is not viable for this search task requiring high recall. However, we describe a refinement of the standard Boolean search process and show that ranking within a Boolean result set can improve the overall search performance by providing early indication of the quality of the results, thereby speeding up the iterative query-refinement process. Conclusions: Outcomes of experiments suggest that an interactive query-development process using a hybrid ranked and Boolean retrieval system has the potential for significant time-savings over the current search process in the systematic reviewing
A quasi-current representation for information needs inspired by Two-State Vector Formalism
Recently, a number of quantum theory (QT)-based information retrieval (IR) models have been proposed for modeling session search task that users issue queries continuously in order to describe their evolving information needs (IN). However, the standard formalism of QT cannot provide a complete description for usersā current IN in a sense that it does not take the āfutureā information into consideration. Therefore, to seek a more proper and complete representation for usersā IN, we construct a representation of quasi-current IN inspired by an emerging Two-State Vector Formalism (TSVF). With the enlightenment of the completeness of TSVF, a ātwo-state vectorā derived from the āfutureā (the current query) and the āhistoryā (the previous query) is employed to describe usersā quasi-current IN in a more complete way. Extensive experiments are conducted on the session tracks of TREC 2013 & 2014, and show that our model outperforms a series of compared IR models
- ā¦