1,477 research outputs found
Harvesting Training Images for Fine-Grained Object Categories using Visual Descriptions
We harvest training images for visual object recognition by casting it as an IR task. In contrast to previous work, we concentrate on fine-grained object categories, such as the large number of particular animal subspecies, for which manual annotation is expensive. We use 'visual descriptions' from nature guides as a novel augmentation to the well-known use of category names. We use these descriptions in both the query process to find potential category images as well as in image reranking where an image is more highly ranked if web page text surrounding it is similar to the visual descriptions. We show the potential of this method when harvesting images for 10 butterfly categories: when compared to a method that relies on the category name only, using visual descriptions improves precision for many categories
Discovering conversational topics and emotions associated with Demonetization tweets in India
Social media platforms contain great wealth of information which provides us
opportunities explore hidden patterns or unknown correlations, and understand
people's satisfaction with what they are discussing. As one showcase, in this
paper, we summarize the data set of Twitter messages related to recent
demonetization of all Rs. 500 and Rs. 1000 notes in India and explore insights
from Twitter's data. Our proposed system automatically extracts the popular
latent topics in conversations regarding demonetization discussed in Twitter
via the Latent Dirichlet Allocation (LDA) based topic model and also identifies
the correlated topics across different categories. Additionally, it also
discovers people's opinions expressed through their tweets related to the event
under consideration via the emotion analyzer. The system also employs an
intuitive and informative visualization to show the uncovered insight.
Furthermore, we use an evaluation measure, Normalized Mutual Information (NMI),
to select the best LDA models. The obtained LDA results show that the tool can
be effectively used to extract discussion topics and summarize them for further
manual analysis.Comment: 6 pages, 11 figures. arXiv admin note: substantial text overlap with
arXiv:1608.02519 by other authors; text overlap with arXiv:1705.08094 by
other author
Locating bugs without looking back
Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, current state-of-the-art IR approaches rely on project history, in particular previously fixed bugs or previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring method is based on heuristics identified through manual inspection of a small sample of bug reports. We compare our approach to eight others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 27 and equal 2. Over the projects analysed, on average we find one or more affected files in the top 10 ranked files for 76% of the bug reports. These results show the applicability of our approach to software projects without history
Value-based analysis of routine pathologic septal and inferior turbinate specimens.
This article was presented at the 2012 AAO-HNSF Annual Meeting & OTO EXPO; September 9-12, 2012; Washington, DC.
Objective To determine the frequency and clinical relevance of unanticipated histopathologic results in routine sinonasal surgery and evaluate the necessity for histologic processing of nasal septal cartilage, bone, and inferior turbinate specimens. Study Design Case series with chart review. Setting Tertiary care academic medical center. Subjects and Methods A retrospective review of surgical pathology reports on adult patients undergoing sinonasal surgery during a 5-year period from 2005 to 2010 was performed. All cases with the preoperative diagnosis of sinonasal neoplasia, autoimmune disease, or directed septal biopsies were excluded from review. Results A total of 1194 pathology reports were reviewed from 1172 individual patients. This included histopathologic evaluation of 1194 septal cartilage and bone specimens and 714 inferior turbinate specimens. None of the patients had unanticipated histopathologic findings that were clinically significant. Conclusion Many surgeons obtain histopathologic diagnoses on all tissue removed from a patient. Based on our institutional case series, histopathology of the septum and inferior turbinates in routine sinonasal cases may not be necessary. A value-based approach to processing grossly unremarkable septal and turbinate tissue by waiving histologic processing and subsequent microscopic evaluation could provide significant cost savings
Analyzing Network Level Information
This chapter provides a brief description of the methods employed for collecting initial information about a given suspicious online communication message, including header and network information; and how to forensically analyze the dataset to attain the information that would be necessary to trace back to the source of the crime. The header content and network information are usually the immediate sources for collecting preliminary information about a given collection of suspicious online messages. The header analysis of an e-mail corpus identifying all the senders, the recipients associated with each sender, and the frequency of messages exchanged between users helps an investigator to understand the overall nature of e-mail communication. Electronic messages like e-mails or virtual network data present a potential dataset or a source of evidence containing personal communications, critical business communications, or agreements. When a crime is committed, it is always possible for the perpetrator to manipulate e-mails or any electronic evidence, forging the details to remove relevant evidence or tampering the data to mislead the investigator. Possible manipulation of such evidence may include backdating, executing time-stamp changes, altering the message sender, recipient, or message content, etc. However, such attempts of manipulation and misleading can be detected by examining the message header. By examining e-mail header and analyzing network information through forensic analysis, investigators can gain valuable insight into the source of a message that is otherwise not traceable through the message body. Investigators can utilize a range of existing algorithms and models and build on leveraging typical forensic planning. Such models focus on what type of information should be collected, ensuring the forensically sound collection and preservation of identified Electronically Stored Information (ESI). By applying these models, it is possible to achieve a full analysis and collect all the relevant information pertaining to the crime. The collected finding is then compiled to reconstruct the whole crime scene, deduct more accurate and logical conclusions [1]
Cluster Based Term Weighting Model for Web Document Clustering
The term weight is based on the frequency with which the term appears in that document. The term weighting scheme measures the importance of a term with respect to a document and a collection. A term with higher weight is more important than a term with lower weight. A document ranking model uses these term weights to find the rank of a document in a collection. We propose a cluster-based term weighting models based on the TF-IDF model. This term weighting model update the inter-cluster and intra-cluster frequency components uses the generated clusters as a reference in improving the retrieved relevant documents. These inter cluster and intra-cluster frequency components are used for weighting the importance of a term in addition to the term and document frequency components
Criminal Information Mining
In the previous chapters, the different aspects of the authorship analysis problem were discussed. This chapter will propose a framework for extracting criminal information from the textual content of suspicious online messages. Archives of online messages, including chat logs, e-mails, web forums, and blogs, often contain an enormous amount of forensically relevant information about potential suspects and their illegitimate activities. Such information is usually found in either the header or body of an online document. The IP addresses, hostnames, sender and recipient addresses contained in the e-mail header, the user ID used in chats, and the screen names used in web-based communication help reveal information at the user or application level. For instance, information extracted from a suspicious e-mail corpus helps us to learn who the senders and recipients are, how often they communicate, and how many types of communities/cliques there are in a dataset. Such information also gives us an insight into the inter and intra-community patterns of communication. A clique or a community is a group of users who have an online communication link between them. Header content or user-level information is easy to extract and straightforward to use for the purposes of investigation
Non-invasive MRI biomarkers for the early assessment of iron overload in a humanized mouse model of β-thalassemia
β-thalassemia (βT) is a genetic blood disorder causing profound and life threatening anemia. Current clinical management of βT is a lifelong dependence on regular blood transfusions, a consequence of which is systemic iron overload leading to acute heart failure. Recent developments in gene and chelation therapy give hope of better prognosis for patients, but successful translation to clinical practice is hindered by the lack of thorough preclinical testing using representative animal models and clinically relevant quantitative biomarkers. Here we demonstrate a quantitative and non-invasive preclinical Magnetic Resonance Imaging (MRI) platform for the assessment of βT in the γβ(0)/γβ(A) humanized mouse model of βT. Changes in the quantitative MRI relaxation times as well as severe splenomegaly were observed in the heart, liver and spleen in βT. These data showed high sensitivity to iron overload and a strong relationship between quantitative MRI relaxation times and hepatic iron content. Importantly these changes preceded the onset of iron overload cardiomyopathy, providing an early biomarker of disease progression. This work demonstrates that multiparametric MRI is a powerful tool for the assessment of preclinical βT, providing sensitive and quantitative monitoring of tissue iron sequestration and cardiac dysfunction- parameters essential for the preclinical development of new therapeutics
Chemical Evolution of CoCrMo Wear Particles: An in Situ Characterization Study
The unexpected high failure rates of CoCrMo hip implants are associated with the release of a large number of inflammatory wear particles. CoCrMo is nominally a stable material; however, previous chemical speciation studies on CoCrMo wear particles obtained from periprosthetic tissue revealed only trace amounts of Co remaining despite Co being the major component of the alloy. The unexpected high levels of Co dissolution in vivo raised significant clinical concerns particularly related to the Cr speciation in the dissolution process. At high electrochemical potentials, the alloy's Cr-rich passive film breaks down (transpassive polarization), facilitating alloy dissolution. The potential release of the carcinogenic Cr(VI) species in vivo has been a subject of debate. While the large-scale Co dissolution observed on in vivo produced particles could indicate a highly oxidizing in vivo environment, Cr(VI) species were not previously detected in periprosthetic tissue samples (except in the specific case of post-mortem tissue of diabetic patients). However, Cr(VI) is likely to be an unstable (transient) species in biological environments, and studies on periprosthetic tissue do not provide information about intermediate reaction products or the exposure history of the wear particles. Here, an in situ spectromicroscopy approach was developed, utilizing the high chemical resolution of synchrotron radiation, to study CoCrMo reactivity as a function of time and oxidizing conditions. The results reveal limited Co dissolution from CoCrMo particles, which increases dramatically at a critical electrochemical potential. Furthermore, in situ XAS detected only Cr(III) dissolution, even at potentials where Cr(VI) is known to be produced, suggesting that Cr(VI) species are extremely transient in simulated biological environments where the oxidation zone is small
- …