279 research outputs found
Is Stack Overflow Overflowing With Questions and Tags
Programming question and answer (Q & A) websites, such as Quora, Stack
Overflow, and Yahoo! Answer etc. helps us to understand the programming
concepts easily and quickly in a way that has been tested and applied by many
software developers. Stack Overflow is one of the most frequently used
programming Q\&A website where the questions and answers posted are presently
analyzed manually, which requires a huge amount of time and resource. To save
the effort, we present a topic modeling based technique to analyze the words of
the original texts to discover the themes that run through them. We also
propose a method to automate the process of reviewing the quality of questions
on Stack Overflow dataset in order to avoid ballooning the stack overflow with
insignificant questions. The proposed method also recommends the appropriate
tags for the new post, which averts the creation of unnecessary tags on Stack
Overflow.Comment: 11 pages, 7 figures, 3 tables Presented at Third International
Symposium on Women in Computing and Informatics (WCI-2015
IPhone Securtity Analysis
The release of Apple’s iPhone was one of the most intensively publicized product releases in the history of mobile devices. While the iPhone wowed users with its exciting design and features, it also outraged many for not allowing installation of third party applications and for working exclusively with AT&T wireless services for the first two years. Software attacks have been developed to get around both limitations. The development of those attacks and further evaluation revealed several vulnerabilities in iPhone security. In this paper, we examine several of the attacks developed for the iPhone as a way of investigating the iPhone’s security structure. We also analyze the security holes that have been discovered and make suggestions for improving iPhone security
Asking Questions is Easy, Asking Great Questions is Hard: Constructing Effective Stack Overflow Questions
This paper explores and seeks to improve the ways in which Stack Overflow question posts can elicit answers. Using statistical data analysis approaches and reviews of existing literature, we pin- point three key factors that are found in many previously success- ful/answerable questions. We then present a prototypical sidebar for the ask page that leverages these factors to dynamically (1) evaluate the quality of questions in construction (2) display answer previews of relevant questions and (3) scaffold the identified factors to subsequent askers during their question development processes
Modeling Tag Prediction based on Question Tagging Behavior Analysis of CommunityQA Platform Users
In community question-answering platforms, tags play essential roles in
effective information organization and retrieval, better question routing,
faster response to questions, and assessment of topic popularity. Hence,
automatic assistance for predicting and suggesting tags for posts is of high
utility to users of such platforms. To develop better tag prediction across
diverse communities and domains, we performed a thorough analysis of users'
tagging behavior in 17 StackExchange communities. We found various common
inherent properties of this behavior in those diverse domains. We used the
findings to develop a flexible neural tag prediction architecture, which
predicts both popular tags and more granular tags for each question. Our
extensive experiments and obtained performance show the effectiveness of our
modelComment: 20 page
Fine-grained reasoning about the security and usability trade-off in modern security tools
Defense techniques detect or prevent attacks based on their ability to model the attacks. A balance between security and usability should always be established in any kind of defense technique. Attacks that exploit the weak points in security tools are very powerful and thus can go undetected. One source of those weak points in security tools comes when security is compromised for usability reasons, where if a security tool completely secures a system against attacks the whole system will not be usable because of the large false alarms or the very restricted policies it will create, or if the security tool decides not to secure a system against certain attacks, those attacks will simply and easily succeed. The key contribution of this dissertation is that it digs deeply into modern security tools and reasons about the inherent security and usability trade-offs based on identifying the low-level, contributing factors to known issues. This is accomplished by implementing full systems and then testing those systems in realistic scenarios. The thesis that this dissertation tests is that we can reason about security and usability trade-offs in fine-grained ways by building and testing full systems. Furthermore, this dissertation provides practical solutions and suggestions to reach a good balance between security and usability. We study two modern security tools, Dynamic Information Flow Tracking (DIFT) and Antivirus (AV) software, for their importance and wide usage. DIFT is a powerful technique that is used in various aspects of security systems. It works by tagging certain inputs and propagating the tags along with the inputs in the target system. However, current DIFT systems do not track implicit information flow because if all DIFT propagation rules are directly applied in a conservative way, the target system will be full of tagged data (a problem called overtagging) and thus useless because the tags tell us very little about the actual information flow of the system. So, current DIFT systems drop some security for usability. In this dissertation, we reason about the sources of the overtagging problem and provide practical ways to deal with it, while previous approaches have focused on abstract descriptions of the main causes of the problem based on limited experiments. The second security tool we consider in this dissertation is antivirus (AV) software. AV is a very important tool that protects systems against worms and viruses by scanning data against a database of signatures. Despite its importance and wide usage, AV has received little attention from the security research community. In this dissertation, we examine the AV internals and reason about the possibility of creating timing channel attacks against AV software. The attacker could infer information about the AV based only on the scanning time the AV spends to scan benign inputs. The other aspect of AV this dissertation explores is the low-level AV performance impact on systems. Even though the performance overhead of AV is a well known issue, the exact reasons behind this overhead are not well-studied. In this dissertation, we design a methodology that utilizes Event Tracing for Windows technology (ETW), a technology that accounts for all OS events, to reason about AV performance impact from the OS point of view. We show that the main performance impact of the AV on a task is the longer waiting time the task spends waiting on events
The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation
We consider training models on private data that is distributed across user
devices. To ensure privacy, we add on-device noise and use secure aggregation
so that only the noisy sum is revealed to the server. We present a
comprehensive end-to-end system, which appropriately discretizes the data and
adds discrete Gaussian noise before performing secure aggregation. We provide a
novel privacy analysis for sums of discrete Gaussians. We also analyze the
effect of rounding the input data and the modular summation arithmetic. Our
theoretical guarantees highlight the complex tension between communication,
privacy, and accuracy. Our extensive experimental results demonstrate that our
solution is essentially able to achieve a comparable accuracy to central
differential privacy with 16 bits of precision per value
Simplifying Embedded System Development Through Whole-Program Compilers
As embedded systems embrace ever more complicated microcontrollers, they present both new capability and new complexity. To simplify their development, some lessons of computer application development will translate with additional work. This thesis offers one such translation. It shows how whole-program compilers - those that broadly analyze a program\u27s entire source code - can achieve performance gains and remove faults in embedded system applications. In so doing, this yields a novel stackless threading system named UnStacked C. UnStacked C enables cooperative multithreading without the risk of stack overflows in embedded system applications. We also propose a novel preemption system called Lazy Preemption. Unstacked C with Lazy Preemption enables stackless preemptive multithreading in embedded systems. These remove the possibility of thread stack overflows, but also significantly reduces the memory required for multithreading in embedded system
Recommended from our members
Scalable hardware memory disambiguation
This dissertation deals with one of the long-standing problems in Computer Architecture
– the problem of memory disambiguation. Microprocessors typically reorder
memory instructions during execution to improve concurrency. Such microprocessors
use hardware memory structures for memory disambiguation, known as LoadStore
Queues (LSQs), to ensure that memory instruction dependences are satisfied
even when the memory instructions execute out-of-order. A typical LSQ implementation
(circa 2006) holds all in-flight memory instructions in a physically centralized
LSQ and performs a fully associative search on all buffered instructions to ensure
that memory dependences are satisfied. These LSQ implementations do not scale
because they use large, fully associative structures, which are known to be slow and
power hungry. The increasing trend towards distributed microarchitectures further
exacerbates these problems. As on-chip wire delays increase and high-performance
processors become necessarily distributed, centralized structures such as the LSQ
can limit scalability.
This dissertation describes techniques to create scalable LSQs in both centralized
and distributed microarchitectures. The problems and solutions described
in this thesis are motivated and validated by real system designs. The dissertation
starts with a description of the partitioned primary memory system of the TRIPS
processor, of which the LSQ is an important component, and then through a series
of optimizations describes how the power, area, and centralization problems
of the LSQ can be solved with minor performance losses (if at all) even for large
number of in flight memory instructions. The four solutions described in this dissertation
— partitioning, filtering, late binding and efficient overflow management —
enable power-, area-efficient, distributed and scalable LSQs, which in turn enable
aggressive large-window processors capable of simultaneously executing thousands
of instructions.
To mitigate the power problem, we replaced the power-hungry, fully associative
search with a power-efficient hash table lookup using a simple address-based
Bloom filter. Bloom filters are probabilistic data structures used for testing set
membership and can be used to quickly check if an instruction with the same data
address is likely to be found in the LSQ without performing the associative search.
Bloom filters typically eliminate more than 80% of the associative searches and they
are highly effective because in most programs, it is uncommon for loads and stores
to have the same data address and be in execution simultaneously.
To rectify the area problem, we observe the fact that only a small fraction
of all memory instructions are dependent, that only such dependent instructions
need to be buffered in the LSQ, and that these instructions need to be in the LSQ
only for certain parts of the pipelined execution. We propose two mechanisms to
exploit these observations. The first mechanism, area filtering, is a hardware mechanism
that couples Bloom filters and dependence predictors to dynamically identify
and buffer only those instructions which are likely to be dependent. The second
mechanism, late binding, reduces the occupancy and hence size of the LSQ. Both of
these optimizations allows the number of LSQ slots to be reduced by up to one-half
compared to a traditional organization without any performance degradation.
Finally, we describe a new decentralized LSQ design for handling LSQ structural
hazards in distributed microarchitectures. Decentralization of LSQs, and to
a large extent distributed microarchitectures with memory speculation, has proved
to be impractical because of the high performance penalties associated with the
mechanisms for dealing with hazards. To solve this problem, we applied classic
flow-control techniques from interconnection networks for handling resource con-
flicts. The first method, memory-side buffering, buffers the overflowing instructions
in a separate buffer near the LSQs. The second scheme, execution-side NACKing,
sends the overflowing instruction back to the issue window from which it is later
re-issued. The third scheme, network buffering, uses the buffers in the interconnection
network between the execution units and memory to hold instructions when the
LSQ is full, and uses virtual channel flow control to avoid deadlocks. The network
buffering scheme is the most robust of all the overflow schemes and shows less than
1% performance degradation due to overflows for a subset of SPEC CPU 2000 and
EEMBC benchmarks on a cycle-accurate simulator that closely models the TRIPS
processor.
The techniques proposed in this dissertation are independent, architectureneutral
and their cumulative benefits result in LSQs that can be partitioned at a
fine granularity and have low design complexity. Each of these partitions selectively
buffers only memory instructions with true dependences and can be closely coupled
with the execution units thus minimizing power, area, and latency. Such LSQ
designs with near-ideal characteristics are well suited for microarchitectures with
thousands of instructions in-flight and may enable even more aggressive microarchitectures
in the future.Computer Science
Predictions to Ease Users' Effort in Scalable Sharing
Significant user effort is required to choose recipients of shared information, which grows as the scale of the number of potential or target recipients increases. It is our thesis that it is possible to develop new approaches to predict persistent named groups, ephemeral groups, and response times that will reduce user effort. We predict persistent named groups using the insight that implicit social graphs inferred from messages can be composed with existing prediction techniques designed for explicit social graphs, thereby demonstrating similar grouping patterns in email and communities. However, this approach still requires that users know when to generate such predictions. We predict group creation times based on the intuition that bursts of change in the social graph likely signal named group creation. While these recommendations can help create new groups, they do not update existing ones. We predict how existing named groups should evolve based on the insight that the growth rates of named groups and the underlying social graph will match. When appropriate named groups do not exist, it is useful to predict ephemeral groups of information recipients. We have developed an approach to make hierarchical recipient recommendations that groups the elements in a flat list of recommended recipients, and thus is composable with existing flat recipient-recommendation techniques. It is based on the insight that groups of recipients in past messages can be organized in a tree. To help users select among alternative sets of recipients, we have made predictions about the scale of response time of shared information, based on the insights that messages addressed to similar recipients or containing similar titles will yield similar response times. Our prediction approaches have been applied to three specific systems - email, Usenet and Stack Overflow - based on the insight that email recipients correspond to Stack Overflow tags and Usenet newsgroups. We evaluated these approaches with actual user data using new metrics for measuring the differences in scale between predicted and actual response times and measuring the costs of eliminating spurious named-group predictions, editing named-group recommendations for use in future messages, scanning and selecting hierarchical ephemeral group-recommendations, and manually entering recipients.Doctor of Philosoph
- …