3,412 research outputs found
Analysis of the temporal and structural features of threads in a mailing-list
A link stream is a collection of triplets indicating that an
interaction occurred between and at time . Link streams model many
real-world situations like email exchanges between individuals, connections
between devices, and others. Much work is currently devoted to the
generalization of classical graph and network concepts to link streams. In this
paper, we generalize the existing notions of intra-community density and
inter-community density. We focus on emails exchanges in the Debian
mailing-list, and show that threads of emails, like communities in graphs, are
dense subsets loosely connected from a link stream perspective
XML content warehousing: Improving sociological studies of mailing lists and web data
In this paper, we present the guidelines for an XML-based approach for the
sociological study of Web data such as the analysis of mailing lists or
databases available online. The use of an XML warehouse is a flexible solution
for storing and processing this kind of data. We propose an implemented
solution and show possible applications with our case study of profiles of
experts involved in W3C standard-setting activity. We illustrate the
sociological use of semi-structured databases by presenting our XML Schema for
mailing-list warehousing. An XML Schema allows many adjunctions or crossings of
data sources, without modifying existing data sets, while allowing possible
structural evolution. We also show that the existence of hidden data implies
increased complexity for traditional SQL users. XML content warehousing allows
altogether exhaustive warehousing and recursive queries through contents, with
far less dependence on the initial storage. We finally present the possibility
of exporting the data stored in the warehouse to commonly-used advanced
software devoted to sociological analysis
A quantitative study of the evolution of open source software communities
Typically, virtual communities exhibit the well-known
phenomenon of participation inequality, which means that only a
small percentage of users is responsible of the majority of
contributions. However, the sustainability of the community requires
that the group of active users must be continuously nurtured with new
users that gain expertise through a participation process. This paper
analyzes the time evolution of Open Source Software (OSS)
communities, considering users that join/abandon the community
over time and several topological properties of the network when
modeled as a social network. More specifically, the paper analyzes
the role of those users rejoining the community and their influence in
the global characteristics of the network
Synchronous development in open-source projects: A higher-level perspective
Mailing lists are a major communication channel for supporting developer coordina tion in open-source software projects. In a recent study, researchers explored tempo ral relationships (e.g., synchronization) between developer activities on source code
and on the mailing list, relying on simple heuristics of developer collaboration (e.g.,
co-editing fles) and developer communication (e.g., sending e-mails to the mailing
list). We propose two methods for studying synchronization between collaboration
and communication activities from a higher-level perspective, which captures the
complex activities and views of developers more precisely than the rather technical
perspective of previous work. On the one hand, we explore developer collaboration
at the level of features (not fles), which are higher-level concepts of the domain and
not mere technical artifacts. On the other hand, we lift the view of developer com munication from a message-based model, which treats each e-mail individually, to
a conversation-based model, which is semantically richer due to grouping e-mails
that represent conceptually related discussions. By means of an empirical study, we
investigate whether the diferent abstraction levels afect the observed relationship
between commit activity and e-mail communication using state-of-the-art time series analysis. For this purpose, we analyze a combined history of 40 years of data
for three highly active and widely deployed open-source projects: QEMU, BusyBox,
and OpenSSL. Overall, we found evidence that a higher-level view on the coordina tion of developers leads to identifying a stronger statistical dependence between the
technical activities of developers than a less abstract and rather technical view
Recommended from our members
Virtual collaborative spaces: a case study on the antecedents of collaboration in an open source software community
Collaboration enables the sharing amongst individuals of resources and knowledge required to innovate. In recent years, this phenomenon has increasingly manifested in virtual collaborative spaces such as open-source software communities because of the advancement in the use of online technologies and the heightened need for distance work. However, it is still unclear which underlying mechanisms foster collaboration in these spaces. By using the Linux kernel open-source software community as a case study, we analyze data from the [email protected] mailing list to model the influence of proximity on the likelihood of collaboration between individuals. Our dataset is composed of 10,513 message replies to the PCI mailing list posted by its 654 active members in the years 2013 to 2015. Our results show that geographical proximity does not have a direct impact on collaboration, while organizational features defined by institutional and organizational proximity do significantly affect collaboration. Cognitive and social proximity also significantly, and positively, affects collaboration, but these relationships show an inverted u-shaped form. Our results confirm the need to develop specific theorizing about virtual spaces, as they present unique features when compared to traditional physical environments.Peer reviewe
Evolution of Conversations in the Age of Email Overload
Email is a ubiquitous communications tool in the workplace and plays an
important role in social interactions. Previous studies of email were largely
based on surveys and limited to relatively small populations of email users
within organizations. In this paper, we report results of a large-scale study
of more than 2 million users exchanging 16 billion emails over several months.
We quantitatively characterize the replying behavior in conversations within
pairs of users. In particular, we study the time it takes the user to reply to
a received message and the length of the reply sent. We consider a variety of
factors that affect the reply time and length, such as the stage of the
conversation, user demographics, and use of portable devices. In addition, we
study how increasing load affects emailing behavior. We find that as users
receive more email messages in a day, they reply to a smaller fraction of them,
using shorter replies. However, their responsiveness remains intact, and they
may even reply to emails faster. Finally, we predict the time to reply, length
of reply, and whether the reply ends a conversation. We demonstrate
considerable improvement over the baseline in all three prediction tasks,
showing the significant role that the factors that we uncover play, in
determining replying behavior. We rank these factors based on their predictive
power. Our findings have important implications for understanding human
behavior and designing better email management applications for tasks like
ranking unread emails.Comment: 11 page, 24th International World Wide Web Conferenc
“Computing” Requirements for Open Source Software: A Distributed Cognitive Approach
Most requirements engineering (RE) research has been conducted in the context of structured and agile software development. Software, however, is increasingly developed in open source software (OSS) forms which have several unique characteristics. In this study, we approach OSS RE as a sociotechnical, distributed cognitive process where distributed actors “compute” requirements—i.e., transform requirements-related knowledge into forms that foster a shared understanding of what the software is going to do and how it can be implemented. Such computation takes place through social sharing of knowledge and the use of heterogeneous artifacts. To illustrate the value of this approach, we conduct a case study of a popular OSS project, Rubinius—a runtime environment for the Ruby programming language—and identify ways in which cognitive workload associated with RE becomes distributed socially, structurally, and temporally across actors and artifacts. We generalize our observations into an analytic framework of OSS RE, which delineates three stages of requirements computation: excavation, instantiation, and testing-in-the-wild. We show how the distributed, dynamic, and heterogeneous computational structure underlying OSS development builds an effective mechanism for managing requirements. Our study contributes to sorely needed theorizing of appropriate RE processes within highly distributed environments as it identifies and articulates several novel mechanisms that undergird cognitive processes associated with distributed forms of RE
A biography of open source software: community participation and individuation of open source code in the context of microfinance NGOs in North Africa and the Middle East
For many, microfinance is about building inclusive financial systems to help the poor
gain direct access to financial services. Hundreds of grassroots have specialised in
the provision of microfinance services worldwide. Most of them are adhoc
organisations, which suffer severe organisational and informational deficiencies.
Over the past decades, policy makers and consortia of microfinance experts have
attempted to improve their capacity building through ICTs. In particular, there is
strong emphasis on open source software (OSS) initiatives, as it is commonly
believed that MFIs are uniquely positioned to benefit from the advantages of
openness and free access. Furthermore, OSS approaches have recently become
extremely popular. The OSS gurus are convinced there is a business case for a purely
open source approach, especially across international development spheres.
Nonetheless, getting people to agree on what is meant by OSS remains hard to
achieve. On the one hand scholarly software research shows a lack of consensus and
documents stories in which the OSS meaning is negotiated locally. On the other, the
growing literature on ICT-for-international development does not provide answers as
research, especially in the microfinance context, presents little empirical scrutiny.
This thesis therefore critically explores the OSS in the microfinance context in order
to understand itslong-term development and what might be some of the implications
for MFIs.
Theoretically I draw on the 3rd wave of research within the field of Science and
Technology Studies –studies of Expertise and Experience (SEE). I couple the
software ‘biography’ approach (Pollock and Williams 2009) with concepts from
Simondon’s thesis on the individuation of technical beings (1958) as an integrated
framework. I also design a single case study, which is supported by an extensive and
longitudinal collection of data and a three-stage approach, including the analysis of
sociograms, and email content. This case provides a rich empirical setting that
challenges the current understanding of the ontology of software and goes beyond
the instrumental views of design, building a comprehensive framework for
community participation and software sustainability in the context of the
microfinance global industry
Recommended from our members
Robust behavioral malware detection
Computer security attacks evolve to evade deployed defenses. Recent attacks have ranged from exploiting generic software vulnerabilities in memory-unsafe languages such as buffer overflows and format string vulnerabilities to exploiting logic errors in web applications, through means such as SQL injection and cross-site scripting. Furthermore, recent attacks have focused on escalating privileges
and stealing sensitive information by exploiting new hardware or operating system (OS) interfaces. Computer security attacks are also now relying on social engineering techniques to run malicious programs on victims' machines; instances of such abuse include phishing and watering hole attacks, both of which trick people into running malicious code or divulging confidential information. Thus, traditional computer security methods, such as OS confinement and program analysis, will not prevent new attacks that do not violate OS confinement or present illegal program behaviors.
Another challenge is that traditional security approaches have large trusted code bases (TCBs), which include hardware, OSs, and other software components that implement authentication and authorization logic across a distributed system. This is a vulnerable area because these components are complex and often contain vulnerabilities that undermine the overall system's integrity or confidentiality.
Evasive attacks on vulnerable systems -- especially in instances where trusted components turn malicious -- inspire the creation of defenses that can augment formally specified mechanisms against known threats. Specifically, this thesis advances the state of the art in behavioral malware detection -- detecting previously unknown malware in the very early stages of infection within an enterprise network.
Here we assess three fundamental insights of modern-day attacks and then describe a cross-layer defense against such attacks. First, we make a low-level machine state visible to behavioral analysis, significantly minimizing the TCB and its associated vulnerabilities. Specifically, our behavioral detector utilizes an executable code's dynamic properties, with architectural and micro-architectural states as input. Second, we evaluate behavioral detectors against adaptive adversaries. For this purpose, we introduce a new metric to determine a detector's robustness against malware modifications, which serves as a step toward explainability of machine learning-based malware detectors. Finally, we exploit the fact that attacks spread through only a limited number of vectors and propose new techniques to analyze the resulting dynamic correlations created among machines. These insights show that behavioral detectors can efficiently protect both individual devices and end hosts within enterprise networks. We present three types of such behavioral detectors.
Sherlock protects resource-constrained devices, such as mobile phones and Internet-of-things (IoT) devices, without modifying the software/hardware stack. Sherlock's supervised and unsupervised versions outperform prior work by 24.7% and 12.5% (area under the curve (AUC) metric), respectively, and detects stealthy malware that often evades static analysis tools.
The second behavioral detector, Shape-GD, protects devices within an enterprise network. It monitors devices on the network, aggregates data from weak local detectors, overlays that with network-level information, and then makes early, robust predictions regarding malicious activity. Shape-GD achieves its goals by exploiting latent attack semantics. Specifically, it analyzes communication patterns across multiple devices, partitioning them into neighborhoods. Devices within the same neighborhood are likely to be exposed to the same attack vector. Furthermore, we hypothesize that the conditional distribution of false positives is different from that of true positives; i.e., given a neighborhood of nodes, we can compute the aggregate distributional shape of alert feature vectors from the neighborhood itself and provide robust labels.
We evaluate Shape-GD by emulating a large community of Windows systems using the system call traces from a few thousand malicious and benign applications; we simulate both a phishing attack in a corporate email network as well as a watering hole attack through a popular website. In both scenarios, Shape-GD identifies malware early on (~100 infected nodes in a ~100K-node system for watering hole attacks, and ~10 of ~1,000 for phishing attacks) and robustly (with ~100% global true-positive and ~1% global false-positive rates).
The third behavioral detector, Centurion, detects malware across machines monitored by an anti-virus company. It is able to analyze behavior from 5 million Symantec client machines in real time and discovers malware by correlating file downloads across multiple machines. Compared with a recent local detector that analyzes metadata from file downloads, Centurion reduced the number of false positives from ~1M to ~110K and increased the true-positive rate by a factor of ~2.5. In addition, on average, Centurion detects malware 345 days earlier than commercial anti-virus products.Electrical and Computer Engineerin
- …