Welcome to the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21) submissions site. Machine learning (ML) models trained on personal data have been shown to leak information about users. USENIX discourages program co-chairs from submitting papers to the conferences they organize, although they are allowed to do so. Zeph executes privacy-adhering data transformations in real-time and scales to thousands of data sources, allowing it to support large-scale low-latency data stream analytics. The copyback-aware block allocation considers different copy costs at different copy paths within the SSD. In experiments with real DL jobs and with trace-driven simulations, Pollux reduces average job completion times by 37-50% relative to state-of-the-art DL schedulers, even when they are provided with ideal resource and training configurations for every job. Novel system designs, thorough empirical work, well-motivated theoretical results, and new application areas are all . Calibrated interrupts increase throughput by up to 35%, reduce CPU consumption by as much as 30%, and achieve up to 37% lower latency when interrupts are coalesced. OSDI brings together professionals from academic and industrial backgrounds in what has become a premier forum for discussing the design, implementation, and implications of systems software. These are hard deadlines, and no extensions will be given. Pollux promotes fairness among DL jobs competing for resources based on a more meaningful measure of useful job progress, and reveals a new opportunity for reducing DL cost in cloud environments. They collectively make the backup fresh, columnar, and fault-tolerant, even facing millions of concurrent transactions per second. We develop a prototype of Zeph on Apache Kafka to demonstrate that Zeph can perform large-scale privacy transformations with low overhead. We implemented the ZNS+ SSD at an SSD emulator and a real SSD. Although the number of submissions is lower than the past, it's likely only due to the late announcement; being in my first OSDI PC, I think the quality of the submitted and accepted papers remains as high as ever. Collaboration: You have a collaboration on a project, publication, grant proposal, program co-chairship, or editorship within the past two years (December 2018 through March 2021). The key insight in blk-switch is that Linux's multi-queue storage design, along with multi-queue network and storage hardware, makes the storage stack conceptually similar to a network switch. However, a plethora of recent data breaches show that even widely trusted service providers can be compromised. Thanks to selective profiling, DMons profiling overhead is 1.36% on average, making it feasible for production use. Papers must be in PDF format and must be submitted via the submission form. Main conference program: 5-8 April 2022. We present the results of a 1% experiment at fleet scale as well as the longitudinal rollout in Googles warehouse scale computers. Our evaluation shows that, compared to existing participant selection mechanisms, Oort improves time-to-accuracy performance by 1.2X-14.1X and final model accuracy by 1.3%-9.8%, while efficiently enforcing developer-specified model testing criteria at the scale of millions of clients. Poor data locality hurts an application's performance. We present DistAI, a data-driven automated system for learning inductive invariants for distributed protocols. We develop MAGE, an execution engine for SC that efficiently runs SC computations that do not fit in memory. DMons targeted optimizations provide 16.83% speedup on average (up to 53.14%), compared to a baseline that uses the highest level of compiler optimization. A graph neural network (GNN) enables deep learning on structured graph data. 64 papers accepted out of 341 submitted. However, existing enclave designs fail to meet the requirements of scalability demanded by new scenarios like serverless computing, mainly due to the limitations in their secure memory protection mechanisms, including static allocation, restricted capacity and high-cost initialization. Editor in charge: Daniel Petrolia . Upon these two primitives, our system can scale to thousands of concurrent enclaves with high resource utilization and eliminate the high-cost initialization of secure memory using fork-style enclave creation without weakening the security guarantees. We present Nap, a black-box approach that converts concurrent persistent memory (PM) indexes into NUMA-aware counterparts. We implement DeSearch for two existing decentralized services that handle over 80 million records and 240 GBs of data, and show that DeSearch can scale horizontally with the number of workers and can process 128 million search queries per day. We prove that DistAI is guaranteed to find the -free inductive invariant that proves the desired safety properties in finite time, if one exists. Session Chairs: Sebastian Angel, University of Pennsylvania, and Malte Schwarzkopf, Brown University, Ishtiyaque Ahmad, Yuntian Yang, Divyakant Agrawal, Amr El Abbadi, and Trinabh Gupta, University of California Santa Barbara. While compiler-based techniques have been proposed to improve data locality, they depend on heuristics, which can sometimes hurt performance. Simultaneous submission of the same work to multiple venues, submission of previously published work, or plagiarism constitutes dishonesty or fraud. The hybrid segment recycling chooses a proper block reclaiming policy between segment compaction and threaded logging based on their costs. Under different configurations of TPC-C and TPC-E, Polyjuice can achieve throughput numbers higher than the best of existing algorithms by 15% to 56%. We also show that Marius can scale training to datasets an order of magnitude beyond a single machine's GPU and CPU memory capacity, enabling training of configurations with more than a billion edges and 550 GB of total parameters on a single machine with 16 GB of GPU memory and 64 GB of CPU memory. Just using Lambdas on top of CPU servers offers up to 2.75 more performance-per-dollar than training only with CPU servers. Additionally, there is no assurance that data processing and handling comply with the claimed privacy policies. Owing to the sequential write-only zone scheme of the ZNS, the log-structured file system (LFS) is required to access ZNS solid-state drives (SSDs). USENIX NSDI, 2021 Acceptance Rate: 15.99% Fluid: Resource-Aware Hyperparameter Tuning Engine P. Yu*, J. Liu*, M. Chowdhury (*Equal contribution) MLSys, 2021 Acceptance Rate: 23.53% NetLock: Fast, Centralized Lock Management Using Programmable Switches Z. Yu, Y. Zhang, V. Braverman, M. Chowdhury, X. Jin ACM SIGCOMM, 2020 Acceptance Rate: 21.6% Here, we focus on hugepage coverage. See the Preview Session page for an overview of the topics covered in the program. This paper presents Dorylus: a distributed system for training GNNs. Qing Wang, Youyou Lu, Junru Li, and Jiwu Shu, Tsinghua University. In particular, responses must not include new experiments or data, describe additional work completed since submission, or promise additional work to follow. In contrast, CLP achieves significantly higher compression ratio than all commonly used compressors, yet delivers fast search performance that is comparable or even better than Elasticsearch and Splunk Enterprise. Starting with small invariant formulas and strongest possible invariants avoids large SMT queries, improving SMT solver performance. GoJournal is implemented in Go, and Perennial is implemented in the Coq proof assistant. Authors must make a good faith effort to anonymize their submissions, and they should not identify themselves or their institutions either explicitly or by implication (e.g., through the references or acknowledgments). These limitations require state-of-the-art systems to distribute training across multiple machines. If the conference registration fee will pose a hardship for the presenter of the accepted paper, please contact conference@usenix.org. As has been standard practice in OSDI and SOSP in recent years, we will allow authors to submit quick responses to PC reviews: they will be made available to the PC before the final online discussion and PC meeting. In this paper, we present Vegito, a distributed in-memory HTAP system that embraces freshness and performance with the following three techniques: (1) a lightweight gossip-style scheme to apply logs on backups consistently; (2) a block-based design for multi-version columnar backups; (3) a two-phase concurrent updating mechanism for the tree-based index of backups. We present application studies for 8 applications, improving requests-per-second (RPS) by 7.7% and reducing RAM usage 2.4%. All the times listed below are in Pacific Daylight Time (PDT). Professor Veloso earned a Bachelor and Master of Science degrees in Electrical and Computer Engineering from Instituto Superior Tecnico in Lisbon, Portugal, a Master of Arts in Computer Science from Boston University, and Master of Science and PhD in Computer Science from Carnegie Mellon University. Graph Neural Networks (GNNs) have gained significant attention in the recent past, and become one of the fastest growing subareas in deep learning. Our evaluation shows that PET outperforms existing systems by up to 2.5, by unlocking previously missed opportunities from partially equivalent transformations. As a member of ACCT, I have served two years on the bylaws and governance committee and two years on the finance and audit committee. Fluffy found two new consensus bugs in the most popular Geth Ethereum client which were exploitable on the live Ethereum mainnet. OSDI '21 Technical Sessions All the times listed below are in Pacific Daylight Time (PDT). This fast path contains programmable hardware support for low latency transport and congestion control as well as hardware support for efficient load balancing of RPCs to cores. Sponsored by USENIX in cooperation with ACM SIGOPS. DMon speeds up PostgreSQL, one of the most popular database systems, by 6.64% on average (up to 17.48%). Finding the inductive invariant of the distributed protocol is a critical step in verifying the correctness of distributed systems, but takes a long time to do even for simple protocols. Jason Mohoney and Roger Waleffe, University of WisconsinMadison; Henry Xu, University of Maryland, College Park; Theodoros Rekatsinas and Shivaram Venkataraman, University of WisconsinMadison. NrOS is primarily constructed as a simple, sequential kernel with no concurrency, making it easier to develop and reason about its correctness. Federated Learning (FL) is an emerging direction in distributed machine learning (ML) that enables in-situ model training and testing on edge data. Hence, CLP enables efficient search and analytics on archived logs, something that was impossible without it. Shaghayegh Mardani, UCLA; Ayush Goel, University of Michigan; Ronny Ko, Harvard University; Harsha V. Madhyastha, University of Michigan; Ravi Netravali, Princeton University. With an aim to improve time-to-accuracy performance in model training, Oort prioritizes the use of those clients who have both data that offers the greatest utility in improving model accuracy and the capability to run training quickly. Sanitizers detect unsafe actions such as invalid memory accesses by inserting checks that are validated during a programs execution. Horcrux-compliant web servers perform offline analysis of all the JavaScript code on any frame they serve to conservatively identify, for every JavaScript function, the union of the page state that the function could access across all loads of that page. Session Chairs: Deniz Altinbken, Google, and Rashmi Vinayak, Carnegie Mellon University, Tanvir Ahmed Khan and Ian Neal, University of Michigan; Gilles Pokam, Intel Corporation; Barzan Mozafari and Baris Kasikci, University of Michigan. Submitted papers must be no longer than 12 single-spaced 8.5 x 11 pages, including figures and tables, plus as many pages as needed for references, using 10-point type on 12-point (single-spaced) leading, two-column format, Times Roman or a similar font, within a text block 7 wide x 9 deep. Welcome to the 2021 USENIX Annual Technical Conference (ATC '21) submissions site! Professor Veloso has been recognized with a multiple honors, including being a Fellow of the ACM, IEEE, AAAS, and AAAI. Session Chairs: Ryan Huang, Johns Hopkins University, and Manos Kapritsos, University of Michigan, Jianan Yao, Runzhou Tao, Ronghui Gu, Jason Nieh, Suman Jana, and Gabriel Ryan, Columbia University. This post is for recording some notes from a few OSDI'21 papers that I got fun. He joined Intel Research at Berkeley in April 2002 as a principal architect of PlanetLab, an open, shared platform for developing and deploying planetary-scale services. All deadline times are 23:59 hrs UTC. If you are uncertain about how to anonymize your submission, please contact the program co-chairs, osdi21chairs@usenix.org, well in advance of the submission deadline. However, Addra improves message latency in this architecture, which is a key performance metric for voice calls. PET discovers and applies program transformations that improve computation efficiency but only maintain partial functional equivalence. In this paper, we propose a software-hardware co-design to support dynamic, fine-grained, large-scale secure memory as well as fast-initialization. Last year, 70% of accepted OSDI papers participated in the . This year, there were only 2 accepted papers from UK institutes. USENIX Security '21 has three submission deadlines. Commonly used log archival and compression tools like Gzip provide high compression ratio, yet searching archived logs is a slow and painful process as it first requires decompressing the logs. Marius is open-sourced at www.marius-project.org. While several new GNN architectures have been proposed, the scale of real-world graphsin many cases billions of nodes and edgesposes challenges during model training. Camera-ready submission (all accepted papers): 2 April 2021; Main conference program: 27-28 April 2021; All deadline times are . Mothy's current research centers on Enzian, a powerful hybrid CPU/FPGA machine designed for research into systems software. The 20th ACM Workshop on Hot Topics in Networks (HotNets 2021) will bring together researchers in computer networks and systems to engage in a lively debate on the theory and practice of computer networking. Session Chairs: Moshe Gabel, University of Toronto, and Joseph Gonzalez, University of California, Berkeley, John Thorpe, Yifan Qiao, Jonathan Eyolfson, and Shen Teng, UCLA; Guanzhou Hu, UCLA and University of Wisconsin, Madison; Zhihao Jia, CMU; Jinliang Wei, Google Brain; Keval Vora, Simon Fraser; Ravi Netravali, Princeton University; Miryung Kim and Guoqing Harry Xu, UCLA. Instead of choosing among a small number of known algorithms, our approach searches in a "policy space" of fine-grained actions, resulting in novel algorithms that can outperform existing algorithms by specializing to a given workload. Prior or concurrent workshop publication does not preclude publishing a related paper in OSDI. However, with the increasingly speedy transactions and queries thanks to large memory and fast interconnect, commodity HTAP systems have to make a tradeoff between data freshness and performance degradation. Unfortunately, because devices lack the semantic information about which I/O requests are latency-sensitive, these heuristics can sometimes lead to disastrous results. Manuela will present examples and discuss the scope of AI in her research in the finance domain. DeSearch uses trusted hardware to build a network of workers that execute a pipeline of small search engine tasks (crawl, index, aggregate, rank, query). J.P. Morgan AI Research partners with applied data analytics teams across the firm as well as with leading academic institutions globally. There is no explicit limit to the response, but authors are strongly encouraged to keep it under 500 words; reviewers are neither required nor expected to read excessively long responses. When uploading your OSDI 2021 reviews for your submission to SOSP, you can optionally append a note about how you addressed the reviews and comments. The symposium emphasizes innovative research as well as quantified or insightful experiences in systems design and implementation. Distributed Trust: Is Blockchain the answer? The ZNS+ also allows each zone to be overwritten with sparse sequential write requests, which enables the LFS to use threaded logging-based block reclamation instead of segment compaction. Session Chairs: Gennady Pekhimenko, University of Toronto / Vector Institute, and Shivaram Venkataraman, University of WisconsinMadison, Aurick Qiao, Petuum, Inc. and Carnegie Mellon University; Sang Keun Choe and Suhas Jayaram Subramanya, Carnegie Mellon University; Willie Neiswanger, Petuum, Inc. and Carnegie Mellon University; Qirong Ho, Petuum, Inc.; Hao Zhang, Petuum, Inc. and UC Berkeley; Gregory R. Ganger, Carnegie Mellon University; Eric P. Xing, MBZUAI, Petuum, Inc., and Carnegie Mellon University. This yielded 6% fewer TLB miss stalls, and 26% reduction in memory wasted due to fragmentation. Submissions violating the detailed formatting and anonymization rules will not be considered for review. Kernel code requires manual memory management and type-unsafe code and must efficiently handle complex, asynchronous events. Consensus bugs are bugs that make Ethereum clients transition to incorrect blockchain states and fail to reach consensus with other clients. Today, privacy controls are enforced by data curators with full access to data in the clear. For conference information, . Some recent schedulers choose job resources for users, but do so without awareness of how DL training can be re-optimized to better utilize the provided resources. These scripts often make pages slow to load, partly due to a fundamental inefficiency in how browsers process JavaScript content: browsers make it easy for web developers to reason about page state by serially executing all scripts on any frame in a page, but as a result, fail to leverage the multiple CPU cores that are readily available even on low-end phones. blk-switch uses this insight to adapt techniques from the computer networking literature (e.g., multiple egress queues, prioritized processing of individual requests, load balancing, and switch scheduling) to the Linux kernel storage stack. The key insight guiding our design is computation separation. Dorylus is up to 3.8 faster and 10.7 cheaper compared to existing sampling-based systems. Distributed systems are notoriously hard to implement correctly due to non-determinism. We also propose two file system techniques for ZNS+-aware LFS. Zeph enforces privacy policies cryptographically and ensures that data available to third-party applications complies with users' privacy policies. Existing decentralized systems like Steemit, OpenBazaar, and the growing number of blockchain apps provide alternatives to existing services. Conference site 49 papers accepted out of 251 submitted. With the help of thousands of Lambda threads, Dorylus scales GNN training to billion-edge graphs. The blockchain community considers this hard fork the greatest challenge since the infamous 2016 DAO hack. Session Chairs: Nadav Amit, VMware Research Group, and Ada Gavrilovska, Georgia Institute of Technology, Stephen Ibanez, Alex Mallery, Serhat Arslan, and Theo Jepsen, Stanford University; Muhammad Shahbaz, Purdue University; Changhoon Kim and Nick McKeown, Stanford University. The NVMe zoned namespace (ZNS) is emerging as a new storage interface, where the logical address space is divided into fixed-sized zones, and each zone must be written sequentially for flash-memory-friendly access. For instance, the following are not sufficient grounds to specify a conflict with a PC member: they have reviewed the work before, they are employed by your competitor, they are your personal friend, they were your post-doc advisor or advisee, or they had the same advisor as you. To adapt to different workloads, prior works mix or switch between a few known algorithms using manual insights or simple heuristics. The co-chairs may then share that paper with the workshops organizers and discuss it with them. You must not improperly identify a PC member as a conflict if none of these three circumstances applies, even if for some other reason you want to avoid them reviewing your paper. How can we design systems that will be reliable despite misbehaving participants? PC members are not required to read supplementary material when reviewing the paper, so each paper should stand alone without it. Authors should email the program co-chairs, osdi21chairs@usenix.org, a copy of the related workshop paper and a short explanation of the new material in the conference paper beyond that published in the workshop version. To this end, we propose GNNAdvisor, an adaptive and efficient runtime system to accelerate various GNN workloads on GPU platforms. Our evaluation shows that DistAI successfully verifies 13 common distributed protocols automatically and outperforms alternative methods both in the number of protocols it verifies and the speed at which it does so, in some cases by more than two orders of magnitude. By monitoring the status of each job during training, Pollux models how their goodput (a novel metric we introduce that combines system throughput with statistical efficiency) would change by adding or removing resources. In particular, I'll argue for re-engaging with what computer hardware really is today and give two suggestions (among many) about how the OS research community can usefully do this, and exploit what is actually a tremendous opportunity. Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. Academic and industrial participants present research and experience papers that cover the full range of theory and practice of computer . OSDI will provide an opportunity for authors to respond to reviews prior to final consideration of the papers at the program committee meeting. For example, optimistic concurrency control (OCC) is better than two-phase-locking (2PL) under low contention, while the converse is true under high contention. Using this property, MAGE calculates the memory access pattern ahead of time and uses it to produce a memory management plan. In this talk, I'll speculate on how we came to this unfortunate state of affairs, and what might be done to fix it. Evaluation on a four-node machine with Optane DC Persistent Memory shows that Nap can improve the throughput by up to 2.3 and 1.56 under write-intensive and read-intensive workloads, respectively. Extensive experiments show that GNNAdvisor outperforms the state-of-the-art GNN computing frameworks, such as Deep Graph Library (3.02 faster on average) and NeuGraph (up to 4.10 faster), on mainstream GNN architectures across various datasets. The wire-to-wire RPC response time through the nanoPU is just 69ns, an order of magnitude quicker than the best-of-breed, low latency, commercial NICs. The papers will be available online to everyone beginning on the first day of the conference, July 14, 2021. Authors are also encouraged to contact the program co-chairs, osdi21chairs@usenix.org, if needed to relate their OSDI submissions to relevant submissions of their own that are simultaneously under review or awaiting publication at other venues. Please identify yourself as a presenter and include your mailing address in your email. The chairs may reject abstracts or papers on the basis of egregious missing or extraneous conflicts. This formulation of memory management, which we call memory programming, is a generalization of paging that allows MAGE to provide a highly efficient virtual memory abstraction for SC. The 15th USENIX Symposium on Operating Systems Design and Implementation seeks to present innovative, exciting research in computer systems. Consensus bugs are extremely rare but can be exploited for network split and theft, which cause reliability and security-critical issues in the Ethereum ecosystem. When further combined with a simple caching strategy, our evaluation shows that P3 is able to outperform existing state-of-the-art distributed GNN frameworks by up to 7. After three years working on web-based collaboration systems at a startup in North Carolina, he joined Sprint's Advanced Technology Lab in Burlingame, California, in 1998, working on cloud computing and network monitoring. Existing frameworks optimize tensor programs by applying fully equivalent transformations, which maintain equivalence on every element of output tensors. This motivates the need for a new approach to data privacy that can provide strong assurance and control to users. Researchers from the Software Systems Laboratory bagged Best Paper Awards at the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2021) and the 2021 USENIX Annual Technical Conference (USENIX ATC 2021).. Jay Lepreau Best Paper Award, OSDI'21. While verifying GoJournal, we found one serious concurrency bug, even though GoJournal has many unit tests. Mothy joined the Computer Science Department ETH Zurich in January 2007 and was named Fellow of the ACM in 2013 for contributions to operating systems and networking research. Sam Kumar, David E. Culler, and Raluca Ada Popa, University of California, Berkeley. This paper presents the design and implementation of CLP, a tool capable of losslessly compressing unstructured text logs while enabling fast searches directly on the compressed data. Notification of conditional accept/reject for revisions: 3 March 2022. She developed the technology for making network routing self-stabilizing, largely self-managing, and scalable. This paper demonstrates that it is possible to achieve s-scale latency using Linux kernel storage stack, even when tens of latency-sensitive applications compete for host resources with throughput-bound applications that perform read/write operations at throughput close to hardware capacity. Submitted November 12, 2021 Accepted January 20, 2022. Computation separation makes it possible to construct a deep, bounded-asynchronous pipeline where graph and tensor parallel tasks can fully overlap, effectively hiding the network latency incurred by Lambdas. The chairs will review paper conflicts to ensure the integrity of the reviewing process, adding or removing conflicts if necessary. Existing algorithms are designed to work well for certain workloads. Reviews will be available for response on Wednesday, March 3, 2021. Erhu Feng, Xu Lu, Dong Du, Bicheng Yang, and Xueqiang Jiang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Yubin Xia, Binyu Zang, and Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China. (Oct 2018) Awarded an Intel Faculty Grant for Research on automated performance optimization (Sep. 2018) Our paper on Foreshadow is accepted to appear at USENIX Security. Performance experiments show that GoNFS provides similar performance (e.g., at least 90% throughput across several benchmarks on an NVMe disk) to Linuxs NFS server exporting an ext4 file system, suggesting that GoJournal is a competitive journaling system. If your paper is accepted and you need an invitation letter to apply for a visa to attend the conference, please contact conference@usenix.org as soon as possible. As a result, the design of a file system with respect to space management and crash consistency is simplified, requiring only 10.8K LOC for full functionality. Mingyu Li, Jinhao Zhu, and Tianxu Zhang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Cheng Tan, Northeastern University; Yubin Xia, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Sebastian Angel, University of Pennsylvania; Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China. Paper Submission Information All submissions must be received by 11:59 PM AoE (UTC-12) on the day of the corresponding deadline. She has a PhD in computer science from MIT. Web pages today commonly include large amounts of JavaScript code in order to offer users a dynamic experience. Fortunately, we observe that the backups for high availability in modern distributed OLTP systems can be retrofitted to bridge the analytical queries and transactions in HTAP workloads.