in-toto: Providing farm-to-table guaranteesfor bits and bytesSantiago Torres-Arias, New York University; Hammad Afzali, New Jersey Institute ofTechnology; Trishank Karthik Kuppusamy, Datadog; Reza Curtmola, New Jersey Institute ofTechnology; Justin Cappos, New York security19/presentation/torres-ariasThis paper is included in the Proceedings of the28th USENIX Security Symposium.August 14–16, 2019 Santa Clara, CA, USA978-1-939133-06-9Open access to the Proceedings of the28th USENIX Security Symposiumis sponsored by USENIX.

in-toto: Providing farm-to-table guarantees for bits and bytesSantiago Torres-Arias† ,[email protected] Afzali‡ ,[email protected] Karthik Kuppusamy ,[email protected] Curtmola‡ ,[email protected] Cappos†[email protected]† NewYork University, Tandon School of Engineering Datadog‡ Department of Computer Science, New Jersey Institute of TechnologyAbstractThe software development process is quite complexand involves a number of independent actors. Developerscheck source code into a version control system, the codeis compiled into software at a build farm, and CI/CD systemsrun multiple tests to ensure the software’s quality among amyriad of other operations. Finally, the software is packagedfor distribution into a delivered product, to be consumed byend users. An attacker that is able to compromise any singlestep in the process can maliciously modify the software andharm any of the software’s users.To address these issues, we designed in-toto, a framework that cryptographically ensures the integrity of thesoftware supply chain. in-toto grants the end user theability to verify the software’s supply chain from the project’sinception to its deployment. We demonstrate in-toto’seffectiveness on 30 software supply chain compromisesthat affected hundreds of million of users and showcasein-toto’s usage over cloud-native, hybrid-cloud and cloudagnostic applications. in-toto is integrated into products andopen source projects that are used by millions of people daily.The project website is available at: software is built through a complex series of stepscalled a software supply chain. These steps are performedas the software is written, tested, built, packaged, localized,obfuscated, optimized, and distributed. In a typical softwaresupply chain, these steps are “chained” together to transform(e.g., compilation) or verify the state (e.g., the code quality)of the project in order to drive it into a delivered product,i.e., the finished software that will be installed on a device.Usually, the software supply chain starts with the inclusionof code and other assets (icons, documentation, etc.) in aversion control system. The software supply chain ends withthe creation, testing and distribution of a delivered product.Securing the supply chain is crucial to the overall securityof a software product. An attacker who is able to controlany step in this chain may be able to modify its output formalicious reasons that can range from introducing backdoorsin the source code to including vulnerable libraries in thedelivered product. Hence, attacks on the software supplychain are an impactful mechanism for an attacker to affectmany users at once. Moreover, attacks against steps of thesoftware supply chain are difficult to identify, as they misuseprocesses that are normally trusted.Unfortunately, such attacks are common occurrences,have high impact, and have experienced a spike in recentUSENIX Associationyears [60, 129]. Attackers have been able to infiltrateversion control systems, including getting commit accessto the Linux kernel [58] and Gentoo Linux [76], stealingGoogle’s search engine code [22], and putting a backdoorin Juniper routers [48, 96]. Popular build systems, such asFedora, have been breached when attackers were able to signbackdoored versions of security packages on two differentoccasions [75, 123]. In another prominent example, attackersinfiltrated the build environment of the free computer-cleanuptool CCleaner, and inserted a backdoor into a build thatwas downloaded over 2 million times [126]. Furthermore,attackers have used software updaters to launch attacks, withMicrosoft [108], Adobe [95], Google [50, 74, 140], and Linuxdistributions [46, 143] all showing significant vulnerabilities.Perhaps most troubling are several attacks in which nationstates have used software supply chain compromises to targettheir own citizens and political enemies [35,55,82,92,93,108,127,128,138]. There are dozens of other publicly disclosed instances of such attacks ly, supply chain security strategies are limited to securing each individual step within it. For example, Git commitsigning controls which developers can modify a repository [78], reproducible builds enables multiple parties tobuild software from source and verify they received the sameresult [25], and there are a myriad of security systems thatprotect software delivery [2, 20, 28, 100, 102]. These buildingblocks help to secure an individual step in the process.Although the security of each individual step is critical,such efforts can be undone if attackers can modify the outputof a step before it is fed to the next one in the chain [22, 47].These piecemeal measures by themselves can not stopmalicious actors because there is no mechanism to verifythat: 1) the correct steps were followed and 2) that tamperingdid not occur in between steps. For example a web servercompromise was enough to allow hackers to redirect userdownloads to a modified Linux Mint disk image, eventhough every single package in the image was signed andthe image checksums on the site did not match. Thoughthis was a trivial compromise, it allowed attackers to builda hundred-host botnet in a couple of hours [146] due to thelack of verification on the tampered image.In this paper we introduce in-toto, Latin for “as a whole,”the first framework that holistically enforces the integrityof a software supply chain by gathering cryptographicallyverifiable information about the chain itself. To achievethis, in-toto requires a project owner to declare and sign a28th USENIX Security Symposium1393

layout of how the supply chain’s steps need to be carried out,and by whom. When these steps are performed, the involvedparties will record their actions and create a cryptographicallysigned statement — called link metadata — for the step theyperformed. The link metadata recorded from each step can beverified to ensure that all steps were carried out appropriatelyand by the correct party in the manner specified by the layout.The layout and collection of link metadata tightly connectthe inputs and outputs of the steps in such a chain, whichensures that tampering can not occur between steps. The layout file also defines requirements (e.g., Twistlock [30] mustnot indicate that any included libraries have high severityCVEs) that will be enforced to ensure the quality of the endproduct. These additions can take the form of either distinctcommands that must be executed, or limitations on whichfiles can be altered during that step (e.g., a step that localizesthe software’s documentation for Mexican Spanish must notalter the source code). Collectively, these requirements canminimize the impact of a malicious actor, drastically limitingthe scope and range of actions such an attacker can perform,even if steps in the chain are compromised.We have built a series of production-ready implementationsof in-toto that have now been integrated across severalvendors. This includes integration into cloud vendors suchas Datadog and Control Plane, to protect more than 8,000cloud deployments. Outside of the cloud, in-toto is usedin Debian to verify packages were not tampered with as partof the reproducible builds project [25]. These deploymentshave helped us to refine and validate the flexibility andeffectiveness of in-toto.Finally, as shown by our security analysis of three in-totodeployments, in-toto is not a “lose-one, lose-all” solution,in that its security properties only partially degrade witha key compromise. Depending on which key the attackerhas accessed, in-toto’s security properties will vary.Our in-toto deployments could be used to address most(between 83% - 100%) historical supply chain attacks.2Definitions and Threat ModelThis section defines the terms we use to discuss the softwaresupply chain and details the specific threat model in-totowas designed to defend against.2.1DefinitionsThe software supply chain refers to the series of stepsperformed in order to create and distribute a deliveredproduct. A step is an operation within this chain that takes inmaterials (e.g., source code, icons, documentation, binaries,etc.) and and creates one or more products (e.g., libraries,software packages, file system images, installers, etc.). Werefer to both materials and products generically as artifacts.It is common to have the products of one step be usedas materials in another step, but this does not mean that asupply chain is a sequential series of operations in practice.Depending on the specifics of a supply chain’s workflow,steps may be executed in sequence, in parallel, or as acombination of both. Furthermore, steps may be carried out139428th USENIX Security Symposiumby any number of hosts, and many hosts can perform thesame step (e.g., to test a step’s reproducibility).In addition to the materials and products, a step in thesupply chain produces another key piece of information,byproducts. The step’s byproducts are things like the STDOUT,STDERR, and return value that indicate whether a step wassuccessful or had any problems. For example, a step that runsunit tests may return a non-zero code if one of the unit testsfails. Validating byproducts is key to ensuring that steps ofthe supply chain indicate that the software is ready to use.As each step executes, information called link metadatathat describes what occured, is generated. This containsthe materials, products, and byproducts for the step. Thisinformation is signed by a key used by the party whoperforms the action, which we call a functionary. Regardlessof whether the functionary commits code, builds software,performs QA, localizes documentation, etc., the same linkmetadata structure is followed. Sometimes a functionary’sparticipation involves repeated human action, such as adeveloper making a signed git commit for their latest codechanges. In other cases, a functionary may participate inthe supply chain in a nearly autonomous manner after setup,such as a CI/CD system. Further, many functionaries can betasked to perform the same step for the sake of redundancyand a minimum threshold of them may be required to agreeon the result of a step they all carried out.To tie all of the pieces together, the project owner setsup the rules for the steps that should be performed in asoftware supply chain. In essence, the project owner servesas the foundation of trust, stating which steps should beperformed by which functionaries, along with specifyingrules for products, byproducts, and materials in a file calledthe layout. The layout enables a client that retrieves thesoftware to cryptographically validate that all actions wereperformed correctly. In order to make this validation possible,a client is given the delivered product, which contains thesoftware, layout, and link metadata. The layout also containsany additional actions besides the standard verificationof the artifact rules to be performed by the client. Theseactions, called inspections, are used to validate softwareby further performing operations on the artifacts inside thedelivered product (e.g., verifying no extraneous files areinside a zip file). This way, through standard verificationand inspections, a client can assure that the software wentthrough the appropriate software supply chain processes.2.2Threat ModelThe goal of in-toto is to minimize the impact of a partythat attempts to tamper with the software supply chain. Morespecifically, the goal is to retain the maximum amount ofsecurity that is practical, in any of the following scenarios:Interpose between two existing elements of the supplychain to change the input of a step. For example, anattacker may ask a hardware security module to signa malicious copy of a package before it is added to therepository and signed repository metadata is created toindex it [27, 44, 51, 76, 107, 120, 120, 147].USENIX Association

Act as a step (e.g., compilation), perhaps by compromising or coercing the party that usually performs thatstep [27, 57, 62, 64, 76, 81, 99, 112, 125]. For example,a hacked compiler could insert malicious code intobinaries it produces [126, 136].Provide a delivered product for which not all steps havebeen performed. Note that this can also be a result of anhonest mistake [37, 49, 56, 68, 73, 97, 142].Include outdated or vulnerable elements in the supplychain [59, 61, 91, 94, 117]. For example, an attacker couldbundle an outdated compression library that has manyknown exploits.Provide a counterfeit version of the delivered productto users [8, 35, 66, 70, 71, 95, 118, 134, 135, 146]. Thissoftware product can come from any source and besigned by any keys. While in-toto will not mandatehow trust is bootstrapped, Section 6 will show how otherprotocols such as TUF [28], as well as popular packagemanagers [2] can be used to bootstrap project owner keys.Key Compromise. We assume that the public keys ofproject owners are known to the verifiers and that the attackeris not able to compromise the corresponding secret key. In addition, private keys of developers, CI systems and other infrastructure public keys are known to a project owner and their corresponding secret keys are not known to the attacker. In section 5.2, we explore additional threat models that result fromdifferent degrees of attacker access to the supply chain, including access to infrastructure and keys (both online and offline).2.3Security GoalsTo build a secure software supply chain that can combatthe aforementioned threats, we envision that the followingsecurity goals would need to be achieved:supply chain layout integrity: All of the steps definedin a supply chain are performed in the specified order.This means that no steps can be added or removed, andno steps can be reordered.artifact flow integrity: All of the artifacts created, transformed, and used by steps must not be altered in-betweensteps. This means that if step A creates a file foo.txtand step B uses it as a material, step B must use the exact file foo.txt created by step A. It must not use, forexample, an earlier version of the file created in a priorrun.step authentication: Steps can only be performed by theintended parties. No party can perform a step unless it isgiven explicit permission to do so. Further, no deliveredproducts can be released unless all steps have been performed by the right party (e.g., no releases can be madewithout a signoff by a release engineer, which would stopaccidental development releases [68]).implementation transparency: in-toto should not require existing supply chains to change their practices inorder to secure them. However, in-toto can be usedto represent the existing supply chain configuration andreason about its security practices.USENIX Associationgraceful degradation of security properties: in-totoshould not lose all security properties in the event ofkey compromise. That is, even if certain supply chainsteps are compromised, the security of the system is notcompletely undermined.In addition to these security goals, in-toto is also gearedtowards practicality and, as such, it should maintain minimaloperational, storage and network overheads.3System overviewThe current landscape of software supply chain security isfocused on point-solutions that ensure that an individualstep’s actions have not been tampered with. This limitationusually leads to attackers compromising a weaker step inthe chain (e.g., breaking into a buildfarm [115]), removingsteps from the chain [68] or tampering with artifacts whilein transit (i.e., adding steps to the chain [66]). As such, weidentify two fundamental limitations of current approachesto secure the software supply chain:1. Point solutions designed to secure individual supplychain steps cannot guarantee the security of the entirechain as a whole.2. Despite the widespread use of unit testing tools andanalysis tools, like fuzzers and static analyzers, softwarerarely (if ever) includes information about what toolswere run or their results. So point solutions, even if used,provide limited protection because information aboutthese tools is not appropriately utilized or even shownto clients who can make decisions about the state of theproduct they are about to utilize.We designed in-toto to address these limitations byensuring that all individual measures are applied, and by theright party in a cryptographically verifiable fashion.In concrete terms, in-toto is a framework to gather andverify metadata about different stages of the supply chain,from the first step (e.g., checking-in code on a version controlsystem) to delivered product (e.g., a .deb installable package).If used within a software supply chain, in-toto ensures thatthe aforementioned security goals are achieved.3.1in-toto parties and their rolesSimilar to other modern security systems [101, 102, 121],in-toto uses security concepts like delegations and rolesto limit the scope of key compromise and provide a gracefuldegradation of its security properties.In the context of in-toto, a role is a set of duties andactions that an actor must perform. The use of delegationsand roles not only provides an important security function(limiting the impact of compromise and providing separationof privilege), but it also helps the system remain flexibleand usable so that behaviors like key sharing are not needed.Given that every project uses a very specific set of tools andpractices, flexibility is a necessary requirement for in-toto.There are three roles in the framework:Project Owner: The project owner is the party in chargeof defining the software supply chain layout (i.e., define28th USENIX Security Symposium1395

Figure 1: Graphical depiction of the software supply chain with in-toto elements added. The project owner creates a layout with three steps, each of whichwill be performed by a functionary. Notice how the tag step creates foo.c anda localization file foo.po, which are fed to different steps down the chain.which steps must be performed and by who). In practice,this would be the maintainer of an open-source projector the dev-ops engineers of a project.Functionaries: Functionaries are the parties that performthe steps within the supply chain, and provide anauthenticated record of the artifacts used as materialsand the resulting products. Functionaries can be humanscarrying out a step (e.g., signing off a security audit) oran automated system (e.g., a build farm).Client: (e.g., end user): The client is the party that willinspect and afterwards utilize a delivered product.We will now elaborate on how these three parties interactwith the components of in-toto.3.2in-toto componentsin-toto secures the software supply chain by using three different types of information: the software supply chain layout(or layout, for short), link metadata, and the delivered product.Each of these has a unique function within in-toto.3.2.1The supply chain layoutLaying out the structure of the supply chain allows the developers and maintainers of a project to define requirements forsteps involved in source code writing, testing, and distributionwithin a software product’s lifecycle. In the abstract sense,this supply chain layout is a recipe that identifies which stepswill be performed, by whom, and in what order.The supply chain layou