A New Way to Build and Collaborate

You may already be skeptical - “Why do we need yet another competing solution for building our code?”. The concern that this would further fragment our community is valid. I have spent a majority of my career as a software engineer advocating to not reinvent what can instead be borrowed and extended from others. This has made the process of presenting my ideas for this project a challenge. At first glance, it seems like the former is a direct contradiction to the latter. However, I believe we are at a pivotal time in the history of C++ that makes this the perfect time to transition to a new system. I think you will find that these ideas presented have the possibility to create a solution that will resolve many issues plaguing our community today, with minimal impact to existing code.

With the release of C++ 20 this year, we will finally be getting our hands on the long awaited (and controversial) Modules support. This feature will allow C++ builds to finally have a clean binary separation between individual projects. This in turn, will open the door to fixing many of the problems present in building and sharing C++ code today. At the same time, migrating our code to support Modules will require a substantial amount of work that will break backward compatibility with legacy code bases. This is the ideal time to consider a major shift in what tooling we use as a community.

In the remaining sections of this document, I outline the key issues present in building and sharing code today. Then, I present a design for a new build system, that leverages Modules at its core, to create a new way of collaborating around the open source community.

Beyond the normal complexities of most modern programming languages, C++ has three primary aspects that make it especially hard to build and share libraries with others. 1) It has a single specification with multiple compiler implementations. 2) It is a compiled language. 3) It inherited the C preprocessor.

Specification

Unlike many other languages available today, C++ does not have a first party compiler and instead exists only as a specification. This affords us the opportunity to have multiple compilers from different vendors, and allows for targeting a large variety of architectures (which is a major strength!). This also means that in order to share code with the C++ community as a whole, one would have to navigate around platform specific logic, and have a unique setup for each compiler to ensure the build works correctly on all systems. Although this isn’t too difficult for a good build system to handle, it does require some integration work to support new compiler vendors. This is the area that has seen the largest improvements to C++ in the last decade. The continued evolution of the Standard Library specification as an abstraction over common platform functionality has greatly reduced, but not entirely eliminated, the complexity for developing cross platform solutions.

The build system that is proposed here will incorporate an abstraction layer over the individual compiler implementations. By using an extensibility framework it will make it easy to integrate with any conceivable platform. It is also the hope that by creating a system that works well as a fully featured package manager the community will create sharable platform abstraction layers of their own to augment and go beyond what is capable with the Standard Library.

Compiled

The complexity of having such a wide array of compiler implementations is compounded by the fact that C++ is compiled directly to the assembly for the target machine that will execute the code. C++ puts no constraints on how a compiler does this mapping. As a result, the Application Binary Interface (ABI) between two compilers (and sometimes between versions of the same compiler) are often not compatible with each other. This requires that we must ensure all generated objects are produced using the same compiler, or that special care be taken to work around these incompatibilities using strict design practices.

The build system that I am proposing will circumvent this issue entirely by only using raw source when building sharable components. This way the library authors can assume the same compiler will be used when being consumed by other downstream projects and ABI compatibility issues will not be an issue. I hope to also incorporate crowd sourced metrics to track incompatibilities in different platforms and compilers to help detect issues early and notify library authors of possible bugs.

Preprocessor

The C preprocessor was, until now, a point of failure that could not be protected against by any build system when integrating with external projects. Until C++ 20 the only way to share a symbol was to place a declaration in a shared header file that would be included by both the implementation and all of the translation units that wish to consume it. However, when a header file is included with a different set of preprocessor definitions, between usage and implementation, problems often occur. These header files can “leak” their internal definitions into consumer code or the consumer could inadvertently change the declarations within by defining an erroneous macro. At best, this will result in a compiler or linker error. At worst, it will result in a fun one definition rule violation or other subtle runtime error.

By utilizing Modules as a clean binary separation between individual projects we can eliminate the possibility of accidentally introducing preprocessor related issues. The proposed build system will explicitly limit the interactions between individual projects down to a single module interface layer to enforce this clean separation.

BONUS - Language Version

A major concern with sharing code between different projects is incompatible language standards. It is relatively straightforward to pull most code that targets earlier versions of the language into a project with a newer version; however, the inverse is not true. For projects that are large enough, a lot of work is put into using conditional compilation to ensure compatibility for all supported language versions. This is another instance where C++ Modules has the ability to be a vehicle to solve this problem in a generic way. By using the Binary Interface layer to allow for inter-module libraries, we could conceptually allow for different language versions internal to the individual projects that still share a compatible interface layer. The best we can do for now is create a build system and ecosystem that utilizes Modules as an inter-project communication channel and hope that the standards committee creates subsequent versions of the language that introduce breaking changes to the language in such a way that maintains the binary interface layer compatibility.

Proposal

Modules will not solve all of our problems by itself. It is necessary for us to also define and create a build system with a clear set of priorities to take full advantage of the new functionality. In the remainder of this document, I outline the Requirements and Goals for this new proposed build system, and give a brief overview of it’s core design.

Requirements

The following set of requirements cannot be compromised. The order does not indicate a priority; but, the final system would be deemed a failure if we are unable to fulfill any one of them.

1) Reproducible - Core to any build system is the requirement that builds be deterministic and reproducible. No matter how well a system is designed and implemented, teams will not be able to utilize it unless they can trust that it will always produce the same result independent of who builds it, where they build it and when.

2) Extensible - A build system should be able to support the requirements of all projects. It should strive to work “out of the box” for a majority of scenarios, but must have an extensibility framework that allows build architects to write their own custom build logic when the built in functionality does not meet their needs.

3) Isolation - This is a uniquely important requirement for C++. This is a direct result of the issues present in the language issues outlined above. Isolated builds requires that one project cannot influence or be influenced by another build, intentionally or by accident, except through explicit structured channels.

Goals

While the goals are not hard requirements, they are always kept at the forefront when making any design or implementation decision. These items are in priority order:

1) Collaborative - Writing code is very rarely done in isolation. The largest goal for this build system is to be able to work seamlessly within a team and with external dependencies.

2) Simple - When fulfilling the above requirements the secondary priority is always simplicity and usability. This means that the standard user will get the best experience possibly for both setup and usage. Some extra complexity is allowed in exchange for performance gains in the internal implementation and the extensibility framework.

3) Fast - The inner developer loop is very important to the productivity of an engineer. To this end, the build system should focus heavily on the performance of an incremental build and, to a lesser extent, ensure the full build is as fast as possible.

4) Secure - By its nature, an extensible framework opens itself up to security concerns when executing arbitrary code written by external teams. While this is the same concern present when consuming any open source project, and the community should take care to use only trusted sources for the projects their use, the build system will limit the functionality available to the build runtime to prevent access to the developers machine when not required.

5) Customizable - How a project is built is often a matter of personal preference (or legacy requirement). Where allowable, the build system should be customizable to allow for overriding default settings so it does not conflict with the ability to easily build single projects as a part of the greater ecosystem.

Design

This build system, called Soup, will utilize a declarative Recipe file as an easy to understand definition for an individual Package. This file will be the primary way to tell Soup about your project. The core command line application will be used to invoke the build and provide extra configuration parameters. Internally, Soup uses a Task execution engine to run build Tasks in their requested order and exposes a registration mechanism to allow for C++ “Extension” Dynamic Libraries to run arbitrary code during the build. The Tasks are expected to generate a Directed Acyclic Graph (DAG) of build Operations that make up the actual build. Theses Operations will be executed to produce the final build result. The primary design consists of four key components: the command line application, the build definition, the build engine, and the package manager.

Application

The Command Line Interface (CLI) is the first thing a user will see when they interact with the Soup build system. The CLI is primarily there to take user input through a set of parameters and flags to pass temporary configuration values into the build execution. While important, it is fairly straightforward to design and will be left open to evolve through use.

Definition

The build definition, which will be implemented through a declarative Recipe configuration file is how the user will configure their project. The Recipe file will utilize the toml language as a clean, human readable, configuration definition that supports a core set of data types. The file can be thought of as a simple property bag for getting shared parameters passed into the build system for an individual package. There are a few “known” property values that will be used within the build engine itself; however, the entire contents will be provided as initial input to the build engine.

Engine

The build Engine is responsible for recursively building all transitive dependencies, facilitating the registration and execution of build Tasks, and executing all requirement build Operations. All build logic will be contained in Tasks and all build execution will be performed in Operations. Having this extra layer of separation between the build evaluation and the build execution allows for build Extensions to get fast incremental build support for “free” and will allow for future performance improvements without introducing breaking changes into the Extension Framework.

This work can be broken down into five phases:

Parse Recipe - The Recipe toml file is read from disk and parsed into a property bag.
Build Dependencies - The Engine will use the known property lists “Dependencies” and “DevDependencies” to recursively build all transitive runtime and development dependencies starting at phase one. The Engine will maintain a communication channel between parent and children project builds, to allow for passing configuration parameters down and output state back up using a special property bag container.
Build Extensions - The Engine will then discover and invoke the predefined C method that is exported from all registered Extension DLLs. A single predefined Extension DLL, that is distributed with the CLI executable, contains the Tasks that execute the default build logic that will allow for building projects for a majority of scenarios.
Run Tasks - A Build Task will consist of a unique name, lists of other Tasks that must be run before and after, and a single execute entry point. The build Tasks will communicate with the build Engine itself through a strict interface layer to maintain a compatible ABI that will allow the CLI executable to work with the source compiled development dependencies from a different compiler. The build Engine will invoke all registered build Tasks in their requested order as defined by the run Before/After lists. The Tasks can influence each other by reading and writing properties to and from the active state (a shared property bag). A build Task should not actually perform any build commands itself (compile/link/copy/etc.), it will instead generate build Operations which are self contained executable definitions with input/output files.
Run Operations - The final stage of the build is to execute the build Operations that were generated from the build Tasks. These commands contain the executable and parameters to pass in, as well as, the input and output files that will be used to perform incremental builds. There will initially be a very simple time-stamp based incremental build that relies on the compiler generated include list. There is an open question of which project will be used to replace this temporary solution. The current best choices are either BuildXL or possibly Ninja.

Package Manager

You may have noticed that nothing about the build explicitly deals with the integration of a public feed of packages. Because each individual projects build is isolated and self contained, a dependency reference can easily be migrated from a direct directory reference, for local projects, to a name@version pair that will be resolved to a published snapshot of a public project. The CLI application will consume a rest API from a hosted web service that allows for users to install other projects and publish the code they would like to share with ease. The build Engine will then have a small amount of integration logic that knows where to look when resolving dependencies that reference a public package that will be installed to a known location. It should be noted that these public dependency references can be for both runtime and developer dependencies. This will allow for shared packages to contain custom build logic and for the creation of shared build Extensions to augment the built in build Tasks.

Summary

Transitioning the entire C++ community to a new ecosystem of build tooling will require a great deal of effort. However, C++20 presents a unique opportunity to do so. Migrating to take advantage of Modules is a non-trivial breaking change. By aligning this transition with the emergence of a new build system that was designed explicitly for use in a post C++20 era, we can finally get to a place where C++ is an exceptional language for collaborating with others.

Sharing Code