CIQ is an enterprise company that specializes in Linux distribution, computing infrastructure, and cluster management and provisioning systems. They have a deep running commitment to supporting open source projects and creating enterprise-level support for them: They’re the Founding Sponsor of Rocky Linux, an open source operating system that rebuilds sources directly from Red Hat Enterprise Linux (RHEL); they support an open source project called Warewulf that’s a cluster management and provisioning system to help simplify deployment and management of compute clusters; and they supported and donated Singularity (later named to Apptainer) to the Linux Foundation that’s designed to bring containers to high performance computing.
All of CIQ enterprise solutions, which are developed from their support and contributions to the open source projects, are consolidated into one repository, known as a monolithic repository (monorepo). A monorepo allows CIQ to ensure all projects (solutions) are consistent and integrated effectively, simplify dependency management as updates can be made uniformly across the codebase, streamline CI/CD pipelines, and facilitate cross-team collaboration and more. Since thousands of customers rely on their solutions for mission-critical use cases, it’s imperative for them to maintain the monorepo to:
ensure consistency across any environment, architectures, libraries, services, and tools efficiently update and maintain features with no downtime rapidly debug issues and patches across the entire codebase perform comprehensive testing and validation across all projects improve the scalability as the codebase grows
CIQ needed a remote execution service to help dynamically scale their compute resources, optimize build times, and handle increasing demands.
Storage Scalability Challenges Physical disk resizing CIQ previously relied on a solution where storage management was based on a block-level architecture. Data was stored and accessed in fixed-size blocks that’s directly tied to physical storage devices. The only way to increase storage capacity is to physically resize the hard drives. As the data volume increased, they had to consistently resize the disk. This impracticality was time consuming and not scalable. When their previous solution couldn’t handle the demand and scale for their needs, CIQ turned to NativeLink for a more robust and maintainable alternative.
Cloud-based storage The other challenge CIQ faced was integrating with cloud-based distributed storage systems like AWS S3 and Google Cloud Storage (GCS). This obstacle primarily stemmed from legacy storage architecture where they heavily relied on physical disk-based block storage. The block storage architecture couldn’t seamlessly interface with cloud-based distributed storage systems that are designed to be elastic and offer on-demand scaling. This incompatibility made it difficult for CIQ to take advantage of the cloud’s ability to adjust storage capacity based on demand. They had high resource utilization waste and couldn’t fully capitalize on cost performance benefits that cloud storage provides.
Implementing and maintaining NativeLink One of the standout features of NativeLink for the CIQ team was its intuitive codebase housed in a single repository. NativeLink’s logical separation of components enabled CIQ’s engineers to efficiently pinpoint the source of any errors, and each component’s responsibilities were well-defined, allowing for swift problem resolution. The simplicity of NativeLink’s architecture meant that most issues were addressed during the setup phase, minimizing the risk of future errors.
For CIQ, NativeLink is also incredibly low-touch when it comes to maintenance. It provided a significant improvement in managing physical disk storage compared to BuildBarn. Changing the physical disk sizes required adjusting the Persistent Volume Claim (PVC) on Kubernetes (k8s) to reflect the desired storage capacity and then update the storage configuration. This streamlined CIQ’s process to quickly scale the physical disks with minimal effort and adjust eviction policies.
Furthermore, NativeLink integrates seamlessly with cloud-based storage systems, like S3. This enables CIQ to scale their infrastructure up or down with flexibility and elasticity as their needs evolve. They now can efficiently run multiple Content Addressable Storage (CAS) nodes. This flexibility and reliability allowed CIQ to focus on more critical aspects of their operations, knowing that their RBE system was stable and dependable. For them, all of NativeLink’s setup was almost a “fire and forget” operation for the CIQ team.
The results Operational savings - CIQ is able to avoid unnecessary provisioning of resources and can scale their node pool with confidence. In addition, because they no longer worry about the complexity of client-oriented scheduling, higher Bazel jobs that would once overwhelm the system and cause timeouts and other issues are handled with ease. Efficient resource management - NativeLink has allowed CIQ to maintain a consistent spot node pool of workers since deployment. Even with increased activity, the system has remained stable, eliminating the need for constant scaling. Additionally, NativeLink’s ability to handle all actions remotely without the need to download CAS objects to GitHub Actions has provided substantial savings in both time and resources. Reduced CI and deployment times - Since implementing NativeLink, CIQ has been able to reduce their CI times drastically, with average PR CI time from 15 minutes to 3 minutes and full service deployment times dropping from an average of over 1 hour to 15 minutes from code merge.
Conclusion Because of NativeLink, CIQ has been able to eliminate significant technical challenges, see reduced operational costs, and increased efficiency. With NativeLink’s intuitive architecture, ease of use, and minimal maintenance requirements, CIQ has been able to scale with confidence and its engineers empowered to focus on what they do best—building innovative solutions without the distractions of an unreliable RBE system.
To learn more about NativeLink, read our documentation, check out our GitHub Repository, or contact our team directly at hello@NativeLink.com.