Nightly Builds
Bugs affecting more than one service commonly surface only when the whole system is running. Our continuous integration system builds and runs a suite of integration tests against Spinnaker nightly on real cloud provider infrastructure to detect these bugs.
Access the CI system here: https://builds.spinnaker.io
Viewers must be a member of the
build-cops
GitHub Team.
Build Cop
The build cop responsibilities include:
- Triage integration test failures on
master
and the 3 most recent release branches - Clean up orphaned resources across target cloud providers
- Route new GitHub issues to the appropriate SIG (applying GitHub labels as appropriate). You can find the full list of SIGs in the governance repo
- Observe any systemic problems raised in the #general and #dev Slack channels
- Log observations and corrective actions taken in the rotation log
Process Structure
The CI system comprises both jobs, which do a specific task, and flows, which invoke a series of jobs.
<code>Flow_BuildAndValidate</code>
is the primary entry point for the master
branch flow.
Flow_BuildAndValidate_<version>
is the the entry point for the respective top-level release. It is a copy of the primary Flow_BuildAndValidate
when that release was cut. Top-level release flows work off their respective release-1.ABC.x
branches.
As its name implies, Flow_BuildAndValidate
builds and tests the whole Spinnaker system. It follows this general process:
1. Build_PrimaryArtifacts
git checkout
all services- Constructs a BOM from the most recent commit on the target branch
- Builds a Docker container and a Debian package of each Spinnaker microservice.
- Builds additional supporting artifacts:
halyard
spin-cli
- Changelog
- Publishes the BOM under the following names:
- With the floating tag:
<branchName>-latest-unvalidated
(e.g.master-latest-unvalidated
) - With a fixed tag:
<branchName>-<timestamp>
(e.g.master-20191213154039
)
- Publishes the changelog
2. Validate_BomAndReportMultiPlatform
For uninteresting reasons, this job must wrap the following ValidateBomMultiPlatform
in order to aggregate its results.
3. ValidateBomMultiPlatform
This “Multi-configuration project” specifies the same test(s) to run across different environments. This confirms Spinnaker works whether deployed as a single VM or in a Kubernetes cluster, for instance.
- Starts Halyard in a new VM
- Connects to this instance and executes a series of
hal config
steps, including account setup for the managed cloud provider(s). - Deploys the configuration with
hal deploy apply
. - Invokes
<code>citest</code>
integration tests against the new Spinnaker instance.
citest
invokes a command to Spinnaker, and then uses the underlying cloud provider’s CLI to confirm the expected changes were made. For example, usinggcloud
to confirm a GCE server group was created or deleted.
Cleaning Orphaned Resources
Occasionally, integration tests fail in a way that is either undesirable or difficult to automatically clean up. Build cops should periodically ensure these orphaned resources are deleted from the following locations:
- <code>spinnaker-community</code> GCP project
- Instance Groups
named
gcp<testName>-*
- VMs
named
jenkins-validate-bom-*
- Load balancers
named
gcp<testName>-*
- Managed certificates
that are not
builds.spinnaker.io
(!)
- Instance Groups
named
Deleting Obsolete Artifacts
The following jobs assist in removing old artifacts created during the build process:
Troubleshooting Playbook
Check whether the failure happened during the build or the test phase:
Click the failing Flow.
![](troubleshooting - base - 10 - flow.png)
Click for the most recent failing build.
![](troubleshooting - base - 20 - mostRecent.png)
Click through to the failing phase.
![](troubleshooting - base - 30 - phase.png)
Build Failures
The build phase uses many subshells to perform its work in parallel. Use the
Console Output
to help narrow down which step of the build has failed, and use the collected logs to view more information on what specificially went wrong.![](troubleshooting - build - 10 - consoleOutput.png)
The Console Output prints out after each completion how much work is still remaining.
![](troubleshooting - build - 20 - buildSteps.png)
Frequently, the build error will be printed out directly to the Console Output, but sometimes this output can be hard to read. View the raw file directly using the Build Artifacts link from Step 1.
![](troubleshooting - build - 30 - failedOutput.png)
Common Build Failures
Bintray Conflicts
If an artifact is uploaded to the Bintray repository but never published (either because of a transient Bintray error or an interrupted build), you’ll get an error like this:
Bintray API Request ‘create version 0.20.0-20200512192702’ failed with HTTP response 409 Conflict
Follow these steps to delete the artifact and resolve the issue:
Navigate to the specific version in the Bintray repository
Click on the Spinnaker repository that had the failure. (If you don’t see it, click to the next page; there are only 10 items per page for some reason.)
Click on the specific version that had the issue.
Click “Actions” in the upper right and select “Edit”.
On the next page, click the “Delete” link in the upper right. It will look like nothing happened, but after 10 seconds or so, the page will refresh and the version will be gone.
Now that the conflict has been removed, you can restart the build.
Test Failures
View the Test Results Overview.
![](troubleshooting - test - 10 - testResultsOverview.png)
Identify the failing test.
![](troubleshooting - test - 20 - failingTest.png)
Identify which step in the test is failing.
![](troubleshooting - test - 30 - failingStep.png)
It can sometimes help to view the last call that was made prior to that stage failing.
![](troubleshooting - test - 40 - failingDetails.png)
Connecting to the Jenkins VM
Members of the jenkins-debuggers@spinnaker.io
group have access to SSH directly to the Jenkins VM. You can connect to the instance with this command:
$ gcloud compute ssh --project spinnaker-community jenkins-transfer --zone us-central1-f --ssh-flag "-L 4040:test-jenkins:8080"
The extra --ssh-flag
establishes a tunnel to the test-jenkins
instance, which is used to trigger some integration tests. You can view this instance at
http://localhost:4040
after the connection is established.
Change to jenkins
user
All processes are run as the jenkins
user and most of the useful links are in /home/jenkins
. Switch to it with:
$ sudo su - jenkins