Bugs affecting more than one service commonly surface only when the whole system is running. Our continuous integration system builds and runs a suite of integration tests against Spinnaker nightly on real cloud provider infrastructure to detect these bugs.
Access the CI system here: https://builds.spinnaker.io
Viewers must be a member of the
The build cop responsibilities include:
- Triage integration test failures on
masterand the 3 most recent release branches
- Clean up orphaned resources across target cloud providers
- Route new GitHub issues to the appropriate SIG (applying GitHub labels as appropriate). You can find the full list of SIGs in the governance repo
- Observe any systemic problems raised in the #general and #dev Slack channels
- Log observations and corrective actions taken in the rotation log
The CI system comprises both jobs, which do a specific task, and flows, which invoke a series of jobs.
is the primary entry point for the
master branch flow.
Flow_BuildAndValidate_<version> is the the entry point for the respective top-level release. It is a copy of the primary
Flow_BuildAndValidate when that release was cut. Top-level release flows work off their respective
As its name implies,
Flow_BuildAndValidate builds and tests the whole Spinnaker system. It follows this general process:
git checkoutall services
- Constructs a BOM from the most recent commit on the target branch
- Builds a Docker container and a Debian package of each Spinnaker microservice.
- Builds additional supporting artifacts:
- Publishes the BOM under the following names:
- With the floating tag:
- With a fixed tag:
- Publishes the changelog
For uninteresting reasons, this job must wrap the following
ValidateBomMultiPlatform in order to aggregate its results.
This “Multi-configuration project” specifies the same test(s) to run across different environments. This confirms Spinnaker works whether deployed as a single VM or in a Kubernetes cluster, for instance.
- Starts Halyard in a new VM
- Connects to this instance and executes a series of
hal configsteps, including account setup for the managed cloud provider(s).
- Deploys the configuration with
hal deploy apply.
integration tests against the new Spinnaker instance.
citestinvokes a command to Spinnaker, and then uses the underlying cloud provider’s CLI to confirm the expected changes were made. For example, using
gcloudto confirm a GCE server group was created or deleted.
Cleaning Orphaned Resources
Occasionally, integration tests fail in a way that is either undesirable or difficult to automatically clean up. Build cops should periodically ensure these orphaned resources are deleted from the following locations:
- <code>spinnaker-community</code> GCP project
Deleting Obsolete Artifacts
The following jobs assist in removing old artifacts created during the build process:
Check whether the failure happened during the build or the test phase:
Click the failing Flow.
!(troubleshooting - base - 10 - flow.png)
Click for the most recent failing build.
!(troubleshooting - base - 20 - mostRecent.png)
Click through to the failing phase.
!(troubleshooting - base - 30 - phase.png)
The build phase uses many subshells to perform its work in parallel. Use the
Console Outputto help narrow down which step of the build has failed, and use the collected logs to view more information on what specificially went wrong.
!(troubleshooting - build - 10 - consoleOutput.png)
The Console Output prints out after each completion how much work is still remaining.
!(troubleshooting - build - 20 - buildSteps.png)
Frequently, the build error will be printed out directly to the Console Output, but sometimes this output can be hard to read. View the raw file directly using the Build Artifacts link from Step 1.
!(troubleshooting - build - 30 - failedOutput.png)
Common Build Failures
If an artifact is uploaded to the Bintray repository but never published (either because of a transient Bintray error or an interrupted build), you’ll get an error like this:
Bintray API Request ‘create version 0.20.0-20200512192702’ failed with HTTP response 409 Conflict
Follow these steps to delete the artifact and resolve the issue:
Navigate to the specific version in the Bintray repository
Click on the Spinnaker repository that had the failure. (If you don’t see it, click to the next page; there are only 10 items per page for some reason.)
Click on the specific version that had the issue.
Click “Actions” in the upper right and select “Edit”.
On the next page, click the “Delete” link in the upper right. It will look like nothing happened, but after 10 seconds or so, the page will refresh and the version will be gone.
Now that the conflict has been removed, you can restart the build.
View the Test Results Overview.
!(troubleshooting - test - 10 - testResultsOverview.png)
Identify the failing test.
!(troubleshooting - test - 20 - failingTest.png)
Identify which step in the test is failing.
!(troubleshooting - test - 30 - failingStep.png)
It can sometimes help to view the last call that was made prior to that stage failing.
!(troubleshooting - test - 40 - failingDetails.png)
Connecting to the Jenkins VM
Members of the
email@example.com group have access to SSH directly to the Jenkins VM. You can connect to the instance with this command:
$ gcloud compute ssh --project spinnaker-community jenkins-transfer --zone us-central1-f --ssh-flag "-L 4040:test-jenkins:8080"
--ssh-flag establishes a tunnel to the
test-jenkins instance, which is used to trigger some integration tests. You can view this instance at
after the connection is established.
All processes are run as the
jenkins user and most of the useful links are in
/home/jenkins. Switch to it with:
$ sudo su - jenkins