When building cloud platforms for our development teams, our top priority is automating the application operation process and empowering teams to handle deployments independently. This approach is a step in the right direction for achieving team autonomy and bridging the gap between development and operations. A typical setup for our platform would involve:

Typical Cloud Runtime platform
Typical Cloud Runtime platform

While this setup is an improvement over the manual handover process of software from development to operations, it still has its own challenges as described below.

Application Focus

Kubernetes is for people building platforms. If you are a developer building your own platform (AppEngine, Cloud Foundry, or Heroku clone), then Kubernetes is for you.

Kelsey HightowerDeveloper Advocate @ Google

Kubernetes is an excellent tool for running applications, as it enables us to define our operational needs and constraints declaratively, in the form of resources. This approach is a game-changer, as it allows us to make constraints explicit and abstract away low-level interactions with the hardware behind an API. However, as Kelsey Hightower has noted, it may not be the best API for development teams to use directly. This is because understanding Kubernetes internals, such as routing, placement, and resource capacities, and how to manipulate them using Kubernetes means can be a distraction from the team’s core objective of rolling out new functionality and running applications efficiently.

Secrets Management

Many platforms do not provide a way to manage secrets in the same way as other resources required to run applications. Whether the secrets are managed manually or automatically generated from a pipeline, they are often treated differently than normal deployments, requiring extra tooling or processes. Furthermore, teams may need access to the secrets in order to generate the secret resource themselves, which can lead to storing them in less secure environments such as the build environment.

Access Management

The example setup necessitates granting the build pipeline access to the cluster, with far-reaching permissions that allow it to create, update, or delete resources within the cluster. The access token for the pipeline must be stored in the build environment and carefully safeguarded to prevent misuse. Furthermore, team members require access to the cluster in order to monitor the health of their own application or perform failure analysis. Managing the appropriate permissions and access can be a complicated and error-prone process, with the potential to create security vulnerabilities if not done correctly.

Request for infrastructure

While infrastructure creation can be automated, there is still a trust boundary between development teams and the platform team. As a result, organizational processes are required to manage requests that cross this boundary. This can take the form of a ticketing system or other means of communication. Ultimately, it is the platform team’s responsibility to set up the infrastructure in order to hide the underlying complexity. For example, setting up the infrastructure may require different tooling such as Terraform, which should not be a burden for the development teams to learn and master. Additionally, governance is necessary to ensure that the infrastructure is set up correctly with the appropriate permissions, default values, and in the right region or network. The platform team is responsible for managing the budget, which may impact decisions related to infrastructure that the development team may not be aware of. As a result, team autonomy may be limited when it comes to infrastructure decisions, without sacrificing the platform team’s governance.

Discoverability and Best practices

Breaking down our systems into tens or hundreds of smaller services, even with automation and a common runtime platform, can make it challenging to maintain an overview. It becomes difficult to answer questions such as: what services exist? Who is responsible for each service? Where can I find the source code and description of each service? Is the service still active? This information is often scattered across multiple wikis, systems, and individuals, making it challenging to obtain a comprehensive understanding of the system.

On the other hand, it is crucial to have a system in place to provide best practices when creating a new service. Although Service Template repositories can be used as a baseline for the code, there is more to creating a new service than just the code. Other steps, such as setting up a pipeline, granting access to the cluster, and other necessary configuration, need to be completed before a new service can be deployed. Therefore, it is essential to have a comprehensive system in place to guide the creation of new services, beyond just providing code templates.

What solutions can we introduce to address the challenges discussed above?

KubeVela

KubeVela provides a simpler way to define applications, among other benefits. Although it is agnostic to the runtime infrastructure, it has excellent integration with Kubernetes. Rather than defining all operational constraints separately, we can use Kubevela to define a simple application like this:

apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
  name: my-app
spec:
  components:
    - name: my-app
      type: webservice
      properties:
        image: my-app:1.0.0
        port: 8080
      traits:
      - type: gateway
        properties:
          domain: my-app.127.0.0.1.nip.io
          http:
            "/": 8080
app.yaml

KubeVela is based on the Open Application Model and offers many features, but its main advantage is providing a better interface for development teams. With KubeVela, teams can focus on the various aspects of the application and mix in traits as needed. By defining the application, KubeVela handles the corresponding Kubernetes resources, saving teams time and effort.

KubeVela managing app specific resources
KubeVela managing app specific resources

ExternalSecrets

ExternalSecrets simplifies the integration of external secret management solutions such as AWS Secrets Manager, Azure Key Vault, or Google Secrets Manager. It maps externally stored secrets to secret resources in the Kubernetes cluster using a defined resource. This provides an easy way to manage secrets across multiple environments, without having to worry about storing them securely within the cluster.

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: example
spec:
  refreshInterval: 1m
  secretStoreRef:
    name: aws-secretsmanager
    kind: SecretStore
  target:
    name: friends
    creationPolicy: Owner
  data:
  - secretKey: my_name
    remoteRef:
      key: friendslist
      property: name.first # Tom
  - secretKey: first_friend
    remoteRef:
      key: friendslist
      property: friends.1.first # Roger

  # metadataPolicy to fetch all the labels in JSON format
  - secretKey: tags
    remoteRef:
      metadataPolicy: Fetch 
      key: database-credentials

  # metadataPolicy to fetch a specific label (dev) from the source secret
  - secretKey: developer
    remoteRef:
      metadataPolicy: Fetch 
      key: database-credentials
      property: dev
externalsecret.yaml

Since ExternalSecrets does not contain any credentials or sensitive information, it can be safely stored in Git along with other resources. Developers do not need to know the secrets and do not have to store them separately in the build environment. If access is required for debugging purposes, it can be granted directly through the external secrets management platform, which eliminates the need to provide secrets in a less secure manner, such as via email.

ExternalSecrets managing secrets
ExternalSecrets managing secrets

ExternalSecrets is just one of many available solutions that enable secure storage of secrets in Git. Another possible option is Sealed Secrets, which offers a similar feature but does not require an external source for secrets.

ArgoCD

ArgoCD is a powerful GitOps tool that simplifies deployment by reversing the flow of deployment. Instead of actively pushing changes into the cluster, ArgoCD runs within the cluster and pulls changes available in Git. This allows teams to focus on their code and use their existing Git workflows to manage deployments.

With ArgoCD, the pipeline renders the final resource files, also known as manifests, and pushes them into a Git repository. ArgoCD then pulls the latest changes from this repository and applies them to the cluster. This makes Git the single source of truth, ensuring that what is defined in Git is what is deployed in the cluster.

ArgoCD for GitOps
ArgoCD for GitOps

By implementing ArgoCD, we can significantly enhance the security and efficiency of our cluster. ArgoCD follows the GitOps approach, meaning that instead of manually deploying changes into the cluster, it pulls changes from a Git repository. This approach provides several benefits:

ArgoCD is just one of many GitOps tools available. Flux is another popular option. Both ArgoCD and Flux simplify the access management to the cluster by eliminating the need for developers to have direct access. This approach improves security and accountability.

Crossplane

Crossplane provides a universal control plane for managing not just internal resources, but also external resources from different cloud providers in a consistent manner. It uses the Kubernetes resource model to model external resource claims and requirements, which can be applied in the same way as all other resources like deployments or PVCs.

In addition to raw mappings to external resources, Crossplane also enables the platform team to define their own abstracted resources, providing a simpler interface for developers. For example, a platform team could define a custom resource for a database with specific configurations, and developers could use this abstracted resource in their applications without needing to worry about the underlying cloud provider or the specifics of the database.

apiVersion: db.innoq.com/v1alpha1
kind: PostgreSQLInstance
metadata:
  name: my-db
  namespace: default
spec:
  parameters:
    storageGB: 10
    passwordSecretRef:
      namespace: default
      name: psqlsecret
      key: password
  writeConnectionSecretToRef:
    name: my-db-conn
db.yaml

This makes it easier for teams to manage their cloud infrastructure in a consistent way, and reduces the complexity and overhead of managing multiple cloud provider APIs separately. Crossplane also provides a policy engine to ensure that resources are provisioned in compliance with company policies and guidelines.

By moving the external resource state to the cluster and managing it as any other resource, Crossplane simplifies the management of external resources and makes it more consistent with the Kubernetes resource model. It also allows for the use of GitOps to have Git as the single source of truth for the external resources such as databases or message middlewares, which can improve the security and auditing of these resources.

Depending on the level of trust in the development teams, these resources can be directly managed in the application’s repository, or a separate Git repository with restricted permissions can be used to keep control in the hands of the platform team.

Crossplane managing external resources
Crossplane managing external resources

If Crossplane is already in place it can make the usage of KubeVela obsolete, as the same can be achieved with the resource abstraction feature of Crossplane to define an own application resource.

Backstage

Backstage is an open-source platform for building developer portals. It provides a centralised platform for developers to discover and reuse shared components, services, and knowledge within an organisation.

With Backstage, developers can view and manage the tools and services that they need to develop, test, and deploy their applications. It provides a modular architecture that allows teams to create plugins for their specific needs, such as integrating with CI/CD pipelines or cloud providers.

Overall, Backstage aims to improve developer productivity by simplifying the management of shared services and components, reducing duplication of effort, and providing a consistent user experience across teams.

Service Catalog in Backstage
Service Catalog in Backstage

One of the core features of Backstage is its workflow templates, which enable teams to quickly set up new repositories with everything needed to get started. These templates include a base setup for a service, including the Git repository and code, as well as the setup for a pipeline and the necessary Kubernetes resources. Additionally, they help to ensure that the correct permissions are in place for teams.

While setting up a new service is one example of a workflow that can be automated with Backstage, the workflow templates can be extended with plugins to automate all kinds of workflows and integrate with external systems.

Example templates
Example templates

Overall, Backstage helps to streamline the development process and promote collaboration between teams.

Backstage as developer portal
Backstage as developer portal

We support you and your team

Conclusion

Where to go from here?

The setup described is just the beginning and one of many ways and there are still many challenges that need to be addressed, such as implementing an observability stack or policy management. However, any new tool added to the mix should adhere to the same principles: simplifying the work of developers, providing easy access to the underlying platform’s features, and ultimately facilitating collaboration between development and platform teams. By continually iterating and improving on this foundation, teams can achieve greater efficiency and reliability in their software delivery processes. The platform is a product and shall be managed as such. Additionally, it’s important to note that this article primarily focuses on the technical aspects of an internal development platform (IDP). However, it’s worth mentioning that a comprehensive documentation, fostering the right company culture, and providing coaching and support are all integral parts of achieving success with an IDP. It’s essential to consider not only the technical implementation but also the broader organizational and cultural aspects to fully realize the benefits of an IDP.