Terraforming AKS at the speed of Cloud Native

Karl Cooke | Jan 15, 2025 min read

IMPORTANT

The code, configuration, and commands discussed in these articles are intended solely to demonstrate the capabilities of a specific product or project.

Unless otherwise stated, these examples are not to be considered “Production Ready”. All configuration samples provided are for demonstration purposes only.

The author takes no responsibility for any consequences arising from the use of this information.

Welcome back!

It’s been a minute since I’ve written anything, life has a way of getting very busy both in and out of the day job. We’ve had a lot going on here, including relocating to a whole new country!

I’m determined to get back to writing regular tech-focused articles. Writing helps me to de-stress and provides plenty of opportunity to learn new things! I also LOVE giving back to the tech community, so let’s get into it!

The Problem

I’m a huge fan of the cloud and of cloud native technologies in particular. Cloud Native technologies are helping organizations of all sizes to innovate faster, scale more efficiently, and ultimately deliver better products to their customers. I could go on, and on, and on about the benefits of Cloud Native technologies, but let’s save that for another time.

My happy place, in technology terms, is working with Cloud Native technologies in Azure, specifically Azure Kubernetes Service or AKS for short.

Things change fast in the Public Cloud. Azure is obviously not an exception here. Regardless of how long you’ve been working in the cloud, keeping up with the rate of change can be a challenge. If the cloud moves fast, then cloud native technologies move faster. If I can let my Star Trek nerd out for a moment, I’d say cloud native technologies move at warp speeds! This is one of the things I love about the cloud and cloud native technologies, it means that no two days are the same and there’s always plenty to learn.

With Kubernetes releasing about 3 times per year, and Azure releasing new features and updates for services like AKS at a similar pace, it’s a lot to keep up with. As an Azure MVP and a Cloud Architect/Consultant/Engineer (sometimes all 3 in the same day!), I need to make sure that I keep up with the pace of change. This often means experimenting with new capabilities, building out labs and proof of concepts to keep my skills sharp and to really get under the hood of the technology.

I often build out AKS lab environments for these experiments and learning opportunities. Terraform is my tool of choice when it comes to Infrastructure as Code. Infrastucture as Code is critical to creating consistent, repeatable environments quickly whether that’s for learning and experimentation or in the wild for client workloads. Terraform uses providers to interact with cloud provider APIs to build your resources in the cloud. For Azure, the primary provider that most engineers and organizations use is the AzureRM provider. I build most of my labs and customer solutions using this provider. It’s great but, it can lag behind the very latest updates from Azure. In particular, preview features often take a while to be supported.

For example, one of the new preview features in AKS that I’ve been testing out is the API Server VNet Integration. Using API Server VNet Integrations projects an API Server endpoint into a delegated subnet without requiring any Private Endpoints. Talking about how great this new feature is, is another topic for another day. Ordinarily, I’d have to wait for this preview feature to be supported in AzureRM or I’d have to change my tooling to use Azure CLI, PowerShell, or Bicep and I don’t want to do that all the time.

The Solution

Don’t worry, there is a solution! I promise I wasn’t just writing an article to present a problem and then leave you to figure it out for your self! The solution to our problem is another Provider from Azure called AzAPI. This Provider has been around for 2-3 years and was specifically created to solve the problem I mention above, not just with cloud native resources but with all resources in Azure.

What is AzAPI?

According to the Microsoft documentation linked above, the AzAPI provider is a thin layer on top of the Azure ARM APIs. It allows you to build resources using any of the ARM APIs directly from Terraform, which is exactly the same as the AzureRM provider. The critical difference with AzAPI is that it allows you to choose which API version you are using. With the latest API versions, you can access the latest day zero features and releases, whether they are preview or not. It’s just a case of selecting the appropriate API version. If you’re familar with Bicep at all then this won’t be a foreign concept to you.

Why Now?

I can already hear you say “Karl, if this tool has been out for years and has had this exact same functionality for years, WHY are you telling me about how awesome it is now?!”

Well, that’s a valid question. When AzAPI was originally released and right up until version 2.0 which was released in October 2024, it relied on A LOT of JSON! Sure, the scaffolding and structure was the Terraform we are familar with in AzureRM but, at it’s core, it was passing JSON to the ARM APIs. JSON that you had to write yourself, making sure that you had everything in the right place. If you’ve heard me talk about Infrstructure as Code before, you’ll know that I wasn’t the biggest fan of ARM templates given the amount of JSON you had to write. Frankly, they weren’t particularly human readable and they were also intimidating to write and maintain for an engineer who was just getting to grips with the cloud way back when. That’s why I rejoiced when Microsoft released Bicep. AzAPI pre version 2.0 felt an awful lot like that and had earned itself a place alongside ARM templates in my “only if I absolutely have to” list.

With the release of version 2.0 (we’re now on version 2.2.0), the copious amounts of JSON have been abstracted away! AzAPI now takes HCL blocks just like AzureRM. This means that you can use the same Terraform syntax that you are already familiar with. While this change might seem small to some, it was a game changer for me and made me really sit up and take notice of AzAPI.

AzAPI is intended to be a first class Terraform Provider for working with Azure. It can be used exclusively or in conjunction with AzureRM. As you will see below, I use both to build out my AKS lab environments. I also use both in my day job to build Production environments for clients as there are some things that AzureRM does better than AzAPI and vice versa.

Why does AzAPI matter?

In short, AzAPI matters because it gives day zero access to new features in Azure without changing the language you’re familiar with. It looks like Terraform, it feels like Terraform, and it acts like Terraform, right down to the state file.

Working Example

Let’s take a look at a working example where I use AzAPI and AzureRM to build an AKS Cluster with the new API Server VNet Integration feature enabled. This is a very tightly scoped example but, it should give you an idea of how AzAPI and AzureRM can be used together to build out your environments.

First up, we need to update our Terraform providers block to include both the AzureRM and AzAPI providers. I like to separate out the various Terraform configurations into their own files to keep things neat and tidy. This is a personal preference and not a requirement. See below for how I tend to set up my directories and files. Ignore the tf.conf file for the moment, we’ll come back to that in the next article.

My Starting Terraform Files

Let’s look at the providers.tf file as it contains information that Terraform uses to decide which providers to install and how to configure them.

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "4.15.0"
    }
    azapi = {
      source  = "Azure/azapi"
      version = "2.2.0"
    }
    http = {
      source  = "hashicorp/http"
      version = "3.4.5"
    }
  }

  backend "azurerm" {
  }
}

provider "azurerm" {
  resource_provider_registrations = "none"
  subscription_id                 = var.init.subscription_id
  tenant_id                       = var.init.tenant_id
  features {}
}

provider "azapi" {
  enable_preflight           = true
  skip_provider_registration = true
}

provider "http" {
}

As you can see, I am using AzureRM, AzAPI, and the HTTP providers. Focusing on the AzAPI provider, you can see that I have enabled preflight checks and instructed Terraform to skip the provider registration. Preflight helps identify errors in the code and configuration before it errors during deployment. Check the AzAPI Terraform Docs for more information on these and additional options.

TIP

In a production implementation I recommend creating opinionated modules for each type of resource. This allows for easy resuse and ensures that resources are created in a consistent, standardized manner that adheres to best practices and security recommendations.

When discussing Infrastructure as Code with my clients, I always recommend to only expose variables that you want module consumers to be able to change. If there are any configuration non-negotiables, make sure they are set in the module and not exposed as variables. By authoring your own modules you can tailor them to suit the needs of your organization.

This is another deeper discussion for another article, so stay tuned for that!

Next up, let’s look at the main.tf where we define the Resource Group and the AKS Cluster.

This is a very basic example meant to show you how to use AzAPI and AzureRM together. As you can see below, I create the Resource Group using the AzureRM provider and then place the AKS Cluster in that Resource Group using the AzAPI provider.

resource "azurerm_resource_group" "resource_group" {
  name     = "rg-kc-lab-azapi"
  location = "East US"
}

resource "azapi_resource" "aks_cluster" {
  type      = "Microsoft.ContainerService/managedClusters@2024-09-02-preview" # (1) Specifying the API version for the AKS resource.
  parent_id = azurerm_resource_group.resource_group.id # (2) Obtaining Resource ID from the AzureRM-created Resource Group above.
  identity {
    type = "SystemAssigned"
  }

  name     = "aks-kc-lab-azapi"
  location = azurerm_resource_group.resource_group.location # (2) Obtaining Location from the AzureRM-created Resource Group above.

  body = {
    sku = {
      name = "Base"
      tier = "Standard"
    }
    properties = {
      apiServerAccessProfile = {
        enableVnetIntegration = true # (3) Enabling API VNet Integration. Option not available in the current version of the AzureRM provider.
      }
      agentPoolProfiles = [
        {
          name   = "systempool"
          vmSize = "Standard_D2ds_v5"
          mode   = "System"
          count  = 1
        }
      ]
      dnsPrefix = "kclabazapi"
    }
  }
}

I’ve added a few comments above at key points that I want to call out. Let’s cover them in a little more detail below:

  1. Ensuring we are using the correct API version for the resource being created with AzAPI is critical. The latest day zero features are gated behind newer API versions. Check out the Azure REST API Reference for the latest API versions and the features they support.
  2. It’s common practice in Terraform to reference other modules or resources to obtain resource specific information. This saves hard-coding and helps in situations where the information may not be known until runtime. In this use case, I am obtaining the Resource ID and Location from the Resource Group.
  3. This is the key configuration item that enables the API Server VNet Integration. Changing to an older API version or using AzureRM to create it would not expose this feature for use.

The end result of running the above is an AKS Cluster with the API Server VNet Integration feature enabled with default configuration. You can confirm this by looking in the nodes resource group for the AKS Cluster and checking for a subnet named aks-apiserver-subnet which will be delegated to Microsoft.ContainerService/managedClusters. It is possible to bring your own subnet for this feature, by specifying the subnet Resource ID in the apiServerAccessProfile block.

TIP

Make your IaC authoring experience top class by installing the VSCode extensions for your chosen language. Once you do, CTRL+SPACE will become your best friend. With these extensions installed, this key combination will bring up a context-senstive list of options. This is great when you can’t remember what the correct property name is or when you aren’t sure what the required and optional properties are for a resource.

Example of the AzAPI Extension in operation

Thank you!

Thanks for reading and learning how to Terraform AKS at the speed of Cloud Native! Please feel free to reach out if you have any questions or comments. Check back soon for a deeper dive into how I use Terraform and specifically the AzAPI provider to quickly deploy AKS lab environments for testing, experimenting, and learning.

Information

Most commands shown in these articles will be Azure CLI running within a Linux Shell. As Azure CLI can also run in PowerShell, these commands are transferrable.

Remember to swap the “line continuation character” when switching between Shells.

Use \ (backslash) for Linux-based Shells and ` (backtick) for PowerShell