Since the day Kubernetes 1.0 was released in 2015, the container orchestration tool has been rapidly adopted. The major cloud providers in particular have embraced it, each offering their own specific technologies for usage of Kubernetes on their platforms.
For Azure, this technology is the Azure Kubernetes Service (AKS), which has been designed to bring all the strengths of Kubernetes while leveraging the speed, capacity, and flexibility of the Azure cloud platform.
To break down just what makes AKS so powerful for enterprises, we held a webinar featuring Redapt experts that served as a deep dive into the technology itself. We also explained how using AKS can greatly accelerate how your organization can bring new products to market.
You can watch the webinar in the player below. There’s also a full transcription. They’re worth checking out if you’re interested in Kubernetes, Azure, and how to adopt modern development tools in order to carve out a competitive edge.
Video transcription:
Welcome, everyone, to Reach the Market at Light Speed with Azure Kubernetes Service. Today we're going to be hearing from our practice area lead Jerry Meisner, who is our practice area lead for DevOps. Quickly, we'll just discuss a little bit about Redapt. Redapt, we're an end-to-end technology solutions provider that brings clarity to dynamic technical environments. We have more than 20 years of technology experience helping organizations navigate through challenges and obstacles to accelerate business growth. We have deep cloud expertise, and we are excited to present to you all today about Azure Kubernetes Service. So go ahead, Jerry.
Perfect. Thanks for the introduction, Sarah. I will leave this slide up for just a moment. We're talking about, again, as Sarah mentioned, Kubernetes service. We'll talk a little bit about containers and the benefits of containers. But I'm Jerry Meisner. I'm the DevOps practice lead. I've been working with Kubernetes since version 1.4, and I've helped many customers make the leap to containers and then eventually Kubernetes administration. So, today, we're going to cover the benefits of containers. We're going to talk a little bit about utilizing the Kubernetes service, and then I'm going to actually do a little bit of a technology demo and kind of show you a sample application being deployed, and kind of some ... what do the portal options give you, and we'll talk through some of that.
So the first thing we'll talk about is some of the benefits of containerization. So there are really three main benefits of containerization. One of them is the fact that containers are going to be lightweight. There is also a standard developed. And then they are relatively secure compared to running multiple processes on the same virtual machine. So a container image is a lightweight standalone, executable package of a piece of software that includes everything needed to run it. So this includes the code, runtime, system tools, system libraries, settings, et cetera, and they're available for both Linux and Windows based applications, and your container will generally run the same regardless of the environment. And that's going to be one of the key benefits that the standardization kind of brings to the table.
Now there are a couple of things. So what makes them lightweight? So the fact that the container is all running on a similar host, can share a common kernel means there's less overhead. They're not running a kernel and an operating system for every single application or every single application that you need to have isolated on a particular virtual machine. So all the applications that are running as containers on a single hardware or virtual machine are leveraging the same kernel, same operating system.
From the secure aspect, these are going to be isolated processes. So if you're running three or four containers in the same virtual machine, and for example, one of your containers on that virtual machine gets compromised, if something does go wrong with that particular container or it is compromised, the issues will be limited to that particular container. So you don't have the ability to go from container to container within the virtual machine, simply through HTTP requests. You'd have to have secrets and keys and identities and things like that in order to get fully in.
When we were talking about Docker, our containers ... and generally you hear Docker come up a lot ... Docker is basically one of the runtimes ... that was kind of one of the early runtimes. There's also some command line tools and some things that they kind of make easier for the development of containers. There are additional runtimes such as runC or containerd that are going to be the runtime environment for your container. It's going to be Docker compliant, it's just going to be a little bit lighter weight than the traditional Docker daemon.
Now there are three main components to Docker. So you have the software which is going to be the Docker daemon running on the host that you would tell to run your container. Then there are going to be the objects such as the container, the image themselves. And then you're going to have a concept of a registry, and the registry basically acts as an Artifactory if you're familiar with that concept. Essentially, you're pushing these images or these artifacts up to the registry, and then when we tell Kubernetes or when we tell Docker to run them, we are going to basically tell it where to look, so "Hey, go grab this image from this registry at this location." So this is a general review of containers.
The analyst firm Gartner predicts that by 2023, 70% of organizations will be running three or more containerized applications in production. So we're seeing a very quick adoption of containers. When you think about the overhead of managing virtual machines, especially for stateless services, for example, there are a lot of benefits that can be had just by containerizing, moving up into a container. Another benefit would be for cost utilization. So you can have three processes running on the same virtual machine that are isolated from each other and make better use of the CPU and memory that that virtual machine is hosting rather than having one of the processes sit idle and not be leveraging the CPU and memory of the node. So there are a lot of benefits and organizations are realizing those.
From the developer and engineering standpoint, moving applications to a more agile development environment leads to a few things. One of them would be an application portfolio assessment. So having the ability to go to a single place and kind of check logs, for example, or to check a GitHub repository and see what dependencies are built into the application, as well as tracking history and source control, that type of thing. All those things are going to be possible.
The next benefit that we have here is selection and implementation ... proof of concept modernization work. So when you have the ability to quickly bring up a new Flask application, you don't have to worry about bringing up the virtual machine, making sure that the virtual machine is patched, making sure that the correct Python is installed on the virtual machine. Those types of things are sort of removed from this process. So once you have a good base candidate for a Flask container, for example, you can simply swap out the Python code and, for the most part, the Flask application that you prototyped will work relatively quickly. This allows you to again, proof of concept, try some things out, rapidly iterate, that type of thing. And developers really like that aspect of it.
The third aspect is full assistance during application migration and Day 2 operations. So when you think about managing virtual machines, there are several cloud providers, there are several on-premises solutions for containers, because it's sort of a standard. It's going to work the same way whether it's in the cloud or whether it's on prem. So the benefit of this is you have the community which has gone ahead of us and developed some software that runs in containers that we can easily deploy on our virtual machines. There are also tools and things that have been developed by third parties, which make life a lot easier for folks that have adopted containers.
Another benefit is if you're going to potentially leverage a consulting firm such as Redapt, having a standard on containers is really easy for us to kind of ramp up on and help you guys advance operations and things like that, because we don't necessarily have to get involved with having to set up virtual machines. We have the instructions for the container, we have the artifacts that need to be deployed, and then we have the instructions for configuration, ideally. So the environment variables, configuration files, that type of thing, which is really easy to hand off compared to tribal knowledge of having to install software onto a virtual machine or a bootstrapped script that has to be modified as things advance.
So technology and business decision-makers are also seeing some benefits, so they’re investing in platforms that are more agile for application development. So they're finding it to be more cost-effective. Again, the overhead for deploying an application to a containerized environment is relatively low compared to that of a virtual machine, especially when you think about secure environments where maybe you have teams that are provisioning the virtual machines, you have a team that installs the software, you have a team that does the security hardening aspects, that type of thing. When you can boil those things down into a container image, and then, you know, you have the processes removed so you don't have to waste the time [inaudible] the virtual machine. The virtual machine should be there, or a pool of VMs should be there to run our container on it. So again, eliminating overhead is a big benefit for business leaders.
Easier to scale as needed. So again, we're kind of going back to that virtual machine process. If you needed three or four or five instances of your application, you'd have to bring up those virtual machines individually. You could leverage something like autoscaling groups, but then you have to kind of ensure that everything is immutable within the autoscaling group, and then the autoscaling group logic might be different between AWS and Azure and Google, for example. Certainly, on prem there are some challenges for autoscaling workloads unless you're running something like OpenStack or an on-premise cloud platform. The benefit of being able to scale at the container level was that we don't necessarily have to scale the hardware. We can scale the processes that are running across the pool of hardware, and so we can achieve replicas, high availability, that type of thing, without the additional need for tons of overhead with the operating system, additional Linux kernels, things like that.
The third big bullet that business decision-makers see is this creates a culture of increased innovation. So you know, when we talk about the developers being able to rapidly prototype something, for example, this will give ... maybe a developer within your organization has a really good idea for a new feature or something along those lines. And rather than having to go through all the traditional pipelines, they can immediately start working on a container involved through the container pipeline process, which may be deployed the workload up, maybe there is a request for an opening up of a firewall rule or exposing a public IP. But at the end of the day there's a lot less overhead when you're working with a container deployment than there is with a traditional VM deployment which kind of usually spans multiple teams.
So with containerized applications you can avoid the slow pace of waterfall development. So if you're familiar with the waterfall development process, it sort of starts with a heavy design phase in the beginning where you ... you know, you have to have all the requirements, all the functionalities sort of really well-defined. And then the waterfall approach basically starts at the beginning, constantly working until all of the design considerations have been met. And then you have an application software at the end of the product ... or at the end of the development cycle, rather.
Now traditionally, this has kind of been the way it's worked in the past. However, we are kind of moving more towards an agile world where instead of designing everything up front, you may have some initial designs up front, but you're going to start with generally a proof of concept and minimum viable product, if you will. And then you're going to do an iterative process. So you know, if the waterfall approach lends itself really well to binaries that you had to deliver to clients, for example, on a regular interval, with the kind of migration to SaaS solutions, those things become less important and what becomes more important is being able to roll out new features relatively quickly. And so, that's where reducing the lengthy times between development and deployments is going to come in handy.
The other thing that I mentioned is the risk of downtime is reduced quite a bit. So certainly when you think about running virtual machines, you're going to have those three VMs, all have their software installed. If something happens on one of those virtual machines, there may not be any sort of operating procedures to restart the processes and things like that. And certainly, if there isn't one, you'd want to have to kind of go through that process of developing them.
In the Kubernetes world, it's a desired state configuration. So essentially, when you're running a set of three containers across your node pool, if for any reason one of your three containers dies, maybe the node crashes itself, the entire host crashes, or you know, some other effect happens. Maybe the process within the container crashes because it ran into a stack trace error or something along those lines. But the idea is that with Kubernetes you have the ability to restart workloads. So if the nodes are healthy we can restart those workloads. If the node stops reporting, then we can move that workload to another node. And so you never really face a scenario where you have less than two or three instances at a time running, and so that kind of increases your uptime.
So this is kind of a little bit of a recap. Again, we talked a little bit about how accelerated development scales your applications in a cost-effective way, rapidly makes improvements to your software and products, and then minimizes downtime for your applications as well.
Now we'll talk a little bit about the benefits of Kubernetes for container orchestration. So containers being a standard, open platform that can kind of run in all clouds and on prem, Kubernetes is going to be an orchestration platform for those containers that run in all the major clouds and on prem as well. There are certainly some supported versions of Kubernetes. AKS is a managed Kubernetes solution, and we're going to talk a little bit about that one today. But the idea is that if you can deploy into Kubernetes in one cloud, you should be able to deploy into Kubernetes on another cloud, assuming you're not leveraging dependencies on that particular cloud.
For example, the custom database solution that doesn't exist anywhere else. If you're using that in one cloud, you won't be able to leverage that in another cloud. You're still going to have to do a kind of a little bit of application modernization in that regard. But you do have multi-cloud capability in that if your application isn't leveraging in cloud-native, cloud-specific technologies, then you can deploy the exact same container to multiple environments and just kind of configure it to have different connection strings, that type of thing.
And then with Kubernetes, we're going to get increased developer productivity. So having the ability to build and run containers locally in the development environment is great, but Kubernetes allows you to create potentially dev namespaces or dev clusters that will allow developers to deploy directly to those environments. So you may have some gates or some checks for the QA or the production environment. But for the dev environment you don't necessarily need to have those types of checks. Instead, you can allow the developers to push code directly to, for example, their own namespace, or maybe their team's namespace, so they can run some integration tests before they improve the code for the next level in the life cycle.
So now we'll talk a little bit about utilizing the Kubernetes service. So the Kubernetes service, AKS in this particular case, the Azure Kubernetes Service, is looking to really provide you with four main things. One is a full enterprise-ready Kubernetes engine. So I'm going to talk a little bit about the architecture of Kubernetes and why this is such a good benefit. A full enterprise-ready Elastic provisioning of capacity without managing infrastructure, so this sort of comes out of the box with Kubernetes, and certainly if you have a cloud provider connected we can run additional virtual machines and things like that. So you don't have to worry about managing the infrastructure, you sort of set your minimums and maximums and allow the workloads to scale accordingly.
The third bullet we have here is an accelerated end-to-end development experience. So I've mentioned this a little bit, but there are some integrations with various tools. For example, Azure Monitor will actually show ... in the total we can actually dive into individual containers running across our cluster. You take a look at their resource consumption and various things like that. Azure DevOps, as an example I'm going to show as well as part of the tech demo. Basically, Azure DevOps is a timeline tool that basically can deploy workloads to Kubernetes so you can have Azure DevOps build the container and deploy it to a specific cluster within a specific namespace. Now the first one here is Visual Studio code, Kubernetes tools. Certainly if you're using VS Code as an IDE, there are some Kubernetes integrations there as well, maybe help you crafting your YAML objects or help you deploy your Helm charts to a remote cluster. Those types of things are all available as part of this Visual Studio code to plug in.
The fourth bit is identity and access management with Azure Active Directory and Azure policy. So typically in Kubernetes environments you'll have some static users. In fact, if you want to have a user management system at all, you have to really kind of roll your own and then essentially have those identities mapped to users within the cluster. And then you assign roles and permissions to those individual identities. With AKS, they have native integration with Azure Active Directory, so you don't have to set this up. This will already be sort of set up automatically. We can set up Azure Active Directory and then configure some Azure policies that prevent users from taking certain actions on the clusters based on their role or based on their identity.
So with AKS, we can easily define, deploy, and debug our applications. You know, basically, when you talk about redeploying applications traditionally, your container is immutable. So, essentially, if we need to make an update, we can make an update in the code, and then build our container, and then redeploy our container. With a similar configuration, maybe some slight configuration changes as part of an upgrade or part of a feature enhancement with our software. So automatically containerize your applications ... So this is probably a little bit of a loaded statement, but really what we mean here by automatically containerize your applications is that ... with Azure DevOps and with tools that kind of integrate, you can actually build your containers using Azure AKS. Certainly, you have to build the Docker file, still have to understand the basic concepts. They're not going to be able to create a container from nothing, at least right now.
The third bullet is to apply automated routine tasks to your AKS cluster for CI/CD pipelines. So again, this could be building the container, it could be deploying to a dev environment first, and then later deploying to a test and then running some integration tests against that environment, and having a manual approval step to deploy things to production once all those tests have been passed. You can also be sure of visibility into your environment with Kubernetes resource view, so I'll show this in the portal as well, kind of taking a look at some of the resources that are available, and then utilize the fully integrated Azure Cost Management tools to control cost. So we talk about managing virtual machine costs and essentially your containers are going to be running on virtual machines. The Azure Cost Management tools will give you the ability to kind of understand what is this cluster costing, you can also take a look at node utilization and see if you're over provisioned. So if you need to lower the hardware size, that type of thing, all of those things are going to be exposed to you through the Kubernetes portal. There are also some security recommendations, so if you're familiar with Azure Security Center, there are some integrations there that will pop out some helpful information, and I'll show you an example of that as well.
So that being said, we are going to pop out of this demonstration, and we are going to take a look at our demonstration. So the first thing I want to talk about is what is Kubernetes, and why is AKS such a good solution, or why do I need to run AKS? Why couldn't I just run Kubernetes myself? What are the benefits that the managed offering provides? So what you're looking at here is an architecture overview. On the left-hand side in the blue box, we have the control plane which includes a lot of the system components of Kubernetes, and I'll talk a little bit about those. And then on the right-hand side, we have some green boxes that include the nodes. So if you want to consider, for example, these two nodes as being part of a node pool, maybe they have a similar VM size, they have a similar operating system, that type of thing, this might be our single node pool.
Now within Kubernetes, we can have multiple node pools of various VM sizes, of various hardware capabilities such as GPUs and things like that. But there are some common components, so if you start kind of at the bottom like look at the kubelet, the kubelet is sort of the agent of the system. So in every node in the cluster, you're going to be running the kubelet, and the kubelet is essentially what is the go-between the API and the Docker daemon or the container runtime on the individual working machine.
Going up, we see the container runtime. Again, the container runtime is going to be that Docker daemon or the runC or containerd runtime that's installed with Kubernetes. Over time, this is going to become containerd again. It's not something that you need to focus too much on. Going one layer up, we see the kube-proxies. The kube-proxies are going to be doing a lot of the service discovery work, so there's internal DNS, there's potential for container network interfaces that do various things, just calico and flannel. Essentially the kube-proxy acts as the router service between the individual nodes. So if we have pod one here that needs to talk to this pod, it would essentially send a request out, the kube-proxy would catch that request and forward it to the appropriate kube-proxy which would then forward it to the appropriate pod on the destination node. There are a couple of ways of handling which pods are related to which service, and we'll talk a little bit about that in this session as well.
So on the left we'll go through some of the control plane components. Now again, everything in this blue box is managed by Azure. So you don't have to worry about maintaining etcd or maintaining the API or maintaining the cloud provider. Those things are all sort of handled by AKS. Now, the cool thing about this is that etcd is not an easy service to maintain. It is a raft storage methodology, so it does have to maintain a quorum. For example, if you have three instances of etcd, if you lost any two of them, you'd have data corruption or data loss. Similarly, if you had five and you lost three you would have data loss. If you had six and lost three you'd have data loss. You need to have more than 50% of the nodes at any given time available of etcd in order to kind of keep it alive. So Azure is going to be handling that for you in the back end.
On top of maintaining etcd, they are going to be doing automatic backups for you. For example, taking a backup of your cluster so that if anything needs to be restored within the control plane, they will be able to automatically restore that for you. You also have the API, so the API is directly storing and reading data from the etcd service, and then you have the individual controller manager. So as you interact with the API, you will, say, to schedule my pod, that will get created in etcd, the controller manager that's looking for the pods will see that and train etcd through the API server call, and then it will tell the API to tell this kubelet for example to run my single pod.
Now, looking at these services over here on the left, the cloud controller manager, this is going to be the one that basically the Azure service principal will be integrated with. So essentially, whenever you create a service type of load balancer, or you create a service that requires a disk for example, the cloud controller manager essentially makes those Azure API calls on your behalf, attaches the disk to the appropriate nodes, or attaches the load balancer to the appropriate nodes, that type of thing. So you can think about this cloud provider network edge, the load balancer gets created, attached to all the nodes. There's some Kubernetes magic happening here with regard to how this load balancer knows which node it can go to and which service it lands on, and I’m not going to go into that as part of this session.
There are some additional components, the controller managers ... There are several controllers within the ecosystem of Kubernetes, and they are kind of related to the objects that you can push to the API. So we'll talk a little bit about deployments, StatefulSets, jobs, cron jobs very shortly, but essentially each of those is a controller manager that's sort of looking for a desired state. So if you put something into the etcd, the controller manager is going to look in etcd, it's going to look at the real example of what's running, and if there are any discrepancies it's going to try and make those changes. So if it needs to delete something or if it needs to add something, it will know by looking at etcd and comparing that to the current state.
The scheduler is the next piece here. The scheduler is basically going to be looking at your pod definitions, your pod kubelets, and it's going to be deciding which node to run the workload. Now, typically it's going to be looking at CPU and memory and then looking at this node to see if it has CPU or memory. But there are other things it can check for, such as ... you know, we can have node pools again that have GPUs. If our workload requires GPUs, we would have to define that as part of our pod template, and then when the scheduler sees that in the definition, it will know that if the node pool one doesn't work, we need to put it on node pool two which has GPUs.
And this is done through a taints and tolerations mechanism. I'm not going to go into that as part of this webinar, but just kind of understand that it is possible to have multiple node pools with different configurations and have the workloads scheduled in the way that you'd prefer them to. Additionally, there are things like pod affinity, or we can have workloads prefer to schedule away from each other. So if we had three instances of a web server and we had three infrastructure nodes, we can find an Affinity rule that says, "Hey, you guys prefer to be away from each other," so that way Kubernetes will automatically schedule one here, one here, and one here. That way if we have a hardware failure, we still have two instances of our application running for high availability purposes.
So, kind of having that overview of the Kubernetes cluster is beneficial. What we're going to talk about now is how to create a Kubernetes cluster and kind of what are the options that AKS exposes to you through the portal. So certainly, the options that are available to you here through this portal are less than what's going to be available to you through the Azure API. I traditionally recommend that you create clusters through something like Terraform or ARM Templates which will kind of give you the more fine grained control of creating these resources.
Now on the basics tab, you'll have ... typical Azure resource form here ... we've got a subscription that we're going to be targeting, potentially targeting a particular resource group, and we also want to have a name for our cluster. In my case, my cluster is jmeisner-test-aks. You can kind of see that over there on the left. But I can do this, you'll notice that it will save ... So it will allow me to create that. I can do it too here, you also have the ability to select the region. Some of the regions have availability zones. For example, if I switch to US East, you can see I'm putting in zone 1, 2, and 3. In the West region, West US, we do not have any availability zones recorded yet, so you don't have the ability to choose there. Now the main reason for choosing across the availability zones is certainly for high availability, fault tolerance, that type of thing, and Kubernetes doesn't really pool things around that.
You also have the ability to select which Kubernetes version that you want to run. So understanding that Kubernetes itself is a software system developed by a known open source community, and as new features are developed they are going to be upgrading Kubernetes itself. And Azure, while they contribute to the CNCF, they are not the Kubernetes developers. They are just the developers of the AKS which are kind of managed solutions. So as new features become available in Kubernetes, they will be appearing in later versions of Kubernetes, and you'll be able to upgrade later, but when you create your cluster you'll have to pick a version. So if I wanted, say 1.19.7, I could start there. Certainly for production, we would recommend that you avoid preview builds if at all possible. Typically in dev you might test a preview build before it needs to be promoted in production.
This, again, down here at the bottom is the primary node pool. So if you think about it in our diagram, we just had two nodes, for example. But we could have three nodes. In fact, I'd recommend the minimum of three nodes, and they're going to recommend the same thing as well, especially if they're production. Then you have the ability to change the node size, so again, this is a two and an eight CPU ratio. If I have a bunch of workloads I'm going to be putting on these nodes, then I know that maybe I want to have a two to eight but I want to double the size and I can use the S3B2s, for example, to select that and have that work.
So that was the default node pool. We can add additional node pools and kind of here is where we have the ability to do that. So if I had ... let's say I wanted to put a GPU node pool in here, then I could go in here and choose my size, and we could do ... oh gosh, I could go in here and find the GPU disk. I'm not going to waste time doing that, but I could essentially go in here and find a virtual machine hardware type that matches up. Maybe I want high memory or high compute, for example, for this particular pool, and we could get that set up and squared away.
Now the taint and toleration isn't available for the UI. Again, it's one of those things I'd recommend creating through a terraform process or through an ARM template. So you can taint the nodes for example, so that workloads don't get scheduled there that don't require a GPU. Again, that's a more advanced topic than this webinar is going to discuss.
The third tab here is the authentication tab. So we've got a couple of authentication methods. You can give a service principle. So when we talked about this overview and we have this cloud controller manager that's making calls to create load balancers and assign disks, essentially that's what this ... this is the identity that's going to be taking care of that. We can just choose a system assigned managed identity, and then we'll kind of automatically handle the role assignment, and we'll automatically handle the credential rotation, that type of thing. I typically recommend this for development environments. Service principle might come in handy if you really need to make sure that the cloud provider can't do specific things like create public ips. Then you might create the service principle beforehand and pass it in here. I do recommend if you're a first time installation that you either use a system-assigned identity or allow it to create a new service principle by checking this box. Certainly, you can go through and modify the permissions of the service principle that gets created if you need to.
Going down a little bit further, we can see the role-based access controller, the authentication authorization. So when we talk about RBAC, there are some roles within the Kubernetes API, for example, if you want specific users to not be able to look at secrets for example, then you will need to turn on the role-based access control and define the role-based access control resources such as clusters and roles and bindings. The next kind of radio option here is whether you want to use the managed Azure Active Directory. Typically, for most folks that are using Azure, they are probably already leveraging AD, so we typically do enable this. And then you have the ability to choose an administrator group for the particular cluster. So if you assign admin to a particular AD group, then anybody in that group will have admin access to the cluster. They'll be able to create additional roles, additional cluster bindings, that type of thing, using their Active Directory credentials. And I'll talk a little bit about how to connect to a cluster very shortly.
So by default, you're going to get encryption at rest. They're going to use a platform managed key, so you don't even have the ability to turn this off. So you could be maintaining some compliance there especially with your... if you have some regulatory concerns, you're already going to get encryption at best. Now the option that they have available to you is to provide your own key. So if you want to provide your own key, if you don't want Azure to have the key to your data, then certainly you could check this option. Again, this is going to be more geared towards regulatory concerns that you aren't sharing anything with Azure or Microsoft.
One of the networking standpoints, there are a couple of different CNI configurations. I typically recommend Kubernetes if you're going to be doing many, many containers on an individual node or across a few nodes, a small node pool. Azure CNI has some benefits, including the ability to have IP addresses outside of the cluster, so we wanted to hit a pod directly from outside the cluster. Essentially what the CNI does is it starts using the address spaces of the subnet that your node pool is deployed into, and it assigns the node that many IP addresses. So for every node that comes up, there might be 30 IP addresses associated with that if the max pods is set to 30. And so every time a new container is scheduled on that node, that network interface is mapped to that container. And so if you ever hit that IP address that will go into that container ... into that virtual machine rather, and then down into that container. Excuse me.
The next options here are basically really more networking-related. So certainly you want to deploy your worker pool and your node pool onto an existing network or you could create a new one. Then there are some service address ranges, a DNS server address range, and then a couple of Docker bridge address ranges. So we typically recommend that you leave Docker bridge just the same. It's going to be the same on every single host in the cluster, as long as this doesn't conflict with anything on primary, I typically recommend leaving that the same.
Now these two here, certainly going to recommend that you choose address ranges that aren't consumed outside of the cluster, if possible. Theoretically, these are going to be mostly internal IP addresses, but later on down the road, you may want to mesh these clusters together, where you have the cluster's ability to kind of cross-communicate with each other. And in those scenarios you want to make sure that this address range is unique across each cluster. And so that's the consideration here.
So you don't want to have this routable to anything on prem or to the other cluster if you can help it. Now you know that your cluster is going to be standalone. It's never going to talk to ... mesh with any other clusters. Then these things are relatively arbitrary and will remain mostly internal. There are a couple things down here with the application routing, so this one allowed the Azure ALB for Ingress management. Certainly there are some different ways of doing Ingress management. They are not going to recommend that you use this for production, so you know, they'll have it right there. This is not recommended for production clusters. So in a production environment I wouldn't even check this.
In a development environment if you have [inaudible] or legacy system tools that you want to expose to the HTTP application load balancer, then certainly enabling that and creating those things there would be perfectly fine. Now we have a couple additional things to kind of make the environment more secure. So we can enable private clusters by doing this, essentially it creates a private IP address for our API. So you won't be able to talk to the API of Kubernetes unless you're actually on the network. So that's going to require a VPN, or you know, some kind of peering connection between your network that you're running on and the network that the Kubernetes is installed on is from here.
Additionally, if you want to leave it exposed to the internet so that it is publicly accessible, then we can set some authorized IP address ranges, so instead of having it exposed to the internet I could come in here and give it my address, and basically with this, if my requests were to come in with this IP address, the API would allow that. Otherwise it would deny that. So you've got a couple of different options for locking down the cluster. I recommend at the very least that you give this option, maybe this is the corporate NAT address or maybe it's individual developer addresses here. You can do, again, a range here. So this could be a bottom and a top.
And then we have the ability to kind of take a look at the network resources. So when we talk about network policies, again these are sort of ... the options, you have a None network policy, or you can have a Calico or an Azure network policy. I tend to recommend either None or Azure, at least for [inaudible] iteration. The Calico CNI does support network policies, meaning you can restrict traffic from pod to pod based on some labels in the Kubernetes cluster. However, this sort of breaks down when you have multiple subnets and multiple worker pools. So typically not leveraging workload isolation so that you don't have to have firewall rules within the cluster or going with the Azure CNI which has a different mechanism. It basically is going to be trying to create firewall rules for your applications.
Now we can take a look at some integration. So we talked a little bit about the registry component of Docker. Basically, Azure has their own Azure container registry service. Essentially you will select the container registry and the region that your cluster is running on, and you can connect this cluster directly to that. Now this just means that you won't have to do any authentication when you're pushing and pulling images from this particular registry on this particular cluster. If you have multiple registries, this isn't going to work too well, so you would have to kind of look at the Kubernetes way of adding additional private registries. But by default, you can assign one registry which will bypass a lot of the credential fetching.
When we look at the Azure monitor, this is basically ... Do we want to see CPU, memory, metrics, that type of thing in the Azure pool? Certainly there are some costs associated with storing that data long term, but it does kind of flow right into a Log Analytics workspace, if you're familiar with that. So I should be able to show an example of that here as well.
The last piece on here is the Azure policy. So again if you want to use Azure policies for creating and managing resources within the AKS environment, then we can enable this. Now I would recommend enabling this even if you don't use it. There's a little bit of overhead associated with it, but ideally I'd recommend enabling it and then trending towards applying some common sense safeguards for the environment.
The last two tabs again are going to be very simple tags. This is similar to tagging other resources within Azure. You may have some things that you need to put in here, so just cost center or various things like that. And then from the Review + Create perspective you can actually validate that all of your settings are good and that you can create the cluster. In my case, I didn't supply everything. I skipped over some stuff, so it's not going to validate for me which is fine. I've already got a Kubernetes service available to me.
So let's take a look at that. So here I drilled into my jmeisner-test-aks service, and I've made ... this is not a private cluster so it is exposed to the internet, and there are a couple of things that I can do. So from the overview page, you'll just see this here, but you will see this Connect option. And so when you click the Connect option, we will have some commands that we can run over here. So if I pull these into my command line, I can basically run those and get logged in, and then I can run this command here. Essentially, this command will download and configure my kubeconfig file. So the kubeconfig file is what you are going to be using to authenticate against the API, and the kubectl command line tool will be looking for the kubeconfig file at this location. There are other ways such as setting the kubeconfig environment variable, but essentially the kubectl tool, whenever you leverage it, is going to be looking for this environment variable or this file to exist.
Now what this file has, there are a few things. It's got a cluster, it's got a user, and then it's going to have a context section. Really the context just matches the cluster and the individual user. So as you add additional clusters to your configuration or as you run this command multiple times for different clusters, each of those clusters will essentially get added to your kubeconfig file. And then if you were to rerun this command, it's really going to be changing your current context. So if you have multiple clusters that are all in the same kubeconfig file, when you re-run this command, it's essentially just going to set your context. There are ways of doing this through the kubectl tool directly, if you have a kubeconfig file.
But this is the main command line tool that you use to interact with the cluster so we can do things like get pods or you can look at logs or you can explain resources and various things like that. Just kind of as an example, if I want to get to all the pods in my cluster across all the namespaces, I can do get pods -a, and you'll see all the pods in my cluster. So I've got some services running in the default namespace, I've got some services running in the test namespace, I've also got some additional components such as Istio and Prometheus Grafana. Basically some things that allow me to do service meshing and NTLS, a lot of additional integrations here and deployed onto this cluster.
Now what I want to show you guys before we ... in this section here is an example application. And in order to kind of do that, I need to talk about the controllers. So we talked a little bit about pods. Pods are the sort of the smallest definition of a runtime unit within Kubernetes. Essentially the pods contain one more container that are all going to be scheduled onto a single same host. And so you can define pods directly. So we can define a pod as a YAML definition and then directly apply that to the cluster, or we can define a replica set and the ReplicaSet, and the ReplicaSet would have a set number. Maybe it's three or four replicas of this pod that need to be running and then the deployment manages the replica sets.
Basically, the way it works is that things kind of collapse up the chain. So if we do a duplicate here and we go to deployment, we can take a look at an example of definition. So in here you can kind of see this template section. This is going to be your pod template. When you're looking at a ReplicaSet, the template that's part of the pod, the template in the ReplicaSet, and the template on the deployment are basically always the same ... what's called the pod spec. So, essentially, everything below this template line is a part of the pod spec and is very similar whether you're working with ReplicaSet's deployment, StatefulSet's, DaemonSet's, CronJob’s or Job’s. Essentially the template is just telling ... What is the pod template? What are the things that are needed to run the container?
So again, the benefit of using a deployment, for example, is you might push out a code change, and then the code change can actually bring up a new ReplicaSet which contains three pods with your new code. And it can bring down the old ReplicaSet too and then bring it down to one and then bring it down to zero. So effectively get a rolling update strategy just by leveraging the deployment instead of managing pods directly.
Your ReplicaSetting is, it's going to be looking and say, you know, I've got three replicas here, so this will create a ReplicaSet, and it will maintain the third group pod. So again if one of these pods gets destroyed, the ReplicaSet controller will see that the desired state versus the actual state doesn't match and will create a new product. So that's kind of the main premise that I need you guys to understand going forward here.
So kind of circling back to our pool example, I have a repo here called favorite-beer. This is a public repo, so if you're interested in kind of taking a look through this you can. It's github.com/redapt/favorite-beer. And here, under the spa-netcore-react-netis folder and under Voting they have a DockerFile. And so this is an example, a Docker file that includes the recipe for building my .NET core application. There are some complex build patterns here. I'm not going to go into what those are. But essentially, we're building our DOL, and then we're pushing our DOL into a final container, and then at the very end we are running our service using our 8080 port, using this environment variable and just calling .NET on that voting DLL. So when that happens, our application will start running, assuming all the configurations and things are there correctly.
So what we want to do is talk about how we take ... this Docker file ... we can build over the new Docker build, we get the container created, plus you have [inaudible]. So how do we get this deployed to Kubernetes? So one of the things I didn't talk about was Docker Compose. Not going to talk about it, but essentially this is a tool used for developing local Docker images. It has its own kind of format, and in this README I talk about how to convert the compose over to the actual StatefulSet that you're going to need to run something comparable. Certainly redis is something I would recommend running, maybe using the ... redis cache service instead of running redis within the Kubernetes cluster. But I wanted to kind of show an example of a StatefulSet as well as a deployment as part of this demo.
So coming down there is a concept of persistent volumes and persistent volume claims, and those things are mapped potentially back to the individual workloads. This is the MiniKube example. Just below that I have an AKS example that is much simpler. So here you can kind of see we have a volumeClaimTemplate down here. Essentially, whenever a new pod is created, we're going to get a new volume created that has one gigabyte of storage, and it's going to attach that to whatever node that my redis container runs on, and then we're using the /data pass. So you know, making sure that my data persists across ... maybe redis restarts, for example.
So this is basically what the StatefulSet would look like. I could again just create this YAML file knowing that my image is a public repository, there are no credentials necessary here, and then I could just apply that to my cluster. And so what you're seeing down here is my favorite-beer-redis service running as a pod. So if I do kubectl get StatefulSets ... maybe -n test because we're working with the test namespace. And so you can see my favorite beer, right, the StatefulSet exists, and it's actually managing the single pod. So it's just one-on-one, just a simple pod.
Now what I want to do is actually expose this service to the internet. So I don't want to expose it to the internet per se. I want to actually expose it to my other web service. So my web API server uses redis as a backend so this is actually going to be a private endpoint. So there are a couple of different service options. In this case I'm using ClusterIP, essentially just defining a service YAML and applying that to the cluster. This will get one of those service address ranges that will be internal for the cluster, and we'll also establish a DNS name, what's in the test namespace, called favorite-beer-redis.
So again, applying those we get our service, and I can kind of show you what that looks like if I do getservice -n test, you can see that favorite-beer-redis service is created here with a cluster IP, we're exposing 6379, all the typical redis jargon. Then in here we have our actual favorite-beer ... favorite-beer application deployment example. So this is a very simple example but doesn't kind of use any dynamic configuration. This basically just uses an alternate JSON file that I have defined within the container to show beer options. Now if we take a quick look at the source code, we can see there are a couple of service settings. There's a servicesettings_alt and a servicesettings.json file, so essentially this is just setting this environment variable, telling it where to find its settings.
Now, what I want to do is actually use another concept called a config map so I can actually inject configs through volumes with Kubernetes. And so I can create a config map that has this sort of settings JSON as one of my keys in my value. And then I can redefine my deployment with a volume and a volume map essentially mounting this JSON file to /etc/config, and then telling my application where to find that configuration file. So again, in the public repo, this is my DNS name within the test namespace for my favorite-beer-redis service, so you can ... just using this name here, the name of the service. And when I deploy this application up, essentially it will be able to communicate back in using those settings and deployments.
Now this is a little bit different exposure so I want to expose this one to the internet. So this time I'm creating a service and the type is going to be of type LoadBalancer. So this will tell the cloud provider, again, that I want to reach out and create a load balancer, and it's going to attach it to a specific node port, in this case 31650, and then any time traffic comes from that load balancer into 31650, we're going to hit port 80 on our container, which is the container we have listening for on our workload.
So after you create that service, after we apply that service definition, you'll see that service pops up in our list of available services here, and you'll also see that after a few minutes we'll get an external IP address that we can actually hit and try to play with our service. So if I get that IP address, you can see here's my favorite beer service, again, kind of pointing back to the configuration that we injected using ... excuse me, using our JSON file ... config map, and we now have our application running.
So that's, pretty quickly, how to get an application running. Obviously I didn't show the Docker build step, essentially building a container, pushing it up to that registry, and then applying these YAML files to the cluster is all you need to kind of get a workload going. Now I have taken this a few steps further, so there are some things in here ... you need examples such as a helm chart, and the helm chart essentially ... it's mostly the same thing. So you have a deployment, you'll have a StatefulSet for redis for example, and basically this is a freeform directory where I created all those resources that we just talked about, but instead of populating the individual values, I'm using some interpolation functions.
So I can actually upgrade the image repo or upgrade the image tag just by changing the values at deployment time. You know if you're ... kind of one of the cool use cases for something like this is, let's say I wanted to run this application for different markets, maybe Atlanta and Denver. So in Atlanta values ... I have my service settings for my config map including just some of these beers, and then in my Denver values we're including different beers. Now the key takeaway for the Helm is that this is a freeform values file, and the templates file basically references those. So when you say Values.servicesettings, for example, that's where we're looking at. That can be injected right into this config map through some interesting functionality.
Going to pick up the pace a little bit. What I want to show next is an example of some logs of a particular workload running in Kubernetes. So let's do ... I'm just going to kind of go through the overview with what ... so from here you can pass the, provided this cluster is up, you can go down here in Settings and take a look at some of the various settings. When you want to look at the workloads for example, we can go into our Workloads tab. So the things that you just saw using the kubectl to find ... that can actually find right through the Azure portal. So here's my StatefulSet for my beer, here's all my pods that are running. I've got several replica sets for different things running on my cluster. There are some daemon sets and some jobs and there are no cron jobs that are on a particular cluster.
So having the ability to kind of see what's in here through these dashboards is really convenient, especially if you have a lot of teams or you have a lot of workloads deployed on a new cluster. So again this is that resource view that we mentioned earlier. Again, we'll be coming down here to Cluster configuration, we can do an upgrade ... so upgrade the control plane for example if I bump this up to 1.9 and save, Azure will basically handle the upgrade for me, and then I actually need to go in here to the node pools and tell my node pool to upgrade as well. So you would upgrade the control plane first to the desired version, and then you would upgrade the node pools. Certainly recommend having multiple clusters so you can have your non-prod and the production environment, so you can test these upgrades in that type of environment.
Excuse me. So again if you want to take a look, we can do ... we can set up our minimums and maximums for our node pool. So again, we can turn this on to auto scale and set this to 10, and basically as long as we tell Kubernetes how much CPU and memory our workload needs as part of a pod template, Kubernetes can schedule things appropriately.
So I'm not going to go through every single tab here, but certainly there are some interesting things. Just going to show you the options that you have available. In my case, I didn't turn on the application routing, I don't have a private cluster, I don't have any authorized ranges. I'm just sort of using this as my demo example. Dev Space is not available in my region, this is something that's probably going to be retired, and they actually comment that out here. So there is a Bridge to Kubernetes service that they're working on, I'm not going on about today, but certainly worth looking into in your free time.
What I do want to go over really quickly, some of the insights and monitoring that are available to you. So out of the box, because there are some workloads deployed to the Kubernetes cluster by Azure that basically allow them to have some visibility to ... you know, how much CPU are you utilizing ... and not only at the node level, but we can actually drill into the container level and see some of this here.
So you can see 2% CPU utilization, and if we drill into specific things that ... There's an auto-scaler service running here, for example, and there are some requests and some limits associated with this workload. But, essentially, if I come in here I can take a look at various things. I can also view container logs which would open up a log stream of just that container. Alternatively, I can come down here at the logs tab and take a look at the Container Logs section and then maybe run a query to pull all the logs in all my different namespaces. So here you can see I've got a log entry here that came directly from stdout of the container and it was picked up by the [inaudible] forwarding device and then forwarded directly here to the actual logging system.
So, again, you don't have to worry about building out that Elastic and search cluster, building out that log collection and log [inaudible]. It's sort of built in automatically. All you have to do is start deploying workloads that are logging appropriately to the stdout.
So the last thing I wanted to share before we kind of wrap everything up is the Azure DevOps kind of integration pieces. So the main kind of understanding is that if you come into your project, your actual DevOps project, you'll have this Project Settings button down here at the bottom. And when you click that Project Settings button, you can then go into Service connections. And so here I've defined some service connections to my cluster in different namespaces, so definitely test namespace, prod, dev, default, and if I click on this, I can get a resource ID essentially up here. So I can grab this ID and say this a connection to my Kubernetes cluster, in this case, the test namespace.
Now the benefit of this is I can now define my Azure-pipelines.YAML file and I can actually put these connections in here. So you'll see my devAksServiceConnection, my testAksServiceConnection, and I can basically define this. So instead of putting my credentials or anything in here, basically ADO will use the identity I've provided in the YAML file when it's making those authentications. So later on, I can do the build step that Docker built, and this is kind of an interesting example again, all part of the favorite-beer, and then I can deploy my application using my Helm code. So again I'm just overwriting the image tag and editing the repository to select the image that I just built and deploy that one to my individual environment.
There's quite a complex example of a pipeline here, certainly there are some areas for improvement, but this is a pretty solid example from a base pipeline standpoint. And then once you have this all kind of configured, you'll have the individual steps, that's probably the pipeline, and then you can see those individual steps get executed. So in this case I did an action where I merged. So it deployed that in the test, built the container first, and then deployed that ultimately to ... my test environment gives me kind of ... has a status message on the back end.
That being said, I'm going to pop back over to our slideshow and close out our session here. So to sum things up, we kind of called out that containerization is key to accelerating your dev and deployment of applications, ensuring your applications are always updated, reducing infrastructure and hardware costs, creating a culture of increased innovation. And AKS is going to provide you with enterprise-grade Kubernetes in Azure and the critical cost monitoring visibility tools, that type of thing.
And generally, you're also going to receive all the benefits of Kubernetes, such as the community available to Kubernetes, deployments or Helm charts that exist that have been developed by the community. So for example, if you want to install ... Prometheus, there's a Helm chart out there for that, and you don't necessarily have to figure out how to install all those things manually yourself. There are other integrations such as Datadog, for example, if they have their own daemon set [inaudible] cluster and automatically integrate with their service, so you kind of get a lot of those integrations with a lot less overhead in terms of making sure that ... in a traditional landscape, you have to make sure that your VMs all have the Datadog agent installed, or that they were all forwarding to an external Datadog agent. In the Kubernetes world, things like that become a little bit simpler.
Now we're going to talk a little bit about working with Redapt. Some of the content that I showed today is part of our DevOps workshop. So we have a full eight-hour workshop where I walk through containers all the way through Helm deployments and the point of automation. But, really, the main benefits are going to be our expertise in containerization and DevOps in general The fact that we've been using Kubernetes for a really long time before Azure AKS was even a thing, and we've been an Azure partner for quite a while so we have a team of engineers familiar with all of the ins and outs of kind of establishing an Azure cloud environment.
As far as what we can do to help, we can certainly do the workshop and we can also help implement, so we can actually help your team build containers or right-size environments. If you're noticing that your costs are getting a little bit out of control, we can kind of help with some of that stuff as well. And then if you have a full migration, for example, your team wants to leverage Kubernetes, but you need a third party to kind of come in and help get you set up, get AKS running, get all the workloads migrated over to AKS, we can handle that as well.
That being said, I hope the webinar brought some value to you as a watcher, and whether you're just getting started down the road to cloud adoption or have already got some resources in Azure, we can certainly help utilize AKS to accelerate development and deployment cycles. So please reach out to one of our experts at your leisure.
Thanks so much, Jerry. This has been great. We did have one question come in, and the question is, can I have different VM sizes in a single cluster?
So the answer to that question is absolutely yes. So when we're kind of talking about our node pool example, if I bring up the Cluster creation page, or if I go back into my node pools ... so this is my existing cluster, I've created it, I can come back in here and add a node pool. And from the node pools screen, I can choose a different size. So if I created a secondary node pool to add a different CPU to memory ratio, for example, or maybe it has a GPU, for example, I could create that as a separate node pool and then manage that through the node pools section here.
Awesome. Wonderful. Well thanks so much, Jerry for walking us through this today, and thanks to everyone for joining us and for watching. Please reach out with any questions or concerns. Just go to redapt.com and contact us and we will happily field your questions and concerns. Thanks so much, Jerry.
Note: This post was originally published in 2021 and has recently been updated.
Categories
- Cloud Migration and Adoption
- Enterprise IT and Infrastructure
- Artificial Intelligence and Machine Learning
- Data Management and Analytics
- DevOps and Automation
- Cybersecurity and Compliance
- Application Modernization and Optimization
- Featured
- Managed Services & Cloud Cost Optimization
- News
- Workplace Modernization
- Tech We Like
- AWS
- Social Good News
- Cost Optimization
- Hybrid Cloud Strategy
- NVIDIA
- Application Development
- GPU