Introducing KK (kubectl ++)

Background:

First, With the rush of releasing Binge, endless zoom meetings and editing all those k8s yaml files, I have been itching to write a for loop (ie..do some coding) .

Second, I am tired of parsing/grepping/awking those kubectl output but I still had to go to my newrelic UI to see which pod is running on my EKS spot vs ondemand nodegroup and I am a commandline kinda guy and its too difficult to do in commandline.

So I decided to write a small tool just for myself, but didn’t feel like doing it in groovy which is my default scripting language (hey I come from java!). A few month ago, I picked up a GO book but realized it will take me more than a weekend to be productive in it and frankly I didn’t like the syntax. I like my language to be more abstract (again, I come from java!), so I picked up a python cheatsheet and started hacking.

I must admit its a great stress reliever when you are trying to deploy ~1000 pods running ~100 microservices in a fresh new EKS cluster for the first time in your organization and the whole internet is there to curse you if the platform is not delivering 🙂 (thats a different story for another day)

Result:

I hacked together a few lines of python code (again, my python skills are those coming from a cheatsheet and random googling over a weekend so no judging). But the output kk (kubectl++) came out to be quite handy and I find myself using it very often so I though I’d put it in my github.

https://github.com/sajid-moinuddin/kk

The idea behind is simple, you get all node info, all pod info and programatically join them and create a virtual PodNode Resource (just like any k8s resource). Now you can do things like:

#get me pod,namespace,nodename,spot/ondemand,pod-resources,restartcount from namespace x/y/z and by the way, exclude the daemonsets and format nicely as I say 

kk get podnode  
-f 'metadata.namespace=streamtech/content/commerce' 
-e 'metadata.owner_references[0].kind=DaemonSet' 
-o 'pod_name:62,pod.status.phase,node.metadata.labels.lifecycle,namespace,node.metadata.name:15,pod.spec.containers[0].resources.requests:35,pod.spec.containers[0].resources.limits:35,pod.status.container_statuses[0].restart_count'

offers-api            Running             spot                commerce            ip-10-100-58-11    {'cpu': '2', 'memory': '4Gi'}          {'cpu': '6', 'memory': '4Gi'}          0                   
offers-api            Running             spot                commerce            ip-10-100-62-11    {'cpu': '2', 'memory': '4Gi'}          {'cpu': '6', 'memory': '4Gi'}          0                   
offers-api            Running             spot                commerce            ip-10-100-63-3.    {'cpu': '2', 'memory': '4Gi'}          {'cpu': '6', 'memory': '4Gi'}          0                   
offers-api            Running             ondemand            commerce            ip-10-100-60-74    {'cpu': '2', 'memory': '4Gi'}          {'cpu': '6', 'memory': '4Gi'}          0                   




(venv) ➜  kk git:(master) kk -hk(kubectl)++ ... only better
Usage:
kk get (podnode|pn) [options]

Options:
    -h --help    show this
    -l STRING    label selector
    -f STRING    field selector
    -e STRING    exclude
    -o output    pydash format plus padding info, ie. node.metadata.name:62
    --offline    do not fetch new data, work on the last fetched data
    --json       print the -o elements in json format
    -w           watch mode

Example:
#kk get podnode -f 'metadata.namespace=streamtech/content/commerce' -e 'metadata.owner_references[0].kind=DaemonSet' -o 'pod_name:62,pod.status.phase,node.metadata.labels.lifecycle,namespace,node.metadata.name:15,pod.spec.containers[0].resources.requests:35,pod.spec.containers[0].resources.limits:35,pod.status.container_statuses[0].restart_count' | sort -k3

#kk get podnode -f 'metadata.namespace=streamtech/content/commerce' -e 'metadata.owner_references[0].kind=DaemonSet' -o 'pod_name:62,pod.status.phase,node.metadata.labels.lifecycle,namespace,node.metadata.name:15,pod.spec.containers[0].resources.requests:30,pod.spec.containers[0].resources.limits:30'

#kk get podnode -f 'metadata.namespace=streamtech/content/commerce'  -e 'metadata.owner_references[0].kind=DaemonSet'   -o 'pod_name:62,node.metadata.labels.lifecycle,namespace,node.metadata.name:62' -w

#kk get podnode -f 'metadata.namespace=streamtech/content/commerce'  -e 'metadata.owner_references[0].kind=DaemonSet'   -o 'pod_name,node.metadata.labels.lifecycle,namespace,node.metadata' --json

I have used the python-builder library which looked like maven. I am yet to spend any time to understand how to publish this as a pip module (I havent spend much time understanding python modules and packages yet…)

NOTE:
>> this can be a very expensive operation on your kubectl api so use this with care in prod environment.
>> It also has a –offline mode (this works on the last fetched data) which can be used to first get a dump of the data and look at them at different angel
>> You can also use a –json format that will dump the raw jason

Conductor + IOC : Crossing Boundary in Hybrid Cloud

IOC is an age old concept we are so familiar with via spring framework. This enables us to invert the flow of control and avoid tight coupling between the modules of the system.

Conductor is a Microservice Orchestration Framework from Netflix. It helps us to avoid Choreography in microservice. If you are not familiar with Orchestration vs Choreography concepts, I highly encourage you to pickup a copy of Building Microservice , or have a quick read:

https://stackoverflow.com/questions/4127241/orchestration-vs-choreography 

Essentially Conductor is a IOC framework for your microservices. You take the flow of control out of your microservices and push it in the “Conductor WorkFlow“.

NOTE: there are other frameworks implementing similar concepts : https://github.com/ing-bank/baker
https://camunda.com/products/bpmn-engine/ 

Now Imagine you have Multiple Microservices Deployed in Cloud and OnPrem Datacentre and a flow as below:

Screen Shot 2020-02-14 at 7.45.21 am

Only problem is, you don’t have any direct internet connection from the CLOUD to your onprem hosted Service B . Why you have that service B onprem you might ask? May be its talking to an onprem legacy database, or a million $ video transcoder that you can’t export to the cloud. (Onprem datacentres are here to stay, stop dreaming!!!)

This is why IOC comes in play, The famous Hollywood principle: “Don’t Call Me, I’ll Call You” at its finest! Instead of the FLOW calling  Service B, Service B can call the FLOW for its Designated Task. Having something like Conductor which enables you to invert this dependency is priceless (otherwise you’d be filling forms and chasing people for months trying to open up corporate firewall to the internet) :

image

NOTE: The task queue is logical component here, there is no kafka/activemq , Conductor uses a simple Redis / Mysql Database to maintain the Queue and provides nice REST api to GET/PUT tasks. You can compare this to event driven architecture/pub sub model with IOC in place (think Polling !). You achieve similar benefit but without the complexity of having a message BUS.

 

Jenkinx-X + ArgoCD : Match made in heaven

My story with Jenkins-X is a Love/Hate one. First time I saw a demo of preview environment on kubernetes with jenkins-x last year, I was hooked :

https://youtu.be/BF3MhFjvBTU

I have never seen anything like it and I still dont know if there is anything in the market that is close to it even now. At the same time I was reading the ​book ​Accelerate and the book world found its match in real world. It was easy to quickly get it up and running and do a few demo and have some stakeholder buy-in into GitOps. We even went live in production with jenkins-x with some of the batch jobs in datalake (lower-risk). However when the honeymoon period is over and we wanted to roll out across 10+ teams, the reality hits and we found productionizing jenkins-x on a non GCP platform is not only HARD but almost impossible. I have had the good fortune of talking to some of the jenkins-x core product people and until recently they have been focused purely on GCP.

The CI part of jenkins-x is too powerful to pass on so we marched on! Many weekend and debugging GO sourcecode and countless hours in jenkins-x slack channel, we got jenkins-x working on EKS cluster ( this has changed since last year when all this happened and the installation option is so much better with jenkins-x boot). We were probably 1 yr too early into the product. But boy am I happy for that investment! Once you get one up and running, there is nothing like it. However, the jenkins-x installation I had was very much a PET. I got it installed and I had a lot of NOTES for it but its not GITOPS itself. So I didn’t want to have it installed 20x time in all the EKS clusters we got per business units plus dev/staging etc. We’d need a few people to maintain those 20 jenkins-x installation. Anything that doesn’t support native gitops style deployment is a massive maintainance overhead.

So we installed jenkins-x in 1 cluster only (We call it GitOps cluster) and do all our CI pipeline there and it produces Helm + docker images following gitops best practices etc. Jenkins-x has a helm chart deployer called “Environment Controller“, but at the time we tested it about 5 month ago, it wasn’t working with bitbucket in EKS. So the search continued. There are quite a few tools in the market to do gitops (weave flux is a good one) , but we stumbled on ARGO by chance (I was playing with KUBEFLOW and kustomize+argo is their main CD tool). As nice surprise, Argo could install the helm environment repo jenkins-x creates out of the box with 0 modification (you need to install argo 1.3.x with helm hook support)

If you want to see Argo in action, checkout their own CD projects:
https://cd.apps.argoproj.io/ (use your github to login)

https://github.com/argoproj/argoproj-deployments/tree/master/argocd

A really really awesome feature is auto-generated helm graph , its really awesome if you have devs who are not that familiar with kubectl but they can still get started by visualizing the deployed helm charts. Here is how our Nginx stack looks like : (You can read more about this here: ( https://medium.com/@sajid2045/aws-eks-ingress-option-alb-nginx-fc64a4a5ea9f )

nginx_monitoring

And here is how the Git Repo vs Cluster State looks like:

argo_app_diff

Finally! This is how it looks like E2E (https://sajidmoinuddin.files.wordpress.com/2019/12/gitopse2e-1.png):

GitOpsE2E

The Phoenix Project

Its so hilarious and painfully true it took me a while to realize I am reading a technical book!

“Improving daily work is even more important than doing daily work.”
― Gene Kim

One hundred years from now, historians will look back at this decade and conclude that something transformative happened: how we worked inside Development and IT Operations changed completely.… I predict that historians will call this decade the “Cambrian explosion for IT,” a time of incredible innovation and disruption when, 50 years after the birth of computing, we finally learned what technology was good for.
—John Willis, Cohost of “DevOps Cafe”

Tips & Tricks for Certified Kubernetes Administrator (CKA) Exam

Ever since I first read my kubernetes article about 2 years ago, I fell in love with it. It was the same when I first read about git or docker or spring dependency injection, you immediately recognise this is the future and it will fundamentally change about how we think about logical+deployment architecture. For me, its was the missing piece of puzzle in microservices land. So far I had a fair understanding and experience of migrating big monolith into microservices and packaging with docker and 12 factor etc but didn’t have a good answer of running them as a fleet. Amazon ECS was something I was toying with about 3 yrs ago but it was far from the abstractions I was after.

Paying homage to my obsession with k8s, recently I got my CKA cert on the 2nd attempt. I thought I was ready on my first try but got only 72 (you need 74……I know right!).

While the memory is fresh, I thought I’d write a few lines on what I learned from failing the 1st attempt and did differently on 2nd attempt.

Did I know more kubernetes when I attempted for the 2nd time? Absolutely not! But I changed my strategy.

I would not repeat whats already out there, there are a lot of material online on this, here is a few link:

Time for Preparation:

I saw people spending a 2-3 month for prep but I didn’t have too much time to prepare, been super busy with my day-job (we just released KAYO (yay) in 5 months and pulling off very long hours and working on weekends too) so I only had the partial weekends to study, on average I think I spent about 4 hours every weekend over 8 weekends + PLUS spent about 3-4 hours * 5 days on Christmas break before the exam. Having said that, I’m no stranger to k8s, I have been running it in prod for more than 1 year. But that is of little use for the exam.

TIP#1:  The exam is all about imperative commands.

If you are used to declarative style (I am a big HELM fan and write helm chart for everything), you need to get that out of head and learn your 1 liners. Practice everything on K8s Cheatsheet , make sure you can create pod, deployment, service etc using the kubectl. I tried doing yaml files and use that to create resources on my 1st attempt, didn’t work. If you take one command out of this post, take this:

kubectl run

  • kubectl run -it --rm test --restart=Never --image=busybox --command -- 'echo' 'echo no need' 'echo bye'
  • kubectl run testPod --restart=Never --image=nginx (–restart=Never creates PODS)
  • kubectl run myDeployment --image=nginx
  • kubectl run testDeployment --image=nginx --dry-run -oyaml > myDeployment.yaml 
  • kubectl apply -f myDeployment.yaml
  • kubectl delete -f myDeployment.yaml

 

TIP#2: Get used to your environment

This is probably the most important tip. The exam uses GateOne web based terminal. It works very weirdly in my mac. Do all your practice in it. You can run it using the following:

docker run -d –name=gateone-2 -p 443:8000 arush/gateone

open https://localhost:443/

View at Medium.com

You should be familiar how copy paste etc work in the environment you will be doing the exam. (I was using a touchpad mouse on my imac but on 2nd trial physical mouse with click buttons worked much better). Otherwise you will be distracted with it and loose time.

You should know your editor (VI / Nano) and MOST IMPORTANTLY: tmux , you will need to look into multiple files at the same time and you only get one terminal. I can’t imagine being able to answer some of the hard questions without tmuxTMUX is cool ! I was a screen user before, but now I am a tmux convert.

ALSO MUST: Know your systemd , systemctl , etcd , journalctl , ssh , ssl (all covered in kubernetes the hard way)

TIP#3: Know your copy/paste (Dont’) , just use the Notepad

You have to copy paste across tmux window , VI etc. On my 1st attempt, I tried to do it all in terminal (My confidence with vi/tmux bite me). But you should be able to do it all with the exam NOTEPAD. Type everything there and paste in the terminal. That way you can also reuse the snippets / commands. Trust me, when you have got 30 mins left to answer 40 points, your supersmart vi/tmux tricks flies out of window. Just stick to the basics.

TIP#4: TIME is of the essence

The exam is not about your knowledge, its all about skills and efficiency. I am sure of all the people who fail CKA, 50% more  would pass if they had 4 hour instead of 3 hour for the exam.

  • You get 180 mins to finish 25 question.
  • Out of those 25 question, 30% of the points are in 5 question and you need at-least 1 hour to answer them.
  • That means you need to finish the rest of 20 questions in 120 mins.
  • You will do some f***up in some questions, allocate 15 mins (if you are lucky) for troubleshooting. That leaves you (120-15) = 105 mins to answer 20 question.
  • So roughly 5 mins per question.
    • Imagine that! Create Resource, edit resource, save result, do basic test for that resource. All in 5 mins. Go get yourself a stopwatch and try to run a deployment in 5 mins.

TIP#5: If you get stuck, move on.

On my first attempt, I tried to make sense of a early question for 5-10 mins on a 2 point question and had to leave a few questions in the last unanswered. Remember you goal is to answer about 85% of the questions right, not 100%. On my 2nd attempt, I answered 92% in 135 mins. I could probably finish the last 8% in 30 mins (it was extremely tough) but I decided to play it safe and use the time to revisit some of the answers I had doubt about. I think I would have lost 6-10 points if I didn’t revise the answers / re-read the questions. I was quite confident I’d get at least 90%, not sure where I lost the 4 points, I still think about it 🙂

TIP#6: Strategise your learning activity

Again, the goal here is to answer 85% properly and get about 80 points. I spent most of my time going over each and every line of K8s Docs . I wish I had spent some of that time for practice.

I was over confident and didn’t attempt any practice exam. Didn’t time my activities, that was probably the biggest mistake I made on 1st attempt.  Remember about 60% of the answers you can do quite easily if you did basic prep. You should prep for them really hard and practice practice practice, you should knock them off like within 60-70 mins.

DO NOT fall into the trap of having access to k8s docs and you can copy/paste. For simple things, you should not even think, just type the cammands from memory.

Rest 20-25% are medium difficulty but if you gone over the k8s docs and know where to look for reference + PLUS you have done kubernetes the hard way, you should be able to answer them too. I think they reserve 10% for the top 1 percentile, you will need a lot of practice and hands on k8s setup etc to get them and tobehonest, in this era of EKS / KOPS etc, I think my time is better used elsewhere.

End Notes

  • Was the prep painful and time consuming but LOTS of fun ? YES!
  • Is there any easy / shortcut to pass ? NO!
  • Do I think I wasted a bit of time for the cert that I could invest learning other stuff given I already have prod k8s running? Yes.
  • Do I wish the exam was more of a test of knowledge (like you need to write a custom schedular, writing CRDs etc) rather than the test of speed ? YES !
  • Is most of the things I learnt in the process of the exam useful for my day to day k8s activity (writing helm charts, investigating / customising other peoples’ helm charts, debugging logs) ? NO!
  • Would I find it much more useful when I started k8s prod deploy about 1 year ago? YES!
  • Would I DO it again ? YES !!! ….. Just for the CKA Walk 🙂

0_98CnIagT_hTGYKHW.gif

CKA IS WALKING