Input/Output

15 years ago, just one year into my 1st job, I submitted my resignation to work on a new startup. One of my seniors was very surprised and asked me why! I was super excited to work on VOIP/SIP and explained that to him (little did I know I’d spend next 7 years of my career debugging UDP pakets in etherial, yes wireshark was called etherial back then). My senior smiled and replied, “you are still young, after a while you will realize everything is ultimately just input/output

15 years down in my career, while I dont fully agree with that, I do appreciate where he was coming from. The other day I had 4 session in 2 day with 4 different teams who has very different domain from each other:

  • 1st one was how to collect and process and analyze millions of set-top-box sending usage data
  • 2nd session was about the next step design of a “metadata-hub” for efficiently storing and distributing video metadata for all the video platforms we have.
  • 3rd was about collecting video click data and feeding it into the recommendation system and using the recommendations on the web.
  • 4th was about a jenkins pipeline and building a jenkins helm chart itself(egg vs chicken)

The whole time (actually starting from 2nd meeting), I couldn’t stop thinking about that quote “everything is eventually input/output”.

Lol I am not sure whether to take this mental state of mine positively or negatively. Have I reached some new architect high where I am seeing the superior abstraction in everything or I got too old and boring and every problem looks same to me 🙂

Trying to look a bit deeper, I realized, with all the OSS frameworks/libraries and aws building blocks(or any public cloud offering) our problem space shifted a lot. All we do these days is glue different services / tools together to deliver business value. And who does the glueing? Your Pipelines !!! Its all about the Pipelines!!! Just looking at the small team we have, we got so many pipeline execution framework running in production at this moment: Conductor, AirFlow, AWS Steps, Jenkins-X, Argo (kubeflow pipelines), Activiti (I know too many!!!, but its about right tool for right job 🙂 )

So yes, if I had to rephrase my senior 15 years later, I’d say, Its all Input/Output And the pipelines in between.

or may be I am missing my SIP/RFC3261 days too much!!!

Watchman

Again, me trying to learn some python (because I want to talk to my Anki Vektor) and make something useful so I can use day to day. I must say, this code has been very instrumental in me keeping an eye on every k8s events and find some gotchas from them when we went live in large scale EKS production platform.

https://github.com/sajid-moinuddin/watchman

PS: as a java developer, once you go past the odd syntax of passing around self , its probably the easiest language to pickup

Introducing KK (kubectl ++)

Background:

First, With the rush of releasing Binge, endless zoom meetings and editing all those k8s yaml files, I have been itching to write a for loop (ie..do some coding) .

Second, I am tired of parsing/grepping/awking those kubectl output but I still had to go to my newrelic UI to see which pod is running on my EKS spot vs ondemand nodegroup and I am a commandline kinda guy and its too difficult to do in commandline.

So I decided to write a small tool just for myself, but didn’t feel like doing it in groovy which is my default scripting language (hey I come from java!). A few month ago, I picked up a GO book but realized it will take me more than a weekend to be productive in it and frankly I didn’t like the syntax. I like my language to be more abstract (again, I come from java!), so I picked up a python cheatsheet and started hacking.

I must admit its a great stress reliever when you are trying to deploy ~1000 pods running ~100 microservices in a fresh new EKS cluster for the first time in your organization and the whole internet is there to curse you if the platform is not delivering 🙂 (thats a different story for another day)

Result:

I hacked together a few lines of python code (again, my python skills are those coming from a cheatsheet and random googling over a weekend so no judging). But the output kk (kubectl++) came out to be quite handy and I find myself using it very often so I though I’d put it in my github.

https://github.com/sajid-moinuddin/kk

The idea behind is simple, you get all node info, all pod info and programatically join them and create a virtual PodNode Resource (just like any k8s resource). Now you can do things like:

#get me pod,namespace,nodename,spot/ondemand,pod-resources,restartcount from namespace x/y/z and by the way, exclude the daemonsets and format nicely as I say 

kk get podnode  
-f 'metadata.namespace=streamtech/content/commerce' 
-e 'metadata.owner_references[0].kind=DaemonSet' 
-o 'pod_name:62,pod.status.phase,node.metadata.labels.lifecycle,namespace,node.metadata.name:15,pod.spec.containers[0].resources.requests:35,pod.spec.containers[0].resources.limits:35,pod.status.container_statuses[0].restart_count'

offers-api            Running             spot                commerce            ip-10-100-58-11    {'cpu': '2', 'memory': '4Gi'}          {'cpu': '6', 'memory': '4Gi'}          0                   
offers-api            Running             spot                commerce            ip-10-100-62-11    {'cpu': '2', 'memory': '4Gi'}          {'cpu': '6', 'memory': '4Gi'}          0                   
offers-api            Running             spot                commerce            ip-10-100-63-3.    {'cpu': '2', 'memory': '4Gi'}          {'cpu': '6', 'memory': '4Gi'}          0                   
offers-api            Running             ondemand            commerce            ip-10-100-60-74    {'cpu': '2', 'memory': '4Gi'}          {'cpu': '6', 'memory': '4Gi'}          0                   




(venv) ➜  kk git:(master) kk -hk(kubectl)++ ... only better
Usage:
kk get (podnode|pn) [options]

Options:
    -h --help    show this
    -l STRING    label selector
    -f STRING    field selector
    -e STRING    exclude
    -o output    pydash format plus padding info, ie. node.metadata.name:62
    --offline    do not fetch new data, work on the last fetched data
    --json       print the -o elements in json format
    -w           watch mode

Example:
#kk get podnode -f 'metadata.namespace=streamtech/content/commerce' -e 'metadata.owner_references[0].kind=DaemonSet' -o 'pod_name:62,pod.status.phase,node.metadata.labels.lifecycle,namespace,node.metadata.name:15,pod.spec.containers[0].resources.requests:35,pod.spec.containers[0].resources.limits:35,pod.status.container_statuses[0].restart_count' | sort -k3

#kk get podnode -f 'metadata.namespace=streamtech/content/commerce' -e 'metadata.owner_references[0].kind=DaemonSet' -o 'pod_name:62,pod.status.phase,node.metadata.labels.lifecycle,namespace,node.metadata.name:15,pod.spec.containers[0].resources.requests:30,pod.spec.containers[0].resources.limits:30'

#kk get podnode -f 'metadata.namespace=streamtech/content/commerce'  -e 'metadata.owner_references[0].kind=DaemonSet'   -o 'pod_name:62,node.metadata.labels.lifecycle,namespace,node.metadata.name:62' -w

#kk get podnode -f 'metadata.namespace=streamtech/content/commerce'  -e 'metadata.owner_references[0].kind=DaemonSet'   -o 'pod_name,node.metadata.labels.lifecycle,namespace,node.metadata' --json

I have used the python-builder library which looked like maven. I am yet to spend any time to understand how to publish this as a pip module (I havent spend much time understanding python modules and packages yet…)

NOTE:
>> this can be a very expensive operation on your kubectl api so use this with care in prod environment.
>> It also has a –offline mode (this works on the last fetched data) which can be used to first get a dump of the data and look at them at different angel
>> You can also use a –json format that will dump the raw jason

Conductor + IOC : Crossing Boundary in Hybrid Cloud

IOC is an age old concept we are so familiar with via spring framework. This enables us to invert the flow of control and avoid tight coupling between the modules of the system.

Conductor is a Microservice Orchestration Framework from Netflix. It helps us to avoid Choreography in microservice. If you are not familiar with Orchestration vs Choreography concepts, I highly encourage you to pickup a copy of Building Microservice , or have a quick read:

https://stackoverflow.com/questions/4127241/orchestration-vs-choreography 

Essentially Conductor is a IOC framework for your microservices. You take the flow of control out of your microservices and push it in the “Conductor WorkFlow“.

NOTE: there are other frameworks implementing similar concepts : https://github.com/ing-bank/baker
https://camunda.com/products/bpmn-engine/ 

Now Imagine you have Multiple Microservices Deployed in Cloud and OnPrem Datacentre and a flow as below:

Screen Shot 2020-02-14 at 7.45.21 am

Only problem is, you don’t have any direct internet connection from the CLOUD to your onprem hosted Service B . Why you have that service B onprem you might ask? May be its talking to an onprem legacy database, or a million $ video transcoder that you can’t export to the cloud. (Onprem datacentres are here to stay, stop dreaming!!!)

This is why IOC comes in play, The famous Hollywood principle: “Don’t Call Me, I’ll Call You” at its finest! Instead of the FLOW calling  Service B, Service B can call the FLOW for its Designated Task. Having something like Conductor which enables you to invert this dependency is priceless (otherwise you’d be filling forms and chasing people for months trying to open up corporate firewall to the internet) :

image

NOTE: The task queue is logical component here, there is no kafka/activemq , Conductor uses a simple Redis / Mysql Database to maintain the Queue and provides nice REST api to GET/PUT tasks. You can compare this to event driven architecture/pub sub model with IOC in place (think Polling !). You achieve similar benefit but without the complexity of having a message BUS.

 

Jenkinx-X + ArgoCD : Match made in heaven

My story with Jenkins-X is a Love/Hate one. First time I saw a demo of preview environment on kubernetes with jenkins-x last year, I was hooked :

https://youtu.be/BF3MhFjvBTU

I have never seen anything like it and I still dont know if there is anything in the market that is close to it even now. At the same time I was reading the ​book ​Accelerate and the book world found its match in real world. It was easy to quickly get it up and running and do a few demo and have some stakeholder buy-in into GitOps. We even went live in production with jenkins-x with some of the batch jobs in datalake (lower-risk). However when the honeymoon period is over and we wanted to roll out across 10+ teams, the reality hits and we found productionizing jenkins-x on a non GCP platform is not only HARD but almost impossible. I have had the good fortune of talking to some of the jenkins-x core product people and until recently they have been focused purely on GCP.

The CI part of jenkins-x is too powerful to pass on so we marched on! Many weekend and debugging GO sourcecode and countless hours in jenkins-x slack channel, we got jenkins-x working on EKS cluster ( this has changed since last year when all this happened and the installation option is so much better with jenkins-x boot). We were probably 1 yr too early into the product. But boy am I happy for that investment! Once you get one up and running, there is nothing like it. However, the jenkins-x installation I had was very much a PET. I got it installed and I had a lot of NOTES for it but its not GITOPS itself. So I didn’t want to have it installed 20x time in all the EKS clusters we got per business units plus dev/staging etc. We’d need a few people to maintain those 20 jenkins-x installation. Anything that doesn’t support native gitops style deployment is a massive maintainance overhead.

So we installed jenkins-x in 1 cluster only (We call it GitOps cluster) and do all our CI pipeline there and it produces Helm + docker images following gitops best practices etc. Jenkins-x has a helm chart deployer called “Environment Controller“, but at the time we tested it about 5 month ago, it wasn’t working with bitbucket in EKS. So the search continued. There are quite a few tools in the market to do gitops (weave flux is a good one) , but we stumbled on ARGO by chance (I was playing with KUBEFLOW and kustomize+argo is their main CD tool). As nice surprise, Argo could install the helm environment repo jenkins-x creates out of the box with 0 modification (you need to install argo 1.3.x with helm hook support)

If you want to see Argo in action, checkout their own CD projects:
https://cd.apps.argoproj.io/ (use your github to login)

https://github.com/argoproj/argoproj-deployments/tree/master/argocd

A really really awesome feature is auto-generated helm graph , its really awesome if you have devs who are not that familiar with kubectl but they can still get started by visualizing the deployed helm charts. Here is how our Nginx stack looks like : (You can read more about this here: ( https://medium.com/@sajid2045/aws-eks-ingress-option-alb-nginx-fc64a4a5ea9f )

nginx_monitoring

And here is how the Git Repo vs Cluster State looks like:

argo_app_diff

Finally! This is how it looks like E2E (https://sajidmoinuddin.files.wordpress.com/2019/12/gitopse2e-1.png):

GitOpsE2E

The Phoenix Project

Its so hilarious and painfully true it took me a while to realize I am reading a technical book!

“Improving daily work is even more important than doing daily work.”
― Gene Kim

One hundred years from now, historians will look back at this decade and conclude that something transformative happened: how we worked inside Development and IT Operations changed completely.… I predict that historians will call this decade the “Cambrian explosion for IT,” a time of incredible innovation and disruption when, 50 years after the birth of computing, we finally learned what technology was good for.
—John Willis, Cohost of “DevOps Cafe”