The Phoenix Project

Its so hilarious and painfully true it took me a while to realize I am reading a technical book!

“Improving daily work is even more important than doing daily work.”
― Gene Kim

One hundred years from now, historians will look back at this decade and conclude that something transformative happened: how we worked inside Development and IT Operations changed completely.… I predict that historians will call this decade the “Cambrian explosion for IT,” a time of incredible innovation and disruption when, 50 years after the birth of computing, we finally learned what technology was good for.
—John Willis, Cohost of “DevOps Cafe”

Tips & Tricks for Certified Kubernetes Administrator (CKA) Exam

Ever since I first read my kubernetes article about 2 years ago, I fell in love with it. It was the same when I first read about git or docker or spring dependency injection, you immediately recognise this is the future and it will fundamentally change about how we think about logical+deployment architecture. For me, its was the missing piece of puzzle in microservices land. So far I had a fair understanding and experience of migrating big monolith into microservices and packaging with docker and 12 factor etc but didn’t have a good answer of running them as a fleet. Amazon ECS was something I was toying with about 3 yrs ago but it was far from the abstractions I was after.

Paying homage to my obsession with k8s, recently I got my CKA cert on the 2nd attempt. I thought I was ready on my first try but got only 72 (you need 74……I know right!).

While the memory is fresh, I thought I’d write a few lines on what I learned from failing the 1st attempt and did differently on 2nd attempt.

Did I know more kubernetes when I attempted for the 2nd time? Absolutely not! But I changed my strategy.

I would not repeat whats already out there, there are a lot of material online on this, here is a few link:

Time for Preparation:

I saw people spending a 2-3 month for prep but I didn’t have too much time to prepare, been super busy with my day-job (we just released KAYO (yay) in 5 months and pulling off very long hours and working on weekends too) so I only had the partial weekends to study, on average I think I spent about 4 hours every weekend over 8 weekends + PLUS spent about 3-4 hours * 5 days on Christmas break before the exam. Having said that, I’m no stranger to k8s, I have been running it in prod for more than 1 year. But that is of little use for the exam.

TIP#1:  The exam is all about imperative commands.

If you are used to declarative style (I am a big HELM fan and write helm chart for everything), you need to get that out of head and learn your 1 liners. Practice everything on K8s Cheatsheet , make sure you can create pod, deployment, service etc using the kubectl. I tried doing yaml files and use that to create resources on my 1st attempt, didn’t work. If you take one command out of this post, take this:

kubectl run

  • kubectl run -it --rm test --restart=Never --image=busybox --command -- 'echo' 'echo no need' 'echo bye'
  • kubectl run testPod --restart=Never --image=nginx (–restart=Never creates PODS)
  • kubectl run myDeployment --image=nginx
  • kubectl run testDeployment --image=nginx --dry-run -oyaml > myDeployment.yaml 
  • kubectl apply -f myDeployment.yaml
  • kubectl delete -f myDeployment.yaml

 

TIP#2: Get used to your environment

This is probably the most important tip. The exam uses GateOne web based terminal. It works very weirdly in my mac. Do all your practice in it. You can run it using the following:

docker run -d –name=gateone-2 -p 443:8000 arush/gateone

open https://localhost:443/

View at Medium.com

You should be familiar how copy paste etc work in the environment you will be doing the exam. (I was using a touchpad mouse on my imac but on 2nd trial physical mouse with click buttons worked much better). Otherwise you will be distracted with it and loose time.

You should know your editor (VI / Nano) and MOST IMPORTANTLY: tmux , you will need to look into multiple files at the same time and you only get one terminal. I can’t imagine being able to answer some of the hard questions without tmuxTMUX is cool ! I was a screen user before, but now I am a tmux convert.

ALSO MUST: Know your systemd , systemctl , etcd , journalctl , ssh , ssl (all covered in kubernetes the hard way)

TIP#3: Know your copy/paste (Dont’) , just use the Notepad

You have to copy paste across tmux window , VI etc. On my 1st attempt, I tried to do it all in terminal (My confidence with vi/tmux bite me). But you should be able to do it all with the exam NOTEPAD. Type everything there and paste in the terminal. That way you can also reuse the snippets / commands. Trust me, when you have got 30 mins left to answer 40 points, your supersmart vi/tmux tricks flies out of window. Just stick to the basics.

TIP#4: TIME is of the essence

The exam is not about your knowledge, its all about skills and efficiency. I am sure of all the people who fail CKA, 50% more  would pass if they had 4 hour instead of 3 hour for the exam.

  • You get 180 mins to finish 25 question.
  • Out of those 25 question, 30% of the points are in 5 question and you need at-least 1 hour to answer them.
  • That means you need to finish the rest of 20 questions in 120 mins.
  • You will do some f***up in some questions, allocate 15 mins (if you are lucky) for troubleshooting. That leaves you (120-15) = 105 mins to answer 20 question.
  • So roughly 5 mins per question.
    • Imagine that! Create Resource, edit resource, save result, do basic test for that resource. All in 5 mins. Go get yourself a stopwatch and try to run a deployment in 5 mins.

TIP#5: If you get stuck, move on.

On my first attempt, I tried to make sense of a early question for 5-10 mins on a 2 point question and had to leave a few questions in the last unanswered. Remember you goal is to answer about 85% of the questions right, not 100%. On my 2nd attempt, I answered 92% in 135 mins. I could probably finish the last 8% in 30 mins (it was extremely tough) but I decided to play it safe and use the time to revisit some of the answers I had doubt about. I think I would have lost 6-10 points if I didn’t revise the answers / re-read the questions. I was quite confident I’d get at least 90%, not sure where I lost the 4 points, I still think about it 🙂

TIP#6: Strategise your learning activity

Again, the goal here is to answer 85% properly and get about 80 points. I spent most of my time going over each and every line of K8s Docs . I wish I had spent some of that time for practice.

I was over confident and didn’t attempt any practice exam. Didn’t time my activities, that was probably the biggest mistake I made on 1st attempt.  Remember about 60% of the answers you can do quite easily if you did basic prep. You should prep for them really hard and practice practice practice, you should knock them off like within 60-70 mins.

DO NOT fall into the trap of having access to k8s docs and you can copy/paste. For simple things, you should not even think, just type the cammands from memory.

Rest 20-25% are medium difficulty but if you gone over the k8s docs and know where to look for reference + PLUS you have done kubernetes the hard way, you should be able to answer them too. I think they reserve 10% for the top 1 percentile, you will need a lot of practice and hands on k8s setup etc to get them and tobehonest, in this era of EKS / KOPS etc, I think my time is better used elsewhere.

End Notes

  • Was the prep painful and time consuming but LOTS of fun ? YES!
  • Is there any easy / shortcut to pass ? NO!
  • Do I think I wasted a bit of time for the cert that I could invest learning other stuff given I already have prod k8s running? Yes.
  • Do I wish the exam was more of a test of knowledge (like you need to write a custom schedular, writing CRDs etc) rather than the test of speed ? YES !
  • Is most of the things I learnt in the process of the exam useful for my day to day k8s activity (writing helm charts, investigating / customising other peoples’ helm charts, debugging logs) ? NO!
  • Would I find it much more useful when I started k8s prod deploy about 1 year ago? YES!
  • Would I DO it again ? YES !!! ….. Just for the CKA Walk 🙂

0_98CnIagT_hTGYKHW.gif

CKA IS WALKING

 

From Monolith to Microservice: Lessons learned from building Foxsports automation platform on Mule

I recently left Foxsports after long five eventful years. Over this period, I was very fortunate to be a part of the core backend team that built the foxsports core integration platform / microservice platform from scratch that automates the workflow from controlling the video router switch to delivering realtime sports data to amazon cloud handling 70 million api calls and servicing all live & statistical sports data to news.com.au mastheads.

The heart of the framework was ScheduAll : A resource and personnel scheduling desktop app thats used widely amongst Broadcast and Network management companies to schedule WorkOrder. This is a tale of how we hooked into the thin integration layer of the system and integrated other network components of the System to give them life. How a WorkOrder booking evolved from a manual spreadsheet job order into self managing , self healing business intelligence framework that needs minimum to no human interaction. Thus transforming the traditional Broadcast infrastructure into a modern Stream based on demand system.

You can see our CTO talking more about it in this Mule Presentation.

This series of blog post will be my humble attempt to document some of the lessons learnt over the process of building a monolith and breaking it up into microservices v.1 and re-writing it again into v.2 over the period of 5 years.

I will try to focus on the following topics:

  1. The self emerging pattern of microservice from trying to do it right
  2. Api is the King – Api Management , Mule support for RAML, Api Gateway , AWS
  3. Splitting the Monolith – Shared api pattern , IPC
  4. Microservice Framework – Spring Integration vs Mule
  5. Event Driven Architecture, Customized persistent Queue Pattern  on Elasticsearch as CQRS implementation
  6. Publish-Subscribe alternative to EventBroker
  7. DataStore – To Share or Not
  8. Mule Flows vs Groovy vs Java Argument
  9. Distributed executors over HazelCast
  10. Troubleshooting – Logging , Event Monitor , Spike Detection , Right Amount of Logging, Logging Policy
  11. Microservice Deployment , Mule MMC , Containers (Docker) & Service discovery , Elastic Container Service , CloudFormation
  12. Testing – Integration vs Functional , Mocking Proxy vs Real Service, How much Unit Test is good, TDD – to do or not to do

 

Configure Docker-client on OSX

So I am a recent MAC convert. Although I am a long time Ubuntu Fan and user but I got tired of switching OS between windows and linux , hacking around virtual box. Also I am more and more working on cloud platform now a days and using my machine as a thin client.

The first thing I wanted on mac is to pull my docker images and use them locally. However, Os X doesn’t support docker images natively. So the docker team came up with this excellent tool called docker-machine. More about docker machine here:

https://docs.docker.com/engine/installation/mac/

When you install docker toolbox , it also installs a ‘Docker Quickstart Terminal’ which you can use to get started with docker. But if you are like me, you want access to your docker engine from any terminal natively.


#find your local docker-machine ip (it should be 192.168.99.100)

$ docker-machine ip
192.168.99.100

#verify your certs exists 

~$ ls ~/.docker/machine/
cache certs machines no-error-report

#who are you?

~$ whoami
sajid

#Update ~/.bash_profile 

$vi ~/.bash_profile

export DOCKER_HOST=tcp://192.168.99.100:2376
export DOCKER_TLS_VERIFY=1
export DOCKER_CERT_PATH=/Users/sajid/.docker/machine/machines/default

#reload

~$ source ~/.bash_profile

 

 

 

What this does is it lets your local docker client point to your default docker-engine / docker-daemon.

Now you can run docker commands just like you can do on linux natively.

~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
williamyeh/ansible ubuntu14.04-onbuild ffb3155960ee 3 weeks ago 244.8 MB
ubuntu latest b549a9959a66 3 weeks ago 188 MB
jenkins 2.0-beta-1 a08ca387230c 3 weeks ago 711.9 MB
sameersbn/redis latest 100e6eb3355d 5 weeks ago 196.5 MB
hello-world latest 690ed74de00f 6 months ago 960 B
sajidmoinuddin.duckdns.org:5000/hello-world latest 690ed74de00f 6 months ago 960 B
sameersbn/gitlab 7.14.3 ffbbcd99823b 7 months ago 631.6 MB
sameersbn/postgresql 9.4-3 40e7e3862c0c 8 months ago 231.6 MB

 

Once you configure your docker client, the next thing you would want is to push some local image in your insecure registry (note my environment is running behind a strict firewall, so its ok to run in insecure mode):


$ docker push mydockerrepo.org:5000/gcloud
The push refers to a repository [mydockerrepo.org:5000/gcloud]
unable to ping registry endpoint https://mydockerrepo.org:5000/v0/
v2 ping attempt failed with error: Get https://mydockerrepo.org:5000/v2/: tls: oversized record received with length 20527
v1 ping attempt failed with error: Get https://mydockerrepo.org:5000/v1/_ping: tls: oversized record received with length 20527

 

NOTE that docker-client is just a proxy between you and the docker engine. So any real config (DOCKER_OPTS) needs to be done on the docker daemon. Where is the docker daemon? Its on the virtualbox (you should see a virtual running with name ‘default’).

So you need to connect to it and update the docker OPTS


~$ docker-machine ssh
## .
## ## ## ==
## ## ## ## ## ===
/"""""""""""""""""\___/ ===
~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ / ===- ~~~
\______ o __/
\ \ __/
\____\_______/
_ _ ____ _ _
| |__ ___ ___ | |_|___ \ __| | ___ ___| | _____ _ __
| '_ \ / _ \ / _ \| __| __) / _` |/ _ \ / __| |/ / _ \ '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__| < __/ |
|_.__/ \___/ \___/ \__|_____\__,_|\___/ \___|_|\_\___|_|
Boot2Docker version 1.10.3, build master : 625117e - Thu Mar 10 22:09:02 UTC 2016
Docker version 1.10.3, build 20f81dd

## EDIT DOCKER CONFIG
docker@default:~$ sudo vi /var/lib/boot2docker/profile 
EXTRA_ARGS='
--label provider=virtualbox
--insecure-registry mydockerrepo:5000

'
CTRL-D to exit docker machine

$ docker-machine stop
$ docker-machine start

 

All set, now you can use your docker environment in OsX just like you would do in native ubunto. No more docker-quickstart-terminal!!!

Docarizing ElasticSearch to Run in Amazon Container Service

Recently I spent almost two days trying to run Elasticsearch in a docker container and hosted under a Elastic Container Service. A word of caution, running a IO-Heavy app like ES under a container like docker might not be a good idea. Specially if you can’t use Native Networking and have to deal with abstracted bridge mode (Currently you can only run in Bridged mode in Elastic Container Service). However, our purpose is to provide search capability on a small set of data (less than 500K) and we were not concerned about the IO overhead, the maintenance benefit outruns the performance penalty so its a good fit for our requirement. Also, if you want to use these scripts for large volume of data , you will have to map larger disk space on the Host services (which is not done here).

While I am not going to describe step-by-step of what ECS is and what Docker is (There is plenty of material from Amazon on the topic), I’ll focus on sharing the code and project that has a working Elastic Image for Docker. Also, I had to re-build a docker image for Elasticsearch of my own because the default docker image that comes from Elasticsearch runs on open-jdk and I wanted Oracle SDK.

Without further Ado , Here is the git-hub project for the elasticsearch:

https://github.com/sajid2045/elasticsearch

And the Elasticsearch-ECS container (with some basic plugin and elastic-ec2 plugin installed):

https://github.com/sajid2045/elasticsearch-ecs

I have setup a build on docker-hub so feel free to use it directly from there:

https://hub.docker.com/r/sajid2045/elasticsearch/

And

https://hub.docker.com/r/sajid2045/elasticsearch-ecs/

Now, a note of credit , most of the work is copied from this blog post:

http://blog.dmcquay.com/devops/2015/09/12/running-elasticsearch-on-aws-ecs.html

However, I have faced some issues mostly because I am trying to run Elastic 2.2 version. But you must read through this blog to understand the internal working principle of this blog post and because its so nicely detailed there, I am not going to repeat them here.

Cloud Formation:
I have used a custom cloud formation script to make sure my instances ends up in the special VPC structure specific to my organization, you can use the default Elastic ECS Cloud template if you just want to try out the feature.

Feel free to look at it if you have similar requirement: https://raw.githubusercontent.com/sajid2045/elasticsearch-ecs/master/cloud-formation-template.json

With the Ready Docker Images and the recipe of how to use them, I think you should have a much smoother time setting it up , Best of luck 🙂

Using GraphDB to manage dynamic subscription

Recently we added functionalities for users to subscribe to their sports of interest so we can send them more personalized content. Given the dynamic nature of sports / match / fixture , I wanted the subscription topics flexible but still hierarchical. The idea was to enable someone to get each and every notification that happens on cricket or to choose a particular team / player etc and get more granular notification.

After trying to model the hierarchy into a RDBMS & document DB , we quickly realized traditional DB is just a bad fit for such recursive model design and the data retrieval queries will be too expensive.

This is finally the big enough problem when you can sell / justify introducing a completely new tool to address the issue. And my chance to use graph-db in production. I was introduced to the graph db concept to a previous project of mine and was really impressed by the ease of natural domain model mapping into the DB via the graph concept.

After some basic R & D, we decided to go with Orientdb. Having used elasticsearch cluster products , Orientdb clustering model seemed more natural compared to neo4j. Having said that, having a graph on a distributed cluster is not a easy problem which we realized later the hard way.

We could easily map our model in a graph schema like the following:

graph1

Notice how easily the concept of an unregistered device / user , and a user with multiple device fits into the domain. A user can subscribe to any node on the Tree that is Topic and the events automatically flows through the Subscriber.

This is how it looks like on Database:

sample

Performance:

With the default setup & no special tuning, the system is looking up 10,000 device-id to push the notification in < 500 ms , which is quite good for us. however, the orientdb-support tells us that it can be made even faster.

Lessons Learned:

>     We went with latest major production release 2.1 (which was released only 1 week ago) , and got burnt by many bugs / corner cases. It took upto 2.1.7 to finally have it stable.

>     Use TinkerPop (think jdbc for database) apis so you can switch between orientdb / neo4j

> If coding in java, use Frames api to maintain the Domain mapping

> Frames api is not properly tuned. I had to re-write most of my queries in gremlin . I am not sure about neo4j but orientdb recommends using gremlin / sql to run the search queries.

> Partition your clusters appropriately, it will result in proper utilization of all nodes in your cluster.