Dev Notes

Saturday, 22 December 2018

Christmas Holidays Coding

Christmas is for coding, starting with Vue.js https://www.pluralsight.com/courses/vuejs-getting-started

Monday, 17 December 2018

Today I had one of those frustrating conversations with one of the network engineers of the infrastructure team. They had managed to destroy one of the octopus boxes and the replacement couldn't see some internal nuget servers.

First, to open the firewall I had to submit an spreadsheet from a template I couldn't access. Then it got rejected because I entered only the DNS entries and not the IPs, so I went to ask him why that was needed, hoping that the reason was that he would use the spreadsheet in some application to automate the rule creation. But the actual reason was "it will take too long to ping those boxes" (there were only three of them).

Hopefully he understands that this kind of behaviour makes every single developer want to move to the cloud and make him obsolete...

Thursday, 6 September 2018

TeamCity - GitHub Commit Status Publisher

This morning one of the new guys in the team asked me if he could get notified when a pull request that has merged is built. Sometimes you just need someone to do the right question for you to notice that you could be doing things better.
After a few clicks in TeamCity I was able to configure a Build Feature to feed back the result of the build to GitHub. The only drawback is that there is no an easy way to configure it in the root of all the projects, so I'll have to add it one by one to the close to hundred build configurations we have.

Plug-in Configuration:

Saturday, 28 July 2018

Unexpected interview output

During the interview of a candidate for a developer position in the team, I started to worry about how little experience he had on the operations side. After some questions, I realized that with all this new cloud development tools it is perfectly possible to focus on just coding in some scenarios.

In his case, he had been working on a simple web application built on Asp.net Core for which he didn't needed to setup continuous build/deployment manually, neither needed to spent much time on monitoring, instrumentation or provisioning as in Azure it is possible to deal with this concerns from the dashboard.

Other than that he proved he could solve problems and produce clean code, so hopefully he accepts the offer and joins the team.

Saturday, 7 April 2018

One year of Akka.net in production

One year ago we went live in production with a rewrite of the core of the platform in Akka.net. Although the initial release date had to be delayed because a bug we found in the pre-production environment in large Akka.net Clusters, I have to say that after fine tuning remoting and clustering parameters, the system has been stable and the improvement of the performance in soft real time processing is noticeable and also more scalable.
The core of the system is formed of two services plus the lighthouse, initially we went live with a single cluster of more than 50 instances, in the latest release this week the count has gone up to 101! service instances distributed in three clusters.
After this experience we have continued using Akka.net in other parts of the platform like dynamic clusters of pub-sub actors used to distribute the load and control the concurrency when processing messages from RabbitMq.
Also have started using Akka.net in conjunction with .net core, and had the opportunity to contribute to the Akka.DI.Ninject repository to convert it to .net standard 2.0 so it could be used in .net core services.

Monday, 26 March 2018

Using python and OctopusApi to handle rollbacks

This week we have migrated a large amount of servers and services to use new clusters of MongoDB, Aerospike and RabbitMQ. Because all this configuration is in Octopus variables (connection strings), we just needed to change those variables to point to the new infrastructure. All good in principle, change the variables and create new releases for each service; but that leaves us exposed to three potential problems:

if an emergency release is needed, it will automatically point to the new infrastructure and it could not be ready for the switch over.
if an unexpected problem is found after the release and we want to rollback, we would need to modify all the Octopus variables and create a new release to rollback to the old infrastructure (or rollback to the old version of the code).
additionally, in both cases, creating a new release after the deployment could pick up variable changes made for future releases like feature flags.

To address those possible problems, I decided to add a rollback role to all the servers that were planned to be deployed (30+) and then add the same rollback role to the existing connection string variables, and then create new variables with the same roles as the originals. This way if a new release needs to be created it will use the rollback variables. Then right before the release we would remove the rollback role so when deploying the new connection strings would be used. And in the case of needing a rollback we would just add back the rollback role and deploy the same release.

The only problem was adding and removing that role to the large amount of servers, this is were I just put together my very first python program to use Octopus Api to add or remove a role. The code is a bit too raw, but other members of the team have already started to used it to setup new deployment targets. I have to say that I'm impressed how natural and quick was to write the script in Python.

import requests

OctopusUrl = 'http://***octopusserver***'

headers = {'X-Octopus-ApiKey' : 'API-***********'}

newRole = 'Rollback'

environmentName = 'Production'

#environment

machinesUrl = 'null'

environments = requests.get(OctopusUrl+'/api/environments', headers=headers).json()

for environment in environments['Items']:

if (environment['Name'] == environmentName):

machinesUrl = environment['Links']['Machines']

#machines

machines = requests.get(OctopusUrl+machinesUrl, headers=headers).json()

machinesList = []

machineEndPage = False

while not machineEndPage:

for machine in machines['Items']:

if ('MainRole in machine["Roles"]):

machinesList.append(machine)

nextMachinesUrl =machines['Links'].get('Page.Next', 'null')

if (nextMachinesUrl != 'null') :

machines = requests.get(OctopusUrl+nextMachinesUrl, headers=headers).json()

else:

machineEndPage = True

for machine in machinesList:

#if (newRole not in machine['Roles']): #Add role

#machine['Roles'].append(newRole)

if (newRole in machine['Roles']): # remove

machine['Roles'].remove(newRole)

machineUrl = OctopusUrl+machine['Links']['Self']

result = requests.put(machineUrl, json=machine, headers=headers)

print(machine['Name']+' '+result)

Thursday, 8 March 2018

Video: What I Wish I Had Known Before Scaling Uber to 1000 Services • Matt Ranney

This morning during breakfast I came across this great video https://youtu.be/kb-m2fasdDY regarding the problems Uber had to overcame when it moved to a microservice architecture. Although in a very smaller scale, it is surprising how the organisation I'm currently part of, had/has the same problems:

rest/json contracts need integration tests
too much logging
logging not uniform across different technologies
tracing agreement
language silos (zealots)
too many repositories
hard to coordinate multi team deployments
incidence ownership
load testing is hard