Imposing deployment environments on SaaS apps that really aren’t designed for it…and the side benefits

Written by Dave Evans, | Posted in News on 30 October 2020

At Banking Works we’re obsessed with finding the most effective (and ideally, simplest) solution to every challenge we face. In this post, Banking Works Senior Software Engineer, Dave Evans gives us an inside scoop on a recent project. It gives an insight into our product teams approach to always finding the best solution, even if it means redesigning the square peg… and the round hole.

Most software engineers will agree that having several copies of your live environment for development and testing is a pretty neat idea. These environments are usually designated as development (for sandbox experimentation with new code), integration (for testing), staging (a mirror of the live environment for final testing), and the live production environment. Changes begin in development and are promoted through each environment to finally be made live. Nowadays, with cloud computing and infrastructure-as-code, this has become more straightforward.

Here at Banking Works we put a lot of effort into ensuring we have this separation, to ensure any changes or new features are thoroughly tested and proven before making the big time. Also, developers working on changes or new features can be safely isolated from QA tests, and of course, the live system being used by clients or customers. It is also possible to easily roll back a change to production, in case of unforeseen problems. This is a key factor in ensuring that our business maintains operational resilience, and that all solutions rolled out to clients are sound.

But what about third party SaaS apps that may be an integral part of operations, but don’t easily support multiple environments?

We could spin up several instances of the app, but then we encounter problems such as:

  • How is configuration synchronised between instances
  • What licensing/cost considerations are there?
  • What about things that must not be enabled in non-production environments?

Zendesk – the ubiquitous customer service platform – is one such SaaS app that we have this issue with. We use it as a client-facing app for various services, and it works pretty well.

Zenfig

The goals for our Zendesk Config tool (Zenfig, because I am obsessed with portmanteaus) were as follows:

  • A command line tool, but may be turned into an API with a web-based UI in future
  • Synchronise configuration entities between multiple instances of Zendesk, allowing updates or additions in one to be pushed to another, and deleted entities to be removed
  • Backup of tickets and related data
  • Extensible for future use with other applications.

The tool is a Node app written in Typescript. The basic concept is:

  • Pull configuration entities from the Zendesk API to local JSON files
  • Store each entity’s ID and instance reference in an ID map file
  • Push the entities to a different Zendesk instance, and store the newly created entity’s ID alongside the ID from the original instance
  • Use this ID map to update the same entities when they are pushed back again
  • Give the user a good amount of information about what the tool is doing, including optionally showing a diff of what has changed.

 

Files

Using a database for configuration storage seemed like overkill, so flat files were chosen. This had the added bonus of allowing us to put the JSON files under source control. Having our Zendesk configuration in a Git repository is very useful, as it is fairly extensive and subject to change by multiple people. This allows us to have an audit trail of changes, and visibility of exactly what has changed in the JSON.

To S3 or not to S3

Where do these JSON files get stored? The local filesystem seemed an obvious initial choice, as this is easy to implement, and great for development and testing. But in production, the tool will most likely be running in AWS somewhere, so Amazon S3 is the obvious choice there. I ended up having an abstraction of filesystem operations, with the concrete implementation being decided at runtime (Strategy pattern), via dynamic import. This lets us change the filesystem easily if our requirements change.

Backup

It made sense to also implement a ticket backup feature, as Zendesk has an incremental backup API, and we are already getting stuff from an API and writing it to files, essentially. The backup API allows us to ask for tickets that are new or changed from a certain start date. It gives us the changes back plus the date of the last change. We do an initial backup with the start date at 0, and store the last change date, and then provide it in the request the next time the backup is run. If we have the tool running as a scheduled task in AWS fairly frequently, we effectively have an almost-live copy of our ticket data.

It’s straightforward to change the code to write to an AWS database solution such as RDS or DynamoDB, instead of (or as well as) S3. This allows us to connect that database to a Business Intelligence platform such as Amazon QuickSight for reporting and live dashboards. We avoid having to use the Zendesk API and can perform arbitrary queries on the ticket data to our heart’s content.

Problems

While in the planning phase, I realised there would be several hurdles to jump.

Firstly, one cannot simply copy all entities to another instance. Some examples:

  • Test/development instances may have less license seats to save on cost, so users that are agents or admins in production may need to be downgraded, and then upgraded again if pushed back to production
  • Brand subdomains must be globally unique, so they must be changed
  • Notifications should not be sent from non-production instances, so any notification triggers may need to be disabled
  • Targets that reference the Zendesk API will refer to a different subdomain, so this would need to be altered.

I solved this with an override system. If certain entities met a predefined set of criteria, their values would be overwritten, and the original values saved. The original values can then be restored when the entity is pushed back to its original instance.

Mapping

Another issue was mapping. It wouldn’t be a case of just mapping the primary key (ID) of each entity, we would also need to map all foreign keys that may exist (basically, linked IDs of other entities, such as group memberships for a particular user). Some of these were quite complex, for example Zendesk triggers and automations, which have dynamic rules that can reference all sorts of other entity types. Also, some of the foreign keys could be in arrays. And also… Certain fields could be foreign keys, but they may not be, based on the value of another field. Just to make things even more fun, foreign keys could be integers, but sometimes strings. In practice, the Zendesk API accepts either, but I wanted to keep any requests consistent with the responses.

This could have all been hard-coded, but one of the goals was to make a more generalised tool that can be used with other APIs, so this was out of the question.

So I put together a flexible mapping system. It is provided with information about each entity type, such as its name (e.g.users), the corresponding API endpoint, and a list of relationship objects. These describe the various foreign keys that might be present; so for users, you might have the user’s organisation and group memberships. Each relationship needs a path to say where the foreign key / ID is located in the JSON object, and also what the entity type is (e.g. organisations or groups). The relationship also contains an optional condition – a key/value pair which is checked before trying to map the ID. This is needed because some entity field values are only IDs based on the value of another field – such as in Zendesk’s trigger and automation parameters.

The path (JSON element addressing) was implemented using a few recursive functions that traverse the field hierarchy, iterating over arrays where required. I realised that this would be a great use case for JSONPath – the query language for JSON objects – but decided to press on with my simple implementation for now. This may get replaced with JSONPath at some point, and there will be some very satisfying refactoring involved!

The final stage was putting together a file containing all of the entity types for Zendesk, and their relationships. It is incomplete as we don’t use all features of Zendesk, but the flexibility of the system means that new entity types can be added very easily, and in future the same system can be used to support entirely different apps.

Shout Outs

OCLIF, the Open CLI Framework is a great Node module for building command line tools. It allows us to very quickly put together a CLI, with lovely features such as auto-documentation, which means avoiding having to write documentation more than once! I also used Winston for logging and a little spinner library called Ora. These didn’t all play nicely together out of the box, so some wrangling had to be done. But eventually, logs can be written to the terminal (or AWS Cloudwatch) and/or a file, with a spinner showing current status when running on the command line.

Epilogue

Something that was basically a workaround has brought forth several additional benefits! We have source control and backup for our Zendesk configuration, we have an audit trail of changes, and our raw ticket data is much more accessible. We can now build additional tooling using our own copy of the ticket data; one such application is live dashboards for complex queries – something Zendesk doesn’t currently offer natively.