We’ve been churning out loads of projects at Toaster, and most of them run on the Google AppEngine ecosystem. We use AppEngine, Google Cloud, the Datastore, Search, and a few of the other related auto-scaling resources. We also do a lot of work for Google clients so they just love it when we use their stuff -- those projects tend to come with special security constraints which AppEngine helps us meet.
Using AppEngine gives us less control over infrastructure -- you end up having to do things a certain way -- but the positives far outweigh the negatives; we end up having to do minimal DevOps work, running multiple versions of the same application comes for free, and updates and rollbacks are a walk in the park.
The complexity of the projects we’ve tackled here keeps on increasing though, so we’ve got a few different approaches to administering content, let me give you a rundown..
1. Sync to Google Sheets
One of the first websites I worked on was the Taiwan Elections 2016 website. It required some static content (text blobs, translations, descriptions of people) and some “live” statistics such as questions/answers and popularity; there was no complex related data or validation required.
It used some NDB models that synchronised hourly with a set of Google Sheets; this was a two-way synchronisation which meant you could administer data within Google Sheets and the results would be seen on the live site, and new data from the live site would periodically be pulled (or pushed) to Google Sheets.
Using NDB models that synchronised with Google Sheets allowed us to update data on the live site by making edits a sheet; it was quick and painless to do.
Doesn’t require running a server (for data administration)
More fault tolerant - if the sheets go out of whack then new data isn’t pulled in, but the site keeps on working
Little/no control over when data goes live
Related data/entities are hard to administer
Basic data validation - it’s not easy to validate data and these measures can also be easily broken or circumvented
Doesn’t allow easy pre or post-processing of content
2. “Offline” CMS with static site generation
Go Global (https://goglobal.withgoogle.com/) used a different approach; we went full-on static site generation for it. Once again a relatively simple site, a few models that are very interrelated.
Locally, a developer runs the site using Django and administers content using the CMS; changes to the database get committed to the repository and pushed up. When it comes to release, we crawl the local site and save each page statically (in different languages) and then can just serve these up directly.
We trade extra complexity in development for less during production, and it is a reasonable trade-off -- having a website run almost entirely statically in production is a massive win, it ends up less complex, more secure and faster than a dynamically run counterpart.
Quick to setup and use CMS (offline)
Not a lot of moving parts in production
Quick to serve
Sharing the database and “uploaded” files is a bit of a pain (committing binary (SQLITE) database to git isn’t great..)
If someone forgets to commit the DB then data that was previously released might not be in the next version
“Offline” edit only - there is no live CMS that anyone can just access and edit data
3. Dynamic CMS with static API generation
YouTube FanFest was a bit more complex than the last project. The data models are quite interconnected: we’ve got YouTube Creators that are performing at Events, each event needs to be translated as well as any related creators, and we’ve got content that needs to be periodically updated.
For this project we were going to have authors and publishers with different ACLs (Access Control Lists) updating the CMS on a regular basis so it was important the data be revisioned and that we have the ability to publish specific revisions, and as always, that all content be served up fast.
The backend CMS reads and writes to a graph of NDB models; at publish time we generate a “dynamic” API request to a series of endpoints and then dump the response straight to JSON that is put on Google Cloud. Each resource usually contains a variety of nested data so is computationally expensive to generate, but once serialised is very fast to read (just read some JSON and serve is straight up) -- this is where we pick our trade-off, expensive writes/cheap reads.
Dependencies between models are tracked, and when a new connected model is published we ensure to re-generate dependencies and update the store; this way published data stays in sync and we don’t see different versions of the same resource when it is nested versus when it is not.
Quick to serve
Easy to edit (live)
Expensive save and publish - especially for inter-connected resources
Separate CMS server to maintain
Which approach do we use?
A dynamic CMS approach works best for more complex projects with interconnected data models, frequent data changes and multiple user roles and permissions to manage content.
When no complex related data or validation is required, using the Google Sheets approach simplifies the development process while keeping data updates simple and easy to do.
We’re continuously optimising both approaches using our previous experience and experimentation to drive improvements we can use in future projects.
Do these approaches resonate with your own experience?