Picking up where we left off with Jack’s series on how he put his cloud engineering learnings into practice with interesting results. This is part two, so if you missed part one, why not check it out first here.
We didn’t call them ‘items’ because DynamoDB calls items items…in fact we came to regret that.
These items have a unique identifier which is their name. These items have a price which is an integer. I hope you didn’t come here for data science because I didn’t design a schema. I just said the items would have a Name, and a price, and I got bored, and stopped thinking. For some reason I then imagined myself entering JSON into a computer, and freaked out and wrote some items out in YAML. They went like this:
Item: Jack eats a can of worms
Description: For 10 pounds I will eat some worms
Some other values:
That was that.
I then wrote my first program (from here on, know this is pseudocode for brevity.)
item = yaml.load(item, Loader=yaml.FullLoader)
The idea was you have a folder full of yaml files called stuff like item.yaml, a stack of stupid things I would do for money. Python eats the file into its snake belly and makes it into an object called ‘item’. The snake then puts a collection of item’s items into a DynamoDB table called items and you items items items.
Of course the code worked but I was already embarrassed - we’re going to stop this thread here - you can imagine some refactoring went on in the final product but eventually there were even /item + /items API endpoints and October might as well have been the month of them (items).
What I did to protect myself from scrutiny, however, was immediately start using the amazon/dynamodb-local:latest Docker image on my local machine. For it was a bit painful changing stuff in the cloud. We will get to terraform soon but in my first weekend spike I declared to not even authenticate with AWS. That lasted for a while.
After the first few iterations I shouted 'why do we even need the Cloud if we’re just messing about?' At exactly no one.
That’s what I often tell people, anyway - ‘why do you need the cloud?' - and I still ask this most days. It was four to six weeks until I needed to host this anywhere, so this container served me well.
I then invoked sam-local-start-api (also from Amazon) to build out my backend Lambda functions. It uses Docker, and some trickery, to mean you can develop and run your Lambda functions locally. No internets were harmed. Rigged up with a local DynamoDB, API-Gateway, and Lambdas, I was iterating code and creating API endpoints at a speed that worried me. I almost thought I was mostly done on the beta version. And it was fairly useful.
This enthusiasm was before I’d run create-react-app, of course.
What happened before that was much more tricky. Looking at the donation website of choice (GivenGain) - with it’s no-API-for-you rules - I was going to be a bit stuck for non-mocked (i.e. IRL, real) data.
You see, I wanted to give people some kind of feedback loop - you can buy this silly thing and it’ll go into this money pot which will get me closer to reaching the goal! That kind of thing. Dopamine for clicks. I kind of needed to get people to my portal, to get them to put money into the other product, and then come back to my website to see the results. This was why I wanted a fancy Reactive front end - but with no real data I was doing the equivalent of trading fake Bitcoins on a demo market.
I needed a screenscraper. I’ve never needed a screenscraper before, well at least not personally. Writing automated browser tests for years, you scrape a lot of screens but you’re not doing it (really) for the content but more for the bugs - this was a weird stance. I’m coming from Cloud Engineering, needing to deploy QA-tooling, to non-elegantly get very simple data from a very complicated place. Then used as input for a Serverless, Cloud Native website. Basically it was idiocy. But I needed to start somewhere.
Testers, we can talk for ages about browser drivers - WebdriverIO, Puppeteer, Cypress Capybara. I’ve used some of them very angrily and - do you know what? - they’re all great and massively annoying. And they’re clunky, and vast, and the whole problem is not simple really.
It seems really counterintuitive to render something just to see if it’s what it was supposed to be. It’s also beautiful and fun and totally interesting, especially as things are very rarely what they were supposed to be. Or where. I’m evidently a begrudgingly passionate fan.
But I didn’t need to select a browser driving framework when all I needed was a tiny bit of content from the page and to parse it.
I started writing a Golang binary to grab the figures and print them somewhere. It would compile onto anything and I could have it running in Lambdas, on K8s, on my Raspberry PI, maybe even on a dumbphone. I was going to be so cool.
I never got that far. GivenGain had really submerged the values in acres of dynamic HTML. I was not going to get it out. I was going to need to render the DOM in something. I was going to get bored. So I pulled buildkite/puppeteer.
I always reach for this container as you can be testing a website without really installing a dependency. Given: you have Docker. When: you want to control Chrome. Then: within minutes I had a test that pulled my figures out. I span the container constantly on my localhost, putting Items into the local DynamoDB and processing the inputs and outputs of my API with (local) Lambdas. It was really whirring my fan.
This caused a momentary lapse of judgement that lasted a week. How was this going to work in Cloud? I’d said K8s and I meant it so I git cloned cloudposse/terraform-aws-eks-cluster and got going.
Around seven story points later I gave up, after spending multiple nights attempting to find my way around AWS’ first (second?) Kubernetes implementation. I didn’t even know where my container was, let alone if it was spinning, let alone if it was able to access the other services (that didn’t exist yet but I could imagine what would happen if they did).
I’m glad I learned this the hard way, giving up early on in the project and not later.
I eventually just gave the container to AWS Fargate - which just ran it - which just worked. I could tell by viewing the logs - having viewed some logs before in my time.
I could run the scraping container every few minutes and I could also run it if someone hit an endpoint I figured. It only weighed 600mb. I put the Cloud down and went back to my local.
Once I was feeling like I had a viable data source I started picking holes in it.
Say I stored the total donations in a table and this updated - I would know that someone had donated, at least I could assume. But then I’d have to do (newTotal - oldTotal) to work out the amount. And what if people donated at the same time. And what if the page I was scraping changed? Argh.
I realized my implementation sucked. And I’d gone in too quickly. I went back to the drawing board and looked at what I’d got. I donated some money and monitored the situation. Ping.
You have mail.
Again, ex-QA here - I even have a commit in the README.md of putsbox.com - I have automatedly received a lot of emails in my time and they are ripe for programmatically checking, but just try and get a massively dynamic web page into an email if you want to go mad. They’re inherently simple, usually.
This was the birth of the /mailchecka endpoint. I clumsily pulled an SMTP library into Python and I set up a new Gmail account. A few security checks turned off (not a work account - stop panicking, infosec!) and I had an array with new donations, amount, and donor name in it. And it was stable. I had meaningful data.
I left a curl running to localhost:3000/mailchecka all night and it rarely 500’d.
It was amazing. It was awful and basic, but it worked!
I felt a sudden rush of blood to the head when I looked at this thing. It took mere seconds, it was almost live, it was pretty much a websocket - and it was probably AWS Free Tier.
I downed backend tools and went towards the Front End.
Read the next and final installment: ‘The Front End Diaries’ to see what happened next