Serverless budget tracking

Posted on
AWS Terraform Python

Solving first-world problems with cloud technologies.

My fiancé and I budget fairly aggressively, as we have a goal of investing several thousand dollars per month.

Most of our fixed monthly costs (rent, gym dues, phone bill, etc.) are enumerated in a static spreadsheet. For our variable expenses - groceries, restaurants, movies, etc. - we impose a weekly limit. To track that progress, we save all our receipts for the week, and I dutifully enter the sums into a spreadsheet (when I remember to). At the end of every week, I clear the totals and start anew.

This process keeps us fairly within budget, but more than once, I’d lose a receipt, or forget to enter it. At best, they were a source of clutter in my pocket, on my desk, and in my mind.

Ultimately, the process was manual, repetitive, and error-prone.

In other words, this process could and should be automated. So, I came up with the following design:

Serverless budget tracking design

Serverless budget tracking design

Yet, one might ask…

But isn’t this a solved problem?

Yes and no.

Yes, there are various financial services that offer similar and more complete functionality.

No, we didn’t want to allow access to our bank account, and didn’t want or need all the proffered features of these services. Furthermore, we were skeptical that they could provide us with the same short-term accountability that comes with our notification system.

We also found other services to occasionally lag in their financial updates, and more than once had to question their data integrity.

What we really wanted the simplest possible service that could work reliably and with a reasonably extensible design, and one that we owned and was custom-tailored to our needs.

Requirements

Based on the process described above, we needed to model

  • the concept of a budget period (weekly, in our case) with a maximum allotment
  • the concept of a team of people who collectively own the budget

We also needed to

  • be able to easily log an expense with a description and a cost
  • notify the team on progress and compliance
  • reset the budget and notify on results when the period has ended

Design considerations

SES

I’m a big believer in choosing the simplest possible tool for the job, as long as it doesn’t make the design inflexible. I debated on writing a mobile-friendly web application exposing a simple form that allowed users to log an expense, but decided on email instead, as it was simpler and already had native support on mobile devices.

I could send an email to budget@yangmillstheory.com to log an entry; the email subject would be the description, and the body would be the cost. With this approach, the sender identity, date and time would also come for free.

I wired up a previously registered domain up to Amazon’s Simple Email Service.

resource "aws_ses_domain_identity" "domain" {
  domain = "${data.terraform_remote_state.route53.primary_zone_name}"
}

resource "aws_route53_record" "text_verify" {
  zone_id = "${data.terraform_remote_state.route53.primary_zone_id}"
  name    = "_amazonses.yangmillstheory.com"
  type    = "TXT"
  ttl     = "1800"

  records = [
    "${aws_ses_domain_identity.domain.verification_token}",
  ]
}

resource "aws_route53_record" "receive_email" {
  zone_id = "${data.terraform_remote_state.route53.primary_zone_id}"
  name    = "${data.terraform_remote_state.route53.primary_zone_name}"
  type    = "MX"
  ttl     = "1800"

  records = [
    "10 inbound-smtp.${var.region}.amazonaws.com",
  ]
}

Now I was ready to send and receive email on the yangmillstheory.com domain.

S3

At this point I had to choose where to store the spreadsheet.

I’d previously used Google Sheets, but using their API to read and update the data involved more work, especially since it required OAuth2.0.

I could have chosen to log entries in database, but the use case wasn’t there:

  • traffic is infrequent and not at all concurrent
  • the data is transient
  • the data doesn’t have to be queried

So, I chose S3 to store the spreadsheet. I’d probably never have to deal with concurrent writes and eventually consistent reads in this this application, so I consider this a win for simplicity of design.

Implementation notes

The Lambdas which write to the log and reset the log are straightforward.

Here’s the main logic of the Lambda that parses an expense log email

def handler(event, *args):
    """Parses all sent emails, aggregates updates, downloads budget file from S3,
    writes updates to downloaded file, commits back to S3, and sends a notification.
    :param event: see http://docs.aws.amazon.com/ses/latest/DeveloperGuide/receiving-email-notifications-contents.html
    This isn't concurrency-safe, but we're expecting extremely little activity.

    """
    logger.info('Received event: {}'.format(event))
    records = [
        record for record in event['Records']
        if _is_clean(record)
    ]
    csv_rows = thread_pool.map(_get_csv_rows, records)
    logger.info('Got csv rows {}'.format(csv_rows))
    _download_csv()
    _update_csv(*csv_rows)
    if not os.getenv('dry_run'):
        _commit_csv()
    period_spend = _get_period_spend()
    _notify_update(csv_rows, period_spend)
    logger.info('Processed event.')

and the main logic of the Lambda that resets the budget and notifies on the summarized results

def handler(*args):
    """Sends a summary for the current budget, then resets the template."""
    logger.info('Resetting budget.')
    _download_csv()
    _notify_period_spend()
    if not os.getenv('dry_run'):
        _reset_csv()
    logger.info('Budget reset.')

The rest of the infrastructure is managed through Terraform. The Terraform code consumes core infrastructure (managed in another git repository) via remote state to hook into the email receiving pipeline:

resource "aws_ses_receipt_rule" "update_budget" {
  name          = "update_budget"
  rule_set_name = "${data.terraform_remote_state.ses.main_receipt_rule_set_name}"
  after         = "${data.terraform_remote_state.ses.last_receipt_rule_name}"

  recipients = [
    "${var.budget_email}",
  ]

  enabled      = true
  scan_enabled = true
  tls_policy   = "Require"

  s3_action {
    bucket_name       = "${data.terraform_remote_state.ses.email_bucket}"
    object_key_prefix = "${var.s3_email_prefix}"
    position          = 1
  }

  lambda_action {
    function_arn    = "${module.email_receiver.lambda_arn}"
    invocation_type = "Event"
    position        = 2
  }

  depends_on = [
    "module.email_receiver",
    "aws_s3_bucket_policy.app_bucket",
  ]
}

There’s a bit more infrastructure than this, particularly around configuring and attaching the various IAM policies and roles, but there’s nothing that’s truly noteworthy in their details.

Conclusion

So far we’ve been using it for a week. Submitting a receipt is quick, notifications, summaries, and resets are automatic, and a lot of manual process and labor was removed from our budget tracking workflow.

Logging an expense Getting an update

Logging an expense & getting notified

It’s empowering to solve your own problems with your own design and implementation!


Update - 12/8/2017

I extended the period reset code to send an email with the budget log for the period to myself and my fiancé when the period ends; the relevant commits are 22cda18 and d532f64.

In a private repository, I created and attached an IAM policy to the Lambda execution role:

data "aws_iam_policy_document" "ses_send_raw_email" {
  statement {
    sid = "1"

    actions = [
      "ses:SendRawEmail",
    ]

    resources = ["arn:aws:ses:us-west-2:<ACCOUNT_ID>:*"]
  }
}

resource "aws_iam_policy" "ses_send_raw_email" {
  name   = "ses-send-raw-email"
  policy = "${data.aws_iam_policy_document.ses_send_raw_email.json}"
}

resource "aws_iam_policy_attachment" "lambda_to_ses" {
  roles = [
    "${aws_iam_role.lambda_basic_execution.name}",
  ]

  name       = "lambda-to-ses"
  policy_arn = "${aws_iam_policy.ses_send_raw_email.arn}"
}

Here’s what the final result looks like in my email client:

Logging an expense

Getting a copy of the budget log