How I built this

Posted on
AWS Terraform Hugo Infrastructure as Code

There are several companies that come to mind when it comes to online publishing1:

All of these share an important commonality:

the platform owns and manages the infrastructure (DNS, content delivery, web servers, etc.)

and expose a user interface by which their users publish content.

Thus, though some of these may have a free tier, eventually I’d end up paying for the cost of infrastructure management. Depending on one’s needs, skill set, and goals, this may be the best option, but for me, I decided that I wanted to manage the infrastructure and deployment myself, primarily because I

  • wanted to own as much of my own blog as possible
  • saw a straightforward path to doing so
  • had an opportunity to use some of my favorite technologies

This approach is also cheaper than some of the alternatives I may have used.

The main components are

Component Role
AWS S3 Host static files
AWS CloudFront Content delivery
Terraform Infrastructure management
Hugo Blogging platform

What is Terraform?

From the product page,

Terraform enables you to safely and predictably create, change, and improve production infrastructure. It is an open source tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

For example, I previously registered my domain yangmillstheory.com through Route53, Amazon’s DNS service. This is what some of that infrastructure looks like:

# ...

variable "primary_zone_id" {
  default = "Z1XVQJ2173I5AH"
}

variable "primary_zone_name" {
  default = "yangmillstheory.com"
}

# these zone attributes were created when registering yangmillstheory.com, and zones aren't importable
#
# note that registration creates an SOA and an NS record, both of which should not be modified!
output "primary_zone_id" {
  value = "${var.primary_zone_id}"
}

output "primary_zone_name" {
  value = "yangmillstheory.com"
}

resource "aws_route53_record" "soa" {
  zone_id = "${var.primary_zone_id}"
  name    = "${var.primary_zone_name}"
  type    = "SOA"
  ttl     = "900"

  records = [
    "ns-1048.awsdns-03.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400",
  ]
}

resource "aws_route53_record" "nameservers" {
  zone_id = "${var.primary_zone_id}"
  name    = "${var.primary_zone_name}"
  type    = "NS"
  ttl     = "172800"

  records = [
    "ns-1048.awsdns-03.org.",
    "ns-557.awsdns-05.net.",
    "ns-212.awsdns-26.com.",
    "ns-1738.awsdns-25.co.uk.",
  ]
}

So here, I can manage core DNS records purely through code, and not by clicking around in the AWS console. This approach has the added benefit of being completely self-documenting, which is key to maintainability.

Notice the two output directives. This instructs Terraform to expose two variables - primary_zone_id and primary_zone_name - to external consumers; by default, it hides all other information. This encourages loose coupling between core DNS infrastructure and consuming applications.

S3

S3 (Simple Storage Service) is Amazon’s highly scalable and available cloud storage. I use it to store files that don’t have a lot of concurrent writes, and for which an eventually consistent read is tolerable.

I deploy the static files comprising my website in a publicly readable S3 bucket configured for website hosting. Note that the IAM policy is necessary as CloudFront needs to read objects from the bucket; having a public-read ACL isn’t enough.

data "aws_iam_policy_document" "blog" {
  statement {
    sid       = "1"
    actions   = ["s3:GetObject"]
    resources = ["arn:aws:s3:::${var.bucket_name}/*"]
    effect    = "Allow"

    principals {
      type        = "AWS"
      identifiers = ["*"]
    }
  }
}

resource "aws_s3_bucket" "website" {
  bucket = "${var.bucket_name}"
  policy = "${data.aws_iam_policy_document.blog.json}"
  acl    = "public-read"

  website {
    index_document = "index.html"
  }
}

At this point I could have created a Route53 A record to point to the S3 bucket. However, traffic to the blog would have been unencrypted HTTP, and I wanted to have secure HTTP at the outset. So that’s where CloudFront comes in.

CloudFront

CloudFront is Amazon’s global CDN, which is basically a geo-distributed cache of the files of its origin server. I use CloudFront because it offers

  • HTTPS support
  • low-latency geo-distributed content delivery (read: fast delivery)

There was a manual step here that was done outside of Terraform, which was to request an SSL certification in the us-east-1 region via Amazon Certificate Manager. Roughly, the steps were:

  • request an SSL certificate for *.yangmillstheory.com in us-east-1
  • set up email at admin@yangmillstheory.com, administrator@yangmillstheory.com to receive the certificate2
  • receive and verify email containing SSL certificate

Once I had the certificate, I was able use it in Terraform to create a CloudFront web distribution:

data "aws_acm_certificate" "primary" {
  domain   = "*.yangmillstheory.com"
  provider = "aws.us-east-1"
  statuses = ["ISSUED"]
}

resource "aws_cloudfront_distribution" "blog" {
  origin {
    domain_name = "${aws_s3_bucket.website.website_endpoint}"
    origin_id   = "${var.origin_id}"

    custom_origin_config {
      http_port = 80
      https_port = 443
      origin_protocol_policy = "http-only"
      origin_ssl_protocols = ["SSLv3", "TLSv1", "TLSv1.1", "TLSv1.2"]
    }
  }

  enabled             = true
  is_ipv6_enabled     = false
  default_root_object = "index.html"

  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "${var.origin_id}"

    forwarded_values {
      query_string = false

      cookies {
        forward = "none"
      }
    }

    viewer_protocol_policy = "allow-all"
    min_ttl                = "${var.min_ttl}"
    max_ttl                = "${var.max_ttl}"
    default_ttl            = "${var.default_ttl}"
  }

  aliases = ["${var.blog_dns_name}.${data.terraform_remote_state.route53.primary_zone_name}"]

  price_class = "PriceClass_100"

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  viewer_certificate {
    acm_certificate_arn      = "${data.aws_acm_certificate.primary.arn}"
    minimum_protocol_version = "TLSv1"
    ssl_support_method       = "sni-only"
  }
}

Note that the CloudFront origin is configured with a custom_origin_config and not an s3_origin_config, and the origin domain_name uses the website_endpoint and not the bucket_domain_name. This is necessary because of the way Hugo structures its posts and directories: posts are at my-bucket/my-post/index.html and not my-bucket/my-post.html3.

Finally, I create a DNS entry blog.yangmillstheory.com which points to the CloudFront distribution.

resource "aws_route53_record" "blog" {
  zone_id = "${data.terraform_remote_state.route53.primary_zone_id}"
  name    = "${var.blog_dns_name}.${data.terraform_remote_state.route53.primary_zone_name}"
  type    = "A"

  alias {
    name = "${aws_cloudfront_distribution.blog.domain_name}"
    zone_id = "${aws_cloudfront_distribution.blog.hosted_zone_id}"
    evaluate_target_health = false
  }
}

Note that this is an example of a separately managed application remotely consuming state (core DNS) managed elsewhere. I’ll write about another example in a separate post.

Hugo

Hugo is a static site generator written in Golang. It builds quickly (as Golang does), has an active open source community, and reloads your development server as you edit your files.

Getting started here was simply a matter of choosing a theme, and reading some documentation.

Once I wrote my first post and built the site, I needed a repeatable way to deploy the generated files to S3. Fortunately, this was trivial:

#! /bin/bash

rm -rf blog/public
hugo -t my-theme -s blog

S3_BUCKET=s3://my-bucket
SITE_DIR=blog/public
AWS_PROFILE=my-profile
echo 'This will deploy the following:'
aws s3 cp --dryrun --recursive $SITE_DIR $S3_BUCKET
read -p "Are you sure? [Y/y] " -n 1 -r
echo    # (optional) move to a new line
if [[ $REPLY =~ ^[Yy]$ ]]; then
  aws s3 cp --recursive $SITE_DIR $S3_BUCKET
else
  echo 'Not deploying.'
fi

Yes, it’s scrappy and brutal. But it’s a conscious decision; I don’t see the need to do more with the current amount of content I’m uploading4.

Conclusion

So that’s roughly how I built this blog using AWS, Terraform, and Hugo.

This approach is good for someone who wants to manage their own infrastructure, get up and running quickly with best-in-class infrastructure and tooling, and doesn’t need a lot of custom features at the outset.


  1. I’m no expert in any of these; they’re just “household names” in the blogging world [return]
  2. I did this via SES, Lambda, S3, and yes, Terraform [return]
  3. https://forums.aws.amazon.com/message.jspa?messageID=314454 [return]
  4. Once you start transferring a payloads larger than 1 MB on a slow access link (~100 KiB/s, which is my current sad state of affairs), it pays to use aws s3 sync instead of aws s3 cp --recursive. This decreases the latency from minutes to seconds. [return]