Don’t Copy Data! Instead, Share it at Web-Scale

Mark Korver, Amazon
Thursday 10:00 - 11:00
Session 1, Track 0, Slot 1

Since its start in 2006, Amazon Web Services has grown to over 40 different services. S3, our object store, one of our first services, is now home to trillions of objects and regularly peaks at 1.5 million requests/second. S3 is used to store many data types, including map tiles, genome data, video, and database backups. This presentation’s primary goal is to illustrate best practice around open data sets on AWS. To do so, it showcases a simple map tiling architecture, built using just a few of those services, CloudFront (CDN), S3 (object Store), and Elastic Beanstalk (Application Management) in combination with FOSS tools, Leaflet, Mapserver/GDAL and Yas3fs. My demo will use USDA’s NAIP dataset (48TB), plus other higher resolution data at the city level, and show how you can deliver images derived from over 219,000 GeoTIFFs to both TMS and OGC WMS clients for the 48 States, without pre-caching tiles while keeping your server environment appropriately sized via auto-scaling. Because the NAIP data sits in a requester-pays bucket that allows authenticated read access, anyone with an AWS account has immediate access to the source GeoTIFFs, and can copy the data in bulk to anywhere they desire. However, I will show that the pay-for-use model of the cloud, allows for open-data architectures that are not possible with on-prem environments, and that for certain kinds of data, especially BIG data, rather than move the data, it makes more sense to use it in-situ in an environment that can support demanding SLAs.