aws S3 cheatsheet


Are you planning to get AWS certification? Hopefully below summary on S3 is useful. S3 is Object Storage (it is not a file system, you cannot run OS or application from S3)

  • You can store any kind of data in S3.
  • S3 namespace is unique (just like the domain name). For example, https://s3.amazonaws.com/test-bitarray.io/aws-s3.png, where “test-bitarray.io” is a bucket name and it has to be unique globally.
  • Size of an object can be from 0 bytes to 5 TB. Maximum size to load in a single PUT is 5 TB. Objects >100MB are recommended to be uploaded in multipart PUT operation.
  • You will receive an HTTP 200 code on successful upload of an object.
  • Once a bucket is created its name cannot be changed.
  • Once a bucket is deleted its name is available again.
  • S3 object tags are key-value pairs applied to S3 objects which can be changed at any time during the lifetime of the object.  You can apply Identity and Access Management (IAM) policies, setup S3 Lifecycle policies etc. based on these tags. These tags can be used to set up storage classes and expire objects in the background (so storage class/policies can be applied at object level).
  • By default, there is NO encryption. (you can set encryption from client-side (load encrypted objects) or encryption from AWS side which can be done using the following three methods)
      • SSE-S3 provides an integrated solution where Amazon handles key management
      • SSE-C enables you to leverage Amazon S3 to perform the encryption and decryption of your objects while retaining control of the keys used to encrypt objects.
      • SSE-KMS enables you to use AWS Key Management Service (AWS KMS) to manage your encryption keys.
  • S3 storage classes(all have 99.999999999% durability, There are 11 -> 9’s in total. )
      • S3 Standard for general-purpose storage of frequently accessed data;  99.99% availability
      • S3 Intelligent-Tiering for data with unknown or changing access patterns; 99.9% availability
      • S3 Standard-Infrequent Access (S3 Standard-IA) and  99.9% availability
      • S3 One Zone-Infrequent Access (S3 One Zone-IA) for long-lived, but less frequently accessed data;  99.5% availability  and
      • Amazon S3 Glacier (S3 Glacier). Might take 1 min – 12 hours. and
      • Amazon S3 Glacier Deep Archive (S3 Glacier Deep Archive) for long-term archive and digital preservation.  Might take 12-48 hours or more. [S3 Glacier Deep Archive is designed for long-lived but rarely accessed data that is retained for 7-10 years or more. Objects that are archived to S3 Glacier Deep Archive have a minimum of 180 days of storage, and objects deleted before 180 days incur a pro-rated charge equal to the storage charge for the remaining days. ]
  • Amazon S3 Standard, S3 Standard-Infrequent Access, and S3 Glacier storage classes replicate data across a minimum of three AZs to protect against the loss of one entire AZ. This remains true in Regions where fewer than three AZs are publicly available. Objects stored in these storage classes are available for access from all of the AZs in an AWS Region. The Amazon S3 One Zone-IA storage class replicates data within a single AZ. Data stored in this storage class is susceptible to loss in an AZ destruction event.
  • S3 Standard-IA is designed for larger objects and has a minimum object storage charge of 128KB. Objects smaller than 128KB in size will incur storage charges as if the object were 128KB.
  • Cost of S3 Bucket depends on the region (where you create it).
  • S3 can be configured to send an event notification to SQS Queue, SNS Topic, Lambda Function.
  • To restrict delete by mistake you can enable MFA authentication (Multifactor authentication) or versioning.
  • Billing scenarios
      • Yes, you will be charged if you access bucket from the management console, from another AWS account etc.
      • There is no Data Transfer charge for data transferred within an Amazon S3 Region via a COPY request (there is a cost top copy between regions).
      • There is no Data Transfer charge for data transferred between Amazon EC2 and Amazon S3 within the same region.
      • There is a cost associated with the amount of space you are taking (including all versions etc), Cost with data transfer in, cost with data transfer out and data requests (PUT, GET, DELETE)
      • You pay the Amazon S3 charges for storage (in the S3 storage class you select), COPY or PUT requests, and inter-region Data Transfer OUT From Amazon S3 for the replicated copy of data. COPY requests and inter-Region data transfer are charged based on the source Region. Storage for replicated data is charged based on the target Region.
  • Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and your Amazon S3 bucket. S3 Transfer Acceleration leverages Amazon CloudFront’s globally distributed AWS Edge Locations. As data arrives at an AWS Edge Location, data is routed to your Amazon S3 bucket over an optimized network path.
  • You can enable transfer acceleration using console or CLI, API etc. and to activate you can point your Amazon S3 PUT and GET requests to the s3-accelerate endpoint domain name.
  • S3 Select is an Amazon S3 feature that makes it easy to retrieve specific data from the contents of an object using simple SQL expressions without having to retrieve the entire object.
  • With S3 Batch Operations, you can make changes to object metadata and properties, or perform other management tasks, such as copying objects to other buckets, replacing object tag sets, modifying access controls, and restoring archived objects from S3 Glacier.
  • Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL queries.
  • Amazon Redshift Spectrum is a feature of Amazon Redshift that enables you to run queries against exabytes of unstructured data in Amazon S3 with no loading or ETL required.
  • Amazon Macie is an AI-powered security service that helps you prevent data loss by automatically discovering, classifying, and protecting sensitive data stored in Amazon S3.
  • The S3 Inventory report provides a scheduled alternative to Amazon S3’s synchronous List API. You can configure S3 Inventory to provide a CSV, ORC, or Parquet file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or prefix.

 

 

Reference: https://aws.amazon.com/s3/faqs/

 

Categories