AWS CloudFront 对S3上的URL多次encode,返回403

还是为了节约成本,计划使用AWS的CloudFront,下面是心酸历程。

S3上原始文件名字Battle Of The Saints I.apk

S3上面可下载URL
https://s3-ap-southeast-1.amazonaws.com/sudops.com/Battle+Of+The+Saints+I.apk
发现S3自动将URL中的空格“ ”转换成了加号“+”,到这里还没错,上面地址是可以下载的。

使用CloudFront后,预期的CloudFront下载地址:
http://sudops.com/Battle%20Of%20The%20Saints%20I.apk

同样没有问题,可以下载
不过从S3的log和CloudFront的log中发现有很多403错误,发现实际下载的URL变了,多了很多乱七八糟的字符,难道是不同的浏览器导致?

S3 log:
[10/Jun/2014:10:09:22 +0000] 54.239.196.63  14E072F930D3F2CD REST.GET.OBJECT Battle%25252520Of%25252520The%25252520Saints%25252520I.apk "GET Battle%252520Of%252520The%252520Saints%252520I.apk HTTP/1.1" 403 AccessDenied 231 - 15 - "-" "Amazon CloudFront" -

CloudFront log:
[10/Jun/2014:08:03:14 +0000] 216.137.54.149  A1709466D371E8D3 REST.GET.OBJECT Battle%252520Of%252520The%252520Saints%252520I.apk "GET Battle%2520Of%2520The%2520Saints%2520I.apk HTTP/1.1" 403 AccessDenied 231 - 16 - "-" "Amazon CloudFront" -

我勒个去,好好的URL居然被四次encode,一个小小的空格“ ”被转成了“%25252520”,怪不得出现403无法访问

http://sudops.com/Battle%25252520Of%252520The%25252520Saints%25252520I.apk

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>86905FA0B9C543E9</RequestId>
<HostId>
qAfOvgYqKNl+33vzVykSSmWoBkRBOjpe06YssRMrw3h+9be4U+0lYOvMRseg4+XT
</HostId>
</Error>

据aws论坛说URL要事先经过两次encode,然后会正常访问:

If your ecommerce platform is deliberately breaking the URL encoding required for transmission of prohibited characters in a URL, you will need to double-URL encode the filenames first, so the ecommerce "solution" decodes them to read 41%2BPYwYkt1L.jpg after it's done its single decoding. 

详见:https://forums.aws.amazon.com/message.jspa?messageID=277276

但是如何解释加入到 AWS cloudfront 之后URL被进行了四次转码?难道CloudFront有多级cache,多个region之间的数据存储会增加多次encode?比如S3在新加坡,新加坡的cloudfront先处理一遍,美国region的cloudfront再处理一遍URL,这不科学啊!

于是,规范S3上的URL才是正解,避免不必要的encode,decode。

mark一下,这个地址不错 http://meyerweb.com/eric/tools/dencoder/