How should spaces and plus signs in URLs be encoded?

How should spaces and plus signs in URLs be encoded?

[[427910]]

This article is reprinted from the WeChat public account "Gopher Guide", the author is New World Grocery Store. Please contact the Gopher Guide public account to reprint this article.

It is a consensus that URLs cannot contain spaces explicitly. However, the form in which spaces exist is not completely consistent in different standards, resulting in different implementations in different languages.

Rfc2396 clearly states that spaces should be encoded as %20.

However, the W3C standard states that spaces can be replaced with + or %20.

Lao Xu was confused on the spot. The space was replaced by +, so + itself can only be encoded. In this case, why not encode the space directly? Of course, this is just Lao Xu's doubt. We can no longer trace the previous background, and we cannot change the facts. However, whether the space is replaced by + or 20%, and whether + needs to be encoded are the problems we need to face now.

Three common URL encoding methods in Go

As a Gopher, the first thing we focus on is the implementation of the Go language itself, so let's first understand the similarities and differences between the three commonly used URL encoding methods in Go.

url.QueryEscape

  1. fmt.Println(url.QueryEscape( " +Gopher points to north" ))
  2. // Output: +%2BGopher%E6%8C%87%E5%8C%97

When using url.QueryEscape encoding, spaces are encoded as +, and + itself is encoded as %2B.

url.PathEscape

  1. fmt.Println(url.PathEscape( " +Gopher points to north" ))
  2. // Output: %20+Gopher%E6%8C%87%E5%8C%97

When using url.PathEscape encoding, spaces are encoded as 20%, while + is not encoded.

url.Values

  1. var query = url.Values ​​{ }
  2. query.Set ( "hygz" , " +Gopher points to north" )
  3. fmt.Println(query.Encode())
  4. // Output: hygz=+%2BGopher%E6%8C%87%E5%8C%97

When using the (Values).Encode method to encode, the space is encoded as +, and + itself is encoded as %2B. Further checking the source code of the (Values).Encode method shows that it still calls the url.QueryEscape function internally. The difference between the (Values).Encode method and url.QueryEscape is that the former only encodes the key and value in the query, while the latter encodes both = and &.

For us developers, which of these three encoding methods should we use? Please continue reading and I believe you will find the answer in the following articles.

Implementation in different languages

Since the URL encoding of spaces and + in Go is implemented differently, does this also exist in other languages? Let's take PHP and JS as examples.

URL encoding in PHP

urlencode

  1. echo urlencode( ' +Gopher points to north' );
  2. // Output: +%2BGopher%E6%8C%87%E5%8C%97

rawurlencode

  1. echo rawurlencode( " +Gopher points to north" );
  2. // Output: %20%2BGopher%E6%8C%87%E5%8C%97

PHP's urlencode and Go's url.QueryEscape functions have the same effect, but rawurlencode encodes both spaces and +.

URL encoding in JS

encodeURI

  1. encodeURI( '+Gopher pointer' )
  2. // Output: %20+Gopher%E6%8C%87%E5%8C%97

encodeURIComponent

  1. encodeURIComponent( ' +Gopher points to north' )
  2. // Output: %20%2BGopher%E6%8C%87%E5%8C%97

JS's encodeURI and Go's url.PathEscape functions have the same effect, but encodeURIComponent encodes both spaces and +.

What should we do?

It is more recommended to use the url.PathEscape function encoding

In the previous article, we have summarized the encoding operations of +Gopher pointer in Go, PHP and JS. The following is a two-dimensional table to summarize whether the corresponding decoding operations are feasible.

Encoding/Decoding url.QueryUnescape url.PathUnescape urldecode rawurldecode decodeURI decodeURIComponent
url.QueryEscape Y N Y N N N
url.PathEscape N Y N YY Y YY
urlencode Y N Y N N N
rawurlencode Y YY Y Y N Y
encodeURI N Y N Y Y Y
encodeURIComponent Y YY Y Y N Y

In the above table, YY and Y have the same meaning. Lao Xu only uses YY to indicate that url.PathEscape is recommended for encoding in Go, while rawurldecode and decodeURIComponent are recommended for decoding in PHP and JS, respectively.

In the actual development process, Gopher will definitely need to be decoded. At this time, it is necessary to communicate with the URL encoding party to obtain the appropriate decoding method.

Encode the value

Is there a universal way that does not require URL encoding and decoding? There is undoubtedly a way! Take base32 encoding as an example. Its encoding character set is AZ and numbers 2-7. At this time, after base32 encoding the value, there is no need for URL encoding.

Finally, I sincerely hope that this article can be of some help to all readers.

This article uses the console of PHP 7.3.29, go 1.16.6 and js Chrome 94.0.4606.71.

refer to

https://www.rfc-editor.org/rfc/rfc2396.txt

https://www.w3schools.com/tags/ref_urlencode.ASP

<<:  What you don’t know about 5G

>>:  Let’s talk about the top ten challenges of 6G

Recommend

5G is eating into Wi-Fi traffic

With the commercialization of 5G and the increase...

In 2020, who will break out in the 5G era?

In June 2019, the Ministry of Industry and Inform...

How businesses can improve remote collaboration in 2021

Since the outbreak of the pandemic last year, the...

In the 5G era, where will the 2G-based temperature and humidity sensors go?

[[349400]] With the construction of 5G network ba...

Inspur Networks launches new data center products to build up new potential

In recent years, with the rapid development of in...

CloudCone Easter Promotion: $15/year KVM-1GB/30GB/3TB/Los Angeles Data Center

CloudCone offers three special VPS packages for t...

Wi-Fi 7: What is it and when can you expect it to arrive?

[[380191]] Wi-Fi 7 is expected to have higher dat...

The interviewer asked me to turn left because of the thread pool!

A few days ago, my friend had an interview. Durin...