How should spaces and plus signs in URLs be encoded?

How should spaces and plus signs in URLs be encoded?

[[427910]]

This article is reprinted from the WeChat public account "Gopher Guide", the author is New World Grocery Store. Please contact the Gopher Guide public account to reprint this article.

It is a consensus that URLs cannot contain spaces explicitly. However, the form in which spaces exist is not completely consistent in different standards, resulting in different implementations in different languages.

Rfc2396 clearly states that spaces should be encoded as %20.

However, the W3C standard states that spaces can be replaced with + or %20.

Lao Xu was confused on the spot. The space was replaced by +, so + itself can only be encoded. In this case, why not encode the space directly? Of course, this is just Lao Xu's doubt. We can no longer trace the previous background, and we cannot change the facts. However, whether the space is replaced by + or 20%, and whether + needs to be encoded are the problems we need to face now.

Three common URL encoding methods in Go

As a Gopher, the first thing we focus on is the implementation of the Go language itself, so let's first understand the similarities and differences between the three commonly used URL encoding methods in Go.

url.QueryEscape

  1. fmt.Println(url.QueryEscape( " +Gopher points to north" ))
  2. // Output: +%2BGopher%E6%8C%87%E5%8C%97

When using url.QueryEscape encoding, spaces are encoded as +, and + itself is encoded as %2B.

url.PathEscape

  1. fmt.Println(url.PathEscape( " +Gopher points to north" ))
  2. // Output: %20+Gopher%E6%8C%87%E5%8C%97

When using url.PathEscape encoding, spaces are encoded as 20%, while + is not encoded.

url.Values

  1. var query = url.Values ​​{ }
  2. query.Set ( "hygz" , " +Gopher points to north" )
  3. fmt.Println(query.Encode())
  4. // Output: hygz=+%2BGopher%E6%8C%87%E5%8C%97

When using the (Values).Encode method to encode, the space is encoded as +, and + itself is encoded as %2B. Further checking the source code of the (Values).Encode method shows that it still calls the url.QueryEscape function internally. The difference between the (Values).Encode method and url.QueryEscape is that the former only encodes the key and value in the query, while the latter encodes both = and &.

For us developers, which of these three encoding methods should we use? Please continue reading and I believe you will find the answer in the following articles.

Implementation in different languages

Since the URL encoding of spaces and + in Go is implemented differently, does this also exist in other languages? Let's take PHP and JS as examples.

URL encoding in PHP

urlencode

  1. echo urlencode( ' +Gopher points to north' );
  2. // Output: +%2BGopher%E6%8C%87%E5%8C%97

rawurlencode

  1. echo rawurlencode( " +Gopher points to north" );
  2. // Output: %20%2BGopher%E6%8C%87%E5%8C%97

PHP's urlencode and Go's url.QueryEscape functions have the same effect, but rawurlencode encodes both spaces and +.

URL encoding in JS

encodeURI

  1. encodeURI( '+Gopher pointer' )
  2. // Output: %20+Gopher%E6%8C%87%E5%8C%97

encodeURIComponent

  1. encodeURIComponent( ' +Gopher points to north' )
  2. // Output: %20%2BGopher%E6%8C%87%E5%8C%97

JS's encodeURI and Go's url.PathEscape functions have the same effect, but encodeURIComponent encodes both spaces and +.

What should we do?

It is more recommended to use the url.PathEscape function encoding

In the previous article, we have summarized the encoding operations of +Gopher pointer in Go, PHP and JS. The following is a two-dimensional table to summarize whether the corresponding decoding operations are feasible.

Encoding/Decoding url.QueryUnescape url.PathUnescape urldecode rawurldecode decodeURI decodeURIComponent
url.QueryEscape Y N Y N N N
url.PathEscape N Y N YY Y YY
urlencode Y N Y N N N
rawurlencode Y YY Y Y N Y
encodeURI N Y N Y Y Y
encodeURIComponent Y YY Y Y N Y

In the above table, YY and Y have the same meaning. Lao Xu only uses YY to indicate that url.PathEscape is recommended for encoding in Go, while rawurldecode and decodeURIComponent are recommended for decoding in PHP and JS, respectively.

In the actual development process, Gopher will definitely need to be decoded. At this time, it is necessary to communicate with the URL encoding party to obtain the appropriate decoding method.

Encode the value

Is there a universal way that does not require URL encoding and decoding? There is undoubtedly a way! Take base32 encoding as an example. Its encoding character set is AZ and numbers 2-7. At this time, after base32 encoding the value, there is no need for URL encoding.

Finally, I sincerely hope that this article can be of some help to all readers.

This article uses the console of PHP 7.3.29, go 1.16.6 and js Chrome 94.0.4606.71.

refer to

https://www.rfc-editor.org/rfc/rfc2396.txt

https://www.w3schools.com/tags/ref_urlencode.ASP

<<:  What you don’t know about 5G

>>:  Let’s talk about the top ten challenges of 6G

Recommend

Since the advent of 5G, 4G network speed has become slower and slower?

With the arrival of 5G, many people have intuitiv...

How to Choose the Right Switch for Your Network?

When it comes to networking, switches are crucial...

How to Choose Brite Box and White Box Switches for Your Network

In the ever-evolving network infrastructure lands...

Edge computing and 5G: What’s next for enterprise IT?

There are some obvious commonalities powering edg...

Quantum network achieves key breakthrough: based on quantum entanglement theory

[[248944]] The new network is based on the theory...

How much do you know about Zigbee wireless connection?

Zigbee has a wide range of applications and can o...