Quick Tools Online

URL Encoding Explained

2025-11-05

URL encoding, formally called percent-encoding, is the mechanism that makes it safe to include arbitrary text inside a URL. You encounter it constantly: spaces become %20, ampersands in query parameters become %26, and non-ASCII characters turn into multi-character sequences starting with %. Understanding how and why it works prevents a class of subtle bugs in URL construction.

Why URLs Have Restrictions

A URL is a structured string. Certain characters play structural roles: / separates path segments, ? separates path from query, & separates query parameters, # introduces the fragment, and : separates the scheme from the host. If any of these characters appear literally in a path segment or parameter value, the parser cannot tell whether they are structure or content.

Additionally, URLs were originally defined to carry only ASCII characters. Non-ASCII characters — accented letters, Chinese, Arabic, emoji — have no direct representation and must be encoded to be transmitted safely across all the systems (browsers, servers, proxies, load balancers) that handle HTTP traffic.

How Percent-Encoding Works

Percent-encoding replaces each byte that needs encoding with a percent sign (%) followed by two uppercase hexadecimal digits representing the byte value. A space (byte value 32, or 0x20 in hex) becomes %20. A left angle bracket (byte value 60, 0x3C) becomes %3C. The percent sign itself becomes %25.

For non-ASCII characters, the process has an extra step. The character is first encoded to UTF-8, which may produce multiple bytes, and then each byte is percent-encoded individually. The emoji 😀 encodes to four UTF-8 bytes (0xF0, 0x9F, 0x98, 0x80), producing %F0%9F%98%80 in a URL.

Reserved vs. Unreserved Characters

RFC 3986 divides characters into three categories. Unreserved characters — letters (A–Z, a–z), digits (0–9), and the four symbols - . _ ~ — can appear in any part of a URL without encoding. Reserved characters — : / ? # [ ] @ ! $ & ' ( ) * + , ; = — have specific structural meanings in URLs and must be encoded when used as data. Everything else must be percent-encoded.

encodeURI vs. encodeURIComponent

JavaScript provides two encoding functions with importantly different behaviors. encodeURI(url) assumes its argument is a complete URL and leaves structural characters alone — it will not encode ://?#&= because those are needed for the URL to function. encodeURIComponent(value) assumes its argument is a single value that will go inside a URL and encodes every character that has structural meaning, including & and =.

  • Use encodeURI when encoding a full URL you have constructed yourself.
  • Use encodeURIComponent when encoding a single query parameter value or path segment.
  • Never use encodeURI to encode a user-supplied value that will go inside a URL — it will miss & and = and allow injection.
  • Most modern frameworks and fetch APIs encode values automatically; check the documentation before encoding manually.

Common URL Encoding Mistakes

Double encoding happens when a value that is already percent-encoded gets encoded again. A space becomes %20, then %20 becomes %2520 (because % is encoded as %25). The receiver decodes once and sees %20 as a literal string instead of a space. The fix is to encode exactly once, at the boundary where you construct the URL.

The + vs. %20 confusion comes from HTML form encoding (application/x-www-form-urlencoded), which uses + for spaces in query strings. Servers that parse query strings often handle both. URL paths, however, should always use %20 for spaces — + in a path is a literal plus sign, not a space.