Public API

The only public yarl class is URL:

>>> from yarl import URL
class yarl.URL(arg, *, encoded=False)

Represents URL as

[scheme:]//[user[:password]@]host[:port][/path][?query][#fragment]

for absolute URLs and

[/path][?query][#fragment]

for relative ones (Absolute and relative URLs).

The URL structure is:

 http://user:pass@example.com:8042/over/there?name=ferret#nose
 \__/   \__/ \__/ \_________/ \__/\_________/ \_________/ \__/
  |      |    |        |       |      |           |        |
scheme  user password host    port   path       query   fragment

Internally all data are stored as percent-encoded strings for user, path, query and fragment URL parts and IDNA-encoded (RFC 5891) for host.

Constructor and modification operators perform encoding for all parts automatically. The library assumes all data uses UTF-8 for percent-encoded tokens.

>>> URL('http://example.com/path/to/?arg1=a&arg2=b#fragment')
URL('http://example.com/path/to/?arg1=a&arg2=b#fragment')

Unless URL contain the only ascii characters there is no differences.

But for non-ascii case encoding is applied.

>>> str(URL('http://εμπορικόσήμα.eu/шлях/這裡'))
'http://xn--jxagkqfkduily1i.eu/%D1%88%D0%BB%D1%8F%D1%85/%E9%80%99%E8%A3%A1'

The same is true for user, password, query and fragment parts of URL.

Already encoded URL is not changed:

>>> URL('http://xn--jxagkqfkduily1i.eu')
URL('http://xn--jxagkqfkduily1i.eu')

Use human_repr() for getting human readable representation:

>>> url = URL('http://εμπορικόσήμα.eu/шлях/這裡')
>>> str(url)
'http://xn--jxagkqfkduily1i.eu/%D1%88%D0%BB%D1%8F%D1%85/%E9%80%99%E8%A3%A1'
>>> url.human_repr()
'http://εμπορικόσήμα.eu/шлях/這裡'

Note

Sometimes encoding performed by yarl is not acceptable for certain WEB server.

Passing encoded=True parameter prevents URL auto-encoding, user is responsible about URL correctness.

Don’t use this option unless there is no other way for keeping URL attributes not touched.

Any URL manipulations don’t guarantee correct encoding, URL parts could be re-quoted even if encoded parameter was explicitly set.

URL properties

There are two kinds of properties: decoded and encoded (with raw_ prefix):

URL.scheme

Scheme for absolute URLs, empty string for relative URLs or URLs starting with '//' (Absolute and relative URLs).

>>> URL('http://example.com').scheme
'http'
>>> URL('//example.com').scheme
''
>>> URL('page.html').scheme
''
URL.user[source]

Decoded user part of URL, None if user is missing.

>>> URL('http://john@example.com').user
'john'
>>> URL('http://бажан@example.com').user
'бажан'
>>> URL('http://example.com').user is None
True
URL.raw_user

Encoded user part of URL, None if user is missing.

>>> URL('http://довбуш@example.com').raw_user
'%D0%B4%D0%BE%D0%B2%D0%B1%D1%83%D1%88'
>>> URL('http://example.com').raw_user is None
True
URL.password[source]

Decoded password part of URL, None if user is missing.

>>> URL('http://john:pass@example.com').password
'pass'
>>> URL('http://степан:пароль@example.com').password
'пароль'
>>> URL('http://example.com').password is None
True
URL.raw_password

Encoded password part of URL, None if user is missing.

>>> URL('http://user:пароль@example.com').raw_password
'%D0%BF%D0%B0%D1%80%D0%BE%D0%BB%D1%8C'
URL.host[source]

Encoded host part of URL, None for relative URLs (Absolute and relative URLs).

Brackets are stripped for IPv6. Host is converted to lowercase, address is validated and converted to compressed form.

>>> URL('http://example.com').host
'example.com'
>>> URL('http://хост.домен').host
'хост.домен'
>>> URL('page.html').host is None
True
>>> URL('http://[::1]').host
'::1'
URL.raw_host

IDNA decoded host part of URL, None for relative URLs (Absolute and relative URLs).

>>> URL('http://хост.домен').raw_host
'xn--n1agdj.xn--d1acufc'
URL.port

port part of URL, with scheme-based fallback.

None for relative URLs (Absolute and relative URLs) or for URLs without explicit port and URL.scheme without default port substitution.

>>> URL('http://example.com:8080').port
8080
>>> URL('http://example.com').port
80
>>> URL('page.html').port is None
True
URL.explicit_port

explicit_port part of URL, without scheme-based fallback.

None for relative URLs (Absolute and relative URLs) or for URLs without explicit port.

>>> URL('http://example.com:8080').explicit_port
8080
>>> URL('http://example.com').explicit_port is None
True
>>> URL('page.html').explicit_port is None
True

New in version 1.3.

URL.authority[source]

Decoded authority part of URL, a combination of user, password, host, and port.

authority = [ user [ ":" password ] "@" ] host [ ":" port ].

authority is empty string if all parts are missing.

>>> URL('http://john:pass@example.com:8000').authority
'john:pass@example.com:8000'

New in version 1.5.

URL.raw_authority

Encoded authority part of URL, a combination of user, password, host, and port. empty string if all parts are missing.

>>> URL('http://john:pass@хост.домен:8000').raw_authority
'john:pass@xn--n1agdj.xn--d1acufc:8000'

New in version 1.5.

URL.path[source]

Decoded path part of URL, '/' for absolute URLs without path part.

>>> URL('http://example.com/path/to').path
'/path/to'
>>> URL('http://example.com/шлях/сюди').path
'/шлях/сюди'
>>> URL('http://example.com').path
'/'
URL.path_qs[source]

Decoded path part of URL and query string, '/' for absolute URLs without path part.

>>> URL('http://example.com/path/to?a1=a&a2=b').path_qs
'/path/to?a1=a&a2=b'
URL.raw_path_qs[source]

Encoded path part of URL and query string, '/' for absolute URLs without path part.

>>> URL('http://example.com/шлях/сюди?ключ=знач').raw_path_qs
'/%D1%88%D0%BB%D1%8F%D1%85/%D1%81%D1%8E%D0%B4%D0%B8?%D0%BA%D0%BB%D1%8E%D1%87=%D0%B7%D0%BD%D0%B0%D1%87'

New in version 0.15.

URL.raw_path

Encoded path part of URL, '/' for absolute URLs without path part.

>>> URL('http://example.com/шлях/сюди').raw_path
'/%D1%88%D0%BB%D1%8F%D1%85/%D1%81%D1%8E%D0%B4%D0%B8'
URL.query_string[source]

Decoded query part of URL, empty string if query is missing.

>>> URL('http://example.com/path?a1=a&a2=b').query_string
'a1=a&a2=b'
>>> URL('http://example.com/path?ключ=знач').query_string
'ключ=знач'
>>> URL('http://example.com/path').query_string
''
URL.raw_query_string

Encoded query part of URL, empty string if query is missing.

>>> URL('http://example.com/path?ключ=знач').raw_query_string
'%D0%BA%D0%BB%D1%8E%D1%87=%D0%B7%D0%BD%D0%B0%D1%87'
URL.fragment[source]

Encoded fragment part of URL, empty string if fragment is missing.

>>> URL('http://example.com/path#fragment').fragment
'fragment'
>>> URL('http://example.com/path#якір').fragment
'якір'
>>> URL('http://example.com/path').fragment
''
URL.raw_fragment

Decoded fragment part of URL, empty string if fragment is missing.

>>> URL('http://example.com/path#якір').raw_fragment
'%D1%8F%D0%BA%D1%96%D1%80'

For path and query yarl supports additional helpers:

URL.parts[source]

A tuple containing decoded path parts, ('/',) for absolute URLs if path is missing.

>>> URL('http://example.com/path/to').parts
('/', 'path', 'to')
>>> URL('http://example.com/шлях/сюди').parts
('/', 'шлях', 'сюди')
>>> URL('http://example.com').parts
('/',)
URL.raw_parts[source]

A tuple containing encoded path parts, ('/',) for absolute URLs if path is missing.

>>> URL('http://example.com/шлях/сюди').raw_parts
('/', '%D1%88%D0%BB%D1%8F%D1%85', '%D1%81%D1%8E%D0%B4%D0%B8')
URL.name[source]

The last part of parts.

>>> URL('http://example.com/path/to').name
'to'
>>> URL('http://example.com/шлях/сюди').name
'сюди'
>>> URL('http://example.com/path/').name
''
URL.raw_name[source]

The last part of raw_parts.

>>> URL('http://example.com/шлях/сюди').raw_name
'%D1%81%D1%8E%D0%B4%D0%B8'
URL.suffix[source]

The file extension of name.

>>> URL('http://example.com/path/to.txt').suffix
'.txt'
>>> URL('http://example.com/шлях.сюди').suffix
'.сюди'
>>> URL('http://example.com/path').suffix
''
URL.raw_suffix[source]

The file extension of raw_name.

>>> URL('http://example.com/шлях.сюди').raw_suffix
'.%D1%81%D1%8E%D0%B4%D0%B8'
URL.suffixes[source]

A list of name’s file extensions.

>>> URL('http://example.com/path/to.tar.gz').suffixes
('.tar', '.gz')
>>> URL('http://example.com/шлях.тут.ось').suffixes
('.тут', '.ось')
>>> URL('http://example.com/path').suffixes
()
URL.raw_suffixes[source]

A list of raw_name’s file extensions.

>>> URL('http://example.com/шлях.тут.ось').raw_suffixes
('.%D1%82%D1%83%D1%82', '.%D0%BE%D1%81%D1%8C')
URL.query[source]

A multidict.MultiDictProxy representing parsed query parameters in decoded representation. Empty value if URL has no query part.

>>> URL('http://example.com/path?a1=a&a2=b').query
<MultiDictProxy('a1': 'a', 'a2': 'b')>
>>> URL('http://example.com/path?ключ=знач').query
<MultiDictProxy('ключ': 'знач')>
>>> URL('http://example.com/path').query
<MultiDictProxy()>

Absolute and relative URLs

The module supports both absolute and relative URLs.

Absolute URL should start from either scheme or '//'.

URL.is_absolute()[source]

A check for absolute URLs.

Return True for absolute ones (having scheme or starting with '//'), False otherwise.

>>> URL('http://example.com').is_absolute()
True
>>> URL('//example.com').is_absolute()
True
>>> URL('/path/to').is_absolute()
False
>>> URL('path').is_absolute()
False

New URL generation

URL is an immutable object, every operation described in the section generates a new URL instance.

classmethod URL.build(*, scheme=..., authority=..., user=..., password=..., host=..., port=..., path=..., query=..., query_string=..., fragment=..., encoded=False)[source]

Creates and returns a new URL:

>>> URL.build(scheme="http", host="example.com")
URL('http://example.com')

>>> URL.build(scheme="http", host="example.com", query={"a": "b"})
URL('http://example.com/?a=b')

>>> URL.build(scheme="http", host="example.com", query_string="a=b")
URL('http://example.com/?a=b')

>>> URL.build()
URL('')

Calling build method without arguments is equal to calling __init__ without arguments.

Note

Only one of query or query_string should be passed then ValueError will be raised.

URL.with_scheme(scheme)[source]

Return a new URL with scheme replaced:

>>> URL('http://example.com').with_scheme('https')
URL('https://example.com')

Returned URL may have a different port (default port substitution).

URL.with_user(user)[source]

Return a new URL with user replaced, auto-encode user if needed.

Clear user/password if user is None.

>>> URL('http://user:pass@example.com').with_user('new_user')
URL('http://new_user:pass@example.com')
>>> URL('http://user:pass@example.com').with_user('олекса')
URL('http://%D0%BE%D0%BB%D0%B5%D0%BA%D1%81%D0%B0:pass@example.com')
>>> URL('http://user:pass@example.com').with_user(None)
URL('http://example.com')
URL.with_password(password)[source]

Return a new URL with password replaced, auto-encode password if needed.

Clear password if None is passed.

>>> URL('http://user:pass@example.com').with_password('пароль')
URL('http://user:%D0%BF%D0%B0%D1%80%D0%BE%D0%BB%D1%8C@example.com')
>>> URL('http://user:pass@example.com').with_password(None)
URL('http://user@example.com')
URL.with_host(host)[source]

Return a new URL with host replaced, auto-encode host if needed.

Changing host for relative URLs is not allowed, use URL.join() instead.

>>> URL('http://example.com/path/to').with_host('python.org')
URL('http://python.org/path/to')
>>> URL('http://example.com/path').with_host('хост.домен')
URL('http://xn--n1agdj.xn--d1acufc/path')
URL.with_port(port)[source]

Return a new URL with port replaced.

Clear port to default if None is passed.

>>> URL('http://example.com:8888').with_port(9999)
URL('http://example.com:9999')
>>> URL('http://example.com:8888').with_port(None)
URL('http://example.com')
URL.with_path(path)[source]

Return a new URL with path replaced, encode path if needed.

>>> URL('http://example.com/').with_path('/path/to')
URL('http://example.com/path/to')
URL.with_query(query)[source]
URL.with_query(**kwargs)

Return a new URL with query part replaced.

Unlike update_query() the method replaces all query parameters.

Accepts any Mapping (e.g. dict, MultiDict instances) or str, auto-encode the argument if needed.

A sequence of (key, value) pairs is supported as well.

Also it can take an arbitrary number of keyword arguments.

Clear query if None is passed.

Note

The library accepts str, float, int and their subclasses except bool as query argument values.

If a mapping such as dict is used, the values may also be list or tuple to represent a key has many values.

Please see Why isn’t boolean supported by the URL query API? for the reason why bool is not supported out-of-the-box.

>>> URL('http://example.com/path?a=b').with_query('c=d')
URL('http://example.com/path?c=d')
>>> URL('http://example.com/path?a=b').with_query({'c': 'd'})
URL('http://example.com/path?c=d')
>>> URL('http://example.com/path?a=b').with_query({'c': [1, 2]})
URL('http://example.com/path?c=1&c=2')
>>> URL('http://example.com/path?a=b').with_query({'кл': 'зн'})
URL('http://example.com/path?%D0%BA%D0%BB=%D0%B7%D0%BD')
>>> URL('http://example.com/path?a=b').with_query(None)
URL('http://example.com/path')
>>> URL('http://example.com/path?a=b&b=1').with_query(b='2')
URL('http://example.com/path?b=2')
>>> URL('http://example.com/path?a=b&b=1').with_query([('b', '2')])
URL('http://example.com/path?b=2')

Changed in version 1.5: Support list and tuple as a query parameter value.

Changed in version 1.6: Support subclasses of int (except bool) and float as a query parameter value.

URL.update_query(query)[source]
URL.update_query(**kwargs)

Returns a new URL with query part updated.

Unlike with_query() the method does not replace query completely.

Returned URL object will contain query string which updated parts from passed query parts (or parts of parsed query string).

Accepts any Mapping (e.g. dict, MultiDict instances) or str, auto-encode the argument if needed.

A sequence of (key, value) pairs is supported as well.

Also it can take an arbitrary number of keyword arguments.

Clear query if None is passed.

Mod operator (%) can be used as alternative to the direct call of URL.update_query().

Note

The library accepts str, float, int and their subclasses except bool as query argument values.

If a mapping such as dict is used, the values may also be list or tuple to represent a key has many values.

Please see Why isn’t boolean supported by the URL query API? for the reason why bool is not supported out-of-the-box.

>>> URL('http://example.com/path?a=b').update_query('c=d')
URL('http://example.com/path?a=b&c=d')
>>> URL('http://example.com/path?a=b').update_query({'c': 'd'})
URL('http://example.com/path?a=b&c=d')
>>> URL('http://example.com/path?a=b').update_query({'c': [1, 2]})
URL('http://example.com/path?a=b&c=1&c=2')
>>> URL('http://example.com/path?a=b').update_query({'кл': 'зн'})
URL('http://example.com/path?a=b&%D0%BA%D0%BB=%D0%B7%D0%BD')
>>> URL('http://example.com/path?a=b&b=1').update_query(b='2')
URL('http://example.com/path?a=b&b=2')
>>> URL('http://example.com/path?a=b&b=1').update_query([('b', '2')])
URL('http://example.com/path?a=b&b=2')
>>> URL('http://example.com/path?a=b&c=e&c=f').update_query(c='d')
URL('http://example.com/path?a=b&c=d')
>>> URL('http://example.com/path?a=b').update_query('c=d&c=f')
URL('http://example.com/path?a=b&c=d&c=f')
>>> URL('http://example.com/path?a=b') % {'c': 'd'}
URL('http://example.com/path?a=b&c=d')

Changed in version 1.0: All multiple key/value pairs are applied to the multi-dictionary.

New in version 1.5: Support for mod operator (%) to update the URL’s query part.

Changed in version 1.5: Support list and tuple as a query parameter value.

Changed in version 1.6: Support subclasses of int (except bool) and float as a query parameter value.

URL.with_fragment(fragment)[source]

Return a new URL with fragment replaced, auto-encode fragment if needed.

Clear fragment to default if None is passed.

>>> URL('http://example.com/path#frag').with_fragment('anchor')
URL('http://example.com/path#anchor')
>>> URL('http://example.com/path#frag').with_fragment('якір')
URL('http://example.com/path#%D1%8F%D0%BA%D1%96%D1%80')
>>> URL('http://example.com/path#frag').with_fragment(None)
URL('http://example.com/path')
URL.with_name(name)[source]

Return a new URL with name (last part of path) replaced and cleaned up query and fragment parts.

Name is encoded if needed.

>>> URL('http://example.com/path/to?arg#frag').with_name('new')
URL('http://example.com/path/new')
>>> URL('http://example.com/path/to').with_name("ім'я")
URL('http://example.com/path/%D1%96%D0%BC%27%D1%8F')
URL.with_suffix(suffix)[source]

Return a new URL with suffix (file extension of name) replaced and cleaned up query and fragment parts.

Name is encoded if needed.

>>> URL('http://example.com/path/to?arg#frag').with_suffix('.doc')
URL('http://example.com/path/to.doc')
>>> URL('http://example.com/path/to').with_suffix('.cуфікс')
URL('http://example.com/path/to.c%D1%83%D1%84%D1%96%D0%BA%D1%81')
URL.parent[source]

A new URL with last part of path removed and cleaned up query and fragment parts.

>>> URL('http://example.com/path/to?arg#frag').parent
URL('http://example.com/path')
URL.origin()[source]

A new URL with scheme, host and port parts only. user, password, path, query and fragment are removed.

>>> URL('http://example.com/path/to?arg#frag').origin()
URL('http://example.com')
>>> URL('http://user:pass@example.com/path').origin()
URL('http://example.com')
URL.relative()[source]

A new relative URL with path, query and fragment parts only. scheme, user, password, host and port are removed.

>>> URL('http://example.com/path/to?arg#frag').relative()
URL('/path/to?arg#frag')

Division (/) operator creates a new URL with appended path parts and cleaned up query and fragment parts.

The path is encoded if needed.

>>> url = URL('http://example.com/path?arg#frag') / 'to/subpath'
>>> url
URL('http://example.com/path/to/subpath')
>>> url.parts
('/', 'path', 'to', 'subpath')
>>> url = URL('http://example.com/path?arg#frag') / 'сюди'
>>> url
URL('http://example.com/path/%D1%81%D1%8E%D0%B4%D0%B8')
URL.joinpath(*other, encoded=False)[source]

Construct a new URL by with all other elements appended to path, and cleaned up query and fragment parts.

Passing encoded=True parameter prevents path element auto-encoding, the caller is responsible for taking care of URL correctness.

>>> url = URL('http://example.com/path?arg#frag').joinpath('to', 'subpath')
>>> url
URL('http://example.com/path/to/subpath')
>>> url.parts
('/', 'path', 'to', 'subpath')
>>> url = URL('http://example.com/path?arg#frag').joinpath('сюди')
>>> url
URL('http://example.com/path/%D1%81%D1%8E%D0%B4%D0%B8')
>>> url = URL('http://example.com/path').joinpath('%D1%81%D1%8E%D0%B4%D0%B8', encoded=True)
>>> url
URL('http://example.com/path/%D1%81%D1%8E%D0%B4%D0%B8')

New in version 1.9.

URL.join(url)[source]

Construct a full (“absolute”) URL by combining a “base URL” (self) with another URL (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL, e.g.:

>>> base = URL('http://example.com/path/index.html')
>>> base.join(URL('page.html'))
URL('http://example.com/path/page.html')

Note

If url is an absolute URL (that is, starting with // or scheme://), the URL‘s host name and/or scheme will be present in the result, e.g.:

>>> base = URL('http://example.com/path/index.html')
>>> base.join(URL('//python.org/page.html'))
URL('http://python.org/page.html')

Human readable representation

All URL data is stored in encoded form internally. It’s pretty good for passing str(url) everywhere URL string is accepted but quite bad for memorizing by humans.

URL.human_repr()[source]

Return decoded human readable string for URL representation.

>>> url = URL('http://εμπορικόσήμα.eu/這裡')
>>> str(url)
'http://xn--jxagkqfkduily1i.eu/%E9%80%99%E8%A3%A1'
>>> url.human_repr()
'http://εμπορικόσήμα.eu/這裡'

Default port substitution

yarl is aware about the following scheme -> port translations:

scheme

port

'http'

80

'https'

443

'ws'

80

'wss'

443

URL.is_default_port()[source]

A check for default port.

Return True if URL’s port is default for used scheme, False otherwise.

Relative URLs have no default port.

>>> URL('http://example.com').is_default_port()
True
>>> URL('http://example.com:80').is_default_port()
True
>>> URL('http://example.com:8080').is_default_port()
False
>>> URL('/path/to').is_default_port()
False

Cache control

IDNA conversion used for host encoding is quite expensive operation, that’s why the yarl library caches IDNA encoding/decoding calls by storing last 256 encodes and last 256 decodes in the global LRU cache.

yarl.cache_clear()

Clear IDNA caches.

yarl.cache_info()

Return a dictionary with "idna_encode" and "idna_decode" keys, each value points to corresponding CacheInfo structure (see functools.lru_cache() for details):

>>> yarl.cache_info()
{'idna_encode': CacheInfo(hits=5, misses=5, maxsize=256, currsize=5),
 'idna_decode': CacheInfo(hits=24, misses=15, maxsize=256, currsize=15)}
yarl.cache_configure(*, idna_encode_size=256, idna_decode_size=256)

Set IDNA encode and decode cache sizes (256 for each by default).

Pass None to make the corresponding cache unbounded (may speed up the IDNA encoding/decoding operation a little but the memory footprint can be very high, please use with caution).

References

yarl stays on shoulders of giants: several RFC documents and low-level urllib.parse which performs almost all gory work.

The module borrowed design from pathlib in any place where it was possible.

See also

RFC 5891 - Internationalized Domain Names in Applications (IDNA): Protocol

Document describing non-ASCII domain name encoding.

RFC 3987 - Internationalized Resource Identifiers

This specifies conversion rules for non-ASCII characters in URL.

RFC 3986 - Uniform Resource Identifiers

This is the current standard (STD66). Any changes to yarl module should conform to this. Certain deviations could be observed, which are mostly for backward compatibility purposes and for certain de-facto parsing requirements as commonly observed in major browsers.

RFC 2732 - Format for Literal IPv6 Addresses in URL’s.

This specifies the parsing requirements of IPv6 URLs.

RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax

Document describing the generic syntactic requirements for both Uniform Resource Names (URNs) and Uniform Resource Locators (URLs).

RFC 2368 - The mailto URL scheme.

Parsing requirements for mailto URL schemes.

RFC 1808 - Relative Uniform Resource Locators This Request For

Comments includes the rules for joining an absolute and a relative URL, including a fair number of “Abnormal Examples” which govern the treatment of border cases.

RFC 1738 - Uniform Resource Locators (URL)

This specifies the formal syntax and semantics of absolute URLs.