Parse URLs
ClickHouse was originally created by a web company, so you shouldn't be surprised to find that it contains many functions to help you work with URLs.
Get the protocol
You can easily extract the protocol using the built-in function protocol()
.
For example:
select
protocol(url) as protocol
from
analytics_events
┌─protocol─┐
│ https │
└──────────┘
Get the domain
You can also extract the domain using the domain()
function.
select
domain(url) as protocol
from
analytics_events
┌─referrer_domain──┐
│ www.tinybird.co │
└──────────────────┘
If you want to excude the www
from the domain, use the `domainWithoutWWW() function.
select
domainWithoutWWW(url) as protocol
from
analytics_events
┌─referrer_domain──┐
│ tinybird.co │
└──────────────────┘
There's also a handful of functions for working with subdomains, such as:
firstSignificantSubdomain
cutToFirstSignificantSubdomain
cutToFirstSignificantSubdomainWithWWW
cutToFirstSignificantSubdomainCustom
cutToFirstSignificantSubdomainCustomWithWWW
firstSignificantSubdomainCustom
Get URL query parameters
You might have experienced just how painful it can be to work with URL Query Parameters in other databases, but ClickHouse makes it easy with the extractURLParameter()
function.
Imagine you have a URL like the following:
https://tinybird.co/?utm_source=youtube&utm_medium=referral&utm_campaign=livessessions&utm_content=description
You want to extract the UTM parameters to analyse your marketing campaigns.
Here's how you can achieve that with ClickHouse:
select
extractURLParameter(url, 'utm_source') as utm_source,
extractURLParameter(url, 'utm_medium') as utm_medium,
extractURLParameter(url, 'utm_campaign') as utm_campaign,
extractURLParameter(url, 'utm_content') as utm_content
from
analytics_events
┌─utm_source─┬─utm_medium─┬─utm_campaign──┬─utm_content──┐
│ youtube │ referral │ livessessions │ description │
└────────────────────────────────────────────────────────┘