Collect Google Analytics data in real-time with Microsoft Azure – Part 2

September 8, 2021

Previously, in Part 1, we introduced the basic architecture you need to implement a relatively low-cost, near the real-time pipeline to capture your web traffic using Google Tag Manager and Azure.

In this article, we will explore how to start implementing this on a technical level which includes configuration of Google Tag Manager and the provisioning and configuration of the necessary Azure resources.

With this in place, you will then have 
1) the ability to immediately start storing your web traffic data for further analysis to build a history of your web traffic and
2) to further extend this solution for real-time reports and dashboards.

In summary, there are 4 main steps:

  1. Provision the Azure Event Hub and configure it as the endpoint
  2. Creating the Secure Access Policy
  3. Creating endpoint URL with a Secure Access Signature to allow GTM to send data to the Event Hub
  4. Configuring the GTM custom JavaScript Variable.

We are assuming in this scenario, that you already have an Azure subscription, a Google Analytics tracking on your website and have a Google Tag Manager account with a Container, and have already configured various tags and variables, etc.

1 – Provision and configure the Event Hub

Login to your Azure subscription, navigate to the Event Hub blade, and click Create.

Enter the necessary details, in this case, we are choosing the standard pricing tier to provide the Event Hub Capture feature, and throughput units then click on Review + create (or Click Next to enter any required custom tagging):
Picture1.png

After the new resource has passed validation, click Create to provision the resource (this can take a few mins in some cases).

Navigate to your new resource, and under the heading Entities, choose Event Hubs and create a new Event Hub. Enter the name for your event hub, configure the necessary properties. Turn on Capture and then configure your Capture Provider, which will be your Azure Storage Account and container. For more information on the Event Hub Capture feature, follow this link.

Picture2.png

Note: Make sure to configure your partitions, retention, time window, and partition strategy according to your organization's policies. In this example, we’re choosing the default settings. In addition, the Event Hub Capture files can only be output in AVRO format.

2 – Creating the Secure Access Policy

For this example, we will only be configuring the access policy for sending data to the Event Hub, for further downstream applications/processes you will need to add a policy for those applications to Listen to the Event Hub.

Navigate to your newly created Event Hub entity and click Shared access policies, choose +Add, check Send, and add an appropriate policy name:

Picture3.png

Once create, click on the policy and copy the Primary key value, and store it in a safe location such as in the Azure Key Vault.

3. Creating endpoint URL with a Shared Access Signature to allow GTM to send data to the Event Hub

In this step, we will be generating an endpoint URL with a Shared Access Signature. The most important step of this is converting the SAS key to a HMAC (Hash-Based Message Authentication code).

The endpoint URL will be in the following format:

https://<event hub namespace>.servicebus.windows.net/<event hub entity name>/<policy name>

For example:

https://evhns-bizonedemo.servicebus.windows.net/evh-bizone-web-traffic/GTM

The Shared Access Signature consists of several parts:

  • Signed Resource (sr) – this is the encoded endpoint URL
  • Signature (sig)- this the HMAC encoded SAS key
  • Signature Expiry (se) – the expiry in UNIX EPOCH time (the number of seconds that have elapsed since the Unix epoch)
  • Signature Key Name (skn) – the same of the Secure Access Policy created in step 2.

In order to do this, you will require some basic programming skills. For more information on how to generate the Shared Access Signature, follow this link, choose your preferred programming language and input the necessary values. Ensure to choose a reasonable expiry as this will be hardcoded into your GTM custom JavaScript task, otherwise, you risk data loss.

In the end, the Shared Access Signature should look something like this:

SharedAccessSignature sr=https%3a%2f%2fevhns-bizonedemo.servicebus.windows.net%2fevh-bizone-web-traffic%2fGTM&sig=<redacted>&se=1662204112&skn=GTM

4 – Configuring the GTM custom Javascript Variable.

The final step in the process is to create your custom JavaScript variable in Google Tag Manager to enable sending data to the Event Hub.

To do this, log in to your GTM account and select your container. Click Variables > New, input a name for the variable, and then edit the Variable Configuration. Choose the variable type Custom JavaScript.

Copy and paste the example code below, being sure to include the endpoint URL and the Shared Access Signature detailed in the previous steps:

function() {
 
  // Endpoint info
  var endpoint = '<end point URL>'
  var accessKey = 'SharedAccessSignature <your SAS information>'
  
  return function(model) {   
    
    var globalSendTaskName = '_' + model.get('trackingId') + '_sendHitTask';
    
    var originalSendHitTask = window[globalSendTaskName] = window[globalSendTaskName] || model.get('sendHitTask');
    
    model.set('sendHitTask', function(sendModel) {
      var payload = sendModel.get('hitPayload');
      var body = {};
      body['payload'] = payload;
      
      originalSendHitTask(sendModel);
      var request = new XMLHttpRequest();
      var path = endpoint;
      request.open('POST', path, true);
      request.setRequestHeader('Content-type', 'text/plain; charset=UTF-8');
      request.setRequestHeader('Authorization', accessKey);
      request.send(JSON.stringify(body));
    });
  };
}

Note: This is only a very simple implementation of the Custom JavaScript and specific to sending to Azure with no error handling. For more information on this and for more complex implementations, we recommend visiting Simo Ahava’s blog where he provides a wealth of knowledge and code samples to help you fit this solution to your specific requirement. There may be also different implementations depending on your endpoint, such as if you are using Amazon Web Services.

Next, if you don’t already have one, create another new variable, this time with the variable type Google Analytics Settings. In the Fields to set, you should add any built-in variables along with any custom dimensions you want to send in the payload and then the customTask. When you’re finished, it should look something like this:

Picture5.png

The last piece of the puzzle is then to configure a Tag, with the Type Google Analytics: Universal Analytics and add the Google Analytics Settings variable you created in the previous step.

Finally, save the changes, then Submit and Publish changes to your container. Then make sure to check your Azure Storage container (using either the portal or Azure Storage Explorer) to ensure you have data flowing through.

With this in place, you should now have data stored on your Azure Storage account which you can further analyze or process downstream using other tools such as Azure Databricks or Azure Synapse Analytics Spark Pools.

If you are not seeing any data in the storage after 5-10 mins, revisit all the previous steps to ensure you have configured all resources correctly. You can also view the metrics for the Event Hub in the Overview tab to confirm there are Messages being sent.

In subsequent articles, we will walk you through how to further extend this solution by using Stream Analytics and lastly using Power BI to create near real-time reports and dashboards.