Monitoring customer experience at Douglas' CDN

While working with the E-Commerce group at Parfümerie Douglas, Europe's premier beauty retail brand and a top-25 Germanonline business, we built a real-time CDN log streaming service in order to quickly detect error pages that are delivered to customers.

Edgesense is a serverless log-to-metrics solution that allows DevOps teams to see what their customers are seeing in real time, processing CDN request logs at $0.50 per million lines. It is available as open-source software on GitHub.

Why Observability Matters

Studies show that DevOps teams which have comprehensive monitoring and observability solutions are 30% more likely to be in the highest-performing group. However over 50% of developers and operators rely on customers to tell them about errors. This case study shows how to close an important gap inobservability: monitoring logs from the CDN (Content Delivery Network) logs andusing them to pinpoint error responses seen by customers.

Client Situation

Douglas provides a highly customized shopping experience to consumers in 14 European countries through online stores such as www.douglas.deand www.douglas.nl. Agile development teams are frequently changing various parts of the shop software. Occasionally, new software versions have created issues where customers would see error pages ("page not found" or "sorry, an error occurred").  

Those problems were not always discovered quickly. Though infrastructure as well as user experience are constantly monitored, Douglas realized that there was not enough visibility into customer-facing errors at the web server level. The goal was to get real-time insights into errors that shop servers were delivering to customers, detect anomalies and pinpoint areas where functionality may have broken.

Requirements:

  • Use logs from the Content Delivery Network (CDN) to see what customers are seeing, including API calls from mobile apps which are not covered by browser-based analytics
  • Simple and low-cost. General-purpose log analytics systems such as Splunk were considered too complex and too expensive considering the amount of data to be processed
  • Scalable - need to deal with large peaks in usage during marketing promotions
  • Drill down on types of pages - allow development teams to see metrics for the pages they own.
  • Drill down on types of errors - "not found" (HTTP 404) and various HTTP 5XX errors - as well as validation ofcorrect redirection behavior (HTTP 301, 302)
  • Build custom dashboards and create alerts on the metric

Solution

Turning logs into real-time metrics. In order to get actionable insights from the large amount of data, the decision was made to process log data immediately when received, and generate a stream of metrics that shows

  1. which types of pages were being delivered, and
  2. what status code they were delivered with - successful or erroneous.

Those metrics are available to development teams on dashboards in near real-time. The metrics are also used to send alerts when error rates rise.

Using serverless architecture, the solution is able to scale rapidly when the amount of incoming logs increases. Since all system components are billed based on usage, overall cost scales well through seasonal demand. 

Ad-hoc analytics. In addition to providing real-time metrics, the solution also allows querying the raw log data using SQL, again billed based on usage. This capability has proved useful for learning about behavior of crawlers and web scrapers, drilling down into specific page requests, as well as generating reports.

Results

Using Edgesense, Douglas teams have been able to quickly detect problems after software releases in a number of instances, leading to a significant decrease in meantime to resolve (MTTR) for this kind of issue.

CDN log data is also being used in reporting DevOps KPIs such as error rates.

The solution has been very cost efficient: Paying $0.50 per million customer web requests processed.

Being a serverless cloud-based solution, the system has required extremely low levels of maintenance.

Douglas decided to make a simplied version of this system available as opn-source software on GitHub: Edgesense