The cause of falling sales via online channels were to be investigated for Sales. After a thorough analysis, the long loading times of the web platform were identified as the cause. The bad user experience associated with the long loading times led to many purchase cancellations.
The highly frequented website—part of the Alexa Top 1,000—needs to be delivered faster.
Due to the historical design and the high technical complexity of the web platform, conventional approaches such as the use of a CDN service in combination with the existing hosting were not sufficient.
In the first step, existing components were adapted in order to be able to run independently of the legacy environment. Subsequently, the now “cloud-ready” components were deployed in five Microsoft Azure regions (West US, East US, West Europe, Japan East, and Australia East). In the same step, tasks such as SSL/TLS termination, caching, and filtering were outsourced to reverse proxies. The deployment of the proxies, based on NixOS images generated by the CI/CD platform, can be automatically scaled up and down using Azure VMSS (Virtual Machine Scale Sets). After all critical components had been checked, and with the aid of the Azure Traffic Manager, GeoDNS functionality of the website traffic could be routed to the closest deployment.
– Microsoft Azure
– Azure Resource Manager (templates that describe Azure Deployment)
– Azure Virtual Machine Scale Sets (automatic scaling of capacities depending on the number of visitors)
– Azure App Service (deployment of legacy .NET applications)
– Azure Traffic Manager (GeoDNS)
– NGINX (SSL/TLS termination, caching, filtering)
– NixOS (Linux distribution for the operation of all non .NET components)
For website visitors outside Europe, loading times were massively reduced.
The availability of the website has improved as visitors can be seamlessly redirected to other regions in the event of a malfunction.
The costs for operating the website have been reduced (SSL/TLS termination and caching through the proxies has significantly reduced the number of required instances).
After the measurable success, three more regions (Brazil South, East Asia, and Southeast Asia) could go live within a few hours.
Owing to an increasingly complex live platform in their own data centers, both development processes and test environments moved ever further away from production reality. It was very difficult to estimate the exact behavior of software and infrastructure changes, let alone to test under simulated real conditions so that uninterrupted operation could be guaranteed under all circumstances.
The goal was to provide an internal, highly flexible cloud platform for development departments, which would allow the depiction of a reliable CI-CD pipeline and the on-demand testing of future infrastructure changes. In addition, developers and DevOps engineers should also be able to provision needed resources at a self-service portal.
In addition, the underlying storage system should be S3 compatible, block-storage-ready, and freely scalable to provide resources for future projects and, if necessary, serve as a company-wide backup system.
We chose a four-node ESXi cluster as hypervisor, which covers its need for virtual machine block storage from a Ceph cluster connected via iSCSI-Multipath. In order to remain operational in the event of any problems with the storage system, the local disks of the hypervisor clusters were merged with EMC ScaleIO to form another horizontally scalable storage tier containing system-critical cluster and storage management machines.
In order to provide space-saving fast S3-compatible storage, erasure-coded pools were configured in the Ceph storage.
For development, a Jenkins-based portal was provided that makes individually preconfigured machines, coreOS and Kubernetes clusters provisionable as required.
ESXi, Ceph, ScaleIO, SLES, coreOS, Kubernetes, Docker, Jenkins, Ansible
The internal cloud platform has established itself as a reliable and workable development platform that facilitates both day-to-day processes, evaluates new technologies and serves for prototyping. It can also test more complex migration scenarios.