The Guide to Practical and Pragmatic IT Architecture Design

Why IT Architecture matters

IT Architecture is important

Many can design and build a tent, number of us can do houses, but when it comes to design buildings or skyscrapers, only few can do. With systems it is not different, where many can write a program of code, but only few can design a robust, flexible and agile system. It matters.

IT architecture failure

If IT architecture were that easy, we would have a world with perfectly working software. But unfortunately as we all know that is still not the case. And in many of these cases, a weak badly designed system architecture is the culprit. 

Number of IT failures and Impacts
  • In 2020 after London’s Heathrow airport was hit by technical issues affecting departure boards and check-in systems, it had to cancel over 100 flights and leaving passengers with little information about their flights and limiting the use of electronic tickets.
  • Amazon Web Services faced network problems in 2013 due to issues with its datacenter in US and Canada and were not able to connect to the site using their public IP address for nearly 20 minutes. This outage cost an estimated $2.5 million, which works out to $163,622 per minute.
  • 30 million users of a telecommunication company in the UK, O2, lost access to data services after a software problem left them unable to use 3G and 4G services in 2018. The nationwide outage also affected Transport for London's live electronic timetables at bus stops.
  • In 2013, Adobe faced a cyber-attack and lost sensitive data including credit card numbers of at least 38 million users. This was one of the worst cases of hacking in recent times, which made Adobe initiate a password reset for all its users.
  • Millions of TSB customers were locked out of their accounts after an IT upgrade led to an online banking outage causing months of disruption. The outage was caused by the move to a new banking platform following its split from Lloyds Banking Group. Many customers experienced problems being not able to login, while others were shown details from other people's accounts or inaccurate credits and debits on their own. Later it was concluded that the Software and Architecture had not been extensively tested.
  • In 2018, doctors and staff of the National Health Care in the UK experienced a widespread computer failure that led to them being unable to access patient files due to IT problems and caused wide disruption as doctors were unable to access patients´ results. 
  • The Coke website crashed during Super Bowl in 2013 due to heavy ad driven traffic. An interactive commercial, which invited visitors to vote online was found to be the cause. According to, the page had a slow response time of an average load time of 62 seconds and the website was down for most users.
  • Facebook platforms suffered severe downtime in 2019 due to a technical issue and users worldwide could not load photos in the Facebook News Feed, view stories on Instagram, or send messages in WhatsApp. Later it mentioned it was accidentally triggered during an IT routine maintenance.
  • Gmail faced delays in delivering emails to its users in 2013 that was caused by a dual network failure. That is a very rare event in which two separate, redundant network paths stop working at the same time. The outage lasted for about 11 hours, affecting 29% of its users. Approximately 1.5% of its messages were delayed by more than two hours.

As not all failures can account for a badly designed IT architecture, it shows that IT has become a key critical component for system design to provide required robustness and scalability.

Common Issues and Problems

Most common issues and problems that are found from a mediocre technology architecture design are the following:
  • Technical problems: 
    • Performance problems, due to bad end-to-end design causing expensive rewrites afterwards,
    • System outages, due to lack of duplicated instances and proper error-handling mechanism,
    • Scalability problems, where systems cannot handle anticipated growth or higher peaks of volumes,
  • Inconsistent quality in Design, 
    • lack of consistency and coherence in  design
    • unclear accountability and ownership, and 
    • proliferation of technology choices making the solution more costly and complex..
  • Low Productivity and/or high complexity resulting in costly build: 
    • Lack of guidance and prescription, leaving team to overbuild the solution components,  
    • Limited re-use, losing efficiency within the solution where common services or (e.g. integration, data) platforms can be re-used instead of duplicating same functionalities multiple times,
    • Costly custom development where frameworks, libraries and technologies can be leveraged to accelerate development 
    • etc.

IT Failure

IT Architecture is designing and deciding how to design the fundamentals of the platform. And if the platform foundation has not been well thought out or is not right, the system may not be strong enough to hold the total weight of all people accessing it at a moment of peak. 

On the other hand, it also needs to protect over investment to avoid that the architecture is too strong, as the project may become too expensive and too hard to facilitate for residential use. 

From the above one can see that a good IT architecture definition is key and important to any software delivery. In the following chapters we show how to get to a well defined architecture blueprint.  

No comments: