12 Jun Scalable AI Infrastructure
By Dave Luttrell (AI Capability Lead | Principal Consultant)
Current Landscape of AI Infrastructure
The increase in demand for AI enabling infrastructure has seen the semiconductor industry boom over the last 5 years, with the major Tech Vendors and Hyperscalers piling on large scale investment for specialised GPU & TPU’s.
This is also evidenced by LLM general availability across local regions from a lot of the major cloud players. Other regions are being serviced better; however, this raises questions around a trade off, data sovereignty and security vs enabling the latest technology.
At a macro level this seems like a whirlwind to consume, as it can be a hard question to answer. Bringing it in house though these questions should strike closer to home.
- What do I need to do, to Scale AI initiatives in House?
- How do I flex to meet the changing needs of models?
- Am I getting value for money for my current spend?
What Do We Mean by AI Infrastructure & Scalability in This Context?
This depends on what we mean by our definition of AI, by and large we would define AI is the use of AI & Machine Learning to mimic human like behaviour to automate tasks, solve problems or conduct logic or reasoning activities.
Within the landscape of AI, Infrastructure means different things to different enterprises. Are we the producers, enablers, consumer or builders of AI based services.
Producer
Builder of AI Products for Consumption such as Language Models, Helper Functions, Applied AI, Deep Learning Models, or Bespoke Algorithms.
Think: OpenAI, Anthropic
Enabler
The companies which provide enabling infrastructure, compute and resource to producers of AI based services, setting the foundation allowing the producers to create and deploy.
Think: NVIDIA, AMD
Builder
Builder of products or propositions, which are enhanced using AI enabled functionality. To automate existing processes or compliment existing propositions.
Think: Snowflake Document AI
Consumer
Ultimate end user of AI enabled products, using these to automate business processes and functions, servicing their core value propositions. Leveraging these to drive value and efficiency.
Think: Banks, Insurance, Retail
Some organisations can sit across many categories such as hyperscalers who provide a variety of the above. We would imagine that a high majority of organisations reading this article reside downstream, as producers or builders.
Scalable AI Infrastructure
Builder
The ability to provide robust and resilient infrastructure for the integration of AI into their existing value propositions, to service consumers of their existing value proposition.
Key priorities: Failover, low latency, positive customer experience, transparency, resilience, enhanced propositions, positive ROI
Consumer
Supporting technical infrastructure, integrated into their existing technology ecosystem, supporting the ability to enhance existing operations, by utilising AI enabled services and workflows to automate tasks.
Key Priorities: Cost, efficiency, reliability, quality, integrability, explainability, positive ROI, metrics, low risk tolerance
At this current moment in the market, the further right we go in the initial visual, you generally will find a lower level of maturity, or experience in adopting and integrating AI into their existing solutions infrastructure. This is often, due to the fact that it resides outside of their core competencies, that or they may not have the runway in enabling non-deterministic compute which is inherited from the deployment of AI solutions.
Scalability in this sense is the technical ability to adapt to meet key priorities and integrate technical solutions into their ecosystem.
What Makes This Challenging?
To follow on, as we’ve alluded to initially, AI based architecture may exist outside of their existing core competency base. Differences in AI architecture can span several paradigm shifts which exist in trying to implement into their existing ecosystem. These hold true whether you’re implementing a bespoke solution or procuring an integrated solution.
Service Based Architectures
The first key thing that is generally observable when implementing an AI solution, is that it veers away from the traditional Monolithic or Hub and Spoke based architectures which are de-facto within Australian enterprise compute. Migrating to loosely coupled architectures is a fundamental paradigm shift and can cause headaches from a DevSecOps, consumption and observability perspective.
Language Models can indeed be Large … But AI might not be in the Future
Large language models are currently in vogue, however in reality they are only one piece of the puzzle when it comes to Generative Deep Neural Networks. There are many types which do exist in the current pattern we’re adopting.
Yann Le Cun – VP AI in Meta, also think that this may change in the future. Indeed, a key fallacy of large language models is the fact that they struggle with the ability to concurrently perform various actions in synchronicity. They are grossly inefficient. To aid this Yann has spruiked the adoption of alternative architectures as a way forward, applying techniques such as Joint-Embedding Predictive Architectures (JEPA) or Energy Based Models.
The key point here being, that to optimally automate a task, the approach may change, and as such architectures should account for fluidity of these developments.
Budgeting for AI is a Headache
It really is unleashing the great unknown when deploying a language model into production, as it really is hard to estimate usage prior to deploying. Which makes this an interesting estimation and measurement problem. It aligns to how AI is configured and how it’s used.
Token consumption, as a key input and output metric of Large Language Models is a key observability metric, alongside considering the context window of the model making predictions.
Endpoint usage and consumption, alongside compute cost of containers and runtime are also other key items which a tight handle should be kept on.
Existing Workflow Tooling wasn’t designed with Enterprise AI in mind
Alluding to anecdotal experience with client’s this is a key question which crops up. “How do I deploy AI into my Workflow?”
It can be a bit of a misnomer that picking up the latest bit of kit off the shelf will mire any cracks showing up. As when deploying AI, you inherit, key observability and transparency mandates aligned to the predictive accuracy of models being deployed.
Cohesiveness and how it all hangs together, is another consideration in this regard. AI as a Service needs to work together in a co-ordinated symphony across key domains and tasks being Automated across an organisation. New architectural approaches such as Service Mesh’s and Deterministic Graphs can bring a semblance of order to the chaos.
Navigating Complexity – Engineering for Scale
A key fallacy of organisations adopting AI is the fact you’re trying to control something which in a sense is wildly experimental. Indeed, this is the science aspect of AI, which doesn’t necessarily align with traditional determinism. We have tangible evidence that AI initiatives, aligning to organisational value, have a high degree of efficacy when deployed in organisations.
Instead of worrying about the experiment, it’s about deploying an infrastructure which rewards experimentation, and institutes approaches with a high degree of efficacy. So how should you configure your infrastructure to enable AI initiatives scalably?
- Reusable Preapproved Componentry
Teams assemble value, they don’t reinvent the plumbing. When adopting AI, we need to ensure that a clear and consistent basis for enabling AI is deployed.
This can be challenging given the variety of componentry needing to be utilised to enable AI solutions across the organisation. Think a principles-based approach when looking at key components, then cascade across components and frameworks as familiarity increases, alongside lessons learned.

Figure 1 – Complex Infrastructure of AI Agents
- Embrace AI as a Service
As alluded to earlier, thinking about how we co-ordinate AI based service architectures helps us co-ordinate and distribute effective production workflows. Enabling patterns such as deterministic graphs and service mesh can help us enable us integrate AI into our ecosystem in a controlled and sustainable way. Giving on demand access where needed to heavyweight capability, without the need to queue for a Data Scientist.
- Centralise Validation
Measurement, accuracy, and monitoring are not a checkbox exercise when it comes to AI. They’re a living thing. This means we need a centralised point of view when enabling across the organisation. Data Quality, Bias, Drift & Security metrics, all need to be collated in an observable way, and acted upon if the model’s pose a risk from a reliability or security perspective.
- Clear Pathways for Experimentation
There needs to be a clear pathway to production to experiments, not that we are in the process of productionising experiments, moreso that we are applying production rigor to AI initiatives. This needs to be thought about early, considering previous points. Think infrastructure as code, build and deployment pipelines, model enhancement and validation, all need to be engrained ideally from the start, and at the very minimum once there is a semblance of efficacy.