System requirements: Rethinking data and architecture in the world of generative AI

One of our Data Scientists, Aleksandar Ivanov explores what businesses leading the way in this AI-powered economy, are doing differently from the rest.

2025-03-24

AI Infrastructure

00:00

--:--

Listen on SoundCloud

System requirements: Rethinking data and architecture in the world of generative AI

Right now, in the world of AI, a familiar pattern is playing out. Companies, eager to stay ahead, are pouring resources into the latest and advanced models. The belief is simple – better algorithms guarantee better results. But time and time again, I’ve watched organisations get swept up focusing solely on the models, only to find themselves with little to show for it.

Meanwhile, the true frontrunners in AI are taking a different path. They recognise a fundamental shift – one that moves the focus from models to something even more powerful: data. But what does this shift really mean? And how does it help you deliver real value with GenAI? Let’s break it down, starting with the foundations.

The core components of GenAI architecture

GenAI systems have four key layers: data processing, model training, feedback, and deployment. Each layer has an important job. Cleaning and organising data, teaching the AI how to learn, improving it based on user input and applying it to the real-world. Combined, these layers shape how well the AI performs.

At the core of these systems are large language models (LLMs). They are the brains behind the operation that turn the raw data into something useful.

GenAI models can be powerful straight out of the box. And with the right setup, they become even smarter. Once you have the safety measures in place, you’re left with a powerful, trustworthy tool ready for all kinds of practical use cases.

But as organisations gain confidence and look to do more with GenAI, they switch gears. Many realise that the competitive advantage comes from going beyond off-the-shelf models. Fine-tuning or even training open-source frontier models helps tailor AI to your specific goals. And this is often what drives successful change.

Fine-tuning LLMs: Customising AI for better performance

Fine-tuning is the process of taking a pre-trained LLM and further training it on a specific dataset to adapt its capabilities for particular tasks or domains. It’s like teaching an experienced professional new specialised skills for a specific role.

This helps you customise the model's knowledge and behaviour to better match your use cases. You’re left with a more personalised model that requires much less data and computing resources than if you were to train a model from scratch.

For instance, a financial institution might refine an LLM with regulatory documents and market data to improve report generation and trend analysis. A retailer could train its AI on customer interactions and sales data to optimise personalised recommendations and automate support services.

However, as AI generates more data in the process, you must be mindful of data quality and misinformation risks. If your data is poor quality or misleading, your AI outputs won’t be reliable. That’s why it’s so important to have solid processes in place for curation, validation, and governance. This will ensure your systems produce accurate and trustworthy results.

1.What does ‘model-centric’ to ‘data-centric’ mean?

With a model-centric approach, the spotlight is on refining AI models. Tweaking algorithms and fine-tuning parameters.

A data-centric strategy prioritises the quality, governance, and thoughtful curation of the data that feeds these systems. In simple terms, the data comes before the model.

This shift recognises that even the most advanced AI models can only be as good as the data they learn from. That’s why more and more forward-thinking companies are treating data quality as the foundation of their AI strategy.

2. How do you protect and manage your data?

Ensuring data quality (not quantity) and governance makes curation, cleaning, and bias mitigation essential.

Companies are now implementing rigorous deduplication and validation processes before allowing AI to ingest information. Without these guardrails, organisations risk reinforcing societal biases or generating unreliable outputs.

While AI-generated data feeds back into training loops, ensuring data integrity and security is no longer optional, it’s fundamental.

Moreover, you must prioritise data privacy and compliance. Regulations such as GDPR and the EU AI Act impose strict rules on how AI systems handle personal and proprietary data. AI models often rely on third-party cloud platforms, making it crucial to implement encryption, access controls, and anonymisation techniques to protect sensitive information.

The next part of the equation is storage. Managing and structuring vast datasets is a challenge that requires thoughtful architectures. The rise of hybrid cloud storage has helped enterprises balance scalability with security. Ensuring that sensitive data remains protected while leveraging the cloud’s computational power.

3. How can you scale GenAI the right way?

The AI landscape is evolving at a remarkable pace. DeepSeek R1 is demonstrating how reinforcement learning can push reasoning capabilities. ChatGPT o3 has introduced program synthesis, allowing AI to generate novel solutions rather than repeating past knowledge. Meanwhile, tools like DeepResearch hint at a future where AI plays a pivotal role in automating knowledge work.

However, scaling these innovations requires the right computing infrastructure. Many businesses start by integrating cloud-based AI services and APIs from GCP, Azure, or AWS to accelerate deployment without worrying about GPU clusters. Yet, as costs rise, organisations are adopting hybrid and multi-cloud approaches. These keep sensitive workloads on-premise while leveraging cloud elasticity. The key is finding a balance between performance, cost, and security. Ensuring that AI investments remain sustainable as demand grows.

To further optimise performance, organisations are using containerisation tools like Docker and Kubernetes. These enable flexible deployment across different environments. Additionally, vector databases and RAG-based retrieval strategies are helping organisations access relevant knowledge dynamically. Improving AI accuracy without requiring full-scale model retraining.

4. What are the best practices for implementation?

Implementing GenAI isn’t about plugging in a model. You need a seamless integration of AI into your business. Ethical considerations, transparency, and regulatory compliance must be top of mind. Businesses are establishing AI bias audits and fairness testing to make sure their systems don’t inadvertently discriminate.

Meanwhile, the need for AI safety and reliability has never been greater. Organisations are using content filtering, retrieval augmentation, and transparency mechanisms to reduce the risk of hallucinations or misinformation. Companies must document and govern their AI systems responsibly to maintain trust and compliance.

The road ahead

GenAI’s rapid evolution is both an opportunity and a challenge. While AI models continue to improve, it’s not a race to adopt the latest new tech.

Real competitive advantage comes from how companies manage their data, optimise AI infrastructure, and embed responsible AI practices.

Aleksandar Ivanov

Data Scientist

With a PhD in Computational Neuroscience from the University of Oxford, Alex is a Technical AI Specialist at Faculty who applies his expertise to complex government and public sector challenges. He leverages the skills from his academic background to build innovative AI solutions, with a focus on natural language processing and large language models. Alex is particularly committed to using AI to improve critical public services, from pioneering new tools in education, to enhancing critical public services.

View All

Smart cities won't succeed without public trust in data management

2020-10-25

AI Infrastructure

View All