Scaling AI Teams and Workload

Apr 18, 2025

Reading Time: 4 minutes

Feature-Based Scaling: Many software systems scale by organizing teams around specific features. For instance, a plugin framework enables multiple teams to independently develop and maintain different plugins. Similarly, applications like Photoshop are scaled by assigning ownership of distinct editing features to dedicated teams. This approach allows teams to focus on specific functionalities, fostering specialization and innovation. Another example is Microsoft Word, where different teams might manage various features, such as text formatting, collaboration tools, or design elements.
Platform-Based Scaling: Teams can also be divided based on the operating systems or platforms they develop for, such as iOS, Windows, or Android. Within each platform, further division can occur at the feature level, enabling specialization within a specific environment. SaaS companies often adopt this model by assigning ownership of distinct features to different teams, allowing them to deeply explore and refine their areas of focus while ensuring scalability and efficiency.

Scaling AI Systems

AI features, such as search, recommendations, and more recently, applications powered by large language models (LLMs), scale in distinct ways. Below is a progression of how companies can scale their teams and workload as they grow:

UI/Product Scaling: At the initial stage, teams are organized around complete products, such as search, recommendations, chatbots, or agents. Each product is managed by a dedicated team, with the possibility of further division into sub-products. For example, within a search system, separate teams might handle query suggestions. Similarly, in recommendation system, personalized tag generation for contents or showing of relevant ads could be managed by separate teams.
Component-Level Scaling: As the system grows, teams can be divided based on core components that make up the product, such as retrieval, ranking, or other foundational elements. In setups like Retrieval-Augmented Generation (RAG) or agent-based systems, this division might include retrieval mechanisms, tool integrations, and model-level operations.
Feature-Level Scaling: Teams can then focus on specific features to ensure their quality and long-term reliability. For instance, Google has dedicated teams for features like knowledge graphs, trust signals, and intent or query classification. This specialization allows for continuous improvement and innovation in individual features.
Querysets/User Cohorts: As the product matures, the Pareto principle often applies, with 80% of traffic coming from 20% of users. To address the needs of long-tail queries or niche user groups, teams can focus on specific querysets or user cohorts. This approach helps capture pain points that may not be evident in general traffic evaluations, building trust and converting long-tail users into more active ones. For example, Google has vertical teams dedicated to improving experiences for specific domains like sports, news, or movies. Similarly, social media companies may have teams optimizing recommendations for users based on location or demographics. This method also allows for testing new ideas on smaller cohorts before scaling them to larger audiences.
Dataset-Level Scaling: In the era of LLMs, scaling can also occur at the dataset level. Dedicated teams can focus on improving the quality of datasets used for training foundational models, ensuring better performance and adaptability.

Flexibility in Scaling Approaches

These scaling strategies are not rigid and depend heavily on the specific goals and directions of the company. As AI systems evolve, especially in the LLM era, new methods for scaling teams and workloads are likely to emerge, enabling further innovation and efficiency.

Scaling AI Teams and Workload

Scaling AI Systems

Flexibility in Scaling Approaches

Subscribe