Conquering the N+1 Problem in Spring GraphQL with DataLoader

February 12, 2024 · 6 min read

GraphQL Guy

DataLoader Pattern

The N+1 problem can turn your elegant GraphQL API into a database nightmare. Learn how Spring GraphQL's batch loading solves this efficiently.

The N+1 Problem Revisited

Before diving into solutions, let's understand the problem clearly. Consider this query:

query {
  books {        # 1 query to get 100 books
    title
    author {     # 100 queries - one per book!
      name
    }
  }
}

With a naive implementation, fetching 100 books triggers 101 database queries:

SELECT * FROM books;                    -- 1 query
SELECT * FROM authors WHERE id = 1;     -- +100 queries
SELECT * FROM authors WHERE id = 2;
SELECT * FROM authors WHERE id = 3;
-- ... 97 more queries

That's the N+1 problem: 1 query for the list, plus N queries for related items.

How Spring GraphQL Solves This

Spring GraphQL provides built-in support for batch loading through the @BatchMapping annotation and the BatchLoaderRegistry.

Approach 1: @BatchMapping (Recommended)

The simplest approach uses @BatchMapping:

@Controller
public class BookController {

    private final BookRepository bookRepository;
    private final AuthorRepository authorRepository;

    @QueryMapping
    public List<Book> books() {
        return bookRepository.findAll();
    }

    @BatchMapping
    public Map<Book, Author> author(List<Book> books) {
        // Collect all author IDs
        Set<String> authorIds = books.stream()
            .map(Book::authorId)
            .collect(Collectors.toSet());

        // Single query for all authors
        Map<String, Author> authorsById = authorRepository
            .findAllById(authorIds)
            .stream()
            .collect(Collectors.toMap(Author::id, Function.identity()));

        // Map each book to its author
        return books.stream()
            .collect(Collectors.toMap(
                Function.identity(),
                book -> authorsById.get(book.authorId())
            ));
    }
}

Now the same query executes only 2 queries:

SELECT * FROM books;
SELECT * FROM authors WHERE id IN (1, 2, 3, ...);

Understanding @BatchMapping

The @BatchMapping annotation:

Collects all instances that need the field resolved
Calls your method once with the entire batch
Maps results back to individual instances

┌─────────────────────────────────────────────────────────────────┐
│                     Query Execution                              │
└─────────────────────────────────────────────────────────────────┘

  books query returns:     Book-1, Book-2, Book-3, ..., Book-100
                              │       │       │            │
                              └───────┴───────┴────────────┘
                                          │
                              ┌───────────▼───────────┐
                              │   @BatchMapping       │
                              │   author(List<Book>)  │
                              │                       │
                              │  Single batch call    │
                              │  with all 100 books   │
                              └───────────┬───────────┘
                                          │
                              ┌───────────▼───────────┐
                              │   Map<Book, Author>   │
                              │   returned            │
                              └───────────────────────┘

Return Types for @BatchMapping

@BatchMapping supports several return types:

// Map - each source maps to one result
@BatchMapping
public Map<Book, Author> author(List<Book> books) { ... }

// Mono<Map> - async loading
@BatchMapping
public Mono<Map<Book, Author>> author(List<Book> books) { ... }

// For collections (one-to-many relationships)
@BatchMapping
public Map<Author, List<Book>> books(List<Author> authors) { ... }

Approach 2: BatchLoaderRegistry

For more control, register batch loaders manually:

@Configuration
public class GraphQLConfig {

    @Bean
    public RuntimeWiringConfigurer runtimeWiringConfigurer(
            AuthorRepository authorRepository) {

        return wiringBuilder -> wiringBuilder
            .type("Book", builder -> builder
                .dataFetcher("author", environment -> {
                    Book book = environment.getSource();
                    DataLoader<String, Author> loader =
                        environment.getDataLoader("authorLoader");
                    return loader.load(book.authorId());
                }));
    }

    @Bean
    public BatchLoaderRegistry batchLoaderRegistry(
            AuthorRepository authorRepository) {

        return new BatchLoaderRegistry() {
            @Override
            public <K, V> BatchLoaderRegistry forName(String name) {
                return this;
            }

            @Override
            public void registerFor(GraphQLContext context) {
                context.put("authorLoader",
                    DataLoaderFactory.newDataLoader(
                        (List<String> ids) -> {
                            Map<String, Author> authors = authorRepository
                                .findAllById(ids)
                                .stream()
                                .collect(Collectors.toMap(Author::id, a -> a));

                            // Return in same order as requested
                            return CompletableFuture.completedFuture(
                                ids.stream()
                                    .map(authors::get)
                                    .toList()
                            );
                        }
                    ));
            }
        };
    }
}

Simpler Registration with Spring Boot

Spring Boot 3+ provides a cleaner way:

@Component
public class AuthorBatchLoader implements BatchLoaderWithContext<String, Author> {

    private final AuthorRepository authorRepository;

    public AuthorBatchLoader(AuthorRepository authorRepository) {
        this.authorRepository = authorRepository;
    }

    @Override
    public CompletionStage<List<Author>> load(List<String> authorIds,
                                               BatchLoaderEnvironment env) {
        return CompletableFuture.supplyAsync(() -> {
            Map<String, Author> authorsById = authorRepository
                .findAllById(authorIds)
                .stream()
                .collect(Collectors.toMap(Author::id, Function.identity()));

            // Must return in same order as input
            return authorIds.stream()
                .map(authorsById::get)
                .toList();
        });
    }
}

@Configuration
public class DataLoaderConfig {

    @Bean
    public BatchLoaderRegistry batchLoaderRegistry(
            AuthorBatchLoader authorBatchLoader) {

        return registry -> registry
            .forTypePair(String.class, Author.class)
            .registerBatchLoader(authorBatchLoader);
    }
}

Nested Batch Loading

What about nested relationships?

query {
  books {
    title
    author {
      name
      publisher {      # Another level!
        name
      }
    }
  }
}

Just add another @BatchMapping:

@Controller
public class AuthorController {

    @BatchMapping
    public Map<Author, Publisher> publisher(List<Author> authors) {
        Set<String> publisherIds = authors.stream()
            .map(Author::publisherId)
            .filter(Objects::nonNull)
            .collect(Collectors.toSet());

        Map<String, Publisher> publishersById = publisherRepository
            .findAllById(publisherIds)
            .stream()
            .collect(Collectors.toMap(Publisher::id, Function.identity()));

        return authors.stream()
            .filter(a -> a.publisherId() != null)
            .collect(Collectors.toMap(
                Function.identity(),
                a -> publishersById.get(a.publisherId())
            ));
    }
}

Spring GraphQL handles the execution order automatically.

One-to-Many Relationships

For collections, the return type changes slightly:

@Controller
public class AuthorController {

    @QueryMapping
    public List<Author> authors() {
        return authorRepository.findAll();
    }

    @BatchMapping
    public Map<Author, List<Book>> books(List<Author> authors) {
        Set<String> authorIds = authors.stream()
            .map(Author::id)
            .collect(Collectors.toSet());

        // Single query for all books by these authors
        List<Book> allBooks = bookRepository.findByAuthorIdIn(authorIds);

        // Group by author ID
        Map<String, List<Book>> booksByAuthorId = allBooks.stream()
            .collect(Collectors.groupingBy(Book::authorId));

        // Map each author to their books
        return authors.stream()
            .collect(Collectors.toMap(
                Function.identity(),
                author -> booksByAuthorId.getOrDefault(author.id(), List.of())
            ));
    }
}

Performance Comparison

Let's measure the difference:

@SpringBootTest
class BatchLoadingPerformanceTest {

    @Autowired
    private GraphQlTester graphQlTester;

    @Test
    void measureQueryPerformance() {
        // Setup: 100 books, 20 authors
        long start = System.currentTimeMillis();

        graphQlTester.document("""
            query {
              books {
                title
                author {
                  name
                }
              }
            }
            """)
            .execute()
            .path("books").entityList(Book.class).hasSize(100);

        long duration = System.currentTimeMillis() - start;
        System.out.println("Query executed in: " + duration + "ms");
    }
}

Approach	Queries	Time
Naive (N+1)	101	~500ms
@BatchMapping	2	~50ms

10x improvement just by changing how we fetch data!

Caching Within a Request

DataLoader automatically caches within a request:

query {
  book1: bookById(id: "1") {
    author { name }          # Loads author "A"
  }
  book2: bookById(id: "2") {
    author { name }          # Also loads author "A" - cached!
  }
}

If both books have the same author, the author is fetched only once per request.

Important: This cache is per-request. New requests start with an empty cache.

Common Pitfalls

1. Returning Results in Wrong Order

DataLoader requires results in the same order as the input keys:

// WRONG - HashMap doesn't preserve order
Map<String, Author> map = ...;
return ids.stream().map(map::get).toList();

// RIGHT - explicitly maintain order
return ids.stream()
    .map(id -> authorsById.getOrDefault(id, null))
    .toList();

2. Forgetting to Handle Nulls

Some IDs might not have corresponding data:

@BatchMapping
public Map<Book, Author> author(List<Book> books) {
    // Some books might have invalid authorId
    Map<String, Author> authorsById = ...;

    return books.stream()
        .collect(Collectors.toMap(
            Function.identity(),
            book -> authorsById.get(book.authorId()),  // Might be null!
            (a, b) -> a,  // Handle duplicate keys
            HashMap::new  // HashMap allows null values
        ));
}

3. Not Using Indexes

Batch loading is only fast if the database query is fast:

-- Make sure you have an index!
CREATE INDEX idx_authors_id ON authors(id);
CREATE INDEX idx_books_author_id ON books(author_id);

Advanced: Custom DataLoader Configuration

For fine-tuned control:

@Bean
public DataLoaderOptions dataLoaderOptions() {
    return DataLoaderOptions.newOptions()
        .setBatchingEnabled(true)
        .setCachingEnabled(true)
        .setMaxBatchSize(100)  // Limit batch size
        .setBatchLoaderScheduler(new BatchLoaderScheduler() {
            @Override
            public <K, V> void scheduleBatchLoader(
                    ScheduledBatchLoaderCall<V> call,
                    List<K> keys,
                    BatchLoaderEnvironment env) {
                // Custom scheduling logic
                CompletableFuture.runAsync(call::invoke);
            }
        });
}

Debugging Batch Loading

Add logging to see what's happening:

@BatchMapping
public Map<Book, Author> author(List<Book> books) {
    log.debug("Batch loading authors for {} books", books.size());
    log.debug("Author IDs: {}", books.stream()
        .map(Book::authorId)
        .distinct()
        .toList());

    // ... rest of implementation
}

Or enable SQL logging:

logging:
  level:
    org.hibernate.SQL: DEBUG
    org.hibernate.type.descriptor.sql: TRACE

Summary

Pattern	Use Case	Annotation
Single related object	`Book.author`	`@BatchMapping` returning `Map<Book, Author>`
List of related objects	`Author.books`	`@BatchMapping` returning `Map<Author, List<Book>>`
Manual control	Complex logic	`BatchLoaderRegistry`

The N+1 problem is GraphQL's most common performance issue, but Spring GraphQL makes it easy to solve. Use @BatchMapping by default, and you'll have efficient queries without complex configuration.

Next: Testing Spring GraphQL - comprehensive strategies for unit and integration testing.

The N+1 Problem Revisited​

How Spring GraphQL Solves This​

Approach 1: @BatchMapping (Recommended)​

Understanding @BatchMapping​

Return Types for @BatchMapping​

Approach 2: BatchLoaderRegistry​

Simpler Registration with Spring Boot​

Nested Batch Loading​

One-to-Many Relationships​

Performance Comparison​

Caching Within a Request​

Common Pitfalls​

1. Returning Results in Wrong Order​

2. Forgetting to Handle Nulls​

3. Not Using Indexes​

Advanced: Custom DataLoader Configuration​

Debugging Batch Loading​

Summary​