Complete MongoDB interview Q&A covering documents, collections, BSON, indexing, aggregation pipeline, sharding, replica sets, and ACID transactions - with code examples.
MongoDB is the most widely used NoSQL database and comes up in almost every backend and full-stack interview. This guide covers all the MongoDB interview questions you're likely to face - from basic document structure to advanced sharding and transactions - each with a clear answer and working code.
MongoDB is a document-oriented NoSQL database. It stores data as JSON-like documents instead of rows and tables, which makes it a natural fit for applications that deal with nested or variable-structure data. If you've built anything with Node.js, Express, or a modern REST API, there's a good chance MongoDB was somewhere in that stack.
It shows up in backend and full-stack interviews because it covers a lot of important database concepts in one tool - indexing, aggregation, replication, sharding, and transactions. Interviewers use it to check whether you understand how databases actually work, not just how to write a query.
This guide covers everything from basic document structure to advanced performance and infrastructure topics, organized by difficulty. Each question has a direct answer and code where it helps.
Key Takeaways
MongoDB stores data as BSON documents inside collections - no fixed schema required
The aggregation pipeline is the most tested intermediate topic - know
$match,$group,$project, and$lookupSharding and replica sets handle scale and availability respectively - interviewers often ask you to explain both and the difference between them
Multi-document ACID transactions are supported from MongoDB 4.0 onward
These open almost every MongoDB interview. They test whether you understand the core model - documents, collections, BSON - before the interviewer moves to harder topics.
MongoDB is a NoSQL, document-oriented database. Instead of storing data in rows and tables, it stores records as documents - flexible JSON-like structures that can hold strings, numbers, arrays, and nested objects. Documents are grouped into collections.
It's open-source, built for horizontal scalability, and designed to handle large volumes of semi-structured or variable-shape data without a rigid schema.
A document is a single record. MongoDB stores it internally in BSON (Binary JSON) format, but you read and write it as JSON. A document can hold any combination of field types - strings, numbers, booleans, arrays, nested objects, dates.
{
"_id": "ObjectId(\"64b1f1234abc5678901234\")",
"name": "Alice",
"role": "Engineer",
"skills": ["MongoDB", "Node.js", "React"],
"address": {
"city": "Mumbai",
"pin": "400001"
}
}Each document in a collection can have different fields - that's the flexible schema.
A collection is a group of documents - the MongoDB equivalent of a table in a relational database. Collections don't enforce a fixed schema, so documents inside the same collection can have completely different structures.
You don't need to create a collection beforehand. MongoDB creates it automatically when you insert the first document into it.
BSON stands for Binary JSON. MongoDB uses it internally to store documents because it's faster to encode and decode than plain text JSON and supports additional data types that JSON doesn't - like Date, ObjectId, BinData, and Decimal128.
When you write or read documents, you work with JSON. The BSON conversion happens under the hood.
Every document must have an _id field. It's the unique identifier for that document within its collection. If you don't provide one when inserting, MongoDB auto-generates an ObjectId - a 12-byte value encoding:
ObjectId("64b1f1234abc5678901234");
// First 8 hex chars = Unix timestamp → embeds creation timeBecause the creation time is baked in, sorting by id ascending gives you documents in insertion order. You can also set id yourself - any unique string, number, or UUID works.
MongoDB isn't a replacement for relational databases - it's a different tool for data that doesn't fit neatly into tables.
String, Integer (32-bit and 64-bit), Double, Decimal128, Boolean, Date, ObjectId, Array, Object (nested document), Null, Binary Data, Regular Expression, and Timestamp.
The most commonly used in practice: String, Number, Boolean, Date, Array, and Object.
You don't need to create them explicitly:
// Switch to (or implicitly create) a database
use myApp
// Collection is created automatically on first insert
db.users.insertOne({ name: "Bob", age: 28 })To create a collection with specific options, use createCollection:
db.createCollection("logs", { capped: true, size: 1048576, max: 1000 });// Single document
db.users.insertOne({ name: "Carol", age: 31, role: "Designer" });
// Multiple documents
db.users.insertMany([
{ name: "Dan", age: 25, role: "Developer" },
{ name: "Eve", age: 29, role: "DevOps" },
]);insertOne() returns the id of the new document. insertMany() returns an array of id values.
// All documents
db.users.find();
// Filter by field
db.users.find({ role: "Developer" });
// Comparison operators
db.users.find({ age: { $gt: 25 } }); // age > 25
db.users.find({ age: { $gte: 25, $lte: 35 } }); // 25 ≤ age ≤ 35
// Logical operators
db.users.find({ $and: [{ role: "Developer" }, { age: { $lt: 30 } }] });
// Return one match
db.users.findOne({ name: "Carol" });
// Projection - return specific fields only
db.users.find({ role: "Developer" }, { name: 1, age: 1, _id: 0 });Common operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or, $not.
// Update one document - only modifies specified fields
db.users.updateOne({ name: "Carol" }, { $set: { role: "Lead Designer" } });
// Update multiple documents
db.users.updateMany({ role: "Developer" }, { $inc: { salary: 5000 } });
// Replace entire document (except _id)
db.users.replaceOne(
{ name: "Dan" },
{ name: "Dan", age: 26, role: "Senior Developer" },
);// Delete one
db.users.deleteOne({ name: "Eve" });
// Delete multiple
db.users.deleteMany({ role: "Intern" });
// Delete all documents in a collection
db.users.deleteMany({});These come up once the interviewer is satisfied with your basics. They test whether you understand indexing, the aggregation pipeline, schema design choices, and update operators.
An index is a data structure that lets MongoDB find documents without scanning the entire collection. Without an index, MongoDB does a collection scan (COLLSCAN) - it reads every document to find matches, which degrades linearly with collection size.
// Single-field index
db.users.createIndex({ email: 1 }); // 1 = ascending, -1 = descending
// Compound index
db.orders.createIndex({ userId: 1, createdAt: -1 });
// Unique index
db.users.createIndex({ email: 1 }, { unique: true });
// Check if a query hits an index
db.users.find({ email: "alice@example.com" }).explain("executionStats");
// "winningPlan.stage": "IXSCAN" → good
// "winningPlan.stage": "COLLSCAN" → missing indexIndexes speed up reads but add overhead to writes, because each write must also update the index. Don't create indexes speculatively - add them where queries are actually slow.
The aggregation pipeline processes documents through a sequence of stages. Each stage transforms the data - filtering, grouping, sorting, reshaping, or calculating. The output of one stage becomes the input for the next.
db.orders.aggregate([
{ $match: { status: "completed" } }, // 1. Filter
{
$group: {
_id: "$userId",
totalSpent: { $sum: "$amount" },
orderCount: { $count: {} },
},
}, // 2. Group and aggregate
{ $sort: { totalSpent: -1 } }, // 3. Sort
{ $limit: 10 }, // 4. Top 10 only
]);Common stages:
| Stage | What it does |
|---|---|
$match | Filters documents (like WHERE) |
$group | Groups and aggregates (like GROUP BY) |
$project | Selects, renames, or computes fields |
$sort | Sorts results |
$limit / $skip | Pagination |
$lookup | Left outer join with another collection |
$unwind | Deconstructs array fields into individual documents |
One thing interviewers check: Put
$matchand$sortas early as possible in the pipeline. If they run first, MongoDB can use indexes on those fields. Placing them after heavy stages means MongoDB processes far more documents than necessary.
find() handles simple reads - filter, project, sort, limit. It's fast and straightforward.
aggregate() handles complex processing - grouping, transformations, calculations, joins, multi-stage logic.
// find() - get completed orders for user 123
db.orders.find({ userId: "123", status: "completed" });
// aggregate() - total revenue per user for last month
db.orders.aggregate([
{ $match: { createdAt: { $gte: new Date("2026-05-01") } } },
{ $group: { _id: "$userId", revenue: { $sum: "$amount" } } },
]);If find() can do the job, use it. It's simpler and typically faster for straightforward lookups.
Embedding stores related data inside the same document:
{
"_id": "...",
"name": "Alice",
"address": { "city": "Mumbai", "pin": "400001" }
}Referencing stores related data in a separate document and links by _id:
// users collection
{ "_id": "abc", "name": "Alice", "addressId": "xyz" }
// addresses collection
{ "_id": "xyz", "city": "Mumbai", "pin": "400001" }Embed when: data is always accessed together, nested data won't grow large, one-to-one or one-to-few relationship.
Reference when: data is accessed independently, shared across documents, or the nested array could grow without bound (reviews, comments, order history at scale).
// $set - add or update fields
db.users.updateOne(
{ _id: id },
{ $set: { role: "admin", updatedAt: new Date() } },
);
// $unset - remove a field
db.users.updateOne({ _id: id }, { $unset: { tempToken: "" } });
// $inc - increment or decrement a number
db.users.updateOne({ _id: id }, { $inc: { loginCount: 1, credits: -10 } });
// $push - append to an array
db.posts.updateOne({ _id: id }, { $push: { tags: "mongodb" } });
// $pull - remove a value from an array
db.posts.updateOne({ _id: id }, { $pull: { tags: "deprecated" } });
// $addToSet - push only if not already in the array
db.users.updateOne({ _id: id }, { $addToSet: { skills: "Redis" } });ObjectId is a 12-byte unique identifier MongoDB generates for _id by default. The bytes encode:
Because it encodes creation time, you can sort by _id ascending to get insertion order - no separate timestamp field needed for ordering purposes alone.
A capped collection has a fixed maximum size. When it's full, new inserts overwrite the oldest documents - like a circular buffer.
db.createCollection("appLogs", {
capped: true,
size: 5242880, // 5 MB max
max: 10000, // max 10,000 documents
});Documents are always returned in insertion order. Good for application logs, audit trails, activity feeds - anything where you only care about recent data and don't want to run manual cleanup jobs.
// Step 1: explain() to see the execution plan
db.orders.find({ userId: "abc", status: "pending" }).explain("executionStats");
// Large gap between totalDocsExamined and nReturned = missing index
// Step 2: Add an index on the filter fields
db.orders.createIndex({ userId: 1, status: 1 });
// Step 3: Project only what you need
db.orders.find({ userId: "abc" }, { _id: 1, amount: 1, status: 1 });
// Step 4: Avoid $where - it runs JavaScript and can't use indexes
// Bad
db.orders.find({ $where: "this.amount > 100" });
// Good
db.orders.find({ amount: { $gt: 100 } });For aggregation pipelines, always put $match before $group to filter early.
| Feature | MongoDB | Relational DB |
|---|---|---|
| Data model | Documents | Tables / Rows |
| Schema | Flexible | Fixed |
| Joins | Embedding or $lookup | Foreign keys + JOINs |
| Scaling | Horizontal (sharding) | Typically vertical |
| Transactions | Multi-document (v4.0+) | Full ACID standard |
MongoDB works best when your data is naturally document-shaped and relationships are either embedded or looked up by ID. Relational databases are better for highly normalized data with complex multi-table relationships.
updateOne() applies update operators to specific fields - everything else in the document stays unchanged.
replaceOne() replaces the entire document with the new one you provide (the _id stays the same).
// updateOne - only 'role' changes
db.users.updateOne({ name: "Alice" }, { $set: { role: "Manager" } });
// replaceOne - entire document is replaced
db.users.replaceOne(
{ name: "Alice" },
{ name: "Alice", role: "Manager", updatedAt: new Date() },
);These come up in senior roles and system design rounds. They test whether you understand how MongoDB handles scale, reliability, and data integrity at the infrastructure level. If you want broader architecture context alongside these, the 50 System Design Patterns guide covers the distributed patterns that underpin sharding, replication, and event-driven data flows.
Yes. Single-document operations have always been atomic. Multi-document ACID transactions were added in version 4.0 for replica sets and extended to sharded clusters in version 4.2.
const session = client.startSession();
session.startTransaction();
try {
await db
.collection("accounts")
.updateOne({ _id: senderId }, { $inc: { balance: -500 } }, { session });
await db
.collection("accounts")
.updateOne({ _id: receiverId }, { $inc: { balance: 500 } }, { session });
await session.commitTransaction();
} catch (err) {
await session.abortTransaction();
throw err;
} finally {
session.endSession();
}Use transactions when multiple writes must all succeed or all fail together - payments, inventory deductions, order creation. For single-document writes, skip the transaction overhead.
A replica set is a group of MongoDB nodes that hold the same data. One is the primary (handles all writes). The rest are secondaries (replicate from the primary asynchronously).
If the primary goes down, the secondaries elect a new one automatically. Applications reconnect without manual intervention.
Client → Primary ──replicates──> Secondary 1
└──replicates──> Secondary 2What replica sets give you:
Three nodes minimum in production (1 primary + 2 secondaries) so a quorum can always be reached during failover.
Sharding is MongoDB's horizontal scaling strategy. It splits a collection across multiple servers (shards) based on a shard key. A mongos router sits between the application and the shards, directing each query to the right one.
Client → mongos → Shard 1 (userId: 0–999)
→ Shard 2 (userId: 1000–1999)
→ Shard 3 (userId: 2000+)// Enable sharding on a database
sh.enableSharding("myApp");
// Shard a collection by userId
sh.shardCollection("myApp.orders", { userId: 1 });Choosing a shard key is critical:
userId, shard on userId| Replication | Sharding | |
|---|---|---|
| Purpose | High availability | Horizontal scalability |
| Data per node | Full copy | Partial (one shard's slice) |
| Primary concern | Failover, redundancy | Handling data volume at scale |
| Use when | You need uptime guarantees | Your data outgrows one server |
In production, you use both: each shard is itself a replica set.
TTL (Time To Live) indexes tell MongoDB to automatically delete documents after a set number of seconds.
// Expire sessions 1 hour after creation
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 });
// The indexed field must be a Date
db.sessions.insertOne({
userId: "abc",
token: "xyz123",
createdAt: new Date(),
});
// → document auto-deleted ~60 minutes after createdAtThe TTL cleanup runs as a background job every 60 seconds, so deletion can lag by up to a minute after expiry. Don't use this for sub-minute precision.
Common uses: session stores, OTP codes, password reset tokens, temporary upload metadata, rate limiting records.
Schema validation lets you define rules that documents must satisfy before being inserted or updated. It uses JSON Schema syntax.
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["name", "email", "role"],
properties: {
name: { bsonType: "string", minLength: 2 },
email: { bsonType: "string", pattern: "^.+@.+\\..+$" },
role: { enum: ["admin", "editor", "viewer"] },
},
},
},
validationAction: "error", // Reject invalid docs (vs "warn" which logs but allows)
});Practical note: Schema validation is a middle ground between "anything goes" and a fully rigid relational schema. Use it to enforce critical invariants - required fields, type correctness, enum values - while still letting other fields vary freely between documents.
Change streams let your application listen to real-time changes in a collection, database, or entire cluster. They're built on MongoDB's internal replication log (oplog) and work on both replica sets and sharded clusters.
// Watch for inserts and updates on the 'orders' collection
const changeStream = db.collection("orders").watch();
changeStream.on("change", (event) => {
if (event.operationType === "insert") {
sendPushNotification(event.fullDocument.userId);
}
if (event.operationType === "update") {
invalidateCache(event.documentKey._id);
}
});Common uses: live dashboards, push notifications, syncing to Elasticsearch, streaming events to Kafka, cache invalidation.
MongoDB Atlas is MongoDB's fully managed cloud database. Instead of running your own cluster, Atlas handles provisioning, automated backups, monitoring, auto-scaling, and security patching.
It runs on AWS, GCP, and Azure, and includes:
For most teams, Atlas removes a significant ops burden. Self-hosting makes sense only if you have specific compliance requirements or want full infrastructure control.
Not every question in this guide applies to every role. Here's how interviewers actually calibrate by experience.
Interviewers at this level want to know you understand the basics before they teach you anything on the job. Expect questions from Section 1: what a document is, how collections differ from tables, what BSON is, and how to run basic CRUD operations (insertOne, find, updateOne, deleteOne).
Know the _id field, understand that MongoDB doesn't enforce a schema by default, and be able to explain at a high level why someone would choose MongoDB over a relational database. You don't need to know sharding or replica sets - but knowing what indexing is and why it matters will set you apart from other freshers.
Topics to focus on: Q1–Q12 in this guide.
At three years, interviewers expect you to have used MongoDB in a real project and made real decisions - not just written queries. They'll ask about indexing strategy, the aggregation pipeline with multiple stages, embedding vs. referencing trade-offs, and update operators.
Expect to be given a slow query scenario and asked how you'd diagnose it (explain()) and fix it (indexes, projections). Know the difference between find() and aggregate(), and when you'd pick one over the other. Capped collections and schema validation are fair game here too.
Topics to focus on: Q13–Q22 in this guide.
Five-year interviews go into architecture and infrastructure. You'll be asked about replica sets and how failover works, the difference between sharding and replication, ACID transactions and when to use them, and TTL indexes for automatic data expiry.
Interviewers also probe your decision-making: how you'd pick a shard key, what you'd do if a query is slow on a 50M-document collection, or how you'd design a schema for an e-commerce platform. Know change streams if you've worked on real-time or event-driven systems.
Topics to focus on: Q23–Q30 in this guide.
At 10 years, the technical questions are table stakes. What interviewers actually probe is your system design judgment - why you chose MongoDB for a specific part of an architecture and not something else, how you'd handle multi-region data with Atlas Global Clusters, how you'd architect an event pipeline using change streams and Kafka, or how you'd approach a Vector Search setup for an AI-powered feature.
Expect to defend your past decisions: "You used MongoDB here - what would have broken if you'd used Postgres instead?" These aren't trick questions; they're checking whether you understand the trade-offs deeply enough to mentor others.
Topics to focus on: Q23–Q30 plus system design context from 50 System Design Patterns.
A document is a single record - a JSON-like object holding your data. A collection is a container that groups related documents, analogous to a table. One collection holds many documents; one database holds many collections.
Not exactly. Documents don't need to share the same structure, but you can enforce rules using schema validation. Most teams also define an implicit schema in their application layer via ODMs like Mongoose. "Schema-less" means MongoDB doesn't enforce structure by default - not that you shouldn't define one.
Use MongoDB when your data is document-shaped, your schema evolves frequently, or you need horizontal scaling without complex setup. Use PostgreSQL when you need strong relational constraints, complex multi-table joins across many entities, or full ACID semantics across the board. Many production systems use both.
Freshers are typically asked about documents, collections, BSON, the _id field, and basic CRUD - insertOne, find, updateOne, deleteOne. You should also be able to explain what MongoDB is and why someone would use it over a relational database. That covers the ground most entry-level interviewers actually test.
For 3–5 years of experience, expect indexing strategy, the aggregation pipeline ($match, $group, $lookup), embedding vs. referencing, and query optimization with explain(). For senior roles (5+ years), add ACID transactions, replica sets, sharding, shard key selection, TTL indexes, and change streams.
GitHub has community-maintained lists - search "mongodb interview questions" on GitHub directly. InterviewBit covers MongoDB in their database section with MCQ and written-answer formats. LeetCode doesn't have MongoDB-specific problems, but their database section covers relational query logic that maps well to MongoDB aggregation concepts. For real interview experiences and what companies actually ask, Reddit's r/mongodb and r/cscareerquestions are more current than any static list.
Most PDF versions floating around are outdated - MongoDB releases major versions frequently and older PDFs miss recent features like multi-shard transactions (4.2), change streams, schema validation improvements, and Atlas Vector Search. Use this guide and bookmark it; it's kept current as the database evolves.
Map-Reduce is an older approach to large-scale data processing - a map function emits key-value pairs, and a reduce function aggregates them. The aggregation pipeline has replaced it for all practical purposes: it's faster, easier to read, and can use indexes. Map-Reduce is still available but rarely used in new code.
mongodump creates a binary export of your MongoDB data. mongorestore loads it back into a MongoDB instance. These are the standard tools for manual backups and database migrations. If you're on Atlas, automated backups are handled for you.