In this post we use our previously created shard and start to segment our data by location.
This is quite a common scenario. The idea is to split data by one of the properties. This will give us some insight into how sharding can be configured.
Our goal is two locations (EU and US) with multiple shards, that as a sum will hold all the data. In this case we will simplify this example to only 2 shards each.
Foundations
Let's tackle this in details. Last time we finished at Shard Status info done with this command at router - mongos
.
sh.status();
I highly recommend to read more about data partitioning before you start to configure it.
First thing that we need to create is Shard Tag
, which is basically a name used internally to differentiate between chunks
- parts of data moved between clusters by balancer.
Shard Tags
In our example, we will have simple location-based chunks
. One for US
location and one for EU
location. Mapped 1:1 to an existing replica-set - rs1
and rs2
.
sh.addShardTag("rs1", "US");
sh.addShardTag("rs2", "EU");
Indexing setup
Now let's switch to database test
and create a collection called sample
with index on location
and factoryId
props.
db.createCollection("sample");
db.sample.createIndex( { location: 1, factoryId: 1 } );
Next step is to match our Shard Tags with the content of location
prop
sh.addTagRange(
"test.sample",
{ "location" : "US", "factoryId" : MinKey },
{ "location" : "US", "factoryId" : MaxKey },
"US"
);
sh.addTagRange(
"test.sample",
{ "location" : "EU", "factoryId" : MinKey },
{ "location" : "EU", "factoryId" : MaxKey },
"EU"
);
Config sharding
Next step is to enable sharding on the whole database and shard test.sample
collection based on this index.
sh.enableSharding("test");
sh.shardCollection("test.sample", { location: 1, factoryId: 1 });
Balancing setup and sample data
Last thing is to start the balancing process.
sh.enableBalancing("test.sample");
To test our sharding setup let's add some data to test.sample
collection.
db.sample.insert({
"_id" : ObjectId("5787936b94afebe02398521a"),
"location": "US",
"factoryId": NumberInt(0),
"__v" : 0
});
db.sample.insert({
"_id" : ObjectId("5787a08c94afebe023985224"),
"location": "EU",
"factoryId": NumberInt(1),
"__v" : 0
});
Now we need to manually start balancer
to move chunks
into the right places.
sh.startBalancer();
Tests and afterthoughts
From mongos
perspective all data in a collection are seen as one set. This is especially important from Write Concern
perspective. More on that topic will appear in future posts.
To confirm setup we need to confirm shard status.
sh.status()
The output should look like this.
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5b7ea79486ef612a6d0ba831")
}
shards:
{ "_id" : "rs1", "host" : "rs1/172.18.0.10:27017,172.18.0.2:27017,172.18.0.5:27017", "state" : 1, "tags" : [ "US" ] }
{ "_id" : "rs2", "host" : "rs2/172.18.0.3:27017,172.18.0.7:27017,172.18.0.9:27017", "state" : 1, "tags" : [ "EU" ] }
active mongoses:
{ "_id" : "b507bfe5de13:27017", "advisoryHostFQDNs" : [ ], "mongoVersion" : "4.0.1", "ping" : ISODate("2018-08-23T12:52:37.017Z"), "up" : NumberLong(1662), "waiting" : true }
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 4
Last reported error: field names of bound { location: "EU", factoryId: MaxKey } do not match those of keyPattern { _id: 1.0, location: 1.0 }
Time of Reported error: Thu Aug 23 2018 14:51:36 GMT+0200 (Central European Standard Time)
Migration Results for the last 24 hours:
7 : Success
databases:
{ "_id" : "test", "primary" : "rs1", "partitioned" : true, "version" : { "uuid" : BinData(4,"pRy7rylRRH6+MsrI2tH84w=="), "lastMod" : 1 } }
test.sample
shard key: { "location" : 1, "factoryId" : 1 }
unique: false
balancing: true
chunks:
undefined undefined
undefined undefined
{ "location" : { "$minKey" : 1 }, "factoryId" : { "$minKey" : 1 } } -->> { "location" : "EU", "factoryId" : { "$minKey" : 1 } } on : rs1 Timestamp(2, 1)
{ "location" : "EU", "factoryId" : { "$minKey" : 1 } } -->> { "location" : "EU", "factoryId" : { "$maxKey" : 1 } } on : rs2 Timestamp(2, 0)
{ "location" : "EU", "factoryId" : { "$maxKey" : 1 } } -->> { "location" : "US", "factoryId" : { "$minKey" : 1 } } on : rs1 Timestamp(1, 3)
{ "location" : "US", "factoryId" : { "$minKey" : 1 } } -->> { "location" : "US", "factoryId" : { "$maxKey" : 1 } } on : rs1 Timestamp(1, 4)
{ "location" : "US", "factoryId" : { "$maxKey" : 1 } } -->> { "location" : { "$maxKey" : 1 }, "factoryId" : { "$maxKey" : 1 } } on : rs1 Timestamp(1, 5)
tag: EU { "location" : "EU", "factoryId" : { "$minKey" : 1 } } -->> { "location" : "EU", "factoryId" : { "$maxKey" : 1 } }
tag: US { "location" : "US", "factoryId" : { "$minKey" : 1 } } -->> { "location" : "US", "factoryId" : { "$maxKey" : 1 } }
We can also check data on rs1
at mongo-1-1:27017 and rs2
at mongo-2-1:27017 to confirm, that at the end they landed on the right places.
db.sample.find();
On GitHub at updated https://github.com/senssei/mongo-cluster-docker in queries
directory, there is script init.js
to automatically configure this for you.
I hope that my post have helped you with mongo sharding. Stay tuned for more awesome content from this site. Feedback is always welcome.
23.08.2018 Updated