In this post we use our previously created shard and start to segment our data by location.

This is quite a common scenario. The idea is to split data by one of the properties. This will give us some insight into how sharding can be configured.

Our goal is two locations (EU and US) with multiple shards, that as a sum will hold all the data. In this case we will simplify this example to only 2 shards each.

Sharding by location

Source

Foundations

Let's tackle this in details. Last time we finished at Shard Status info done with this command at router - mongos.

 sh.status();

I highly recommend to read more about data partitioning before you start to configure it.

First thing that we need to create is Shard Tag, which is basically a name used internally to differentiate between chunks - parts of data moved between clusters by balancer.

Shard Tags

In our example, we will have simple location-based chunks. One for US location and one for EU location. Mapped 1:1 to an existing replica-set - rs1 and rs2.

sh.addShardTag("rs1", "US");
sh.addShardTag("rs2", "EU");

Indexing setup

Now let's switch to database test and create a collection called sample with index on location and factoryId props.

db.createCollection("sample");
db.sample.createIndex( { location: 1, factoryId: 1 } ); 

Next step is to match our Shard Tags with the content of location prop

sh.addTagRange(
  "test.sample",
  { "location" : "US", "factoryId" : MinKey },
  { "location" : "US", "factoryId" : MaxKey },
  "US"
);

sh.addTagRange(
  "test.sample",
  { "location" : "EU", "factoryId" : MinKey },
  { "location" : "EU", "factoryId" : MaxKey },
  "EU"
);

Config sharding

Next step is to enable sharding on the whole database and shard test.sample collection based on this index.

sh.enableSharding("test");
sh.shardCollection("test.sample", { location: 1, factoryId: 1 });

Balancing setup and sample data

Last thing is to start the balancing process.

sh.enableBalancing("test.sample");

To test our sharding setup let's add some data to test.sample collection.

db.sample.insert({
    "_id" : ObjectId("5787936b94afebe02398521a"),
    "location": "US",
    "factoryId": NumberInt(0),
    "__v" : 0
});

db.sample.insert({
    "_id" : ObjectId("5787a08c94afebe023985224"),
    "location": "EU",
    "factoryId": NumberInt(1),
    "__v" : 0
});

Now we need to manually start balancer to move chunks into the right places.

sh.startBalancer();

Tests and afterthoughts

From mongos perspective all data in a collection are seen as one set. This is especially important from Write Concern perspective. More on that topic will appear in future posts.

To confirm setup we need to confirm shard status.

sh.status()

The output should look like this.

--- Sharding Status --- 
  sharding version: {
	"_id" : 1,
	"minCompatibleVersion" : 5,
	"currentVersion" : 6,
	"clusterId" : ObjectId("5b7ea79486ef612a6d0ba831")
}
  shards:
	{  "_id" : "rs1",  "host" : "rs1/172.18.0.10:27017,172.18.0.2:27017,172.18.0.5:27017",  "state" : 1,  "tags" : [ "US" ] }
	{  "_id" : "rs2",  "host" : "rs2/172.18.0.3:27017,172.18.0.7:27017,172.18.0.9:27017",  "state" : 1,  "tags" : [ "EU" ] }
  active mongoses:
	{  "_id" : "b507bfe5de13:27017",  "advisoryHostFQDNs" : [ ],  "mongoVersion" : "4.0.1",  "ping" : ISODate("2018-08-23T12:52:37.017Z"),  "up" : NumberLong(1662),  "waiting" : true }
 autosplit:
	Currently enabled: yes
  balancer:
	Currently enabled:  yes
	Currently running:  no
	Failed balancer rounds in last 5 attempts:  4
	Last reported error:  field names of bound { location: "EU", factoryId: MaxKey } do not match those of keyPattern { _id: 1.0, location: 1.0 }
	Time of Reported error:  Thu Aug 23 2018 14:51:36 GMT+0200 (Central European Standard Time)
	Migration Results for the last 24 hours: 
		7 : Success
  databases:
	{  "_id" : "test",  "primary" : "rs1",  "partitioned" : true,  "version" : {  "uuid" : BinData(4,"pRy7rylRRH6+MsrI2tH84w=="),  "lastMod" : 1 } }
		test.sample
			shard key: { "location" : 1, "factoryId" : 1 }
			unique: false
			balancing: true
			chunks:
				undefined	undefined
				undefined	undefined
			{ "location" : { "$minKey" : 1 }, "factoryId" : { "$minKey" : 1 } } -->> { "location" : "EU", "factoryId" : { "$minKey" : 1 } } on : rs1 Timestamp(2, 1) 
			{ "location" : "EU", "factoryId" : { "$minKey" : 1 } } -->> { "location" : "EU", "factoryId" : { "$maxKey" : 1 } } on : rs2 Timestamp(2, 0) 
			{ "location" : "EU", "factoryId" : { "$maxKey" : 1 } } -->> { "location" : "US", "factoryId" : { "$minKey" : 1 } } on : rs1 Timestamp(1, 3) 
			{ "location" : "US", "factoryId" : { "$minKey" : 1 } } -->> { "location" : "US", "factoryId" : { "$maxKey" : 1 } } on : rs1 Timestamp(1, 4) 
			{ "location" : "US", "factoryId" : { "$maxKey" : 1 } } -->> { "location" : { "$maxKey" : 1 }, "factoryId" : { "$maxKey" : 1 } } on : rs1 Timestamp(1, 5) 
			 tag: EU  { "location" : "EU", "factoryId" : { "$minKey" : 1 } } -->> { "location" : "EU", "factoryId" : { "$maxKey" : 1 } }
			 tag: US  { "location" : "US", "factoryId" : { "$minKey" : 1 } } -->> { "location" : "US", "factoryId" : { "$maxKey" : 1 } }


We can also check data on rs1 at mongo-1-1:27017 and rs2 at mongo-2-1:27017 to confirm, that at the end they landed on the right places.

db.sample.find();

On GitHub at updated https://github.com/senssei/mongo-cluster-docker in queries directory, there is script init.js to automatically configure this for you.

I hope that my post have helped you with mongo sharding. Stay tuned for more awesome content from this site. Feedback is always welcome.

23.08.2018 Updated