Tags: , , , , | Categories: .NET, Architecture Posted by oleksii on 4/20/2012 4:02 PM | Comments (0)

This post covers a brief introduction into database sharding including a bit of theory and a sample project. As a database server I selected RavenDB, primarily because it is an impressive project by itself and I just wanted to play about with it. The concept however can be applied to any other server either relational or non-relational. 

The problem of high load

It is common that some projects generate much of attention and as result may receive high load: large number of users, huge network traffic, billions of bank transactions etc. Each software operation has its cost obviously, but the hardware resources have very defined limitations. Therefore there will be a point where hardware can no longer cope with numerous requests.

Solution

The easiest approach to deal with high load is to do vertical scaling, i.e. add more hardware: memory, processing power, disk space. This may make things run faster for a while. Unfortunately, if the load continue to increase, vertical scaling is not the cure and more scalable solution is needed. 

Vertical scaling assumes that there is one very powerful server, the beast. In contrast, horizontal scaling promotes usage of many inexpensive machines. The load distribution is the key concept of horizontal scaling, and database sharding is one of the techniques of horizontal load balancing.

Sharding can be defined as a procedure of breaking large databases into smaller independent ones. The idea is to denormalize data and split a highly loaded database into several independent chunks. Denormalization allows us to make chunks of a database independent, thus a single query can hit only one database shard. From the other side, denormalization introduces data duplication, as same information may need to be inserted into several shards to make data consistent. You see there is always a balance between consistency and scalability.

For more information on database sharding concepts see an article on code futures.

Sample overview

To demonstrate sharding technique with NoSQL database, I created a very simplified model of Twitter. In my model there are users and tweets. Each user has a dynamic array of tweets, so he/she can add or remove text messages. Assume that in order to provide consistently fast user experience one database can handle 10 users at most. This means I should store 10 users per database. All my users have an id which is a global counter, based on this counter I can store users with id 1 - 10 on a database shard 1, users 11 - 20 I store on shard 2 and so on. In this sample I will have 3 shards and provide support for 30 users.

This is a step-by-step guide to get the sample working

  1. Running servers. To start Raven DB servers, you can run
    \tools\RavenDB-Build-800\Just-Servers\Start.cmd 
    This shall open three console application, one for each RavenDB server
  2. Running client
    1. Either open solution in Visual Studio 2010 and run the project
      or
    2. Run the binary client directly from
      \bin\RavenDbSharding.exe
  3. Take a look at the RavenDB servers to see which request goes to which server.

   Users are stored in the following shards:
      Users with id 1 - 10 are stored in shard "user_1_10"
      Users with id 11 - 20 are stored in shard "user_10_20"
      Users with id 21 - 30 are stored in shard "user_20_30"

RavenDB API notes 

To load all users from all shards, RavenDB provides same API as if there is only one database, thanks to interface-oriented design. Developer doesn't need to know the actual location of each user (be it shard 1 or shard N). Internally RavenDB understands that IDocumentStore is a ShardedDocumentStore and picks up the implementation of IShardSelectionStrategy. In this sample to resolve the location of the user, RavenDB checks user's id, which is uniquely mapped to a shard.

If I want to retrieve data for defined user, I will use exactly the same code as if there were no shards. The whole idea is to make the client side code unaware where the item is stored. Simplicity is beauty.

Sample output (trimmed)

Client first generates sample data using FizzWare.NBuilder. It then saves it into 3 RavenDB shards. Lastly, client queries the shards and displays all users and all tweets of the one randomly selected user.

FirstName1 LastName1
 Subscribes count: 1
 Tweets count: 24
---------
FirstName2 LastName2
 Subscribes count: 28
 Tweets count: 25
---------
...
---------
FirstName30 LastName30
 Subscribes count: 4
 Tweets count: 9
---------
===================================================
Tweets by FirstName2 LastName2
dolore magna et et dolor amet elit lorem lorem amet dolor magna consectetur et a
met consectetur tempor et
...
consectetur dolore
===================================================

RavenDB servers

Expected output of the three shards follow below

Note of advice

  1. Get all records is a killer. Client side code as well as server code must be safe by default. For example RavenDB server by design returns 1024 objects at most and client side only allows 128 objects. If user needs more data, a paged version shall be used (Take and Skip). After all any search engine doesn't return you all the results for a query, so why should the user be stormed with billions of objects?
  2. IO operations are expensive. If possible no interactions shall be made with hard drives. For this one can
    •  use caching
    •  hold all database in-memory
  3. RavenDB is an actively developed project, every so often braking changes are introduced so this sample project may not work with versions later than 800.

Additional resources

Download this project

This project is making use of several open source projects. Corresponding acknowledgement and licences are provided with the download.

From this web site:
RavenDbSharding.zip (15.22 mb) [Downloads: 22]
MD5: cf43b8afe1148ca32d88660bfca9b85d
SHA-1: 9d76bf443ea0bc87b3a6ed692dfdd51e7b080c51
SHA-256: 633c71ec9a9e15dc64139ae969d05da775f195ca9b557da4d443dd93ea336165


Or get it from github:

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , | Categories: .NET, Events Posted by oleksii on 3/1/2012 5:28 PM | Comments (0)

Visual studio 2011 Beta was released yesterday. This seems to be a good point to start looking into the new features provided with it. VS comes with new 4.5 .NET framework and C# 5. These are just the main points of interest to me now, there are lots of other exciting tech bits and bobs, see the official release blog entry by Jason Zander.

Some folks have already seen async/await syntax from the community technology preview I guess. If not, here is a simple hello-kinda-world async sample (many more samples).

I start with a basic worker class that uses an Action class (no input, no output, just does the work provided through lambda expression) and executes it asynchronously with the help of async/await keywords. If you wonder what actually happens behind the scene, the answer lies in the compiler's reasoning and code injections.

Compiler parses the code and recognises async and await keywords, it then generates and injects code to perform async operations. Once compiler hits an await keyword, it records this position and returns control to the caller methods. At the same time it also tries to spawn a new task with the asynchronous code in it. NB this code may or may not be executed in a separate thread, in fact if the result is available immediately it is returned straightaway. If the code is a short-running operation, it is dispatched to the threadpool. Lastly, if the task is a long-running operation, it is sent to a separate thread. Clever enough and pretty neat syntax, hm?

public class Worker
{
    public async void DoWorkAsync(Action action)
    {
        Console.WriteLine("  Before DoWorkAsync");
        await Task.Factory.StartNew(action.Invoke);
        Console.WriteLine("  After DoWorkAsync");
    }
}

Here is a sample usage, take a quick look at step 2, where the work is actually done. SpinWait is a class that burns CPU without blocking or sleeping (this uses all quants of CPU time for the thread). This class is widely used in non-blocking concurrent collections.

class Program
{
    static void Main(string[] args)
    {
        //Step 1
        Console.WriteLine("Init");
        var sl = new SpinWait();
        int max = 100000;
        Worker w = new Worker();

        //Step 2
        Console.WriteLine("Start worker");
        w.DoWorkAsync(() =>
            {
                Console.WriteLine("  Before actual work");
                for (int i = 0; i < max; i++)
                {
                    sl.SpinOnce();
                }
                Console.WriteLine("  After actual work");
            });

        //Step 3
        Console.WriteLine("Returned to main");
        Console.WriteLine("Block");
        Console.Read();
    }
}

Check out the output and see that the execution flow returns to the main thread right after it hits the await.

Init
Start worker
  Before DoWorkAsync
Returned to main
Block
  Before actual work
  After actual work
  After DoWorkAsync

A small exercise to the reader. What will happen if I remove the blocking call Console.Read?

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , | Categories: .NET, Architecture Posted by oleksii on 1/27/2012 1:39 PM | Comments (0)

Consider very simple code

1	class Program
2	{
3		static void Main(string[] args)
4		{
5			AppDomain newDomain = AppDomain
6				.CreateDomain("MyDomain_1");
7  		}
8	}

Question: How many domains are there at line 7, given this code run as a managed console application?

Think about it before continuing reading.

Without additional searching of the Internet or digging clever books it is easy enough to check what's happening. CLR has had several debuggers that allow developers to to debug the code on the edge of managed and unmanaged code. One of these debuggers is Son of Strike (SOS).

Let's get our hands dirty. I put a break point at line 7 and start debugging in Visual Studio. Once the breakpoint is hit, I load SOS from the Immediate Window in Visual Studio. There are usually some troubles with just loading the debugger extension, so consider this post to troubleshoot SOS.

.load sos

This will hopefully load SOS and now let's run the command to get all AppDomains:

!DumpDomain

The output on my machine was

--------------------------------------
System Domain:      59da1478
LowFrequencyHeap:   59da1784
HighFrequencyHeap:  59da17d0
StubHeap:           59da181c
Stage:              OPEN
Name:               None
--------------------------------------
Shared Domain:      59da1140
LowFrequencyHeap:   59da1784
HighFrequencyHeap:  59da17d0
StubHeap:           59da181c
Stage:              OPEN
Name:               None
Assembly:           00240a20 [C:\Windows\Microsoft.Net\assembly\GAC_32\...\mscorlib.dll]
ClassLoader:        00240ac0
  Module Name
589e1000            C:\Windows\Microsoft.Net\assembly\GAC_32\...\mscorlib.dll

--------------------------------------
Domain 1:           001f19f0
LowFrequencyHeap:   001f1d6c
HighFrequencyHeap:  001f1db8
StubHeap:           001f1e04
Stage:              OPEN
SecurityDescriptor: 001f3168
Name:               Test.exe
Assembly:           00240a20 [C:\Windows\Microsoft.Net\assembly\GAC_32\...mscorlib.dll]
ClassLoader:        00240ac0
SecurityDescriptor: 0023b5a8
  Module Name
589e1000            C:\Windows\Microsoft.Net\assembly\GAC_32\...mscorlib.dll

Assembly:           0024b048 [C:\Projects\Private\Test\bin\Debug\Test.exe]
ClassLoader:        0024b0e8
SecurityDescriptor: 0024c0e0
  Module Name
003c2e9c            C:\Projects\Private\Test\bin\Debug\Test.exe

--------------------------------------
Domain 2:           0024f730
LowFrequencyHeap:   0024faac
HighFrequencyHeap:  0024faf8
StubHeap:           0024fb44
Stage:              OPEN
SecurityDescriptor: 00252c28
Name:               MyDomain_1
Assembly:           00240a20 [C:\Windows\Microsoft.Net\assembly\GAC_32\...mscorlib.dll]
ClassLoader:        00240ac0
SecurityDescriptor: 002542c8
  Module Name
589e1000            C:\Windows\Microsoft.Net\assembly\GAC_32\...mscorlib.dll

Answer: so as you can see 4 domains will get created. Namely: system, shared, one default user domain and the last one that the user code additionally creates (lines 5-6). I found really nice article describing what happens when CLR starts and how CLR bootstrapper loads domains.

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , , | Categories: .NET, Interview questions Posted by oleksii on 1/1/2012 4:51 PM | Comments (0)

This is one of the typical interview questions: when do I need to use a StringBuilder and how a simple implementation would look like?

The problem with a System.String type is that it is actually an immutable reference type although it seems to behave like mutable value type. Immutability means that for any write operation performed with the same string, a new object is mandatory created. For example, s1 = s1 + s2 will not modify s1, but will create a new object with a concatenated s1 and s2 and drops the reference to the old s1 to garbage collector. This is required to support long strings (< 2GB in .NET 4), save time and space complexity and maybe (which would be a pure guess) it was easier to follow a successful example of Java strings.

StringBuilder class has internal access to the string object and is useful for any string manipulations, especially numerous. Internally, StringBuilder uses string object, and since .NET 4 it uses a char array. This class does not create copies of string objects but rather acts as if it works with one mutable string

Because StringBuilder implements a Builder design pattern any state-changing operation should return the object itself (this). ToString() method acts as a Build() method and as it is defined in the System.Object, there is no need to include ToString() in the contract.

public interface ISimpleStringBuilder
{
	ISimpleStringBuilder Append(string value);
	ISimpleStringBuilder Clear();
	int Lenght { get; }
	int Capacity { get; }
}

A very simple implementation of the builder class may look like this

public class SimpleStringBuilder : ISimpleStringBuilder
{
    private char[] _internalBuffer;

    public ISimpleStringBuilder Append(string value)
    {
        char[] data = value.ToCharArray();

        //check if space is available for additional data
        InternalEnsureCapacity(data.Length);

        foreach (char t in data)
        {
            _internalBuffer[Lenght] = t;
            Lenght++;
        }

        return this;
    }

    public override string ToString()
    {
        //use only non-null ('\0') characters
        var tmp = new char[Lenght];
        for (int i = 0; i < Lenght; i++)
        {
            tmp[i] = _internalBuffer[i];
        }
        return new string(tmp);
    }
    ...    
}

This code is of course very basic, inefficient, not thread-safe, doesn't make any input validation etc. It does however demonstrates the idea behind StringBuilder class.

SimpleStringBuilder.zip (75.77 kb) [Downloads: 52]

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , , , | Categories: .NET Posted by oleksii on 9/24/2011 2:41 PM | Comments (0)

I was trying to create a simple project based on WCF where a service can notify clients about something (as oposed to the standard scenarios where a client asks a service to perform some operation). The simplest way to do so is to use callbacks. I have made up a very simple solution to demonstrate this concept. A sample solution has two interfaces: one for the usual service contract and another one for the client callback.

[ServiceContract(CallbackContract = typeof(IContractCallback))]
public interface IContract
{
    [OperationContract]
    void Foo();
}

[ServiceContract]
public interface IContractCallback
{
    [OperationContract]
    void OnFooCallback();
}

A very basic implementation of these contract can look like this

internal class WcfService : IContract
{
    public void Foo()
    {
        //Do work...            
        var callback = OperationContext.Current
                        .GetCallbackChannel<IContractCallback>();
        callback.OnFooCallback();
    }
}

internal class ContractCallback : IContractCallback
{
    public void OnFooCallback()
    {
        Console.WriteLine("...");
    }
}

This is how this small app looks

Using the callback contract can be straight-forward, however there are some limitations:

  • Only NetTcpBinding, NetNamedPipeBinding and WSDualHttpBinding bindings are supported
  • One callback contract per service contract is allowed
  • Client must keep the connection open the whole time
  • Service must use reentrant or multiple concurrency mode

This project can be downloaded using the link below or from the github.

CallbackService.zip (17.49 kb) [Downloads: 216]

If you enjoyed this post, make sure you subscribe to my RSS feed!