Tags: , , , , | Categories: .NET, Architecture Posted by oleksii on 4/20/2012 4:02 PM | Comments (0)

This post covers a brief introduction into database sharding including a bit of theory and a sample project. As a database server I selected RavenDB, primarily because it is an impressive project by itself and I just wanted to play about with it. The concept however can be applied to any other server either relational or non-relational. 

The problem of high load

It is common that some projects generate much of attention and as result may receive high load: large number of users, huge network traffic, billions of bank transactions etc. Each software operation has its cost obviously, but the hardware resources have very defined limitations. Therefore there will be a point where hardware can no longer cope with numerous requests.

Solution

The easiest approach to deal with high load is to do vertical scaling, i.e. add more hardware: memory, processing power, disk space. This may make things run faster for a while. Unfortunately, if the load continue to increase, vertical scaling is not the cure and more scalable solution is needed. 

Vertical scaling assumes that there is one very powerful server, the beast. In contrast, horizontal scaling promotes usage of many inexpensive machines. The load distribution is the key concept of horizontal scaling, and database sharding is one of the techniques of horizontal load balancing.

Sharding can be defined as a procedure of breaking large databases into smaller independent ones. The idea is to denormalize data and split a highly loaded database into several independent chunks. Denormalization allows us to make chunks of a database independent, thus a single query can hit only one database shard. From the other side, denormalization introduces data duplication, as same information may need to be inserted into several shards to make data consistent. You see there is always a balance between consistency and scalability.

For more information on database sharding concepts see an article on code futures.

Sample overview

To demonstrate sharding technique with NoSQL database, I created a very simplified model of Twitter. In my model there are users and tweets. Each user has a dynamic array of tweets, so he/she can add or remove text messages. Assume that in order to provide consistently fast user experience one database can handle 10 users at most. This means I should store 10 users per database. All my users have an id which is a global counter, based on this counter I can store users with id 1 - 10 on a database shard 1, users 11 - 20 I store on shard 2 and so on. In this sample I will have 3 shards and provide support for 30 users.

This is a step-by-step guide to get the sample working

  1. Running servers. To start Raven DB servers, you can run
    \tools\RavenDB-Build-800\Just-Servers\Start.cmd 
    This shall open three console application, one for each RavenDB server
  2. Running client
    1. Either open solution in Visual Studio 2010 and run the project
      or
    2. Run the binary client directly from
      \bin\RavenDbSharding.exe
  3. Take a look at the RavenDB servers to see which request goes to which server.

   Users are stored in the following shards:
      Users with id 1 - 10 are stored in shard "user_1_10"
      Users with id 11 - 20 are stored in shard "user_10_20"
      Users with id 21 - 30 are stored in shard "user_20_30"

RavenDB API notes 

To load all users from all shards, RavenDB provides same API as if there is only one database, thanks to interface-oriented design. Developer doesn't need to know the actual location of each user (be it shard 1 or shard N). Internally RavenDB understands that IDocumentStore is a ShardedDocumentStore and picks up the implementation of IShardSelectionStrategy. In this sample to resolve the location of the user, RavenDB checks user's id, which is uniquely mapped to a shard.

If I want to retrieve data for defined user, I will use exactly the same code as if there were no shards. The whole idea is to make the client side code unaware where the item is stored. Simplicity is beauty.

Sample output (trimmed)

Client first generates sample data using FizzWare.NBuilder. It then saves it into 3 RavenDB shards. Lastly, client queries the shards and displays all users and all tweets of the one randomly selected user.

FirstName1 LastName1
 Subscribes count: 1
 Tweets count: 24
---------
FirstName2 LastName2
 Subscribes count: 28
 Tweets count: 25
---------
...
---------
FirstName30 LastName30
 Subscribes count: 4
 Tweets count: 9
---------
===================================================
Tweets by FirstName2 LastName2
dolore magna et et dolor amet elit lorem lorem amet dolor magna consectetur et a
met consectetur tempor et
...
consectetur dolore
===================================================

RavenDB servers

Expected output of the three shards follow below

Note of advice

  1. Get all records is a killer. Client side code as well as server code must be safe by default. For example RavenDB server by design returns 1024 objects at most and client side only allows 128 objects. If user needs more data, a paged version shall be used (Take and Skip). After all any search engine doesn't return you all the results for a query, so why should the user be stormed with billions of objects?
  2. IO operations are expensive. If possible no interactions shall be made with hard drives. For this one can
    •  use caching
    •  hold all database in-memory
  3. RavenDB is an actively developed project, every so often braking changes are introduced so this sample project may not work with versions later than 800.

Additional resources

Download this project

This project is making use of several open source projects. Corresponding acknowledgement and licences are provided with the download.

From this web site:
RavenDbSharding.zip (15.22 mb) [Downloads: 22]
MD5: cf43b8afe1148ca32d88660bfca9b85d
SHA-1: 9d76bf443ea0bc87b3a6ed692dfdd51e7b080c51
SHA-256: 633c71ec9a9e15dc64139ae969d05da775f195ca9b557da4d443dd93ea336165


Or get it from github:

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , | Categories: .NET, Architecture Posted by oleksii on 1/27/2012 1:39 PM | Comments (0)

Consider very simple code

1	class Program
2	{
3		static void Main(string[] args)
4		{
5			AppDomain newDomain = AppDomain
6				.CreateDomain("MyDomain_1");
7  		}
8	}

Question: How many domains are there at line 7, given this code run as a managed console application?

Think about it before continuing reading.

Without additional searching of the Internet or digging clever books it is easy enough to check what's happening. CLR has had several debuggers that allow developers to to debug the code on the edge of managed and unmanaged code. One of these debuggers is Son of Strike (SOS).

Let's get our hands dirty. I put a break point at line 7 and start debugging in Visual Studio. Once the breakpoint is hit, I load SOS from the Immediate Window in Visual Studio. There are usually some troubles with just loading the debugger extension, so consider this post to troubleshoot SOS.

.load sos

This will hopefully load SOS and now let's run the command to get all AppDomains:

!DumpDomain

The output on my machine was

--------------------------------------
System Domain:      59da1478
LowFrequencyHeap:   59da1784
HighFrequencyHeap:  59da17d0
StubHeap:           59da181c
Stage:              OPEN
Name:               None
--------------------------------------
Shared Domain:      59da1140
LowFrequencyHeap:   59da1784
HighFrequencyHeap:  59da17d0
StubHeap:           59da181c
Stage:              OPEN
Name:               None
Assembly:           00240a20 [C:\Windows\Microsoft.Net\assembly\GAC_32\...\mscorlib.dll]
ClassLoader:        00240ac0
  Module Name
589e1000            C:\Windows\Microsoft.Net\assembly\GAC_32\...\mscorlib.dll

--------------------------------------
Domain 1:           001f19f0
LowFrequencyHeap:   001f1d6c
HighFrequencyHeap:  001f1db8
StubHeap:           001f1e04
Stage:              OPEN
SecurityDescriptor: 001f3168
Name:               Test.exe
Assembly:           00240a20 [C:\Windows\Microsoft.Net\assembly\GAC_32\...mscorlib.dll]
ClassLoader:        00240ac0
SecurityDescriptor: 0023b5a8
  Module Name
589e1000            C:\Windows\Microsoft.Net\assembly\GAC_32\...mscorlib.dll

Assembly:           0024b048 [C:\Projects\Private\Test\bin\Debug\Test.exe]
ClassLoader:        0024b0e8
SecurityDescriptor: 0024c0e0
  Module Name
003c2e9c            C:\Projects\Private\Test\bin\Debug\Test.exe

--------------------------------------
Domain 2:           0024f730
LowFrequencyHeap:   0024faac
HighFrequencyHeap:  0024faf8
StubHeap:           0024fb44
Stage:              OPEN
SecurityDescriptor: 00252c28
Name:               MyDomain_1
Assembly:           00240a20 [C:\Windows\Microsoft.Net\assembly\GAC_32\...mscorlib.dll]
ClassLoader:        00240ac0
SecurityDescriptor: 002542c8
  Module Name
589e1000            C:\Windows\Microsoft.Net\assembly\GAC_32\...mscorlib.dll

Answer: so as you can see 4 domains will get created. Namely: system, shared, one default user domain and the last one that the user code additionally creates (lines 5-6). I found really nice article describing what happens when CLR starts and how CLR bootstrapper loads domains.

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , | Categories: .NET, Architecture Posted by oleksii on 9/1/2011 11:36 AM | Comments (0)

Any service oriented architecture, including WCF, follows a convention where a client implements a proxy pattern. Proxy allows the client to abstract from the concrete implementations and locations of the service. Client often creates stubs that work well in the compile time, whereas in runtime stubs are resolved and actual calls are made to the service.

Closing WCF client

A client has an inherited responsibility of gracefully closing the connection. It is always recommended to close the proxy client. If the binding between a client and a service is transport-layer sessionful, then closing a proxy is essential to tear down the connection between both parties. Service has a payload threshold defined for concurrent connections. If the number of concurrent connections goes above this threshold linearly then the overall service performance exponentially decreases. This is why it is crucial to dispose of the connection as soon as possible. Closing the proxy also notifies the service instance that it is no longer in use and may be collected by GC. If the client does not close a connection, it is still automatically torn down by WCF timeouts (found in the configuration files).

Aborting WCF client

In the situation where there is a fault in the service-client interaction, the objects on both ends are potentially totally broken. Thus using a proxy after the exception is not advised. Given the WCF binding use transport sessions, the client after a fault would not even be able to close it (if there was no transport layer session then the client could use or close the proxy, but this is not recommended as the configuration of sessions could change). So after a fault has happened the only safe operation is to abort a proxy.

Coding

Close (and Abort on fault) needs to be called for all the client calls, therefore these methods can be moved to a proxy itself

class Client1 : ClientBase<IContract1>, IContract1
{
    public void DoWork()
    {
        try
        {
            Channel.DoWork();
            Close();
        }
        catch
        {
            Abort();
            throw;
        }			
    }
}

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , | Categories: .NET, Architecture Posted by oleksii on 6/7/2011 2:28 PM | Comments (0)

I want to make a thread sleep for 1 ms on a Windows OS, but when I use Thread.Sleep(1) it usually takes more time, for example 10 – 20ms. This happens because desktop and server Windows are not real-time operating systems (Windows Compact Editions is). My laptop has one CPU and two cores, thus I have a very small number of independent execution units.

From the other hand there is a number of processes and many more threads. Each thread requires some processor time, that is assigned internally by the Windows scheduler. So Windows blocks all the threads but gives a certain amount of CPU time to the few selected threads, then it switches the context to the other threads. OS tries to switch the context as less as possible and stay in one context as long as possible, because switching itself doesn't do any practical payload, but still consumes the resources.

When I call Thread.Sleep, no matter how small it is, I kill the whole time span that Windows has assigned to the thread. It is clear that there are no reasons to wait for a sleeping thread. This is why the context can be safely switched straight away. It can take a few ms when the Windows returns the context control to the current thread next time and this is why I usually cannot sleep for small time span.

If I still need to sleep a very small amount of time, I can do processor spinning, e.g. System.Collections.Concurrent namespace uses spinning for non-blocking collections' updates.

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , , , | Categories: .NET, Architecture, Security Posted by oleksii on 5/10/2011 10:15 AM | Comments (0)

Part 1

Security of itself is a huge topic. Every developer must know how to deal with the most widespread attacks. There are a few tips of how to write and deploy web applications that will make the attack on it less likely to succeed.

  1. Never leave debug enabled on a production web site. It greatly affects performance and allows people to see debug information that helps to find a penetration points
    <configuration>
        <system.web>
            <compilation debug="false"/>                    
        </system.web>
    </configuration>
  2. Enable error pages. Include 404 (resource not found), 500 (internal error) and general error pages. Never display actual error to the user. This information directly exposes the weaknesses of the code. Do exception logging first and use static HTML pages to display errors
    <configuration>
      <system.web>
        <customErrors defaultRedirect="Error.html" mode="RemoteOnly">
          <error statusCode="500" redirect="InternalError.html"/>
          <error statusCode="404" redirect="NotFound.html"/>
        </customErrors>
      </system.web>
    </configuration>
    
  3. Do not leave back-ups on the web server. They can be easily downloaded.
  4. As mentioned in [3], things can be downloaded (in contrast to served). Therefore be ready that web.config can be downloaded as it is on the site. Always encrypt sensitive information, like connection strings and application properties. Do not store user names and passwords in the web.config. This is how my connection strings section looked like before encryption (real values instead of ***)
    <connectionStrings>
    	<clear />	
    	<add name="connName" connectionString="server=***;database=***;
            uid=***;pwd=***" providerName="***" />		
    </connectionStrings>
    
    And here is its actual look right now (user name and pwd are there, go ahead and try extracting that)
    <connectionStrings configProtectionProvider="RsaProtectedConfigurationProvider">
    <EncryptedData Type="http://www.w3.org/2001/04/xmlenc#Element"
      xmlns="http://www.w3.org/2001/04/xmlenc#">
      <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#tripledes-cbc" />
      <KeyInfo xmlns="http://www.w3.org/2000/09/xmldsig#">
    	<EncryptedKey xmlns="http://www.w3.org/2001/04/xmlenc#">
    	  <EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-1_5" />
    	  <KeyInfo xmlns="http://www.w3.org/2000/09/xmldsig#">
    		<KeyName>Rsa Key</KeyName>
    	  </KeyInfo>
    	  <CipherData>
    		<CipherValue>c+7C9X2b2mCtOTk/kUc4+oWnJnQDQiGmTUUV7l/w0YZox/g1mFXadzmR
    		WrO6M0rl5v18P5rZlG5Im0JpGKv4jB2TSXc7hedH89TFAWDkNm0t2s1kzY2JcPgJ1d1Gd
    		bEYGKgsR04NyONKO/mboIF2YdzkPNdEQuQq3+I=</CipherValue>
    	  </CipherData>
    	</EncryptedKey>
      </KeyInfo>
      <CipherData>
    	<CipherValue>MOOSknVjny5JIa+yEoSNCS44OVWovoU5RQb7hD0MXCQfBAL2AJp8R8JCuGAEFZSO
    	Ms4/4Law0Pmjjkl+TsCBfy4P5UdZMVITtmpAQK5PcQx+sArCm5EcZ1TuIsrbhh8Y1jZKJjyYYsoZ1
    	lx+Y2Hhw8nNsuX9t/k8bPXmdfkFAKjP7o9qkC2inASsOGyiKfOFpKpqD9A7c6Kxf/9vV0y5LbuqKF
    	VcIrEGQrqXsih9T81yKLnfNxyNTyz5i3mUyvuIgw6LUkUkGY4/NhEhb8bpdPsflzSLdNlmnnwBJZf
    	SAiYsL9XpjA==</CipherValue>
      </CipherData>
    </EncryptedData>
    </connectionStrings>
    

If you enjoyed this post, make sure you subscribe to my RSS feed!