This is one of the typical interview questions: when do I need to use a StringBuilder and how a simple implementation would look like?
The problem with a System.String type is that it is actually an immutable reference type although it seems to behave like mutable value type. Immutability means that for any write operation performed with the same string, a new object is mandatory created. For example, s1 = s1 + s2 will not modify s1, but will create a new object with a concatenated s1 and s2 and drops the reference to the old s1 to garbage collector. This is required to support long strings (< 2GB in .NET 4), save time and space complexity and maybe (which would be a pure guess) it was easier to follow a successful example of Java strings.
StringBuilder class has internal access to the string object and is useful for any string manipulations, especially numerous. Internally, StringBuilder uses string object, and since .NET 4 it uses a char array. This class does not create copies of string objects but rather acts as if it works with one mutable string
Because StringBuilder implements a Builder design pattern any state-changing operation should return the object itself (this). ToString() method acts as a Build() method and as it is defined in the System.Object, there is no need to include ToString() in the contract.
public interface ISimpleStringBuilder
{
ISimpleStringBuilder Append(string value);
ISimpleStringBuilder Clear();
int Lenght { get; }
int Capacity { get; }
}
A very simple implementation of the builder class may look like this
public class SimpleStringBuilder : ISimpleStringBuilder
{
private char[] _internalBuffer;
public ISimpleStringBuilder Append(string value)
{
char[] data = value.ToCharArray();
//check if space is available for additional data
InternalEnsureCapacity(data.Length);
foreach (char t in data)
{
_internalBuffer[Lenght] = t;
Lenght++;
}
return this;
}
public override string ToString()
{
//use only non-null ('\0') characters
var tmp = new char[Lenght];
for (int i = 0; i < Lenght; i++)
{
tmp[i] = _internalBuffer[i];
}
return new string(tmp);
}
...
}
This code is of course very basic, inefficient, not thread-safe, doesn't make any input validation etc. It does however demonstrates the idea behind StringBuilder class.
SimpleStringBuilder.zip (75.77 kb) [Downloads: 14]

If you enjoyed this post, make sure you subscribe to my RSS feed!
8c2470ab-3e33-49ed-9a48-e4ee7842dcff|0|.0
I have been looking for interesting and thought-provoking questions at stackoverflow and found a nice post: Senior Interview LINQ questions. The best questions to ask (voted by the community) at the time were:
- Why var keyword is used and when it is the only way to get query result?
- What is Defered Execution?
- Explain Query Expression syntax, Fluent syntax, Mixed Queries.
- What are Interpreted Queries?
- Use of IQueryable and IEnumerable interfaces.
- Use of let and into keyword, and how they help in making Progressive queries but still keep Defered execution.
- What are Expression Trees?
This is how I would try to answer these question (perhaps not 100% correctly), given they asked at the interview.
1. Why var keyword is used and when it is the only way to get query result?
var is a keyword introduced in C# 3.0. It is used to substitute the type of the variable with a generalised keyword. The compiler infers the actual type from the static type of the expression used to initialise the variable. One must use var with the query that returns anonymous type. E.g.
// anonymous type returned
var query = from w in words
select new
{
LetterCount = w.Length,
UpperString = w.ToUpper()
};
foreach (var item in query)
{
Console.WriteLine(item.LetterCount + " " + item.UpperString);
}
2. What is Deffered Execution?
Deferred execution means that the actual work will not be performed immediately, but rather when the result is requested at a latter stage. This is implemented via proxy pattern and perhaps yield return. The benefits of deferred executions are that potential heavy load on CPU, memory or database is delayed to the moment it is absolutely required, therefore saving time say while initialisation.
3. Explain Query Expression syntax, Fluent syntax, Mixed Queries.
Query expression syntax is based on the new keywords such as from, select, join, group by, order by etc.
string[] words = { "roll", "removal", "fuse", "accusation",
"capture", "poisoning", "accusation" };
var query = from w in words
where w.Length > 4
orderby w ascending
select w;
This query selects words with more than 4 letters and presents result in the ascending order. Fluent syntax is based on the regular C# methods that are linked in a chain, like this sample from msdn.
List<Customer> customers = GetCustomerList();
var customerOrders = customers
.SelectMany(
(cust, custIndex) => cust
.Orders
.Select(o =>
"Customer #" + (custIndex + 1) +
" has an order with OrderID " + o.OrderID));
Mixed syntax means query expression syntax is mixed with fluent method calls, for example it can be used to get the distinct values, first result or to get items as array (which by the way will trigger immediate query execution)
4. What are Interpreted Queries?
LINQ combines two architectural models: in-memory local and remote.
The first one is basically LINQ-to-Objects and LINQ-to-XML. Local model closely work with IEnumerable<T> and decorator sequences of C# methods, lambdas and delegates. The query is compiled into standard imperative IL code.
The second one is LINQ-to-SQL and LINQ-to-Entities. Remote model in contrast is rather declarative to the runtime. Sequences in query implement the IQueryable<T> (which in turn derives from IEnumerable<T>) and after the compilation resolve into query operators from Queryable class – expression trees. Depending on the query provider, expression trees will be later interpreted by the runtime and are “friendly” to the remote data source.
5. Use of IQueryable and IEnumerable interfaces.
IEnumerable<T> is applicable for in-memory data querying, in contrast IQueryable<T> allows remote execution, like web service or database querying. Misuse of the interface can result in performance and memory problems, e.g. if IEnumerable<T> is used instead of IQueryable<T> to perform paging all the rows from data source will be loaded, instead of only those rows from the current page.
6. Use of let and into keyword, and how they help in making Progressive queries but still keep Defered execution.
into and let create temporary reference to store the result of a subquery/subexpression that can be later queried itself. These operators keep deferred execution and allow creation of sub routines that can be used incrementally or progressively. Personally, I would rather try simplifying the query, so I will not use either of these keywords. A few separate queries are much debugger [developer] friendly and less bug prone.
As a sample, let can be used to store the count of elements in a group
var categories = from p in products
group p by p.Category into g
let elCount = g.Count()
select new
{ Category = g.Key,
ElementCount = elCount
};
7. What are Expression Trees?
Expression tree is the construct that has been developed for remote LINQ model. In a nutshell expression trees provide a separation layer between data source and run time. With the help of a visitor pattern expression tree is interpreted in run time and query object is translated into the presentation that is understandable to the data source. In case of LINQ-to-SQL, expression trees are translated into SQL with column names mapped from XML configuration. The SQL query is sent to the database and the result undergoes similar translation procedure but in the opposite direction.
If you enjoyed this post, make sure you subscribe to my RSS feed!
e136d69a-fabb-4bbe-be0e-772a91edf187|2|3.0