Archive for the ‘Code’ Category

Monday, September 10th, 2012

Before .NET 4 the recommended way to create a new worker item thread was using the thread pool to queue up a work item.

ThreadPool.QueueUserWorkItem(new WaitCallback(DoWork));

private void DoWork(object state)
Console.WriteLine("Thread {0} doing work",
Thread.Sleep((new Random()).Next(5, 5000));
Console.WriteLine("Thread {0} DONE work",

There is nothing wrong with continuing to use the ThreadPool pattern to queue up worker thread. In fact using this pattern gets a performance boost you can see with just the enhancements Microsoft added to the ThreadPool in .NET 4. However .NET 4 also contains a whole set of features to facilitate parallel programming via the Tasks namespace. I would not recommend switching existing code to use the new features for the sake of using new feature. For the new code however it is worth taking a look to see if any of the new features would be worth using. Using tasks we could queue up worker threads in a similar matter.

new Task(DoWork, null).Start();

private void DoWork(object state)
Console.WriteLine("Thread {0} doing work",
Thread.Sleep((new Random()).Next(5, 5000));
Console.WriteLine("Thread {0} DONE work",

One line code change, functionally it seems to work the same, and it is using a newer paradigm. Done right? Just because it’s newer doesn’t mean it’s better. Let’s take a look ‘under the hood’ to really see what they differences are between the two code block above.

When talking about ‘under the hood’ with .NET that basically is a code phrase for ‘opening up Reflector’. Taking a look at what actually happens when we call ThreadPool.QueueWorkerItem we can boil it down to a fairly straight forward sequence of events.

  1. Ensures that the ThreadPool VM is initialized (calls into the CLR for this via a QCall)
  2. Wraps the WaitCallback inside an ExecutionContext call back. This preserves the calling state among some other minor details
  3. Queues the item onto the Global ThreadPool

The ‘meat and potatoes’ of what is being done can be done by just viewing the ThreadPool.QueueUserWorkItemHelper method.


At some point then the work is Dequeued by a thread. Dequeuing in .NET 4 actually has some complexities behind it with WorkStealingQueues and other magic. For now we can be contented with the fact that all ThreadPool.QueueUserWorkItem does is put a callback on a Queue for the framework to pick up at a later date.

Now what about using Tasks and Task.Start? The sequence of events is slightly more complex.

  1. Calls Task.Start with the default TaskScheduler (which is the ThreadPoolScheduler)
  2. Checks the current state of the tasks to ensure the task can be schedules
  3. QueueTask is called on the current TaskScheduler (again by default this is ThreadPoolScheduler)

That is pretty much what the Task does, now the responsibility of scheduling the task is passed to the TaskScheduler (imagine that). Taking a look the default ThreadPoolScheduler.

  1. Looks at the options passed in
    1. If TaskCreationOptions.LongRunning pass passed in the scheduler treat this a hint to create a regular .NET Thread set as a BackgroundWorker. Because user options are tied in with the state of the ThreadPoolScheduler there is no guarantee that this code path will be taken.
    2. Otherwise ThreadPool.UnsafeQueueCustomWorkItem is called to queue up the item.
      1. If the Task was started from within a Task the WorkerItem is queued up on a local ThreadPoolQueue
      2. Otherwise the Task is queued up in the Global ThreadPool – This is what happens in the code sample above

There is more logic going on up front with the Task (for a great article on what is going on I would recommend For the typical usage pattern though Task.Start will eventually call ThreadPool.UnsafeQueueCustomWorkItem.


Which then queue up the item to be dequeued by the same ThreadPool as if we used ThreadPool.QueueUserWorkItem.

So for typicalusage there will be little difference between using ThreadPool.QueueUserWorkItem and using Task.Start. However there are some subtle differences between the two. Ultimately which way you want to use is up to you. Me? I prefer using Task.Start, mostly for the flexibility it provides (plus ‘OOOH! SHINY!’).

Note: This entry also posted on

Saturday, August 4th, 2012

Smaller entry today. Wrote some cool SQL that I just had to share, but didn’t feel like it needed to be wrapped in a lager entry.

TSQL makes it easy to ORDER BY and SELECT TOP when there is a set of data well defined by your schema. Given a table:

ID | Manufacturer | Model      | UnitsSold
1  |  Ferrari     | 599 GTO    |   392
2  |  Ferrari     | Enzo       |   34
3  |  Ferrari     | Testarossa |   218
4  |  Ferrari     | Dino       |   87
5  |  Porsche     | 911        |   1302
6  |  Porsche     | Boxster    |   1511
7  |  Porsche     | 928        |   819

Manufacturer VARCHAR(128),
Model VARCHAR(128),
UnitsSold INT
INSERT INTO @CarTable (ID, Manufacturer, Model, UnitsSold) VALUES (1,'Ferrari','599 GTO',392)
INSERT INTO @CarTable (ID, Manufacturer, Model, UnitsSold) VALUES (2,'Ferrari','Enzo',340)
INSERT INTO @CarTable (ID, Manufacturer, Model, UnitsSold) VALUES (3,'Ferrari','Testarossa',218)
INSERT INTO @CarTable (ID, Manufacturer, Model, UnitsSold) VALUES (4,'Ferrari','Dino',87)
INSERT INTO @CarTable (ID, Manufacturer, Model, UnitsSold) VALUES (5,'Porsche','911',1302)
INSERT INTO @CarTable (ID, Manufacturer, Model, UnitsSold) VALUES (6,'Porsche','Boxster',1511)
INSERT INTO @CarTable (ID, Manufacturer, Model, UnitsSold) VALUES (7,'Porsche','928',819)

Getting the top 2 selling models of cars is a fairly straight forward SQL query.

FROM @CarTable

Things get more complicated if we wanted to get the top 2 selling model of cars per manufacturer. The trick is to use the RANK function.

Rank = RANK() OVER (PARTITION BY Manufacturer ORDER BY UnitsSold DESC)
FROM @CarTable
) cars
WHERE cars.[Rank] <= 2 -- Get top 2 from each Manufacturer
ORDER BY Manufacturer

Thursday, July 12th, 2012

This is a deep dive into the yield keyword in C#.  At the surface yield is a keyword that allows a method to return an iterator (IEnumerable, IEnumerator, IEnumerable<T>, or IEnumerator<T>). When the caller uses the iterator (say in a foreach loop) the iterator will only call as much code as needed to get to the next item.

This is all well and good, and the above information with a few code samples is ample information for many programmers to go off and use iterator constructs without needing to know much more. However there are still questions that are left unanswered with the simple introduction.

  • Looks like there is ‘yield return’ and ‘yield break’ what is what and when would I use one over the other?
  • How exactly does the runtime know the state of the iteration?

There are other questions that may be lingering, but hopefully by just exploring the large questions there is an ‘ah ha’ moment that makes the answers to other questions easier to come by.

Lets start with the easy one.  The two forms of using yield are ‘yield return’ and ‘yield break’. You want to use yield return to return the next value in the iterator, and yield break to stop the iteration. Take a simple example of a none iterator.

public IEnumerable Range(int min, int max)
if (min >= max)
return new List();

List _items = new List();
for (int i = min; i < max; i++) _items.Add(i); return _items; }

In order to change that into an iterator construct with yield keywords the statement that returns just an empty list would be replaced by the ‘yield break’ and ‘yield return’ would be used in the body of the loop like:

public IEnumerable Range(int min, int max)
if (min >= max)
yield break;

for (int i = min; i < max; i++) yield return i; }

If we did not use ‘yield break’ and left the ‘return’ statement the compiler will give you a friendly error message “error CS1622: Cannot return a value from an iterator. Use the yield return statement to return a value, or yield break to end the iteration.”. Hopefully by now how to use yield is clear, even if why you would use it and how does it work is not yet.

Lets explore the 2 example from above for a little longer.  First of all both examples can pretty much be used inside your code interchangeably. Both take in range and return an IEnumerable that can be used in a foreach. The difference is the first example (the one without the yield) will construct a whole list in memory before returning it.  We can see this by using ildasm.exe and inspecting the method.  Don’t worry if you are not to familiar with IL.  I’ll call out some of the important bits.

// Code size 53 (0x35)
.maxstack 2
.locals init (class [mscorlib]System.Collections.Generic.List`; V_0,
int32 V_1,
class [mscorlib]System.Collections.IEnumerable V_2,
bool V_3)
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: clt
IL_0005: stloc.3
IL_0006: ldloc.3
IL_0007: brtrue.s IL_0011
IL_0009: newobj instance void class [mscorlib]System.Collections.Generic.List`1::.ctor()
IL_000e: stloc.2
IL_000f: br.s IL_0033
IL_0011: newobj instance void class [mscorlib]System.Collections.Generic.List`1::.ctor()
IL_0016: stloc.0
IL_0017: ldarg.1
IL_0018: stloc.1
IL_0019: br.s IL_0027
IL_001b: ldloc.0
IL_001c: ldloc.1
IL_001d: callvirt instance void class [mscorlib]System.Collections.Generic.List`1::Add(!0)
IL_0022: nop
IL_0023: ldloc.1
IL_0024: ldc.i4.1
IL_0025: add
IL_0026: stloc.1
IL_0027: ldloc.1
IL_0028: ldarg.2
IL_0029: clt
IL_002b: stloc.3
IL_002c: ldloc.3
IL_002d: brtrue.s IL_001b
IL_002f: ldloc.0
IL_0030: stloc.2
IL_0031: br.s IL_0033
IL_0033: ldloc.2
IL_0034: ret

First thing to notice is that it is 53 instructions. Instructions from IL_0000 to IL_000f are the precheck if min is greater than max and returns a new empty List. The rest of the program starting from IL_0011 creates a list and adds an integer to the list (IL_001d) this repeats until all the integers from min to max have been added to the list. Then the list is returned as an IEnumerable.

Calling a method this way for a min and a max close to each other would work pretty well and not take up too much memory. However if min was Int32.MinValue and max was Int32.MaxValue you could easily run out of memory.

Using an iterator construct would prevent the memory use problem by only returning values as needed.  This keeps a small memory foot print, but how does the CLR know what to return next? Lets take a look again at the dissembled source using  ildasm.exe.

// Code size 35 (0x23)
.maxstack 2
.locals init (class YieldClass/'d__0' V_0,
class [mscorlib]System.Collections.IEnumerable V_1)
IL_0000: ldc.i4.s -2
IL_0002: newobj instance void YieldClass/'d__0'::.ctor(int32)
IL_0007: stloc.0
IL_0008: ldloc.0
IL_0009: ldarg.0
IL_000a: stfld class YieldClass YieldClass/'d__0'::'<>4__this'
IL_000f: ldloc.0
IL_0010: ldarg.1
IL_0011: stfld int32 YieldClass/'d__0'::'<>3__min'
IL_0016: ldloc.0
IL_0017: ldarg.2
IL_0018: stfld int32 YieldClass/'d__0'::'<>3__max'
IL_001d: ldloc.0
IL_001e: stloc.1
IL_001f: br.s IL_0021
IL_0021: ldloc.1
IL_0022: ret

Only 35 instructions! Great! But taking a deeper look what exactly is this doing? There is no range check. There are a bunch of references to YieldClass/'<Range>d__0'. YieldClass is just the name of class I used to generate the exe, but <Range>d__0 is an automatically generated class. Take a look at the assembly organization through ildasm:


Notice how <Range>d__0 is an embedded class under YieldClass. I'll repeat that. The C# compiler just generated a full blown class for us under the covers when we used the yield statement.

Back to the IL, we can see on line IL_0002 we are creating a new instance of <Range>d__0 then lines IL_000a, IL0011, and IL0018 we just set the <>4__this, <>3__min, and <>3__max fields on the <Range>d__0 class. Then it returns the instance we created.

This starts to shed some light onto how the state is being managed, but it doesn’t really answer the question on how exactly the runtime knows the state of the iteration. For that we will need to dig into the <Range>d__0 class. 

Expanding the class in ildasm we can see there is quite a bit inside.


It implements a bunch of interfaces, it has some fields, some of which we saw being set before when we looked at the IL for our function and it has functions and properties (all of which are actually part of one interface or another).

To actually dump out all the IL that gets generated for the class would be unwieldy.  I recommend making a simple program and start poking around the IL as I point out some of what is going on.

The first thing to know is how exactly an IEnumerable get called in an foreach loop. The C# compiler does a bit of magic. Basically the foreach statement calls MoveNext until there is nothing left to enumerate over.

Looking at the IL for the MoveNext function in the <Range>d__0 class one of the first things is a jump table implemented via the switch opcode, which is just a jump table. The input to the jump table is the <>1_state (which if you did into the constructor for <Range>d__0 the state field was initialize during creation of the <Range>d__0 class. The state that is stored is actually the which part of the code we are currently executing.  That is, the more ‘yield’ statements we have, the more states we will have and the more entries we will have in our jump table.

Now we are getting somewhere! We know how the runtime know which part of code it needs to execute, but now how about the currently value. That is taken care of by the <>2__current field. This field will store whatever value the enumerator is currently pointing to. Then each target in the jump table can use the current value (if needed) to compute the next value.

Hopefully this has demystified some of the magic behind one of the neater C# features.  To really understand more I do highly encourage breaking out ildasm to and actually follow along with what was generated to allow your C# program to do the cool things its can do.