Archive for the ‘C#’ Category

Monday, August 20th, 2012

Note: The entry originally appeared on the tech blog at collectedit.com on 08/20/2012

One of the first large-ish projects I worked on at CollectedIt was creating a notification system. This would notify users of actions that other users took with their collection (commented, agree/disagree with item stats, etc). While I hope to blog more about the all the technical challenges that came about developing the system today I am going to concentrate on some of the text templating we do with the notifications.

image

CollectedIt is architected in such a way that database calls are few and far between whenever the current action is on a code path that could be called from a user facing interface (website, iphone app, etc). This architecture minimizes lag time and leaves us in a good position to scale out. However, as with most forms of architecture, there are trade offs. The biggest trade off for notification system was that the front-end knew enough about the action that was being performed that should generate a notification, but knew virtually no details about the objects that were part of the notifications.

A more concrete example:
Arthur is browsing medieval collections and stumbles upon an item in Tim’s “Enchanted Weapons” collection called “Staff that shoots flames”. This item is really neat to Arthur so he wishes to give Tim a kudos. Once Arthur clicks the kudos button this triggers a notification. At notification generation time all that is known is:

  1. Logged in user id
  2. Current collection id
  3. Current item id
  4. A kudos was given

Nothing is known about Arthur, or Tim, or “Enchanted Weapons” or “Staff that shoots flames”. It would be fairly trivial to the perform a DB query joining together 3 or 4 tables to get all the information needed, but we are in code that is executed as result of the kudos button being clicked so we want to get back to the user as soon as possible. Doing an extra DB query (particularly one that involved 4 tables) is not the quickest way to get that information.

What was decided was that the notification could be generated with tokens that could be replaced later down the line. The first thought was to just use Razor which would be cool however the Razor Parser is marked for internal use and I have been burnt by using internal methods before (not to say it’s never appropriate to use undocumented methods…but that’s another blog entry). Back to GoogleDuckDuckGo to see if anything is out there for me to do some sort of text templating with the CollectedIt object and some text.

I ran into T4 which at first look seem like it would work. Looking deeper though there is a compile time class that gets generated and the runtime just uses that generated object to do the processing. This won’t really work since the template is also generated at runtime.

A little more time searching I came up with nothing that would really do what I wanted. So I decided to experiment a little writing my own. Since I wanted to write this quick and there really is no reason to write a full blown text processor (although that would be fun) I needed to boil down what exactly it was I was trying to accomplish.

  • Flexible text replacement
  • Not much logic necessarily needed inside the template itself
  • Both template and replacement objects would be generated at runtime

First thing I decided to do was take a look at what I could get with C#’s dynamic type. I have used dynamic objects in the past to do things like Object.RuntimeProperty but that’s not exactly what I have here. I have Object and “RuntimeProperty” where “RuntimeProperty” is just a string. There may well be a way to use “RuntimeProperty” directly on a dynamic object, but I could not find one (if anybody knows of a way let me know in the comments). Instead I went down the reflection route since at runtime there is really nothing different between a dynamic object and a compiled object when inspecting objects with reflection.

Type dynamicType = o.GetType(); 
PropertyInfo p = dynamicType.GetProperty(property); 
object dynamicPropValue = p.GetValue(o, null); 
FieldInfo f = dynamicType.GetField(property); 
object dynamicFieldValue = f.GetValue(o);

Great! That takes care of runtime objects and their properties. What about the text template itself though. Well…I know regular expressions.

In order to not completely reinvent the wheel I picked the T4 syntax (and specifically only the subset of T4 that replaces the text template with a string: <#= Property #>). This is pretty easy to detect with a regex:

<#=\s*(?[a-zA-Z]\w*)\s*#>

With the reflection and the regex it gives just us all the tools that are need to satisfy the requirements we came up with. All that’s left is to package it up in a nice usable package. In order to figure out exactly how to package it up I looked at how exactly the text templating would be called.

Continuing with the Arthur/Tim example from above the code creating the kudos notification would like to generate the notification with an interface like

string notificationText = 
	"<#= Author > really likes your <#= Item #> in <#= Collection >";
string notification = template.ProcessTokens(new {
	Author = "Arthur", 
	Item = Staff that shoots flames",
	Collection = "Enchanted Weapons" 
});

This points to using an extension method. In fact that is exactly what we went with. The whole extension method is

public static string ProcessTokens(this string s, dynamic o)
{
	Type dynamicType = o.GetType();

	string composedString = s;
	MatchCollection tokens = _tokenRegex.Matches(s);
	foreach (Match token in tokens)
	{
		string property = token.Groups["prop"].Value;
				
		PropertyInfo p = dynamicType.GetProperty(property);
		if (p != null)
			composedString = composedString.Replace(token.ToString(), String.Format("{0}", p.GetValue(o, null)));
		else
		{
			FieldInfo f = dynamicType.GetField(property);
			if (f != null)
				composedString = composedString.Replace(token.ToString(), String.Format("{0}", f.GetValue(o)));
		}	
	}

	return composedString;
}
private static readonly Regex _tokenRegex = new Regex(@"<#=\s*(?<prop>[a-zA-Z]\w*)\s*#>");

That’s how we solved the problem of having a disjointed read/write object system. Feel free to use the code snippets above in your own projects to solve any sort of problem where you need runtime text and runtime objects to generate a string. Also make sure to drop by collectedit.com with questions, suggestions, or just some kudos.

Thursday, July 12th, 2012

This is a deep dive into the yield keyword in C#.  At the surface yield is a keyword that allows a method to return an iterator (IEnumerable, IEnumerator, IEnumerable<T>, or IEnumerator<T>). When the caller uses the iterator (say in a foreach loop) the iterator will only call as much code as needed to get to the next item.

This is all well and good, and the above information with a few code samples is ample information for many programmers to go off and use iterator constructs without needing to know much more. However there are still questions that are left unanswered with the simple introduction.

  • Looks like there is ‘yield return’ and ‘yield break’ what is what and when would I use one over the other?
  • How exactly does the runtime know the state of the iteration?

There are other questions that may be lingering, but hopefully by just exploring the large questions there is an ‘ah ha’ moment that makes the answers to other questions easier to come by.

Lets start with the easy one.  The two forms of using yield are ‘yield return’ and ‘yield break’. You want to use yield return to return the next value in the iterator, and yield break to stop the iteration. Take a simple example of a none iterator.

public IEnumerable Range(int min, int max)
{
if (min >= max)
return new List();

List _items = new List();
for (int i = min; i < max; i++) _items.Add(i); return _items; }

In order to change that into an iterator construct with yield keywords the statement that returns just an empty list would be replaced by the ‘yield break’ and ‘yield return’ would be used in the body of the loop like:

public IEnumerable Range(int min, int max)
{
if (min >= max)
yield break;

for (int i = min; i < max; i++) yield return i; }

If we did not use ‘yield break’ and left the ‘return’ statement the compiler will give you a friendly error message “error CS1622: Cannot return a value from an iterator. Use the yield return statement to return a value, or yield break to end the iteration.”. Hopefully by now how to use yield is clear, even if why you would use it and how does it work is not yet.

Lets explore the 2 example from above for a little longer.  First of all both examples can pretty much be used inside your code interchangeably. Both take in range and return an IEnumerable that can be used in a foreach. The difference is the first example (the one without the yield) will construct a whole list in memory before returning it.  We can see this by using ildasm.exe and inspecting the method.  Don’t worry if you are not to familiar with IL.  I’ll call out some of the important bits.

// Code size 53 (0x35)
.maxstack 2
.locals init (class [mscorlib]System.Collections.Generic.List`; V_0,
int32 V_1,
class [mscorlib]System.Collections.IEnumerable V_2,
bool V_3)
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: clt
IL_0005: stloc.3
IL_0006: ldloc.3
IL_0007: brtrue.s IL_0011
IL_0009: newobj instance void class [mscorlib]System.Collections.Generic.List`1::.ctor()
IL_000e: stloc.2
IL_000f: br.s IL_0033
IL_0011: newobj instance void class [mscorlib]System.Collections.Generic.List`1::.ctor()
IL_0016: stloc.0
IL_0017: ldarg.1
IL_0018: stloc.1
IL_0019: br.s IL_0027
IL_001b: ldloc.0
IL_001c: ldloc.1
IL_001d: callvirt instance void class [mscorlib]System.Collections.Generic.List`1::Add(!0)
IL_0022: nop
IL_0023: ldloc.1
IL_0024: ldc.i4.1
IL_0025: add
IL_0026: stloc.1
IL_0027: ldloc.1
IL_0028: ldarg.2
IL_0029: clt
IL_002b: stloc.3
IL_002c: ldloc.3
IL_002d: brtrue.s IL_001b
IL_002f: ldloc.0
IL_0030: stloc.2
IL_0031: br.s IL_0033
IL_0033: ldloc.2
IL_0034: ret

First thing to notice is that it is 53 instructions. Instructions from IL_0000 to IL_000f are the precheck if min is greater than max and returns a new empty List. The rest of the program starting from IL_0011 creates a list and adds an integer to the list (IL_001d) this repeats until all the integers from min to max have been added to the list. Then the list is returned as an IEnumerable.

Calling a method this way for a min and a max close to each other would work pretty well and not take up too much memory. However if min was Int32.MinValue and max was Int32.MaxValue you could easily run out of memory.

Using an iterator construct would prevent the memory use problem by only returning values as needed.  This keeps a small memory foot print, but how does the CLR know what to return next? Lets take a look again at the dissembled source using  ildasm.exe.

// Code size 35 (0x23)
.maxstack 2
.locals init (class YieldClass/'d__0' V_0,
class [mscorlib]System.Collections.IEnumerable V_1)
IL_0000: ldc.i4.s -2
IL_0002: newobj instance void YieldClass/'d__0'::.ctor(int32)
IL_0007: stloc.0
IL_0008: ldloc.0
IL_0009: ldarg.0
IL_000a: stfld class YieldClass YieldClass/'d__0'::'<>4__this'
IL_000f: ldloc.0
IL_0010: ldarg.1
IL_0011: stfld int32 YieldClass/'d__0'::'<>3__min'
IL_0016: ldloc.0
IL_0017: ldarg.2
IL_0018: stfld int32 YieldClass/'d__0'::'<>3__max'
IL_001d: ldloc.0
IL_001e: stloc.1
IL_001f: br.s IL_0021
IL_0021: ldloc.1
IL_0022: ret

Only 35 instructions! Great! But taking a deeper look what exactly is this doing? There is no range check. There are a bunch of references to YieldClass/'<Range>d__0'. YieldClass is just the name of class I used to generate the exe, but <Range>d__0 is an automatically generated class. Take a look at the assembly organization through ildasm:

image

Notice how <Range>d__0 is an embedded class under YieldClass. I'll repeat that. The C# compiler just generated a full blown class for us under the covers when we used the yield statement.

Back to the IL, we can see on line IL_0002 we are creating a new instance of <Range>d__0 then lines IL_000a, IL0011, and IL0018 we just set the <>4__this, <>3__min, and <>3__max fields on the <Range>d__0 class. Then it returns the instance we created.

This starts to shed some light onto how the state is being managed, but it doesn’t really answer the question on how exactly the runtime knows the state of the iteration. For that we will need to dig into the <Range>d__0 class. 

Expanding the class in ildasm we can see there is quite a bit inside.

image

It implements a bunch of interfaces, it has some fields, some of which we saw being set before when we looked at the IL for our function and it has functions and properties (all of which are actually part of one interface or another).

To actually dump out all the IL that gets generated for the class would be unwieldy.  I recommend making a simple program and start poking around the IL as I point out some of what is going on.

The first thing to know is how exactly an IEnumerable get called in an foreach loop. The C# compiler does a bit of magic. Basically the foreach statement calls MoveNext until there is nothing left to enumerate over.

Looking at the IL for the MoveNext function in the <Range>d__0 class one of the first things is a jump table implemented via the switch opcode, which is just a jump table. The input to the jump table is the <>1_state (which if you did into the constructor for <Range>d__0 the state field was initialize during creation of the <Range>d__0 class. The state that is stored is actually the which part of the code we are currently executing.  That is, the more ‘yield’ statements we have, the more states we will have and the more entries we will have in our jump table.

Now we are getting somewhere! We know how the runtime know which part of code it needs to execute, but now how about the currently value. That is taken care of by the <>2__current field. This field will store whatever value the enumerator is currently pointing to. Then each target in the jump table can use the current value (if needed) to compute the next value.

Hopefully this has demystified some of the magic behind one of the neater C# features.  To really understand more I do highly encourage breaking out ildasm to and actually follow along with what was generated to allow your C# program to do the cool things its can do. 

Friday, June 1st, 2012

In one of my last projects, where Microsoft .NET assemblies can be updated at runtime, sometimes we ran into an exception, “There is an error in XML document”. Even when nothing changed in reference to any part of the code dealing with the object being serialized, or the serialization code itself we would see this error.

I sat down to try and figure out why.

The exception was happening inside the Microsoft generated dynamic dlls for XmlSerialization. You can get the dll to stick around if you follow Hanselman’s instructions.  Once we can step into the DLLs we see the error is happening on a line like this


if ((object)(a_0_0) == null) Reader.Skip(); else method9_Add.Invoke( ...

The error is a null reference exception because method9_Add is null.  So looking at what method9_Add  actually is:


System.Reflection.MethodInfo method9_Add = type2_Item.GetMethod("Add" ...

So they are using reflection. The method they are trying to get is the Add method on a generic dictionary.  In particular for they are trying to get the Add method for a List<MyBizObject>. This fails because GetMethod’s execution context doesn’t know anything about MyBizObject.

This begs the question as to why it works normally.  Doing the same steps without updating the dll at runtime:


if ((object)(a_0_0) == null) Reader.Skip(); else a_0_0.Add( ...

Ah ha! Normally XmlSerialization doesn’t use reflection, a_0_0 is a real object. After doing some digging in reflector we find.

clip_image002[4]

A null assembly.Location will mark the type as dynamic and a dynamic assembly triggers XmlSerialization to generate reflection members. When a DLL is loaded with using  Assembly.Load  assembly.Location to be an empty string. Causing our issue.

In this particular case the fix was to stop using a strongly typed List<MyBizObject> and just use an ArrayList. This way the reflection code generated by Microsoft worked fine.