My Journey with the StackExchange API

On May 23, Jeff Atwood posted an announcement to the StackOverflow blog.  The StackOverflow team was getting to work on version 1.0 of the StackExchange API.  As you know from my previous post, I’m something of a StackExchange fan.  Needless to say (though I’ll say it anyway) I was pumped (did that rhyme?)

Two months later, I’ve created three new open-source projects based on the StackExchange API:

StackLINQ: A LINQ Provider to the StackExchange API
StackedDeck:  The TweetDeck of StackExchange.  Watch questions from any and all StackExchange sites stream in by tag.
StackWatcher:   A system tray application that watches in real-time for changes in your rep, favorites, and badge awards, and notifies you with some popup toast.

This effort has been an intensive and interesting experience.  I learned a TON about LINQ and Windows Presentation Foundation, WPF being the basis for StackedDeck and StackWatcher.  I won’t lie and say I did it just for the fun of it. I would love to win cool stuff for my efforts, but that’s truly not the whole deal.

LINQ, WPF, and Silverlight are three of my favorite Microsoft technologies.  I’ve learned a lot more by writing the StackLINQ library and the WPF apps, and I’ll be blogging and demoing my experience over the next several months, starting with StackLINQ.  StackedDeck is now version 0.9 beta.  It will be getting a major refactoring as I refine it to take better advantage of binding and MVVM.  I will probably roll StackWatcher’s functionality into StackedDeck, and will definitely be updating StackedDeck when the StackExchange API v2 is released, so you can ask and answer questions too!

This process will be “televised”- that is, blogged and probably made into videos.  All three projects are on BitBucket for your perusal at the links above.  Enjoy, and watch this space to follow my progress.

Entity Framework 4: Then and Now


  • What is OR/M?
    • A Little History
    • The Impedance Mismatch
    • OR/M to the rescue
  • Entity Framework 1.0
    • Entity Data Models
    • Code Generation
    • Generated Classes
    • Entity Framework 1.0 FTW!…?
  • The Controversy
    • Persistence Ignorance
  • Entity Framework 4
    • New Features
    • Persistence Ignorance and POCO
    • T4 Templates
    • T4 Generated Files
    • Model-First Development
    • FK Associations
    • Code-Only Entities

What is Object Relational Mapping?

A Little History

Over ten years ago, I was working a contract developing legacy ASP websites with a SQL Server 6.5 backend (on stone tablets, in the driving snow, uphill both ways…)

Set oConn = Server.CreateObject("ADODB.Connection")  oConn.Open "DRIVER={Microsoft Access Driver (*.mdb)}; DBQ=" & Server.MapPath("DB.mdb")
Set rsUsers = Server.CreateObject("ADODB.Recordset")
rsUsers.Open("SELECT UserID FROM Users";, oConn,1,3)

--SQL CREATE PROCEDURE usp_GetCustomers () AS

Anyone who did this ADO development in VB6 or ASP probably noticed a trend:  at least 50% of your code was database-related.   Then you got to stop what you were doing writing code, move over to Query Analyzer, and write scads of stored procedures.   Bummer.

Then, in 2001, we got .NET.  Whoohoo!

SqlDataAdapter da = new SqlDataAdapter(“usp_GetCustomer;", myConnection);
da.TableMappings.Add("Customers1", "Orders");
DataSet ds = new DataSet();
da.Fill(ds, "Customers");

--SQL  CREATE PROCEDURE usp_GetCustomers ()  AS

OK, so it’s strongly typed.  Great… but wait a minute.  The story hasn’t really changed here.  I’m still writing tons of Connections, Commands, and Parameters.  I’m still spending half my life (and my budget) hacking together repetitive SQL scripts.  Things got incrementally better with the Enterprise Library and Data Access Application Block, but the fundamental problem remained.

What is the problem?  The Impedance Mismatch.

“Impedance Mismatch” the clever term used to describe the fact that the Data/Persistence tier of n-tier applications was just not connecting cleanly to “upper” tiers.  Part of the reason for this was, as I’ve said, we had to work in two different languages to get anything done.  This code mismatch increased the amount of code we had to write, increased development costs, and was just no fun.

The other part of the impedance mismatch is the Modeling Mismatch. Relational databases are great for storing and retrieving data, but relational schemas don’t always make good Entity Relationship Models.  We needed too much code contortionism to get the relational, two-dimensional, tabular database return formats to return data in a useful, hierarchical, entity-relational manner.

Object Relational Mapping to the rescue!

About five years ago, some smart folks (I’m looking at you, Ayende, Rob Conery, and Frans Bouma) decided they’d had enough.  They said,

I should be able to get my data by writing my queries in C#, VB.NET, or IronPython, for Pete’s sake.  I should not have to write another line of SQL.  I should be able to represent and query against Entities that contain data in the shape I need it without regard the database schema.

Thus was born NHibernate, Subsonic, and LLBLGenPro.  These OR/M projects were among the first to be introduced to the .NET platform.  Through a variety of approaches, these projects:

  • Provided the ability to abstract the database schema to a set of objects (aka Entities) whose shape did not have to match 1:1 with the tables in the database.
  • Allowed developers to write queries in the .NET language of their choice by cross-compiling queries written in C#/VB.NET directly to dynamic SQL.

Not to be left out, Microsoft joined the party in 2008.  Entity Framework 1.0 was introduced with Visual Studio 2008 SP1.  In the box was:

  • System.Data.Entity
  • A new designer for editing Entity Data Models
  • LINQ to Entities
    Let’s take a quick look at Entity Framework 1.0 before we dive into EF 4.

Entity Framework 1.0

The center of the story for EF 1.0 is the Entity Data Model.

edmAn Entity Data Model (EDM) is an XML-formatted file with an .edmx extension.  EDMs contain three sections: the Storage Model, Conceptual Model, and Mapping Model.

  • The Storage model is simply an XML representation of a database schema.
  • The Conceptual model is the representation of the Entities in the EDM.
  • The Mapping Model is the map between columns in the database tables and properties of the Entities.

The Entities in an EDM may contain data fields from a single table (Table per Type) or could be packed together in a single table (Table per Hierarchy.)  It’s also possible to split an Entity between tables.  EDMs in Entity Framework 1.0 are created as Table per Type Entities by default, which means there is one Entity per table.  It’s possible to get crazy with other mapping strategies, but it requires manually editing the .edmx file, which isn’t fun.

Code Generation

Creating the EDM with the designer is just one half of the excitement- the other half is the magic worked behind the scenes by the EntityModelCodeGenerator. This is a class in the Entity Framework that is used by Visual Studio to parse the .edmx file, and generates classes that reflect the shape of all the Entities in an EDM.  It uses internal logic and CodeDOM voodoo, but more on that later.

Generated Classes

If you pop open the “codebehind” of a standard EF 1.0 EDM file, you’ll see an autogenerated file containing one class: the Context.

public partial class AdventureWorksProdEntities : ObjectContext
       #region Constructors
       /// <summary>
       /// Initializes a new AdventureWorksProdEntities object using the connection string found in the 'AdventureWorksProdEntities' section of the application configuration file.
       /// </summary>
       public AdventureWorksProdEntities() : base("name=AdventureWorksProdEntities", "AdventureWorksProdEntities")
           this.ContextOptions.LazyLoadingEnabled = true;

The Context is your gateway into the Entities in the EDM.  It has a few constructor overloads for accepting various connection strings or objects (the EntityConnection object.)  Then, for each Entity, a special type of property:

[global::System.CodeDom.Compiler.GeneratedCode("System.Data.Entity.Design.EntityClassGenerator", "")]
        public global::System.Data.Objects.ObjectQuery<Product> Products
                if ((this._Products== null))
                    this._Products= base.CreateQuery<Product>("[Product]");
                return this._Product;

These are not collection properties.  They are query properties that derive from ObjectQuery.  The Entity Framework relies on some new coolness in the System.Expressions namespace to manipulate expression trees. Expression trees allow you to represent code statements as objects.  Once you can do that, you can translate those trees into something new, like SQL.

Using an Entity Data Model to do all the old CRUD is super-simple:

using (AdventureWorksProdEntities ctx = new AdventureWorksProdEntities())

      var products = from p in ctx.Products
                              select p;


We now have a query ready to go.  Nothing has hit the database yet.  To do that, we:


Now the query represented by expression in the LINQ statement above gets reworked by the LINQ to Entities Provider and sent to the database.  Voila, we have a populated collection!  We can get fancier:

using (AdventureWorksProdEntities ctx = new AdventureWorksProdEntities())
    var products = ctx.Products.Include(“ProductSubCategory”).Select(x => x).ToList();


This is called eager loading.  The Include() extension method tells the Entity Framework to go ahead and get all the ProductSubCategory Entities and populate them as properties of the Product.  The “ProductSubCategory” string refers to a Navigation Property of Product.  Navigation Properties represent the relationships between Entities.

If the Include() call were not there, this would throw an exception:

int id = products.First().ProductSubCategory.Id;

That code would throw a NullReferenceException because none of the ProductSubCategory entities are loaded.  If we don’t use Include(), we have to do this to avoid the exception:


That line explicitly loads the ProductSubCategory related to the first Product in the list.

Making and persisting changes to Entities in EF 1.0 requires just a few lines of code:

using (AdventureWorksProdEntities ctx = new AdventureWorksProdEntities())
     //Assigning a different ProductSubCategory
     Product p = products.First();
     ProductSubCategory psc = ctx.ProductSubCategory.First();
     p.ProductSubCategory = psc;

How many lines would this take without an OR/M?  Commands, connections, parameter objects, and all those SQL stored procedures would probably run to over 100 lines, easily.

Entity Framework 1.0 FTW! Right?

So, as we’ve seen, Entity Framework 1.0 is a big step forward.  What could be better?

A lot, actually.

Consider the code above.  Let’s assume the the code above reassigns one Product’s related ProductSubCategory properties to a different one that is already in the database.  We have to query for the new one and assign it to the Product, then call SaveChanges() on the Context.  Suppose we already know that the ID for the ProductSubCategory we want to assign is 16.  Why not just set the value directly and be done?  Entity Framework 1.0 just doesn’t do foreign key references that easily.

Difficult foreign key references, “black-box” code generation, and the limited choice between eager and explicit loading were just a few of the perceived quirks of EF 1.0.

The Controversy

In fact, Entity Framework 1.0 turned out to be quite controversial among the OR/M cognoscenti.  There were several features that many considered critical to a good OR/M that EF 1.0 just didn’t have.  Concern ran so high amongst these developers, that Microsoft might impose (by their estimation) a substandard platform on an unsuspecting community, that the Entity Framework Vote of No Confidence was drafted and posted online.  Over 800 developers, many of them Microsoft MVPs, signed on.  It appeared that Microsoft had work to do.

To be sure, the Vote of No Confidence reads like a doctoral dissertation.  It’s great for insomnia.  However, it did raise several legitimate points.  I won’t recapitulate the entire thing here, but I will call out the issue that is probably most important.

Persistence Ignorance

What the bleep is Persistence Ignorance?  I’m going to use a real-world example to explain.

Suppose your boss comes to you and says “You, I need you to create a console app that will go to the web, read the page at the URL in the first argument, find some keywords, and save them to a file.  It will be used internally.  We’re not selling this thing, but we’ll probably need to add to it, so make sure it’s designed well enough.”

OK, no problem.  File-New Project, Console app.  Add reference- System.Web…

Bothered yet?  You should be.  I’m about to add a static reference to System.Web in my console app. Bad foo.  I’m violating a few important things here, like Single Responsibility Principle and Separation of Concerns.  What should I be doing?  I should add a new class project, right?  Right.  My business logic belongs in its own assembly.  That assembly should get the reference to System.Web.

Entity Framework 1.0 had a similar problem.  Peep this:

public partial class Product: EntityObject

This tiny little excerpt from the Context class of an EF 1.0 Entity Data Model exposes a major issue with EF 1.0.  All the Entity classes generated by an EF 1.0 EDM derive from EntityObject.  This means that my Entities are tightly, statically coupled to their persistence implementation.


Enterprise Architectures and Service-Oriented Architectures consider this type of coupling to be an antipattern.  Entities are really just Data Transfer Objects (DTOs.)  Good architectural design dictates that DTOs must not have any knowledge of (i.e. direct coupling to) their persistence implementation(s).  They must be Persistence Ignorant.

Suppose your boss came back to you two weeks after your keyword-finding app was out in the wild, used by your colleagues every day:

Great job!  Now we need it feed directly into an Oracle database, a SQL database, a MySQL database, and post out to a WCF Service.

Now we see where Persistence Ignorance offers a huge benefit.  If your app tier is bound directly to the persistence layer, you now get to rewrite, refactor, and retest.  If, however, you had based the persistence operations on some interfaces, you can write each persistence implementation separately, test them, and plug them in with a Dependency Injection framework like Ninject or Spring.NET.  Done.

By many accounts, Entity Framework 4 goes a long way to address the concerns voiced in the Entity Framework Vote of No Confidence.  There is certainly disagreement as to whether the new features go far enough, but it’s safe to say that no one can reasonably claim Microsoft wasn’t listening.  So, without further ado:

Entity Framework 4

The new features of Entity Framework 4 include:

  • New Features
    • Persistence Ignorance
    • POCO (Plain Old CLR Object)
    • T4 Code Generation
    • Self-Tracking Entities
    • Model-First Development
    • FK Associations
    • Code-only

    Let’s dig in.

Persistence Ignorance and POCO

POCO objects, or “Plain-Old CLR Objects” are objects with no dependencies outside the basics, like System and System.Collections.  Here’s an example:

    using System;
    using System.Collections;
    using System.Collections.Generic;

    public class Person
    public int Id {get; set;}
    public string FirstName {get; set;}
    public string LastName {get; set;}
    List<Product> Purchases {get; set;}


There’s no data-layer baggage here, just a few simple properties and a collection property of the Products this Person bought.

Entity Framework 4 will not only allow us to use a simple class like this, it will help us make one:


The context menu on the right comes up when you right-click whitespace in an EDM.  The new Code Generation Item option raises this dialog:


We have three options listed:

  • ADO.NET EntityObject Generator
  • ADO.NET POCO Entity Generator
  • ADO.NET Self-Tracking Entity Generator

These three options will generate classes for us to support our interaction with an EDM, but the mechanism is very different from what it was in EF 1.0.

T4 Templates:  The Coolest VS2010 Feature You Never Heard About

Instead of the EntityModelCodeGenerator, Entity Framework 4 uses T4 Templates.  T4 stands for Text Template Transformation Toolkit. This is an incredibly powerful feature of VS2010 that you get for free. T4 is used in ASP.NET MVC, Entity Framework, and other project types, and is the new preferred way to do code generation.  Details on T4 templates are way out of scope here, but I strongly recommend you look into them.  They’re the unsung hero of VS2010.

The important point about T4 templates is that they are just text files, so you can make them your own.  You can modify the ones that come with VS2010, or write new ones from scratch.  This puts you in complete control of the code generation process for Entity classes, or any other type of text-based file, for that matter.

The three options in the dialog above are T4 templates.  The EntityObject Generator creates classes in the same style as EF 1.0, with entities deriving from EntityObject.  The POCO Entity Generator and Self-Tracking Entity Generator is where things get interesting and new.

T4 Generated Files

If you select ADO.NET POCO Entity Generator from the list of available templates, you’ll see this in Solution Explorer:


Note: To get the EDM to generate using T4 templates instead of the EntityModelCodeGenerator, you must right-click the .edmx file in Solution Explorer, select Properties and clear the “Custom Tool” setting in the Properties pane.

The files with the .tt extension are the templates.  The POCO generator has two: one for the Context, which does have to keep a reference to the Entity Framework, and another that generates a separate class file for each Entity, plus an additional class that defines FixupCollection<T>.  FixupCollection<T> is used as the basis for one-to-many relationship properties, such as between Customers and Orders.  The FixupCollection<T> base class ensures that modifications to one side of a relationship stay in synch with the other end.  For example, if you delete an Order, the Customers’ Orders collection is updated as well.

Here’s a look inside the generated Context:

    public partial class AdvWorksHREntities : ObjectContext
    public const string ConnectionString = “name=AdvWorksHREntities”;
    public const string ContainerName = “AdvWorksHREntities”;
    #region Constructors
    public AdvWorksHREntities()
    : base(ConnectionString, ContainerName)
    this.ContextOptions.LazyLoadingEnabled = true;
    public AdvWorksHREntities(string connectionString)
    : base(connectionString, ContainerName)
    this.ContextOptions.LazyLoadingEnabled = true;
    public AdvWorksHREntities(EntityConnection connection)
    : base(connection, ContainerName)
    this.ContextOptions.LazyLoadingEnabled = true;
    #region ObjectSet Properties
    public ObjectSet<Department> Departments
    get { return _departments  ?? (_departments = CreateObjectSet<Department>(“Departments”)); }
    private ObjectSet<Department> _departments;
    public ObjectSet<Employee> Employees
    get { return _employees  ?? (_employees = CreateObjectSet<Employee>(“Employees”)); }


This Context looks similar to one generated by EF 1.0, with a few key differences.  Firstly, this context class contains only constructors and Entity query properties, but not the definition for the Entity classes themselves.  Separating these classes into their own files allows the Entity classes to live in an assembly that has no binding to System.Data.Entity.

Another key difference is that these query properties derive from ObjectSet instead of ObjectQuery.  ObjectSet derives from ObjectQuery, but also implements IObjectSet.  IObjectSet contains methods that were previously implemented by the EF 1.0 context object, such as AddObject and DeleteObject.  The upside of having these methods implemented as part of an interface is testability and composability:

  • You can mock out the methods of the interface and test the Entity classes without having to hit a database.
  • You can use the interfaces to glue together your persistence and business logic layers with a DI framework like Unity or Spring.NET.

Now let’s peek inside an Entity class:

public partial class Employee
#region Primitive Properties
public virtual int EmployeeID
public virtual string NationalIDNumber
public virtual int ContactID


#region Navigation Properties
public virtual ICollection<EmployeeAddress> EmployeeAddresses
if (_employeeAddresses == null)
var newCollection = new FixupCollection<EmployeeAddress>();
newCollection.CollectionChanged += FixupEmployeeAddresses;
_employeeAddresses = newCollection;
return _employeeAddresses;
if (!ReferenceEquals(_employeeAddresses, value))
var previousValue = _employeeAddresses as FixupCollection<EmployeeAddress>;
if (previousValue != null)
previousValue.CollectionChanged -= FixupEmployeeAddresses;
_employeeAddresses = value;
var newValue = value as FixupCollection<EmployeeAddress>;
if (newValue != null)
newValue.CollectionChanged += FixupEmployeeAddresses;
private ICollection<EmployeeAddress> _employeeAddresses;

Here we have some simple, primitive properties like EmployeeID.  We also see the definitions for the associative Navigation Properties, implemented as ICollections.  The getters and setters of these Navigation Properties include the association fixup logic.  FixupCollection<T> actually derives from ObservableCollection<T>, which has events that fire when the members of the collection change.  This is used in the POCO classes to fixup both ends of related Entities when objects are added or removed from one end or the other.

Convention Over Configuration

So how does the Entity Data Model magically “know” how to handle POCO classes?  There’s no magic, just convention.  Entity Framework expects POCO classes to follow conventions, such as the ones for property names.  The property names of your POCO classes must match exactly to the names defined in the EDM.  If the conventions are followed to the letter, the CRUD “just works.”

Lazy Loading

Now is a good time to talk about Lazy Loading.  Remember our earlier example of eager loading:

    using (AdventureWorksProdEntities ctx = new AdventureWorksProdEntities())
    var products = ctx.Products.Include(“ProductSubCategory”).Select(x => x).ToList();

This query retrieves all Products and their related ProductSubCategory in a single query.  EF 4 introduces the ability to load related Entities “on demand”:

    using (AdventureWorksProdEntities ctx = new AdventureWorksProdEntities())
    var products = ctx.Products.ToList();
    int id = products.First().ProductSubCategory.Id;

Without lazy loading, the code above would throw a NullReferenceException because the ProductSubCategory was neither pulled in with Include(), nor loaded explicitly with Load().  EF 4 allows the code above to “automatically” query for the related ProductSubCategory if it is not already loaded.  Lazy loading is “on” by default.  Here is one of the constructors in a generated POCO context class:

public AdvWorksHREntities()
: base(ConnectionString, ContainerName)
this.ContextOptions.LazyLoadingEnabled = true;

This and all other constructors in a POCO context class has the LazyLoadingEnabled ContextOption set to true.  Accessing a related entity will automatically send a query to the database to retrieve the entity.

There’s a caveat here.  Take a gander:

    using (AdventureWorksProdEntities ctx = new AdventureWorksProdEntities())
    var products = ctx.Products.ToList();
    foreach (Product p in products)

This is a case where lazy loading can get you into trouble.  Suppose there were 100 Products in the products List<Product>.  The code above would result in 100 additional queries, one for each Product in the list.  This can get even worse in the case of one-to-many relationships.  The point to remember is that you must be very conscious in your use of lazy loading.  It can be very handy if used carefully, but it’s easy to abuse it.  This can have negative consequences for your application performance.  This is true for any OR/M, of course, but it can catch a developer upgrading from EF 1.0 unawares.

Self-Tracking Entities

We’ve seen how Entity Framework 4 is better suited to Enterprise and Service-Oriented architectures due to POCO and Persistence Ignorance.  Entity Framework 4 introduces another improvement with Self-Tracking Entities.

Service-Oriented architectures rely on serialization to move objects across service boundaries.  If we want to send an EF 1.0 Entity across a WCF service, it must be Detached from its Context using the Detach() method.  If that Entity is modified and returned back to the service layer to be persisted, it must be attached to a Context using Attach().  Unfortunately, it doesn’t end there.  The state of the Entity must be checked and the ApplyPropertyChanges() method must also be used to synchronize the modified Entity with the existing Entity in the database.  SaveChanges() can then be used to update the database.  This all amounts to a very manual, tedious process.

The Self-Tracking Entity (STE) template offers an improvement to this process by including extension methods, classes, and interfaces that track an Entity’s state after it is transmitted over a service.  There is a catch however: self-tracking entities only work over WCF, and only if the destination client is running .NET 4.  This is due to the fact that the classes and interfaces that support the tracking must be available at the destination as well as the source.  So, STEs are useful in homogenous (.NET) environments but they don’t support heterogeneous environments where the destination service based on a platform other than .NET.

Model-First Development

If you’ve used EF 1.0, you’ve seen this dialog:


There hasn’t been much use for the “Empty model” until now.  EF 4 supports designing the model first, then using that model along with POCO classes.  It also supports database generation.  If you create your EDM, add Entities to it, then right-click on an empty area of the design surface:


Select “Generate Database from Model” and you’ll get a SQL script that will generate all the tables required by your EDM.

FK Associations

One of the big headaches with EF 1.0 was foreign key references.  Remember our code sample:

using (AdventureWorksProdEntities ctx = new AdventureWorksProdEntities())
//Assigning a different ProductSubCategory
Product p = products.First();
ProductSubCategory psc = ctx.ProductSubCategory.First();
p.ProductSubCategory = psc;

All we’re doing here is reassigning the ProductSubCategory of a Product.  We should be able to do that just by setting the ID of the new ProductSubCategory foreign-key value on the Product, right?  Not so in EF 1.0.  As we see above, we actually have to go get an instance of the one we want to reassign to by querying.

This is no longer necessary in EF 4.


If you leave “Include foreign key columns in the model” checked, the foreign-key IDs become primitive properties of your entities, available for reassignment without hassle.  These properties are called “FK Associations” and brings zen back to the process of reassigning entity relationships.  EF 1.0 Associations are still included and are referred to as “Independent Associations”.

Code-only Entities

The Code-Only option is currently available only in CTP.  The CTP includes the assembly Microsoft.Data.Entity.CTP, which provides classes that support a “zero-model” method for representing Entities.  The ADO.NET team has provided a great walk-through that takes you through the process of setting up a sample project.

Code-only development is similar in style to NHibernate and other OR/M tools that focus on code-based implementations. The idea behind “code-only” or “code-first” is that the Entity Model is expressed exclusively in code, with no .edmx file in the middle.  POCO classes represent the Entities.  Relationships and other facets are coded using EntityConfiguration<T>.  Once your object model is designed, the ContextBuilder<T> can use your Context class to generate a database for your model.  ContextBuilder<T> and EntityConfiguration<T> provide complete control over the mapping model used to generate the tables.  The Entity classes can be separated from the EF-dependent classes in true POCO style.


The major new features of EF 4 include:

  • Persistence Ignorance
  • POCO support
  • T4 Code Generation
  • Model-First Development
  • FK Associations
  • Code-Only Development

The debate rages on as to whether Entity Framework 4  matches its competitors.  It does seem safe to say that EF 4 is ready for prime-time, and deserves evaluation alongside alternatives such as NHibernate and LLBLGenPro.  I’ve been using EF 1.0 since it was released and I can personally say that the improvements in EF 4 save time and promote better architecture over the previous version.


POCO in the Entity Framework: Part 1 – The Experience:

What’s New and Cool in Entity Framework 4.0:

Model First with the Entity Framework 4:

Transparent Lazy Loading for Entity Framework – part 1:

Attaching/Detaching Entities in EF 1.0:

LINQ Aggregate Queries: Multiple Group By Columns

I recently had the need to SUM a column grouped by two other columns. Here’s the view:

SELECT SolutionID, EventName, EventDuration
FROM Events

I needed this:

SELECT SolutionID, EventName, SUM(EventDuration) EventTotal
FROM Events
GROUP BY SolutionID, EventName

Here’s the LINQ:

var events = from e in summary
   group e by new { e.SolutionID, e.EventDetail } into g
   let TotalMinutes = g.Sum(x => x.EventDuration)
   orderby TotalMinutes descending
   select new EventSummary
          SolutionID = g.Key.SolutionID,
          Name = g.Key.EventDetail,
          Seconds = TotalMinutes,
          Minutes = TotalMinutes / 60.0,
          Hours = (TotalMinutes / 60.0) / 60.0                           

The magic is in line 2:

   group e by new { e.SolutionID, e.EventDetail } into g

You can group by as many columns as you need with that little nugget. Using the anonymous type feature in C# 3.0 you can group on as many columns as you need. Enjoy!

LINQ and the Entity Framework: An Overview

kick it on

Table of Contents

  1. Building the Sample
  2. A Brief Introduction to LINQ
  3. LINQ-to-[Provider]
  4. LINQ to Entities
  5. Creating an Entity Data Model
  6. EDM XML Format
  7. Object Services
  8. Retrieving Entity Data
  9. Related Entities
  10. Entity SQL
  11. Stored Procedures
  12. Updating Data
  13. Summary
  14. References

Building the Sample

Download the Sample

The sample code requires Visual Studio 2008 SP1 and the AdventureWorks sample database.

The sample is an ASP.NET application. Several basic examples of querying and updating using LINQ and the Entity Framework are included.

You may have to modify the web config depending on whether you are using SQL 2005 Express or the full SQL Server. There are two connection strings defined, one for SQL Server Express and another for SQL Server. Uncomment the one that applies to your environment.

A Brief Introduction to LINQ

There are lots of great articles out there on LINQ (see the References section,) so I will give just a general overview to provide a background.

LINQ stands for Language INtegrated Query. Its purpose is to introduce a powerful collection-querying capability to CLR-based languages.

LINQ consists of a collection of new goodies provided by parts of both .NET 3.0 and 3.5:

  • Implicitly Typed Local Variables (var keyword)
  • Lambda Expressions
  • Extension Methods
  • Object Initializers
  • Query Expressions
  • New reserved words: Select, Join, OrderBy, Where, Take, Skip, etc.
  • Anonymous Types
  • Expression Trees
  • Partial Methods

This stew of new features combines to provide us something new: queries written exclusively in your favorite .NET language They are all substantial and interesting topics unto themselves; see the References section for a deeper dive into each.

Here is a very simple demonstration of LINQ:

List<int> numbers = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
int i = (from n in numbers
where n == 1
select n).Single();

This is what LINQ is all about- querying collections. Where do those collections come from? They can be built statically and deterministically like the sample above, or they can come from a LINQ Provider.


LINQ handles returning data from a collection based on a query. LINQ Providers handle instantiating the collection to be queried from any of several persistence sources. Let’s compare traditional ADO.NET data access with the model offered by a LINQ implementation.

The traditional ADO.NET code model for accessing data from a database is:

  1. Create a database-dependent Connection
  2. Create a Command
  3. Create an Adapter
  4. Set parameters on the Command
  5. Execute the Command and use the Adapter to fill a DataSet

We’ve all used this model, and it has worked… more or less. Unfortunately, it does not really support the ideal of domain-focused development. Many development hours are burned up writing, debugging, testing, and maintaining storage dependent “plumbing” code. We try to mitigate this with code generators and loosely-coupled data access layers, but it’s still additional work and a larger codebase.

Let’s also consider all the SQL queries and stored procedures required to support application development. All those lines of SQL have to be written, documented (hopefully) and maintained along with the application code.

Now let’s look at a trivial example comparing traditional ADO.NET data access with a LINQ to SQL query:


//For this example we’re sending a text query, this could (and in production, should) be a stored procedure call
SqlConnection conn = new SqlConnection(@"Data Source=localhost\SQLEXPRESS;User ID=advworks;Password=advworks");
SqlCommand cmd = new SqlCommand("SELECT * FROM Person.Contact WHERE LastName LIKE 'A%'", conn);
DataSet ds = new DataSet("Contacts");
SqlDataAdapter adapter = new SqlDataAdapter(cmd);
cmd.CommandType = System.Data.CommandType.Text;
GridView1.DataSource = ds;


     //LINQ to SQL provides a basic set of Object-Relational Mapping (OR/M) capabilities.
     //LINQ to SQL is useful in relatively simple applications where a 1:1 mapping between classes and tables is        sufficient.
     AdventureWorksDataContext ctx = new AdventureWorksDataContext();
     var contacts= from s in ctx.Contacts
                    where s.LastName.StartsWith("A")
                    select s;
     GridView1.DataSource = contacts;

The examples above both demonstrate the same simple task: retrieval of all Contacts in the AdventureWorks database whose last names start with “A”, then binding that data to a GridView. The important thing to notice about the LINQ example is that there is no trace of database-dependent SQL. The data store has been abstracted away by, in this case, the SQL Server LINQ Provider (LINQ to SQL.) We can also see that the LINQ code is much more concise and readable.

LINQ cares not what your underlying data store looks like. Its purpose is to allow you to query against the in-memory collections produced by the storage-dependent provider of your choice. It is important to remember the distinction between LINQ and a LINQ Provider: LINQ represents only the language feature enhancements that support the “native” collection-querying capabilities. LINQ Providers bridge the gap between application code and storage implementation.

There are several LINQ providers available:

  • LINQ to SQL
  • LINQ to DataSet
  • LINQ to XML
  • LINQ to Objects
  • LINQ to YourData (write your own!)

And, last but certainly not least…

LINQ to Entities

LINQ to Entities combines LINQ with the ADO.NET Entity Framework.

When dealing with data, there is another restriction in the traditional data access model: we’re forced to deal directly with relational data. Relational data is optimized for storage and retrieval. Object-Oriented Design often prefers a very different representation of the Objects in an Object Model than we find in the relational data schema. This problem is sometimes described as the “impedance mismatch” between relational and object-oriented representations.

This mismatch is addressed through Object Relational Mapping, or OR/M. OR/M tools such as LLBLGenPro, NHibernate, and indeed the Entity Framework, serve to create an abstraction layer between an Entity Model and the database storage schema. The Objects exposed by an OR/M are strongly-typed objects that can be instantiated, manipulated, and then committed back to the storage medium by working with the Objects directly. OR/M tools allow developers to work with data as Objects without having to be concerned about the storage details. This is generally known as Persistence Ignorance[14].

The Entity Framework uses an Entity Data Model (EDM) to represent the Entities stored in a database. An EDM is an XML file that defines a set of Entities (the Conceptual Model) and mappings (the Mapping model) to the tables (the Storage Model) in which the Entity data are stored. EDMs expose a set of strongly-typed Entities to the application, abstracting the relational storage.

[Figure 1: AdventureWorksModel.edmx]

Visual Studio 2008 SP1 includes a designer for Entity Data Models. Figure 1 displays an EDM created in Visual Studio that targets the AdventureWorks sample database. The Entity Data Model Wizard assists in creating new EDMs based on an existing database:

Figure 2: Adding a new Entity Data Model

Once a database connection is established, the wizard will present a list of objects from the database (tables, views and stored procedures) that may be selected for inclusion in the new EDM.

Creating an Entity Data Model

The EDM in Figure 1 appears to be a simple database diagram, but there is an important difference: Entities in the EDM do not have to map directly to database tables. A single Entity may span several normalized tables[11]. Conversely, several related Entity types may be stored in a single table. The EDM exposes the Entities to the business layer of an application as a set of types.

True to object-oriented convention, EDMs also support the concept of inheritance. There are three types of inheritance we can model:

  • Table per Hierarchy
    • All derived types are stored in a single table, distinguished by a single column called the discriminator.
  • Table per Type
    • The data common to all derived types (the superclass) is stored in one table, and the data for each derived type is stored in its own table
  • Table per Concrete Type
    • All the data for an Entity type is stored in a single table.

Each of these inheritance types is modeled in the CustomerModel.edmx found in the sample.

EDM XML Format

The EDM Designer is a great tool, but as always it’s helpful to understand the underlying XML format. Let’s crack open the XML of AdventureWorks.edmx.

<Schema Namespace="AdventureWorksModel.Store" Alias="Self" Provider="System.Data.SqlClient" ProviderManifestToken="2005" xmlns:store="" xmlns="">
<EntityContainer Name="Person">
<EntitySet Name="Address" EntityType="AdventureWorksModel.Store.Address" store:Type="Tables" />
<EntitySet Name="Contact" EntityType="AdventureWorksModel.Store.Contact" store:Type="Tables" />
<EntitySet Name="Product" EntityType="AdventureWorksModel.Store.Product" store:Type="Tables" Schema="Production" />
<EntitySet Name="SalesOrderDetail" EntityType="AdventureWorksModel.Store.SalesOrderDetail" store:Type="Tables" Schema="Sales" />
<EntitySet Name="SalesOrderHeader" EntityType="AdventureWorksModel.Store.SalesOrderHeader" store:Type="Tables" Schema="Sales" />
<AssociationSet Name="FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID" Association="AdventureWorksModel.Store.FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID">
<End Role="SalesOrderHeader" EntitySet="SalesOrderHeader" />
<End Role="SalesOrderDetail" EntitySet="SalesOrderDetail" />

The first major node under Runtime is the StorageModels node. This node contains a representation of the physical tables from which the Entities retrieve their data. The EntityContainer node represents a database schema (i.e. ‘dbo’). An EntitySet relates directly to a table. AssociationSets represent foreign key relationships.

The next major node contains the Conceptual model:

<Schema Namespace="AdventureWorksModel" Alias="Self" xmlns="">
<EntityContainer Name="AdventureWorksEntities">
<EntitySet Name="Address" EntityType="AdventureWorksModel.Address" />
<EntitySet Name="Contact" EntityType="AdventureWorksModel.Contact" />
<EntitySet Name="Product" EntityType="AdventureWorksModel.Product" />
<EntitySet Name="SalesOrderDetail" EntityType="AdventureWorksModel.SalesOrderDetail" />
<EntitySet Name="SalesOrderHeader" EntityType="AdventureWorksModel.SalesOrderHeader" />
<AssociationSet Name="FK_SalesOrderHeader_Address_BillToAddressID" Association="AdventureWorksModel.FK_SalesOrderHeader_Address_BillToAddressID">
<End Role="Address" EntitySet="Address" />
<End Role="SalesOrderHeader" EntitySet="SalesOrderHeader" />
<AssociationSet Name="FK_SalesOrderHeader_Address_ShipToAddressID" Association="AdventureWorksModel.FK_SalesOrderHeader_Address_ShipToAddressID">
<End Role="Address" EntitySet="Address" />
<End Role="SalesOrderHeader" EntitySet="SalesOrderHeader" />
<AssociationSet Name="FK_SalesOrderHeader_Contact_ContactID" Association="AdventureWorksModel.FK_SalesOrderHeader_Contact_ContactID">
<End Role="Contact" EntitySet="Contact" />
<End Role="SalesOrderHeader" EntitySet="SalesOrderHeader" />
<AssociationSet Name="FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID" Association="AdventureWorksModel.FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID">
<End Role="SalesOrderHeader" EntitySet="SalesOrderHeader" />
<End Role="SalesOrderDetail" EntitySet="SalesOrderDetail" />
<EntityType Name="Address">
<PropertyRef Name="AddressID" />
<Property Name="AddressID" Type="Int32" Nullable="false" />
<Property Name="AddressLine1" Type="String" Nullable="false" MaxLength="60" Unicode="true" FixedLength="false" />
<Property Name="AddressLine2" Type="String" MaxLength="60" Unicode="true" FixedLength="false" />
<Property Name="City" Type="String" Nullable="false" MaxLength="30" Unicode="true" FixedLength="false" />
<Property Name="StateProvinceID" Type="Int32" Nullable="false" />
<Property Name="PostalCode" Type="String" Nullable="false" MaxLength="15" Unicode="true" FixedLength="false" />
<Property Name="rowguid" Type="Guid" Nullable="false" />
<Property Name="ModifiedDate" Type="DateTime" Nullable="false" />
<NavigationProperty Name="SalesOrderHeader" Relationship="AdventureWorksModel.FK_SalesOrderHeader_Address_BillToAddressID" FromRole="Address" ToRole="SalesOrderHeader" />
<NavigationProperty Name="SalesOrderHeader1" Relationship="AdventureWorksModel.FK_SalesOrderHeader_Address_ShipToAddressID" FromRole="Address" ToRole="SalesOrderHeader" />

Object Services

Object Services is the component of the Entity Framework that supports query, insert, update, and deletion of Entity types. The Entity Framework uses EDMs to auto-generate classes based on Object Services.

Let’s look into the code that is generated from the AdventureWorksModel EDM (AdventureWorks.Designer.cs):

    public partial class AdventureWorksEntities : global::System.Data.Objects.ObjectContext
    { /// <summary>
      /// Initializes a new AdventureWorksEntities object using the connection string found in the 'AdventureWorksEntities' section of the application configuration file.
     /// </summary>
     public AdventureWorksEntities() : base("name=AdventureWorksEntities", "AdventureWorksEntities")
     { this.OnContextCreated(); }
     /// <summary>
     /// Initialize a new AdventureWorksEntities object.
     /// </summary>
     public AdventureWorksEntities(string connectionString) : base(connectionString, "AdventureWorksEntities")
     { this.OnContextCreated(); }
     /// <summary>
     /// Initialize a new AdventureWorksEntities object.
     /// </summary>
     public AdventureWorksEntities(global::System.Data.EntityClient.EntityConnection connection) : base(connection, "AdventureWorksEntities")
     { this.OnContextCreated(); }
     public global::System.Data.Objects.ObjectQuery<Address> Address
             if ((this._Address == null))
             { this._Address = base.CreateQuery<Address>("[Address]"); }
             return this._Address; }

Entity Framework code generation creates classes in a Façade-like pattern that defines the connection to the EDM and exposes strongly-typed collection classes that represent queries. By default the class generated will be [ModelName]Entities, so in this case we have AdventureWorksEntities.

The AdventureWorksEntities class derives from System.Data.Objects.ObjectContext. There are three constructors: one that gets the connection string from the configuration file by its key name, one that accepts the connection string as a parameter, and one that accepts an already-instantiated System.Data.EntityClient.EntityConnection object.

The ObjectContext class maintains the Entity metadata and tracks changes to Entity objects using an internal object graph. The change tracking supports the generation of the dynamic SQL required to perform automatic updates on Entity-typed objects.

The generated AdventureWorksEntities class exposes a public member of type System.Data.Objects.ObjectQuery for each Entity in the EDM. The ObjectQuery generic class represents a collection of strongly-typed Entity objects. It implements interfaces such as IEnumberable, which supports foreach iteration, IListSource, for databinding, and IQueryable, which supports LINQ querying.

Retrieving Entity Data

The ADO.NET Framework exposes EDMs as a datasource. In fact, if you look in the Web.config of the sample, you will find an entry in the section that looks like a connection string. This is the connection string used by the Entity Framework to connect to the AdventureWorksModel EDM in the sample project.

<add name="AdventureWorksEntities"
provider connection string=&quot;
Data Source=localhost;Initial Catalog=AdventureWorks;
Integrated Security=True;MultipleActiveResultSets=True&quot;"

Note the “res://” entries in this string. The Conceptual, Storage, and Mapping models (.csdl, .ssdl, and .msl respectively) are embedded as resources into the compiled assembly. These three models are each sections of the EDM file, Adventureworks.edmx.

Code Sample:

     using (AdventureWorksEntities adv = new AdventureWorksEntities())
          var orderHeader = (from o in adv.SalesOrderHeader
                              where o.DueDate < System.DateTime.Now
                              select o);

This example retrieves a collection of SalesOrderHeader Entities from the datastore where the DueDate has passed. What happens when this code is executed? The orderHeader var becomes an object of type IQueryable. That’s it- the database is not actually queried until:

     DataGrid1.DataSource = orderHeader;


     foreach (SalesOrderHeader so in orderHeader)
     { //do something }

These routines actually iterate the IQueryable, which causes the Entity Framework to execute the represented query against the database and return the materialized Entities. IQueryable methods that cause the database to be queried include First(), FirstOrDefault(), Last(), and LastOrDefault().

Related Entities

Entity relationships in an EDM are represented by Navigation Properties. Take a look at the SalesOrderHeader entity in the AdventureWorks EDM. At the bottom of the property list, in a separate section, you will see a group of properties that take on the names of the related entities, such as Address. The Entity Framework creates these navigation properties by reading the foreign-key relationships defined in the data schema.

When a LINQ query is executed and an Entity or EntityCollection is retrieved, what happens with those related entities? By default, the Entity Framework will only load related entities if this is done explicitly in the query. This is by design, to limit the size of queries generated by a query. Depending on the organization of a particular schema, some entities may have several one-to-many relations with others. A query for a single entity could easily load hundreds or thousands of related entities. Therefore, loading of related entities is left up to the developer to handle at design time, so that the queries can be optimized to retreive only needed data for a given functional scenario. Here’s an example of loading related entities explicitly:

                             var completeHeader = (from o in adv.SalesOrderHeader.Include("SalesOrderDetail")
                             where o.DueDate < System.DateTime.Now
                             select o).First();

This is called “eager loading,” where we load the related set of entities in the original query. In this example, we’re getting the first SalesOrderHeader from the database, including all of that header’s SalesOrderDetails. We then print the first CarrierTrackingNumber of the first SalesOrderDetail in that collection. Note the Include(“SalesOrderDetail”) called on the SalesOrderHeader ObjectQuery. If it was not part of the query, the Entity Framework would throw an exception when it attempted to access the related SalesOrderDetail Entity in line 4.

It is possible to find out if a related collection is loaded by checking the IsLoaded property of any ObjectQuery (navigation properties are of type ObjectQuery). If the collection is not loaded, the Load() method can be used to execute a query against the data store to load the collection:

    SalesOrderHeader so = (from o in adv.SalesOrderHeader
                              select o).First();
     if (!so.SalesOrderDetail.IsLoaded)
     { so.SalesOrderDetail.Load(); }

This is referred to as “lazy loading.” The related entity is loaded after the original query that returned the SalesOrderHeader has executed.

Entity SQL

The auto-generated AdventureWorksEntities class is not the only way to get Entity data from an EDM. Remember, an EDM is exposed as a data source. The Entity Framework includes a SQL-derived dialect called Entity SQL, or ESQL. ESQL looks like T-SQL but ESQL queries against the Entities in an EDM, so it is “aware” of the data as Entities, not as tables.

We can get a single “row” of data as an anonymous, structured record using the ROW keyword:

    SELECT VALUE row (product.ProductID as ProductID, product.Name
    as ProductName) FROM AdventureWorksEntities.Product AS product

We can return an Entity or Complex type directly from ESQL using a Named Type Constructor. Here is a query that returns a materialized object of type SalesOrderDetail:

SELECT VALUE AdventureWorksModel.SalesOrderDetail (o.SalesOrderDetailID, o.CarrierTrackingNumber, o.OrderQty,
    o.ProductID, o.SpecialOfferID, o.UnitPrice, o.UnitPriceDiscount,
    o.rowguid, o.ModifiedDate) FROM AdventureWorksEntities.SalesOrderDetail
    AS o

We can even set references to Entities in ESQL and get their property values with REF:

    AdventureWorksEntities.Product as p

Once we have written an ESQL query, we can execute it against an EDM in much the same way as a traditional ADO execution. In the following example, the EntityConnection, EntityCommand, and EntityDataReader (sound familiar?) are used to return a set of strongly-typed Products from the AdventureWorks database.

   using (EntityConnection conn = new EntityConnection(connectionString))
        string esqlQuery = "SELECT VALUE Product FROM AdventureWorksEntities.Product AS Product";
        // Create an EntityCommand.
        using (EntityCommand cmd = conn.CreateCommand())
            cmd.CommandText = esqlQuery;
            // Execute the command.
            using (EntityDataReader rdr =
                // Start reading results.
                while (rdr.Read())
                    StructuralTypeVisitRecord(rdr as IExtendedDataRecord);

        Console.WriteLine("Just testing the connection.");

The EntityDataReader contains a collection of Product Entities, instead of DbDataRow(s) as we’d get with IDbDataReader.

Stored Procedures

If an application already uses stored procedures, those procedures can be added to an EDM as functions. This is done by adding a Function Import to the EDM. Stored Procedures can be used to return Entity types, as custom logic for insert, update, and delete, and to modify many-to-many relationships. Stored procedures imported as functions can accept parameters, as one would expect.

Suppose we had a stored procedure in our AdventureWorks database called “MyCustomerOrderHeaderProc.” If we add a function import to the EDM, we can call that stored procedure as if it were a function of the ObjectContext:

    using (AdventureWorksEntities adv = new AdventureWorksEntities())
          var orderHeaders = from o in adv.MyCustomOrderHeaderProc()
                                select o;

In this case, the function import is set up to return Entities of type SalesOrderHeader. Function imports can also return nothing, or a scalar value (scalar values are broken in EF v1 but will work in the next version.)

Updating Data

If you’re familiar with ADO.NET, you may remember our good old friend the SqlCommandBuilder, which allows us to automagically update a database from changes made to an in-memory Dataset. Entity Framework expands on this concept:

    using (AdventureWorksEntities adv = new AdventureWorksEntities())
          var completeHeader = (from o in adv.SalesOrderHeader.Include("SalesOrderDetail")
                                  where o.DueDate > System.DateTime.Now
                                  select o).First();
          completeHeader.ShipDate = System.DateTime.Now;
          var details = completeHeader.SalesOrderDetail.Where(x => x.UnitPrice > 10.0m);
          foreach (SalesOrderDetail d in details)
               d.UnitPriceDiscount += 5.0m;

Here we have retrieved all the SalesOrderHeader records where the DueDate has passed. The ShipDate is set to the current time (this might be part of an order fullfilment routine.) Then, on line 7, the Where function is used with a predicate to return all OrderDetails where the price is over $10. The discount on those items is increased by 5% (a customer-specific discount, perhaps?)

Whew! That’s a lot of changes! Now I have to trudge over and update the database by… calling SaveChanges() on AdventureWorksEntities. Oh my poor typing fingers!

The AdventureWorksEntities ObjectContext-derived class tracks the changes we make to those objects, then generates the SQL necessary to effect those changes in the persistence store when we call SaveChanges().


LINQ and the ADO.NET Framework combine collection querying and OR/M to provide a powerful data access platform as part of .NET 3.5 and .NET 4.0.

LINQ is useful in any scenario where a collection needs searching, whether that collection is a simple array of primitive values, or a generic List. LINQ Providers allow LINQ to query against those collections while hiding the details of their persistence.

The Entity Framework is an OR/M that uses Entity Data Models, which are XML files, to generate an Object Services class. This generated class exposes strongly-typed Data Objects referred to as Entities. Entities are queryable via LINQ, track changes to themselves, and automatically generate the SQL code required to retrieve, insert, update, and delete themselves. Relationships between Entities are automatically created by reading foreign key constraints in the database and are represented as Navigation Properties. The developer has control over how and when these related entities are loaded, through eager or lazy loading.

Future versions of Entity Framework will introduce new features:

  1. POCO (Plain-Old CLR Object) Support
  2. Model First
  3. Visual Studio EDM designer improvements
  4. FK Assocations / FK Properties
  5. “CodeOnly” Entity Data Models

I’ll be discussing these new features in future posts, stay tuned! Thanks for reading!


  1. LINQ on MSDN
  2. Implicitly Typed Local Variables
  3. Lambda Expressions
  4. Extension Methods
  5. Object Initializers
  6. Query Expressions
  7. LINQ keywords
  8. Anonymous Types
  9. Expression Trees
  10. Partial Methods
  11. How to: Define a Model with Single Entity Mapped to Two Tables
  12. Entity SQL
  13. Entity Data Models
  14. Persistence Ignorance


Get every new post delivered to your Inbox.