Hacking Bangkok Blog

Tech and Living in Thailand's City of Angels

The Hacking Bangkok blog covers I.T. and technology in general, and my experiences working and living in the Kingdom of Thailand. Bangkok has a very long Thai name, which starts with Krung thep - City of Angels.
Bangkok sunset from my bedroom balcony

Tuesday, March 31, 2009

Adventures With Linq-to-SQL: The Update Dilemma


I haven't written a post that goes into much (any?) detail on what I am working on over here, so this is that post. Since December, I've been working on a set of applications for a recruiting company, to replace the apps that they're using now (which I also wrote, about three years ago mostly). The old version, while it works, has some design flaws (some of which I inherited by way of using the database that held the previous never-finished attempt to create the app, built by some local programmers), some of which were inherent in using Windows .Net forms, and some of which resulted in my (over-) reliance on DataSets for storing object data.

So the new version of the app is an almost-complete rewrite; it uses SQL 2008 (or SQL 2005) rather than SQL 2000, but the bigger changes are that I'm using WPF for the UI, and Linq-to-SQL for almost 100% of the data access layer. Linq is beguilingly easy to use. Instead of writing code along the lines of, "If Not IsDBNull(ContactDataSet.Tables(i).Rows(j).Item("Name"))", I can just write, "myContact.Name" and get a string value. Linq objects are strongly-typed (granted, you can create strongly-typed datasets as well), and once you lean the very simple to use Linq query syntax, you can fetch objects out of the database with very little code, and no t-sql at all. Inserting new objects into the database is equally easy, and I love that columns with default values in the database can be set to automatically-sync their values back to the object on inserts or updates.

It's only when you go to update an existing object that you start finding out why people have so much trouble with Linq: because if you fetched an object with one "DataContext", which you shoud promptly discard (the typical pattern of usage is, "Using myDC as New MyDataContextClassname() ... CRUD... End Using), and then update it, and then create a new instance of your DataContext, trying to save it becomes annoying. By default, for the new DataContext instance to save changes to your item, you need to attach it, with a call to the context's .Attach() method. And there the fun starts!

Trying to attach an object you loaded from a different context will, unless you do some extra work, throw an exception. You'll get an error along the likes of "Attempt to attach an object that is not new, or loaded by a different context." Now, my app uses a lot the "deferred loading" in Linq, which means that a linq object has properties that represent other linq objects, or collections of those objects, which aren't loaded until they're requested by something. So, a contact object like MyContact (an instance of Contact, which is pulled from a Contacts table, for example) might have properties called "Company_Id" (from FK column of the same name), and another property called "Company". The "Company" property is actually of type EntityRef(Of Company). Or, if the relationship from Contacts -> Companies in your database is one-to-many, then MyContact probably has a property called "Companies", which returns (natch!) a collection of the Company objects to which it's related via the PK-FK relationship. The "Companies" property of MyContact is actually of type EntitySet(Of Company) in this case. If you reference MyContact.Company, you will get a Company object if there is one, and so on, so you probably don't even see the EntityRef stuff until later. When you try to update your contact!

Either way, either in using WPF databinding or in code-behind, this design makes it really easy to get to related objects and collections of them. So in WPF, if I have a bunch of elements bound to MyContact, I might have an element whose content property (or whatever property) is set to "{Binding Path=Company.CompanyName}", or even "{Binding Path=Company.Country.CountryName}". During binding, the "Company" property is called automatically, which then loads (if it's not already loaded) the Company object, or for the second example there, it will actually fetch a Country object that's linked to the Company object that is in turn linked to MyContact. All with no code! See ma, no hands!!

The problem is that now, MyContact has its "Company" property set. If you dig a little, you'll see that in the .dbml file (the Linq-to-sql class and mapping file), that the Company property is, as I mentioned, of type EntityRef(Of Company), and that it's just a public property wrapper for the private member _Company. And variables of type EntityRef have a property called .HasLoadedOrAssignedValue, which is boolean, and .Entity, which essentially refers to some object (of type Company, for this example) in the same DataContext. You see, DataContext has one "object" for each table that you've added to your .dbml file, which represents a table of those objects. When my XAML (or code-behind) calls the .Company propert of MyContact, the DataContext that MyContact belongs to duly loads that company object into it's Companies table, and points the private member MyContact._Company.Entity so that it refers to that company.

Now - after binding is done, and I've disposed of my datacontext like a good boy - I still have access to that Company object, via MyContact. I can even create a new object, "MyCompany", and set it equal to MyContact.Company, and then have my way with it. But when I update MyContact, instantiate a new datacontext (again, "Using NewDc as New MyDataContextName().."), and call NewDc.Attach(MyContact), it will fail. MyContact has those EntityRefs that refer to objects (like Company) that are in a datacontext I already disposed of. Even though the object is there, and accessible. I'm going to confess that I'm not sure why, fundamentally, the Linq team couldn't have come up with a built-in solution, something like "MyContact.Detach" (to clear out it's entity-refs and entity-sets). But anyway.

There are multiple ways around the dilemma. First, keep in mind that, in order to perform concurrency checks (that is, to ensure one user isn't overwriting another user's changes), Linq will either check the old values against the current ones, or (if you're using a rowversion/timestamp column), it will just check that. For simplicity, I added rowversion columns to all my tables containing editable records.

Okay, I said there were ways to save your data.

(1) You can pass the .Attach method both the current state of your object, plus it's original state. Of course, that original state is only available from the already-disposed datacontext you used to load it (if it wasn't disposed, you could call, "myDC.Contacts.GetOriginalEntityState(MyContact)" which returns the Contact object with all the properties set to their original values, when you pulled it from the database). You could also just keep a copy of every object that you plan to change, so you have the old copy to pass to .Attach() - I did play around with this, and didn't have much success. Also, it's annoying (to me) to have to manually keep an extra copy of objects I'm about to change, just to update them.

(2) You could turn off DeferredLoading on your datacontext, thus killing one of the great benefits (in my opinion) using linq in the first place. I never really considered mucking with changing the deferred loading on the fly, but that might be viable for some people. You can also set ObjectTrackingEnabled to False, but you can read on Rick Strahl's blog why this is usually pointless (in short, you can't get to linked objects at all).

(3) You could do what I did when prototyping this project, which was to keep a single DataContext in an object, and open, insert and update every object using that datacontext (this turns out to be the opposite of what the MSDN documentation recommends, maybe because it's keeping a database connection open, I'm not sure). Anyway - the datacontext objects are meant to be used then disposed.

(4) You could use some elegant feature or method in Linq that I don't know of yet - and if you do know of it, please leave me a comment below!!!

Or (5), you can write you own .Detach methods, which is what I did.

This might not be an ideal solution for many people (possibly even for me!), but it has the advantage that (a) it works, and (b) I already did it. Linq's classes are all generated as partial classes, so you can extend them willy-nilly in another partial class (normally in the file that's created if you right-click the .dbml file in VS solution explorer, and choose "View Code"). I created a new public method, "Save()", for each of the types of objects that users will edit (i.e., Contact, Company, Appointment, and so on). I also wrote two private methods for each, "Detach()" and "RestoreContext()".

If the property is not an EntitySet, but just an singular EntityRef (so, one company, not a collection), you can just assign null/nothing to property's .Entity propety, like so: Me._Company.Entity = Nothing. Now, in a few places in my app, I might want to get at that object again without having to load it (again) from the database. In fact, if I call my .Save() method, which first calls my .Detach() , then any WPF fields that were bound to properties of the company (like in the binding examples I gave above) will suddenly go blank after the save. You could set the binding mode to one-time, but maybe I want the user to be able to actually see more properties of the company in a pop-up box or something. So my solution was pretty straight-forward: I added private variable declarations in my partial classes. I declare two kinds of things, depending on my object:

Private CompanyEntity As Company
Private HobbyEntitySet As EntitySet(Of Hobby)

Note that I don't really have a list of hobbies, just using it as an example. So anyway, in my Detach() method, I put:
If Me._Company.HasLoadedOrAssignedValue() Then CompanyEntity = Me._Company.Entity

And finally, the public .RestoreContext() method I wrote just reverses the process - it sets Me._Company = CompanyEntity, then sets CompanyEntity back to nothing/null. I don't always need to restore the context - if the user clicked, "Save & Close", for example, I just call the .Save method on my object. If I'm saving some changes, but keeping the object (MyContact) around, I just call .Save, and then .RestoreContext().

If the property IS an EntitySet, I use the same principle.
The auto-generated linq classes do already have (private) methods for these EntitySet(Of T) properties. These methods are in the main .rbml file, and are called "detach_xxxxx" and "attach_xxxxx" which are called whenever you do something like, "MyContact.Companies.Remove(SomeCompany)" or "MyContact.Companies.Add(SomeCompany)" (assuming a one-to-many with Companies, obviously). When you instantiate a new Company - which happens whenever you load one from the DB, or attach it to a context, or declare one (eg., "Dim c As New Contact) - the generated linq code assigns attach_xxxxx and detach_xxxxx as the functions called whenever you add or remove a Company, say, to or from the .Companies list. And so I've seen posts on the web where people wrote .Detach() methods in a partial class (which is where I got the idea!), and they loop through each attached object, calling .Remove on each, like so: "For each c As Company in Me.Companies.... Me.Remove(c)... Next", or the equivalent in C#. Maybe that's what I should have done, but....

Being a bit lazy, I went for the rather quicker and dirtier one-line solution:
Me._Hobbies = New EntitySet(Of Hobby)

You can optionally specify those attach_xxxxx and detach_xxxxx functions, like this:
Me._Hobbies = New EntitySet(Of Hobby)(AddressOf Me.attach_Hobbies, AddressOf Me.detach_Hobbies)

If you know you're about to dispose of the object, or if you know you won't be calling "MyContact.Hobbies.Add(SomeNewHobby)", then you don't really need to worry about specifying the callbacks. I only needed to do that for one object, the rest never have things added/removed like that.

And again, because I sometimes want to keep those collections of objects in the EntitySets around (maybe I have a ComboBox bound to it or something), I use private vars to hold the original values, and assign them back again in .RestoreContext.

So, for the Company and Hobbies example, the total code in the partial "Contact" class might look like this:

Private CompanyEntity As Company
Private HobbyEntitySet As EntitySet(Of Hobby)

Private Sub Detach()
If Me._Company.HasLoadedOrAssignedValue() Then CompanyEntity = Me._Company.Entity
Me._Company.Entity = Nothing
HobbyEntitySet = Me._Hobbies
Me._Hobbies = New EntitySet(Of Hobby)
End Sub

Public Sub Save(ByVal User As String)
Using db As New MyDataContext
Me.Detach()
If Me.ContactID = 0 Then
Me.CreatedBy = User
db.Contacts.InsertOnSubmit(Me)
Else
db.Contacts.Attach(Me, True)
Me.recmodifiedBy = User
Me.recmodifiedDate = Date.Now
End If Try
db.SubmitChanges()
Catch e As ChangeConflictException
'handle concurrency conflict here, and use .Refresh and .SubmitChanges() as needed.
End Try
End Using
End Sub

Public Sub RestoreContext()
Me._Hobbies = HobbyEntitySet
Me._Company.Entity = CompanyEntity
HobbyEntitySet = Nothing
CompanyEntity = Nothing
End Sub

For now, this is working for me, although if I run into any wierdness, I'll post about that later. Many thanks to all the great Linq-to-SQL blogs I've read, which led me to ultimately implement this hack-like solution. Like I said - if you know of a more elegent solution, let me know, thanks! Also, apologies for any typos in the code above that I've missed...

So hopefully most of my posts won't be this techie, unless I get a lot of positive feedback (which seems unlikely!!). Do people prefer the pictures of cows maybe?

0 comments: