Friday, December 4, 2009

WeakReference for lazy loading and releasing

One thing that always bothered me when working with nhibernate and lazy objects is the following problem:
hibernate will automatically generate a wrapper proxy for your objects and fill them only when a data is requested. so far so good, but what about when i want to get rid of them?

Lets start at the beginning.
Many applications require processing large datasets, these datasets are often way too large to fit in memory.
Another thing that is common is that many applications do not require all of the records all the time.
Because of this ORMs like hibernate support lazy loading, you pay with an extra fetch for the benefits of a faster initial query and less memory usage.
When a proxyed object is accessed, their data is loaded into memory and remains there until some explicit action is taken.
in any case, a proxied object will remain in memory as long as their session is active, and they are attached to it.

Now, lets look at the case of an application like the one i am currently implementing, i have a very large dataset that i wish to present to my users.
They can only see and use a small portion of the dataset at a time, but they need to be able to access all of it.
Our interface of choice is a single data table, and our record counts are hundreds of thousands for the average case and millions for the larger sets.

Loading 100k records into memory, while possible takes about 24 seconds on my machine, and consumes about 100mb of ram.
Obviously 24 seconds is too long for a user to wait until they get their data (There are several sets of 100k, which we switch between) and 150 mb is waay to much as it doesnt scale for our worst case, on an average user's machine.

Its not hard to imagine that many other applications have similar requirements - handle large volumes of data without requiring insane amount of memory and ages to load.

The classical solution to this problem is pagination - we divide our dataset into segments and execute a query to load the required page.
This solution works pretty well in many cases, but in my particular case unless the pages are very small, page loading will be visible (And annoying) to the user.

So the problem is this - how do i load data only when it's required and also allow it to be released when its not currently being used.
I would also like the solution to be as simple as possible, and require as little management as possible.

So, after thinking about it for a while i came up with this:
Create an object that will hold some way of retrieving the full record (non lazy).
Hold a WeakReference in that object pointing to the actual object.
Expose a property that will check the reference, return the instance or reload it if its not there.

The solution looks like this:


class WeakWrapper<WrappedData>
{
long id;
WeakReference reference;

public WrappedData Data
{
get
{
WrappedData result;
//check if something is set in our reference
if (reference != null)
{ //we have - lets see if it is valid
result = reference.Target as WrappedData;
if (result!=null)
return result; //return the data.
}
//no reference, load our data from the db.
result = LoadData();
//set the reference
reference = new WeakReference(result);
//return the result
return result;
}
}
/// <summary>
/// This method loads the data from an nhibernate session
/// using the record id.
/// </summary>
/// <returns></returns>
private WrappedData LoadData()
{
ISession session = ActiveRecordMediator
.GetSessionFactoryHolder()
.CreateSession(typeof(EntityType));
//load all the entities

IList entities =
session.CreateCriteria(typeof(EntityType))
.Add(Expression.Eq("Id", id))
.List();
//release the session
ActiveRecordMediator
.GetSessionFactoryHolder()
.ReleaseSession(session);

return (WrappedData)entities[0];
}
}



This solution works relatively well, but when the DataGrid that displays this data is scrolled down it takes a noticable amount of time for elements to load.
The main reason for this is that elements are loaded individually.

To avoid loading them individually, we group the calls and then load all the elements in a single statement.
To acheive this i used a nice trick i found on Tomer Shamam's blog.
Tomer did implemented a similar mechanism using a custom collection and a caching mechanism.
He used the Dispatcher Thread's prioritized InvokeLater method to deffer the invocation of the fetch until several calls have been made, and only then load the data.
This method works much better then a single fetch method.
My version, using The WeakReference mechanism is implemented via a modified version of the WeakWrapper, and a Bulk Loader that fetches the record via the deferred loading operation.

My particular implementation uses nhibernate, and an "int" id.
these can be easily changed for any kind of fetch (in the loader's load method) and any kind of "id".

The modified wrapper looks like this:

/// <summary>
/// A class that holds a weak reference to an entity.
/// it requires an entity that implements from IEntity
/// (which contains a single property int Id)
/// </summary>
/// <typeparam name="EntityType"></typeparam>
public class WeakEntityWrapper<EntityType> : INotifyPropertyChanged
where EntityType : IEntity

{
private static BulkEntityLoader<EntityType> loader
= new BulkEntityLoader<EntityType>();
public event PropertyChangedEventHandler PropertyChanged;
private WeakReference _objectReference = null;
private int _objectID;

/// <summary>
/// create a new wrapper with the provided id.
/// </summary>
/// <param name="objectID"></param>
public WeakEntityWrapper(int objectID)
{
this._objectID = objectID;
}
public WeakEntityWrapper(EntityType entity)
{
//get the id
this._objectID = entity.Id;
//save the reference
this._objectReference = new WeakReference(entity);
}
public EntityType Entity
{
get
{
//check if we have an initialized reference
if (_objectReference!=null)
{
EntityType result =
_objectReference.Target as EntityType;
//check if we have a valid target
if (result!=null)
return result; //return the target.
}
loader.LoadEntity(this);
return null; //nothing for now, but soon! muahahahah


}
set
{ //the loader got back to us -
//set the entity and tell whoever is listening
_objectReference = new WeakReference(value);
//notify anyone who is listening
if (PropertyChanged!=null)
PropertyChanged
.Invoke(this,new PropertyChangedEventArgs("Entity"));
}
}
/// <summary>
/// the persistant id of the object.
/// </summary>
public int ObjectId
{
get { return _objectID; }
set { _objectID = value; }
}


}


As you can see, the main difference is that the Load method has been externalized
and that we now have a "set" method (that will be called by the loader.
This method will notify the viewer that it needs to update itself again, and this time it will get a non-null reference.

The BulkEntityLoader uses ActiveRecord and NHibernate and looks like this:




/// <summary>
/// uses deffered actions to group loading operations.
/// this class is NOT thread safe.
/// it expectes to be called by a single (UI) thread.
///
/// </summary>
/// <typeparam name="EntityType"></typeparam>
class BulkEntityLoader<EntityType>
where EntityType : IEntity
{
protected static readonly ILog Logger =
LogManager.GetLogger(typeof(BulkEntityLoader<EntityType>).Name);
//here we save all the deferred wrappers for later
private Dictionary<long,WeakEntityWrapper<EntityType>> deferredWrappers
= new Dictionary<long, WeakEntityWrapper<EntityType>>();

//flag to mark the that the deffered action should only happen once
private volatile bool _isDeferred = false;

/// <summary>
/// NOT THREAD SAFE. FOR UI THREAD ONLY
///
/// mark an entity for future loading.
/// </summary>
/// <param name="wrapper"></param>
public void LoadEntity(WeakEntityWrapper<EntityType> wrapper)
{
//check if a deffered action was already set in place
if (!_isDeferred)
{
//mark our flag
_isDeferred = true;
//no deffered action, create a new one.
Dispatcher.CurrentDispatcher.BeginInvoke(
DispatcherPriority.Render,
(Action)LoadEntities);

}
//check if the wrapper is in our collection, if not - add it.
if (!deferredWrappers.ContainsKey(wrapper.ObjectId))
deferredWrappers.Add(wrapper.ObjectId, wrapper);
}
/// <summary>
/// Load all the entities from the wrapper dictionary.
/// </summary>
public void LoadEntities()
{
Logger.Debug("Starting deffered action.\n"
+" deferring "+deferredWrappers.Count+" elements");
using (new SessionScope(FlushAction.Never))
{
//open a session
ISession session = ActiveRecordMediator
.GetSessionFactoryHolder()
.CreateSession(typeof (EntityType));
//load all the entities
IList entities =
session.CreateCriteria(typeof (EntityType))
.Add(Expression.In("Id", deferredWrappers.Keys))
.List();
//release the session
ActiveRecordMediator
.GetSessionFactoryHolder()
.ReleaseSession(session);


foreach (EntityType entity in entities)
{
//set the values
deferredWrappers[entity.Id].Entity = entity;
//remove all entities that were set
deferredWrappers.Remove(entity.Id);
}
//mark the end of the deffered action.
_isDeferred = false;
}
}
}


So basically we collect all the calls we get until the Dispatcher decides to invoke us.
Once the dispatcher has time to invoke us, we process all the calls in a single query and set the property for everyone that requested an update.
Once set, the property will fire a PropertyChange event, that will inform the UI that it needs to reload these properties.

So now all that remains is loading a list of ids from the database, and setting the collection in a DataGrid.