Saturday, October 24, 2009

Pfz.Caching - ViewIds instead of ViewStates

This is a framework for caching serializable objects in general. It has the Cache generic class that caches any object in memory/disk. If it is no more in memory, it is read from disk.
It also has a CacheDictionary that does the same, but has some optimization for small data, avoiding many "buffer files" to be generated.
And, finally, the best of all: Adding the App_Browsers in your project, you can make all viewstates use this technology, so only a small ViewId is sent to the client instead of the full viewstate. Also, it deletes unused files after some time and reutilizes identical files, avoiding wasting HD space.

I didn't really face many problems when working with the web. I started to program when computers had only 1MB or 2MB. But, I see that people in general are a bit lost on when to store the data. Documentation on ViewState and Sessions is also confusing, they look interchangeable, when they are not.
I first created a caching technology for paginating recordsets.
Then, I created the cache technology for any data.
And, finally, the module to do it for the ViewStates. And it is really great, and working for over a year without problems.

How It Works (Basic)
The implementation can be very complex, as the code is thread-safe and you need to manage garbage collection properly, but the principle is very simple:
  1. Every object "buffer" is serialized and a hashcode (checksum) of that buffer is created. If a directory for that buffer exists, I look if there is some buffer with the same length and, if there is, if it is identical, so avoiding to create a new one, or I create a new unique id, save the file and return the id and hashcode of that buffer. This is shared among all sessions, but as the data is read-only, there is no problem.
  2. The item 1 is the BufferId, not the ViewId. Now, with the bufferId, I try to find if a ViewState information in memory for the actual session which contains that bufferId. If none contains, a new ViewId is generated. If one exists, the Id of the old one is returned. Obviously, every time I reutilize a file, I do a keep-alive (in file, updating the date/time of the file, and in memory, calling GCUtils.KeepAlive).
  3. Every buffer is also kept in memory with weak-references, but to avoid recent buffers to be collected, I call GCUtils.KeepAlive. The GCUtils.KeepAlive (not GC.KeepAlive) guarantees that an object will survive the next collection.
  4. The ViewIds are created for the sessionId, so there is no problem of one user getting the viewstate of the other, even if the internal buffer is the same (in which case, every user will have different ViewIds, but they will point to the same buffer). Also, every page generates a NEW viewstate, and so I try to get the Id of an existing viewstate if possible, or I create a new file.
  5. By default, FileCachePersister runs a process at 30 minutes, deleting files with more than 4 hours. This has nothing to do with session-expiration times.
How It Works (Advanced)
At this moment, I will not explain the details of the CacheDictionary (which is effectively a dictionary of Cache objects, but with some optimizations) and also not explain the WeakDictionary, because its explanation itself would be bigger than this entire article. But the important thing is to know that it is a dictionary that allows its items to be collected by the garbage collector, but keeps its recently used values alive.
I will start explaining the
CacheManager. The CacheManager class is responsible for loading and saving the buffers (bytes), as it does not know anything of the real type of the object. The most important thing is that it has is a WeakDictionary where the keys are the HashCodes of the buffers, and the values are Dictionaries of the Ids and the serialized bytes. Its internal functions try to find a value in memory using the hashcode and the buffer id and, if none is not found, ask for the persister to load it and then store the loaded value (if any) in these weak-dictionaries, doing a KeepAlive on them. The Save function does a similar process, trying to find a compatible buffer in memory, to reutilize the Id or, if one is not found, call the persister to save and return the generated Id.
The Cache generic class is the one with the capacity to serialize and deserialize buffers. It uses the CacheManager to read or write these buffers, but the Cache itself has its own WeakReference for the effective object. This is done because, when deserializing a cache object, only the id of the object is needed, not the real object. If too many "identical" objects are put in cache, all the cache objects can have their effective objects collected, but maybe the buffer to regenerate them is still in memory. It can look redundant, but in my experience, it is not.
Well, so, your create a
Cache for an object. The cache serializes the object and calls CacheManager, which will try to reutilize the id of some identical buffer or will ask for it to be saved... but, where will it be saved?
That's the job of the persister. Within the framework, the only Persister that exists and is already useable is the FileCachePersister. It simple receives the parameters and tries to load the file, if one exists, or returns null. It receives the name of the file and tries to update its date/time to keep it alive, or saves the file. This is done in such a way so that you can easily create a Persister to store data into the database, or use a remote server, which can have its own caching also, avoiding concurrent processes to access the same files at the same time. This is important, as the FileCachePersister works very well with many threads, but only ONE process must be using the directory.
Ok, in the
FileCachePersister there is a thread to delete the old files, but that's nothing really complex.

The ViewState
The ViewState solution is very similar to the Cache solution, but it also has additional security information. The class responsible for loading and saving ViewStates is the PfzPageStatePersister. Similar to the CacheManager class, it has a dictionary composed of SessionIds, so the ViewIds are exclusives to the actual session, then the values are dictionaries of ViewIds, and the values of that dictionary are the type of the page that generated the ViewState (so copying the ViewId to another page will not work) and the effective information of the ViewState.
Or, better, a cache to such value. Why? Because if you go from one page to another, generating identical viewstates, only a new "reference" to the buffer will be generated, but the buffer, which can be very large, is the same. It looks a little more complicated, as it has a Pair, but that's because of the way
PageStatesPersisters works in general, as they only generate two objects, which the only purpose to be serialized. Not very friendly to be honest.
But, then, the idea is the same:
Look if the
ViewState is in memory. If it is not, ask for the Persister to load it.
When saving, search one identical
ViewState in memory, or create a new one, calling the persister to save it.

Using the Code
The CacheManager class is where it all starts. And, if you use the framework only to save viewstates in files, is where it ends. In the Global.asax, put the following (or something similar):

See full details: