Tuesday, January 11, 2011

The Curious Case of Vanishing Data - Notes Document Locking Fail!

Today I checked my Facebook updates, and there were some 78 Farmville requests! With the prices of vegetables soaring as these are, farming’s got to be the next big thing. Onion prices have driven people crazy. Rumor has it that many housewives have pledged to their husbands an hour of silence every day for a month if they can get them onions. No kidding!

OK, enough of crying over cut onions and back to business. Here I am going to share with you a curious case where users reported that data went missing after they entered those in a particular Notes database. To make things worse, such cases were sporadic (making it difficult to pin-point any psycho serial killer) and the data were extremely important. So, enter the brave detective – Me, Moi, Myself!

Let me explain the scenario. (The actual names have been changed due to a very perceivable security threat and resemblance to any application or server in service or decommissioned is purely coincidental. Not that I give a heck, anyway.) Say, there is a Notes database called Matrix on a server named Reloaded, and its replica resides on its cluster server - Revolutions. There is a form Neo, with which the users create documents and continually add data to the existing documents. However, at times, and without any comprehensible reason, the data simply vanished from the documents! A considerable gloom and panic had settled among the users of Matrix. What was the reason, the motive? Was this a conspiracy? Were the aliens behind the data hijack?

My first suspect was simultaneous multiple edition. However, the form had a QueryModeChange event, which allowed only one person to edit the document at any time. So data getting lost due to multiple editions at same instant was not the answer. Sabotage was ruled out too. Only a few users had Editor right in the Matrix, and they knew better than to delete such important data. The needle of suspicion started moving towards Revolutions - the replica in the cluster sure had something to do with this? I tried recreating the crime scene - the error, without success. The replica in the cluster theory somehow just did not add up. What do I do now?

I did what any great and famous investigator would do to nail the guilty – I sat back, closed my eyes and thought long and hard about how to get through this month with my empty wallet the issue. And suddenly a spark! Yes, that’s it! Bull’s eye! A classic case of back-end and front-end not tying their loose ends and failing to make their ends meet!

Here is what was happening. Consider two users of Matrix – Tom and Harry (of Tom, Dick and Harry fame). One fine day, both decide to edit one particular Neo document. So both open that document on their respective screens (or UI’s, as you may like to call it). Tom, always the first to get anywhere, does a Ctrl+E immediately. Since nobody else is editing that document, QueryModeChange allows Tom to go into edit mode. And he furiously types in all the data that he so laboriously collected. But as Harry tried to edit the document, QueryModeChange tells him “Sorry dude, you’re a bit late!”

“What the heck!” says Harry and he leaves to grab a mug of coffee. However, he had kept the document open in his screen. By the time he returns, Tom is done with his work in that document, has closed the document and is whistling softly to a tune. Harry does a Ctrl+E in his already open document and voila! He is able to edit it now. But here lies the catch. The snap shot of the document that Harry has open in his screen does not have the changes done by Tom. So when he edits and saves the document, all the changes that Tom had made are gone - vanished into thin air!

With this theory, the error could be re-created as many times as one would like to. To solve this, the time the document was last modified was captured in QueryOpen of the form (Date1). This would give the last modified time of the document when Harry opens the document in the screen. In QueryModeChange, the last modified time of the document was looked up from the back-end using the document UNID (Date2). If Tom has modified the document since the time Harry opened it, Date1 will not be equal to Date2. With this simple check, Harry is advised that he needs to close the document and then open in again in order to edit it.

Hallelujah! Mystery solved. Guilty convicted and put behind bars. I flung my coat over my shoulders, grabbed my hat and walked out amidst thunderous applause and blinding camera flashes. Ha! Dark Comedy! Nothing happened. Life went back to normal. Nobody cared. It was BAU - Business As Usual. Sob! Sob!

Moral of the story – Just any Tom, Dick or Harry can mess up the Matrix big time, and it needs someone really smart and clever to put things back in order. And that however smart a person is, one cannot give full attention to work when the pockets are empty.

3 comments:

  1. Trust me this is a daring situation for any individual .. great going choudhary .. bring on more ...

    ReplyDelete
  2. This made for an interesting read. The way you described solving an issue like solving a murder mystery was really fresh and fascinating - Tim

    ReplyDelete