In preparation to upgrade our Domino environment to Release 9.0.x, we determined it would be a great idea to know how many Notes IDs we were missing from our ID Vault. A little Google searching, I discovered that two people at IBM created an application, ID Vault Database Scanner.
After some fitful starts to getting it to run, this past weekend I scheduled it to start at 1AM and run for 20 hours (I adjusted Agent Manager to allow agents to run for 1200 minutes). I felt that would be more than adequate time to loop through 22,000 Person Documents and the ID Vault.
After 20 hours, the agent stopped and had looped through . . . 1,598 Person Documents.
If my math is good, that is 1.33 Person Documents per minute. And, extrapolating from that data point, it will take, by my estimation, 502 hours (or 21 days) to completely scan both the Domino Directory and the ID Vault.
That is a lot of time for what I see as a “simple” task.
While it could be asked of me to run the application locally, I have tried that to no avail. In fact, within the documentation of the application, it does not allow for the application to be run locally. And, even if it did, I don’t think that tasking one workstation for 20 days on one task is a good use of an asset.
If you are a developer and you take a look at the code on the website, is there anything you see that can be done more efficiently? Something that will allow this application to complete within 20 hours? Yes, I did read the comments where it was mentioned that this application is slow. However, I didn’t think it would be this slow. I was wrong.
If you know of an application that can compare Person Documents with entries in the ID Vault and return a list of missing Notes IDs, I would love to hear about it.
20 thoughts on “Not My Code: Advice on Speeding Up a Domino Application”
I think the issue is in the code in Listing 3. It has a loop inside a loop which both scan collections/views. There is some halfhearted attempt to terminate the inner loop when a match is found, but it’s borked.
“… but it’s borked.” That’s some really developer-centric wording there, isn’t it? Thanks for the laugh, I appreciated it. I did write to the authors of the database, but I think I have a better shot of becoming Queen of England than hearing from them.
I have a similar app which runs quite fast in large environments. I can send it to you if you give me your mail address, then you can give it a try…
Thanks for the very kind offer, Stephan. I have a similar app, provided to me by another developer, that looks like it will do what the IBM application cannot do (fast comparisons). I will be testing this morning and expect that it will be done within 30 minutes. But if it doesn’t work, I will be contacting you. Thank you.
Very quick and dirty solution: In listing 3 this line should be outside of the while loop
set curdb2view2entrycollection2 = curdb2view2.allentries
My first thoughts while examining the code: ‘Welcome to the asylum’
prompted by finding several GoTo to a label on the next line…
please allow for some time, as I will need to recover…
If the view localtempidvaultdocuments is sorted on the fldidowner field the entire inner loop in listing 3 can be replaced with a GetEntryByKey, which should be even faster than terminating the inner loop once a match is found. If the view isn’t sorted as required you could just create one that is.
“Shankar Venkatachalam is a Software Engineer at IBM’s India Software Labs. He works on crash, core, and performance issues for Notes/Domino servers and more recently has been working with the IBM SmartCloud Notes L3/Development team. You can reach him at firstname.lastname@example.org.
Ranjit S Rai is a Software Advisory Team Engineer and a member of the APAC Software Advisory Team (APAC SWAT) for Lotus Domino. He has worked with IBM and Lotus products for more than 10 years. You can reach him at email@example.com.”
I. can’t. even…
Let me know if you’d like me to remove case-sensitivity as well, since it really shouldn’t be case-sensitive.
P.S: I don’t have a vaulted server on which to test this, but it should have minimal impact to any system since it doesn’t’ do writes to track the vaulted user names.
Sent you a much improved version via your addres which I see on linkedIn. Runtime with approx 2000 person records on the server NAB & Vault, and this db local, 15 seconds. It spits out as much result docs as needed. (the original would have failed if the single text field would have grown too much). Since the original created a lot of temp docs and then deleted them, I decided to forego that step altogether and use lists of custom objects as Bill Buchan has taught like forever. Would love to hear how it performs in your environment. Ours is puny next to yours.
One caveat: it currently reads in the Active users from the IDVault, and then steps through the People view in the NAB and reports differences.
So it might report misses where the ID is available, only marked inactive. Just say the word, and I’ll whip up a new version.
New version sent, now compares to both active and inactive users in the ID Vault. So you know which IDs are just marked inactive instead of missing…
Shaved some time, now reports are done in 9 seconds.
Sounds like Lars and I made similar modifications to the approach. 🙂
Hi Nathan! Did you build lists too instead of the unholy creating and deleting documents? And the requirement of the db being located on the server is bogus too, just remove the leading / from the vault path and away we go…
What an utterly bad example it is, rife with bad programming, oversights and faults. Yeuch. Niklas must have had a very bad day when he okayed this into OpenNTF. It should be purged, replaced by either of our versions.
So there’s a variety of problems with the original code, but I’d like to apply a little math to them.
The strategy taken by the authors is to 1) walk the contents of the ID vaults and create a local document for each one contain the name on the ID; 2) Then walk all the Documents in the People view in the NAB; 3) For each Document in the NAB, walk all the documents created locally to see if there’s a match; 4) if there is a match, note that there’s a match but continue walking the view until you reach the end; 5) if there is no match and you’ve reach the end of the view, append the name to the list and save.
Now, in an environment with 22000 people in the NAB, let’s say you have 21500 in the vault. So in order to find the 500 that aren’t vaulted, you first have to create 21500 local documents, then iterate over those 21500 documents 22000 times, for a total of 22000 document writes and 4730435000 document reads, along with 500 refreshes of a view containing 21500 documents.
Opening half-a-billion documents and refreshing a view 500 times? Yeah, not really surprised that this had an unholy runtime.
Grossly irresponsible.It’s really astounding how often experienced and accredited IBMers will deliver embarrassing results. Sometimes I wonder how they make it through the day.
Time for QA at OpenNTF to step in.
Updated the original article in Domino Wiki.
Burning curiosity: how quick is it with 22.000 person docs?
Lars: Started the agent at 0914 this morning. Let it run for 30 minutes and when it didn’t seem to have stopped, broke out of the agent. In that time, the database had 17,258 documents, which all look the same to me. The header of an example document showed:
Found 19678 active IDs, 2 inactive IDs, 19680 IDs with 20336 names
Found 22832 person documents, of which 19469 have Active IDs, 0 have inactive IDs, 3363 have no record in the IDVault
People from the NAB who have an inactive ID in the IDVault:
Looking at several of the documents in the database, it appears that the last Person Document it gathered was one whose last name begins with “Fe.”
I like that it found 3363 documents, I was expecting around 2000, but it didn’t finish all of the documents in 30 minutes (maybe I am being too impatient?)
found and fixed, DominoWiki article updated with version 1.4.
Bonus version: 1.5, only outputs one result doc per run, using rich text fields for output