Message boards :
News :
new Applications using VirtualBox released
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Feb 10 Posts: 10 Credit: 45,332 RAC: 0 |
We just released new application versions for Windows, Mac and Linux of the cmsearch VM app that uses VirtualBox for checkpointing. Here is how you can participate:
|
Send message Joined: 9 Aug 10 Posts: 17 Credit: 7,442,398 RAC: 0 |
RNA World: A huge THANK YOU for implementing checkpointing support via the VirtualBox VM application. I have now re-enabled RNA World on my main machine, and am presently processing 2 of the VM tasks. So far so good. Although, it is a bit curious that both of the tasks say "98.765% Progress" constantly with no Progress % updating. I traced it to 2 possible sources: - slots\slot#\shared\progress.txt file that has as its contents: 0.98765 - slots\slot#\boinc_task_state.xml file that has: <fraction_done>0.987650</fraction_done> ... either way, it looks like it is hardcoded and not updating. Also, I see "Remaining (estimated)" decreasing for each second ran, and the total estimated times for my 2 VMs are radically different, presumably due to each one having different "Estimated computation size" values. Here is the status of my 2 VMs: - [1]: 13,724,158 GFLOPs size, 9.5 hours complete, 17.25 hours remaining, 98.765% progress. - [2]: 24,969,548 GFLOPs size, 3.5 hours complete, 31.75 hours remaining, 98.765% progress. So, my questions are: - How accurate are the "Remaining (estimated)" values for each VM? - Can anything be done to show actual Progress % in BOINC, with smooth updates to the % value? Thanks again for your work, I'm happy to be crunching for your project again! --Jacob Klein |
Send message Joined: 13 Feb 10 Posts: 10 Credit: 45,332 RAC: 0 |
So, my questions are: The "Remaining (estimated)" values are calculated using the FLOPS estimation done on the server and are not very accurate. The actual Progress % are calculated on each host using the same function as on the server. They should be smooth and get capped at 98,765% if they were too low. It seems that for the current long running tasks this is reached very quickly which indicates a bug in the calculation logic. This does not affect the actual scientific calculation it's just a cosmetic issue. I will look into this issue for the next version of the application (for which there is no ETA). |
Send message Joined: 9 Aug 10 Posts: 17 Credit: 7,442,398 RAC: 0 |
Okay, thank you. For your reference, my tasks still count down every second in "Remaining (estimated)", but then that value gets reset every once in a while to be the full value I think. For instance, I had posted that my VM statuses were: - [1]: 13,724,158 GFLOPs size, 9.5 hours complete, 17.25 hours remaining, 98.765% progress. - [2]: 24,969,548 GFLOPs size, 3.5 hours complete, 31.75 hours remaining, 98.765% progress. Right now (7.5 hours later), my VM statuses are: - [1]: 13,724,158 GFLOPs size, 17 hours complete, 17.25 hours remaining, 98.765% progress. - [2]: 24,969,548 GFLOPs size, 10.5 hours complete, 31.75 hours remaining, 98.765% progress. So, I believe this means that I currently have no means of knowing the real progress at all. :( For all I know, the VMs might be stuck in some sort of infinite loop (are they? What's a realistic completion time for these?) I hope you are able to significantly improve the indication of progress level, in the next version. Thanks, Jacob |
Send message Joined: 13 Feb 10 Posts: 10 Credit: 45,332 RAC: 0 |
So, I believe this means that I currently have no means of knowing the real progress at all. :( For all I know, the VMs might be stuck in some sort of infinite loop (are they? What's a realistic completion time for these?) That's right, nobody can know the real progress because the cmsearch algorithm is non-deterministic. I tried to design the control script that an infinite loop is not possible. What I may do in a future version is to let the progress "flicker" a little bit. Switching between 98,0 and 99,0% could indicate that the VM is still alive. |
Send message Joined: 9 Aug 10 Posts: 17 Credit: 7,442,398 RAC: 0 |
I downloaded a couple more RNA World VMs, and noticed that I get considerable Windows UI contention (where paints/refreshes are lagged/delayed) the more RNA World VMs I run at the same time. I reported the issue to Rom via the BOINC Alpha list. Anyway, I understand T4T limits to 1 VM per host. I was wondering if you guys might implement an RNA World project setting that, if set by the user, would limit to x per host. I'd prefer to make sure I don't get more than 2 of these non-deterministic VMs at a time. Regards, Jacob |
Send message Joined: 2 Oct 13 Posts: 2 Credit: 31,326 RAC: 0 |
This is issue has already been discussed in the german part of the RNA World forum. ChristianB told us that he can only set one fixed value server-sided for all users. You can use an app_config.xml file to limit the number of concurrent workunits. (see Berkeley link here: http://boinc.berkeley.edu/trac/wiki/ClientAppConfig). Just create this file with your favourite texteditor and drop it into the project data directory, which would be \ProgramData\BOINC\projects\www.rnaworld.de_rnaworld\ for a typical Windows pc. Content: <app_config> <app> <name>cmsearch3</name> <max_concurrent>1</max_concurrent> </app> </app_config> Notes:
|
Send message Joined: 9 Aug 10 Posts: 17 Credit: 7,442,398 RAC: 0 |
Yup, I've implemented the app_config.xml file, thanks. However, I worked with Rom Walton and David Anderson, and we think we have discovered the cause and fix of my problem. They have supposedly fixed it, but may or may not include it in the upcoming public release. The issue is that, when calculating how much memory each RNA World VM is using, BOINC used Windows measurements, and came up with ~250 MB, even though the VM "Base Memory" size was 4 GB. The docs (link below) indicate that the 4 GB has to be available as free RAM, and it gets fully-reserved for the client VM, per "Base Memory" documentation in link 2 below. What made matters worse was that Task Manager and Process Explorer both do not show how much host memory a VM has consumed, as evidenced in link 3 below. I happened to have multiple RNA World VMs running on my 12 GB machine. When I brought in the 4th VM, the Windows UI started becoming erratically non-responsive. Most likely, Windows was paging memory in/out, to give the VMs all 12 GB of my memory. Task manager and Process Explorer were not reporting it as being used, however the UI responsiveness felt terrible at times. See: http://superuser.com/questions/66842/how-does-virtualboxs-memory-usage-work http://www.virtualbox.org/manual/ch03.html#settings-system ("Base Memory" section) http://forum.sysinternals.com/pe-is-not-showing-all-memory-used-by-virtualbox_topic23886.html So, David Anderson checked in some code that will, when calculating "running memory" for a VM task, instead of querying Windows on memory usage for processes, will now instead use the <rsc_memory_bound> value as "how much is this VM task using". That way, if I have BOINC set up to use no more than 60% of my 12 GB (which is 7.2 GB), BOINC will not allow a 2nd RNA World VM to launch (since each one now will "constantly consume 4 GB" according to how BOINC calculates running memory for those VM tasks.) I hope that makes sense. I think we solved that client BOINC bug, but I'm not sure if it will be included in the upcoming public release or not. Thanks, Jacob |
Send message Joined: 25 May 09 Posts: 155 Credit: 4,855,406 RAC: 0 |
I think we solved that client BOINC bug, but I'm not sure if it will be included in the upcoming public release or not. For our project, that fix is an essential issue. Please try to make this clear to David and thank you for your comments. Michael. Rechenkraft.net e.V. - Verein zur Foerderung von Bildung, Forschung und Wissenschaft durch Einsatz vernetzter Computer. |
Send message Joined: 9 Aug 10 Posts: 17 Credit: 7,442,398 RAC: 0 |
I passed on your message, and will try to reply back here when I know more. There are actually 2 workarounds for this problem: 1: Use an app_config.xml file and set max_concurrent to a reasonable value. However, this can lead to a work fetch issue that results in an idle CPU. 2: If you only want to run x RNA World VMs, allow work fetch to get x+1 tasks (perhaps by temporarily suspending other projects), and then suspend all but x (and resume all the projects). When an RNA World VM task completes, repeat. Sure, option 2 is micromanaging a bit, but I'll be using it for the time being, as a workaround that also ensures my CPUs are all kept busy. Regards, Jacob |
Send message Joined: 9 Aug 10 Posts: 17 Credit: 7,442,398 RAC: 0 |
David and Rom have decided to include the fix, and we are doing additional testing on it. So far, my testing indicates that 7.2.24+ does indeed use the "4 GB" value when considering memory, and thus won't overcommit beyond the user's specified memory limit. We haven't publicly released yet, but 7.2.24+ fixes the problem. And so now, I don't have to use either of the 2 workarounds. Regards, Jacob |
Send message Joined: 9 Aug 10 Posts: 17 Credit: 7,442,398 RAC: 0 |
BOINC v7.2.28 has been released to the public today, and is now the recommended version. I'm quite excited about it! Note: It DID include the VM-task "memory accounting" fix, where BOINC now uses <rsc_memory_bound> when determining how much memory a VirtualBox VM task is consuming. So, it considers 4GB of memory allocated, for each RNA World VM task. Oh, and the task I was working on? It's still processing, 182 hours [~1 week] done, with an 8-week estimate on a comparable system :) Still crossing my fingers for no errors! Good luck! Jacob |
Send message Joined: 2 Oct 13 Posts: 2 Credit: 31,326 RAC: 0 |
Thanks for the info. I was using 7.2.24 since end of October on one PC and it showed some odd behaviour when tasks were in high priority, as it used not all of the available threads. With RNA World you can easily get into high priority when workunits are extended server-sided but the client BM still has the 'old' due date. Updated today to 7.2.28, let's see how it works. :) Btw, I just noticed there are new VM apps available since yesterday (I guess they include the new wrapper?). However, apparently Christian has not switched workunit creation back on yet. Me has 50 days of unfinished VM tasks currently plus one finished and credited. :) Good luck everybody! |