1)
Message boards :
News :
СmDock "long" and "short" tasks applications
(Message 1932)
Posted 23 Jan 2023 by marmot Post: SiDock has a beta test server that one of my machines is still waiting on tasks from. Trying to clarify. So the WU that went on for 3-4 days, and maybe hung, were sent intentionally and the bug with the application you were trying to uncover was found by the results returned by the BOINC community running SiDock last week? And the beta server is down and you only beta test in house now then send the new apps straight out? |
2)
Message boards :
News :
СmDock "long" and "short" tasks applications
(Message 1931)
Posted 23 Jan 2023 by marmot Post: After special "restart test" under Ubuntu, I did the same test for Windows 10 + BOINC 7.16.11. Before ~ 1 hour of tasks completion I restart a VM with Windows. First task, for workunit 49655115 is complete. And as you see, CPU time does not lost: With these results, could you please comment on my post about checkpointing and some WU that still refuse to checkpoint here: https://www.sidock.si/sidock/forum_thread.php?id=231 |
3)
Message boards :
Number crunching :
checkpointing
(Message 1930)
Posted 23 Jan 2023 by marmot Post: Most of the newer Wu's are checkpointing within 10 minutes. Some of them are still checkpointing once when first entering RAM then never again. This one Workunit 49653004 has been running 2d 19h on my 2700X with 3d 17h reported to go and only a checkpoint at the 1st second. (going to abort it since restarting the client could lose credit on several more WU's) The WU reports itself as it's 1st time sent to a BOINCer, not a resend. So are these from the data set before https://www.sidock.si/sidock/forum_thread.php?id=225&postid=1913#1913 where some app routines were dropped? Will these WU that refuse to checkpoint be exhausted or deprecated soon? EDIT: (If I look through the WU by time received the answer may emerge. All the ones sent to a Xeon server at 22 Jan 2023, 6:25:11 UTC refuse to checkpoint. Ones sent later to my Intel laptops and AMDs Desktops are almost all checkpointing.) |
4)
Message boards :
News :
СmDock "long" and "short" tasks applications
(Message 1929)
Posted 23 Jan 2023 by marmot Post: Folks, It's some improvement. If the tasks report they are checkpointing every 10 min or less then those are the 'good' WU. The ones that only create a checkpoint at the 1st second, and never again, are risky and will lose you credit if you restart to get them moving again. Not sure how many more of these non-checkpointing WU are to come (I'm posting a question about it in Crunching forum). You could abort every WU that refuses to checkpoint within 11 minutes after they start? They still have a high chance of completing as long as you do not pause the client (my electric company has peak hours of 9x pricing to avoid). I'm going to abort 1 of about 40 received on my 2700X in the last day. |
5)
Message boards :
Number crunching :
how long can "long tasks" be?
(Message 1919)
Posted 21 Jan 2023 by marmot Post:
I reset BOINC.exe controlling that last, above, WU and it completed in minutes and here is the result: 77561932 44887 15 Jan 2023, 18:36:22 UTC 21 Jan 2023, 1:48:59 UTC Completed and validated 309,214.12 298,826.10 1,129.15 CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 At least it's not 10 credit; not what you'd expect for 309k sec but over 1k. Thanks for the above strategy to investigate run time and checkpoint in properties I'm going to restart all the remaining BONC clients with 2d+ CPU time WUs and see what happens. |
6)
Message boards :
Number crunching :
how long can "long tasks" be?
(Message 1915)
Posted 21 Jan 2023 by marmot Post: But if you see tasks with estimation like "days" you can check it properties in BOINC Manager: "CPU time at last checkpoint" and "CPU time". If it values very differs (for 1 hour, for example) - task is hung and need a restart. My one task is 4 days and 5 hours run time, at 99.4% complete, and stalled. If I follow your direction then it will get credit for the last 30 minutes, according to Mad_Max. Actually, all of the newer WU's (Feb 3rd deadline) that are progressing have: CPU time at last checkpoint: of under 30 minutes. The WU above, at 99.4% complete shows 1d 5 hours since last checkpoint. So it seems that if the checkpoints are over 60 minutes ago then the task is hung. Or is that because these new WU are checkpointing and the old ones weren't and so the run time should equal the checkpoint time? . |
7)
Message boards :
News :
СmDock "long" and "short" tasks applications
(Message 1914)
Posted 21 Jan 2023 by marmot Post: I suggest you move on to a different project (like I and others) or simply stop until the issues are resolved. It's not worth getting bent out of shape over. I have WU's still in progress, that I'm spending an hour or more per day baby sitting, and want to keep working on this project. They need to hear how much time we are spending on managing their project WU's. We're not directly being paid for this work; but they are. COVID research is highly paid research and the vaccine companies have made record profits the last 2 years. I already watch top contributors leaving this project and I will shortly follow when there is no improvement.Greger, top contributor of our team, was 230k RAC has pulled out. To put this in perspective; SiDock has a beta test server that one of my machines is still waiting on tasks from. They have beta test capability, with volunteers willing to accept the risks of beta WU's, and it could have been used to prevent what happened last week. |
8)
Message boards :
Number crunching :
how long can "long tasks" be?
(Message 1903)
Posted 20 Jan 2023 by marmot Post: but slower PCs estimate up to 9 days run time. My one machine had 20 WU over 7days left and the time left kept advancing 5 seconds every second. Pausing/unpausing didn't help. I gave up on them as they appeared to be hung. Anyway, might as well d/l fresh WU's as they will have more realistic deadlines and the credit gained form a 9 day task won't be worth it. |
9)
Message boards :
Number crunching :
how long can "long tasks" be?
(Message 1902)
Posted 20 Jan 2023 by marmot Post: My KabyLake 14nm Laptop shows 3d 4 hours left to complete which I'm not sure it can make by Jan 21. That laptop was running an Einstein Intel GU task. It lowered the CPU effective frequencies from 2400 to 1300 (which is way more severe a drain than I realized). Stopped all new Einstein work and all but 2 are going to make the deadline wish I'd known about the ability to ask for deadline extensions: would be very kind, if you can also extend the deadline for this 2 WUs I could use another 2 days to complete the rest of these, unaborted tasks, on the one machine. I'm user 279. Thankyou |
10)
Message boards :
Number crunching :
Tasks hanging -
(Message 1901)
Posted 20 Jan 2023 by marmot Post: 77522270 49580149 15 Jan 2023, 3:32:33 UTC 17 Jan 2023, 22:39:26 UTC Completed and validated 241,613.00 3,915,919.00 1,647.69 CurieMarieDock 0.2.0 long tasks v2.00Mistake can occurs on different stages - machine, sending, processing on server. Usually known anomalies relates to Windows hosts. Maybe a some influence of antivirus | defenders | e.t.c takes a place. This machine is dedicated to BOINC 20/7 and any service or 3rd party app that can drain resources is disabled. No anti-virus, no task schedules, no workstation, no local DNS server, only basic IP 4 packeting. The 3rd party task scheduler is just been added yesterday and couldn't be the cause. I am very careful to examine all new projects WU's and have never seen this kind of run time reporting. If the machines local clock was off 5 minutes on a WU that ran only 5 minutes, maybe the negative time reporting would be interpreted as 3,915,919.00 sec but the local clock isn't off by 235k seconds... |
11)
Message boards :
News :
СmDock "long" and "short" tasks applications
(Message 1900)
Posted 20 Jan 2023 by marmot Post: You all could make management of these huge jobs easier if you multi-threaded the problem and set run limits. Look at Amicable Numbers user settings. We get to choose number of threads and run time length. I've spent about 12 hours over the last 3 days baby sitting these WU's. And I foresee more management hours to come. |
12)
Message boards :
News :
СmDock "long" and "short" tasks applications
(Message 1899)
Posted 20 Jan 2023 by marmot Post: This has been a bad SiDock day for me. 31 tasks completed successfully across 9 computers. 27 WU of 32 on the daytime BOINC installation on the one server that is running dual BOINC (10 hours morning/ 10 Hours night for avoiding peak electric rate hours) are increasing expectation time 3 seconds every second. I paused them but they are set to NOT leave RAM because that loses credit. They have 35+ hours or runtime already and looking at another 50+ hours (some said 5-9 days left) and can't beat their deadline so I aborted them. The nighttime BOINC install appears to be working but another 5 appear unable to be able to meet the deadline. *Why did 29 of 32 SiDock WU stop advancing on a machine that pauses 2x a day for 2 hours each period?* I am switching to a single BOINC install but it will still need to pause 2x a day with a cron job boinccmd --set_run_mode never 7320 So I need assurances that these WU won't keep stalling out because they paused. Also, still getting a few ending in error after very long runs. So today I'm at 31 success and 23 failures. 42% failure rate is abominable! 77279883 49343723 44965 11 Jan 2023, 23:44:08 UTC 18 Jan 2023, 16:16:43 UTC Aborted 493391.9 486944.8 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77337873 49399835 44888 12 Jan 2023, 19:26:44 UTC 16 Jan 2023, 17:47:08 UTC Error while computing 13412.96 13299.98 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77337845 49399809 44888 12 Jan 2023, 19:26:44 UTC 16 Jan 2023, 17:16:13 UTC Error while computing 11594.1 11437.84 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77398977 49459298 44888 13 Jan 2023, 15:36:16 UTC 16 Jan 2023, 16:29:53 UTC Error while computing 8851.1 8775.33 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77398985 49459314 44888 13 Jan 2023, 15:36:16 UTC 16 Jan 2023, 16:53:02 UTC Error while computing 10180.91 10069.2 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77399065 49459325 44888 13 Jan 2023, 15:38:10 UTC 16 Jan 2023, 17:00:55 UTC Error while computing 10682.11 10552.55 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77440386 49499316 44898 14 Jan 2023, 3:44:08 UTC 19 Jan 2023, 2:10:12 UTC Aborted 180865.64 161821.8 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77440796 49499732 44900 14 Jan 2023, 3:50:29 UTC 15 Jan 2023, 12:52:36 UTC Error while computing 238.91 209.97 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77446164 49505088 44965 14 Jan 2023, 5:27:09 UTC 18 Jan 2023, 16:16:43 UTC Aborted 331879.66 327318.9 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77557978 49611032 44887 15 Jan 2023, 16:19:56 UTC 20 Jan 2023, 13:30:06 UTC Aborted 278329.64 268281.6 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77558574 49611655 44886 15 Jan 2023, 16:39:03 UTC 17 Jan 2023, 14:04:34 UTC Error while computing 91007.37 80447.98 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77561448 49614522 44886 15 Jan 2023, 18:16:18 UTC 18 Jan 2023, 2:09:25 UTC Error while computing 95553.32 85968.58 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77561513 49614582 44886 15 Jan 2023, 18:17:09 UTC 20 Jan 2023, 2:23:25 UTC Aborted 140384.21 122982.7 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77575916 49628894 44888 16 Jan 2023, 14:00:54 UTC 20 Jan 2023, 14:54:31 UTC Aborted 160172.45 70500.13 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77575917 49628899 44888 16 Jan 2023, 14:00:54 UTC 20 Jan 2023, 14:54:31 UTC Aborted 164651.93 74874.86 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77575927 49628909 44888 16 Jan 2023, 14:00:55 UTC 17 Jan 2023, 18:37:29 UTC Error while computing 4871.7 4784.16 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77575864 49628849 44888 16 Jan 2023, 14:01:56 UTC 20 Jan 2023, 14:46:51 UTC Aborted 140432.11 35749.41 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77576808 49629786 44888 16 Jan 2023, 16:29:53 UTC 20 Jan 2023, 14:54:31 UTC Aborted 142096.72 37286.89 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77576812 49629790 44888 16 Jan 2023, 16:29:53 UTC 20 Jan 2023, 14:54:31 UTC Aborted 152900.31 73801.61 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77576971 49629949 44888 16 Jan 2023, 16:53:02 UTC 20 Jan 2023, 14:54:31 UTC Aborted 151144.96 70712.69 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77576972 49629952 44888 16 Jan 2023, 16:53:02 UTC 20 Jan 2023, 14:46:51 UTC Aborted 152046.29 71703.56 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77576991 49629969 44888 16 Jan 2023, 17:00:55 UTC 20 Jan 2023, 14:54:31 UTC Aborted 148436.82 68058.72 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77576993 49629971 44888 16 Jan 2023, 17:00:55 UTC 20 Jan 2023, 14:46:51 UTC Aborted 149894.3 45036 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77577026 49630004 44888 16 Jan 2023, 17:16:13 UTC 20 Jan 2023, 14:54:31 UTC Aborted 146550.13 66754.64 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77577052 49630030 44888 16 Jan 2023, 17:16:13 UTC 20 Jan 2023, 14:54:31 UTC Aborted 144242.85 65149.5 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77577165 49630145 44888 16 Jan 2023, 17:39:47 UTC 20 Jan 2023, 14:54:31 UTC Aborted 146132.22 65806.09 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77577227 49630209 44888 16 Jan 2023, 17:47:08 UTC 20 Jan 2023, 14:54:31 UTC Aborted 145234.13 64319.44 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77577289 49630271 44888 16 Jan 2023, 17:55:21 UTC 20 Jan 2023, 14:55:11 UTC Aborted 144869.21 64534.8 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77577296 49630270 44888 16 Jan 2023, 17:55:21 UTC 20 Jan 2023, 14:55:11 UTC Aborted 144408.69 63322.3 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77577308 49630289 44888 16 Jan 2023, 17:58:26 UTC 20 Jan 2023, 14:54:31 UTC Aborted 144100.1 62643.67 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77577263 49630238 44888 16 Jan 2023, 17:59:20 UTC 20 Jan 2023, 14:54:31 UTC Aborted 144041.11 63361.22 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77577321 49630296 44888 16 Jan 2023, 18:00:14 UTC 20 Jan 2023, 14:54:31 UTC Aborted 143210.83 63137.41 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77585195 49638144 13277 17 Jan 2023, 17:47:15 UTC 19 Jan 2023, 2:09:39 UTC Aborted 10668.42 10534.66 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77585527 49638492 44888 17 Jan 2023, 18:37:29 UTC 20 Jan 2023, 14:54:31 UTC Aborted 142162.91 52501.22 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77580116 49633069 44893 17 Jan 2023, 3:07:02 UTC 20 Jan 2023, 14:29:05 UTC Aborted 192815.89 179682.8 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77580105 49633104 44895 17 Jan 2023, 3:09:24 UTC 20 Jan 2023, 14:29:18 UTC Aborted 185638.03 180320.3 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77581948 49634901 13277 17 Jan 2023, 8:53:04 UTC 19 Jan 2023, 2:09:39 UTC Aborted 28511.6 26879.02 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77588447 49641410 44886 18 Jan 2023, 4:06:39 UTC 20 Jan 2023, 4:28:35 UTC Error while computing 128506.93 112809.9 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 |
13)
Message boards :
News :
СmDock "long" and "short" tasks applications
(Message 1873)
Posted 18 Jan 2023 by marmot Post: Look at this, this machine ran the hottest temperature LLR SRBase WU's for weeks without a single failure. 77561448 49614522 44886 15 Jan 2023, 18:16:18 UTC 18 Jan 2023, 2:09:25 UTC Error while computing 95,553.32 85,968.58 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 77558574 49611655 44886 15 Jan 2023, 16:39:03 UTC 17 Jan 2023, 14:04:34 UTC Error while computing 91,007.37 80,447.98 --- CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 That's an entire wasted day for each of those cores. Some of these WU are going to run 2-3 days ending in errors without any hardware cause? This is not acceptable. And the results are being purged way too quickly so we can not evaluate the run results and find the issues. 6 tasks all report less than 5% complete, been running for 12-15 hours and somehow they are going to complete in under 3 days (according to BOINC which is using past WU run time data)? 15 hours for 5% calculates to 12.5 more days to completion and the one is at 1.67% after 13 hours time to complete would be 33 days. And the failure rate was 19% per day on my 5 machines. Maybe this is the results of the percentages not being accurate from a BOINC restart as Mad Max found; but Mad Max reported the percentages displayed corrected themselves from checkpointing. I'm concerned. |
14)
Message boards :
Number crunching :
Tasks hanging -
(Message 1871)
Posted 18 Jan 2023 by marmot Post:
Did you see any results reporting 3,915,919 seconds? (Oh no! All my valid results have been purged! There was another and I was trying to check if it was an identical 3,915,919. It was over 3.9 million seconds). So, I have to go and edit all my machines BOINC settings to retain apps in RAM.... :sigh:
Agreed and I made that point several times on several messages |
15)
Message boards :
News :
СmDock "long" and "short" tasks applications
(Message 1868)
Posted 18 Jan 2023 by marmot Post: This new app looses CPU/elapsed time stats if restarted (full restarts,without leaving in memory). And so loose points/credits as well. So we ae in a catch 22. Hoar Frost says we need to restart the stuck tasks to get them to work but if we do we lose all the earned credit so far. I have to pause my BOINC from 6-8am and 6-8pm every weekday because the electric company charges 9x normal rates during those periods. The SIDock WU have been removed from RAM 2x per day. Have had 15 WU fail and 7 not validate of 113 total: 19.5% failure rate. This WU are not ready for prime time... Too many unresolved issues. Use the Sidock test server for this and let the issues get worked out by people who know it's a beta test. |
16)
Message boards :
Number crunching :
how long can "long tasks" be?
(Message 1867)
Posted 18 Jan 2023 by marmot Post: All of my tasks are this long; which Bryn Mawr said was the intent. My KabyLake 14nm Laptop shows 3d 4 hours left to complete which I'm not sure it can make by Jan 21. The electric company put 4 hours of high cost peak periods that I have to pause the machines for daily. BOINC does not support 2 pauses per day. Only through task scheduler can boinccmd be used to dual pause boinc. The laptop is low power so I pause it for one period but the servers got a dual BOINC install and so half the SiDock long run for 10 hours in daylight and the other half for 10 hours at night. Not sure any of those can complete by the 21st running only 10 hours a day. They are server Xeons, but 4th gen and older. They easily completed even the longest SRBase the last month on the 10 hour/10 hour dual plan. SRBase provided us a multithread option we can setup in a app_config.xml to assure we'd meet the deadlines. Is multithreaded planned here? |
17)
Message boards :
Number crunching :
can this result be correct?
(Message 1865)
Posted 18 Jan 2023 by marmot Post: There were 2 results with that impossible run time of over 45 days within 1 day on a single thread. Also, there were 15 that ended in error states yesterday. 7 that couldn't validate. Given that 91 completed, that's a 19.5% failure rate. |
18)
Message boards :
Number crunching :
Tasks hanging -
(Message 1862)
Posted 18 Jan 2023 by marmot Post: I haven't looked at the logs but most all my WU's are showing 2d+ left till completion and the returned credit at Free-DC for this project has taken a sharp nosedive today implying it's a systemic problem in the WU's. Except for two fake results like this one from my machines: 77522270 49580149 15 Jan 2023, 3:32:33 UTC 17 Jan 2023, 22:39:26 UTC Completed and validated 241,613.00 3,915,919.00 1,647.69 CurieMarieDock 0.2.0 long tasks v2.00 windows_x86_64 This reported runtime is IMPOSSIBLE because it was running on a single thread and the machine returned 48 other WU in the same time period.. Something is wrong with these work units. There were 15 that ended in error states yesterday. 7 that couldn't validate. Given that 91 completed, that's a 19.5% failure rate. Also, the deadline is too close. Our local electric company forced new rate programs and meters on us. 6-8am and 6-8pm are 31 cents per kwh the rest of the day is 4 cents. BOINC doesn't support 2 pause periods so moved to dual installs. One runs 10 hours in the day the other 10 hours at night. These new peak/off-peak programs are a paradigm shift in USA electric power companies; so others crunching BOINC will have to face this soon 8th gen laptop should be able to complete one of these before a deadline but with 4 hours lost per day to the rate plan; and it showing 3 days 4 hours till a Jan 21 deadline, it looks unlikely to finish.. We'll need longer deadlines or a switch to multi-thread these WU's. |
19)
Message boards :
Number crunching :
Tasks hanging -
(Message 1859)
Posted 18 Jan 2023 by marmot Post: I haven't looked at the logs but most all my WU's are showing 2d+ left till completion and the returned credit at Free-DC for this project has taken a sharp nosedive today implying it's a systemic problem in the WU's. |
20)
Message boards :
Number crunching :
can this result be correct?
(Message 1858)
Posted 18 Jan 2023 by marmot Post: This machine was running multiple WU's and none were taking up all the cores, and 40+ other WU finished today, so how can this WU have have 45 days of run time on a single core in 2 days? 77522270 49580149 15 Jan 2023, 3:32:33 UTC 17 Jan 2023, 22:39:26 UTC Completed and validated 241,613.00 3,915,919.00 1,647.69 CurieMarieDock 0.2.0 long tasks v2.00 Are the WU's multithreaded? Still, 45 other WU's completed in the same day so not sure where it found cores to multithread to. |
©2024 SiDock@home Team