Posts by Brian Nixon

1) Message boards : Number crunching : 141 days remaining on a WU (Message 2220)
Posted 9 Mar 2024 by Brian Nixon
Post:
Note: all other deadlines are listed in EDT
That’s because they fall after your clocks go forward tomorrow morning.

I **assume** that the checkpoints for Linux SiDock applications work correctly
AFAIK, CmDock on Linux doesn’t have the same checkpointing issue as on Windows (which, incidentally, is not that checkpoints aren’t made; just that BOINC doesn’t know about them) – but it does still have the restarting problem where (even with successful checkpointing) tasks that get stopped and restarted sometimes go back to the beginning. In this latter case BOINC’s measurement of the task’s progress rate will get confused (because it is tracking CPU time already used, but the task has gone back to 0% complete), which is most likely what leads to the enormous predicted time remaining.

I was going to let BOINC or the SI server issue an abort/timeout
AFAIK, neither your local BOINC nor the SiDock server will abort an already-started task just because it is predicted to miss (or even if it has missed) its deadline. You can still get credit for reporting it late (even, I think, if a resend goes out and gets back before yours completes). But if you want to abort the suspect task yourself and let the server resend the work unit to another host, don’t worry about it. It is just one of the millions that SiDock is running; the project will cope… :-⁠)

PS. Days remaining shifted to 101
The longer the task runs, the better BOINC’s estimate of its ‘speed’ gets. The predicted time remaining will gradually correct itself.
2) Message boards : Number crunching : WU become longer and longer (Message 2216)
Posted 9 Mar 2024 by Brian Nixon
Post:
There are a couple of known bugs in the CmDock science app:

  • On Windows, BOINC doesn’t know about its checkpoints. After a restart, the percentage complete usually recovers after a few minutes, but the previous elapsed CPU time is lost and so the BOINC credit will be lower. The problem has been fixed at source, though the project hasn’t updated its application yet.

  • If a task resumes from a checkpoint but gets restarted before it has saved another checkpoint, it might restart from the beginning or report completion and fail validation. (Also mentioned in threads here and here.) The bug has been reported.

3) Questions and Answers : Unix/Linux : Exit status 195 (0x000000C3) EXIT_CHILD_FAILED (Message 2193)
Posted 8 Feb 2024 by Brian Nixon
Post:
Code 195 comes from the BOINC wrapper reporting that the science app (CmDock) exited with an error. The clue to that error is in the task output app exit status: 0x8b, indicating it exited due to a SIGSEGV. If this is the only machine experiencing this issue, it most likely points to a hardware problem such as faulty memory or unstable overclocking.
4) Message boards : Number crunching : WU error (Message 2153)
Posted 26 Dec 2023 by Brian Nixon
Post:
This is a message from Windows Error Reporting after detecting memory corruption in the BOINC wrapper application. It’s uncommon for that to crash, as it doesn’t do much itself – it just starts the science app (CmDock) and takes care of communication between that and the rest of BOINC.

Do you also see errors in your SiDock tasks? (You should be able to check here, but nobody else can see your results unless you choose to show your computers in your preferences.)

The problem might be unrelated to BOINC/​SiDock specifically. Have you had other unexplained crashes on that machine? Are you overclocking?
5) Message boards : Number crunching : remaining time frozen (Message 2150)
Posted 25 Nov 2023 by Brian Nixon
Post:
This is normal. The CmDock application only reports its ‘Progress’ statistic once every few minutes, so between those updates the ‘Remaining’ time estimated by the BOINC client can remain frozen.
6) Message boards : Number crunching : low credit awarded on restarted task (Message 2070)
Posted 15 May 2023 by Brian Nixon
Post:
Yes; it’s a bug. It was discussed in this thread and is now fixed at source, though the project hasn’t updated its application yet.
7) Message boards : News : CmDock source code (Message 2068)
Posted 9 May 2023 by Brian Nixon
Post:
CmDock does checkpoint (typically every 5–⁠15 minutes), but there are a couple of bugs:

  • On Windows, BOINC doesn’t know about the checkpoints. The issue is discussed in this thread. After a restart, the percentage complete usually recovers after a few minutes, but the previous elapsed CPU time is lost and so the BOINC credit will be lower. The problem has been fixed at source, though the project hasn’t updated its application yet.

  • If a task resumes from a checkpoint but gets restarted before it has saved another checkpoint, it might restart from the beginning or report completion and fail validation. (Mentioned in threads here and here.) The bug has been reported.

8) Questions and Answers : Windows : Project communication failed: attempting access to reference site (Message 2063)
Posted 2 May 2023 by Brian Nixon
Post:
Also reported on the main BOINC forums here. Not sure it ever got resolved.
9) Questions and Answers : Windows : Project communication failed: attempting access to reference site (Message 2061)
Posted 2 May 2023 by Brian Nixon
Post:
Try adding the http_debug flag (Manager, Advanced View » Options » Event Log options) and then looking in the BOINC event log for entries containing HTTP error:, which should give additional detail about the problem.
10) Message boards : Number crunching : Potential error (Message 2059)
Posted 18 Apr 2023 by Brian Nixon
Post:
TL;DR: I believe this is harmless.

It is the BOINC wrapper that has crashed here, not the science app (CmDock). My best guess for why that did not cause the task to fail with an error is that it happened while the client was asking the task to exit (either at shutdown, or suspending activity with the preference Leave non-GPU tasks in memory while suspended unchecked). If that is the case, this looks like a race condition while the wrapper is exiting (which would be a bug in BOINC’s part, not SiDock’s) – but it seems to be benign because the science app had exited before the wrapper crashed. That means that when the client resumed the task later, it was able to restart correctly from the last checkpoint.
11) Message boards : Number crunching : "Bestätigungsfehler" - I got this out of the most of my finished tasks (Message 2029)
Posted 9 Mar 2023 by Brian Nixon
Post:
Did those tasks get started, stopped and restarted in quick succession?

(I have seen “Bestätigungsfehler” (“Validate error”) in those circumstances. Shortly after the second start, the task reports it has completed, but the results are invalid.)
12) Message boards : Number crunching : i7-8700K performance (Message 2026)
Posted 6 Mar 2023 by Brian Nixon
Post:
The i7 hasn’t completed anywhere near as many tasks as the Ryzens, so its measured processing rate for the new “long tasks” application version is still lagging (4.1 GFLOPS vs. 7.3, at the time of writing). The moving average from which credit is calculated is quite heavily filtered, so it will take several weeks to stabilise.

As a wise person counselled in the other thread about credit consistency:
People just need to exercise a bit of patience.
13) Questions and Answers : Windows : Cannot add host error ! (Message 2019)
Posted 2 Mar 2023 by Brian Nixon
Post:
Since Windows 7 is not supported anymore by Microsoft, it doesn't receive any updates, including these one with root certificates.
Automatic root certificate updates are still working on Windows 7, even if it’s out of official support.

@Tim: Is the Cryptographic Services service running on the machine that’s failing to connect?
14) Message boards : Number crunching : Most projects stuck on uploading for ever (Message 2017)
Posted 1 Mar 2023 by Brian Nixon
Post:
The BOINC client does back off very rapidly after a couple of failed upload attempts, sometimes waiting hours before trying again. You can try to kick it back to life with the Advanced view menu option Tools » Retry pending transfers.
15) Message boards : News : СmDock "long" and "short" tasks applications (Message 2015)
Posted 26 Feb 2023 by Brian Nixon
Post:
Thinking about it some more: this is probably only part of the story. It explains why BOINC gets “CPU time since checkpoint” wrong, but not why tasks’ elapsed CPU time resets to zero after being stopped and restarted.
---
Edit: Having tested it just now, I can confirm Mad_Max’s findings: forcibly updating the last-write time of docking_out.chk does cause BOINC to update all the related internal stats – including “Elapsed CPU time at last checkpoint”, which is saved for use after a restart to initialise the CPU time. And looking at the BOINC client code, it is clear that this only gets done when the last-checkpoint time reported by the wrapper changes.
16) Message boards : Cafe : I wonder how SiDock (distributed computing) compares with supercomputing (Message 2013)
Posted 26 Feb 2023 by Brian Nixon
Post:
With widespread participation distributed computing can do much more than all supercomputers from TOP500 list. :)

In principle, maybe. In practice, it comes nowhere close. (The TOP500 (4,864,000 TFlop/s) can do in 4 minutes what SiDock@home (36 TFlop/s) does in 1 year.)

David Anderson analyses this missed opportunity at length in his BOINC in Retrospect. (TL;DR: Needs more promotion, understanding, trust, …)
17) Message boards : News : СmDock "long" and "short" tasks applications (Message 2012)
Posted 26 Feb 2023 by Brian Nixon
Post:
Yeah – just seen that: #20. Thanks!
18) Message boards : News : СmDock "long" and "short" tasks applications (Message 2008)
Posted 25 Feb 2023 by Brian Nixon
Post:
I did some debug to and have found part of the problem with checkpoints and loosing cpu/elapsed time counters after each WU/BOINC/Computer restart.
It may be OS related indeed. But not caused by OS itself (as all other projects running on same computer and in the same BOINC installation do not have such problems). I see it on all of my computers, but they use same OS ver installed (Win7 Pro x64).
May be something like new OS API call which was added only on latest win ver and do not work properly(partial support) on older versions?

I did some monitoring of files which running SiDock WU writes to the disk during checkpoint save in working "/slots/" folder.
And have found very interesting things: while checkpoint files written to the disk OK it miss write of file metadata: after modifying these files by app
docking_out.progress
docking_out.chk

File timestamps of "last modified" do not change and always stays same(equal) as time-stamp of initial file creation at WUs first start up.

I get this all the time. It seems to be a Windows thing. From the documentation of the WriteFile function:
When writing to a file, the last write time is not fully updated until all handles used for writing have been closed. Therefore, to ensure an accurate last write time, close the file handle immediately after writing to the file.

So AFAICT what is happening is that CmDock is writing the checkpoint to docking_out.chk, but not closing the handle – so the wrapper (which is polling the last-write time of that file to report the last-checkpoint time back to BOINC) does not see any change.
19) Questions and Answers : Windows : Cannot add host error ! (Message 1999)
Posted 20 Feb 2023 by Brian Nixon
Post:
SSL connect error
This machine might be missing a security update. Can you compare the versions of the file %SystemRoot%\system32\schannel.dll (in Explorer, right-click » Properties » Details tab » File version) on your two machines?
20) Questions and Answers : Windows : Cannot add host error ! (Message 1994)
Posted 19 Feb 2023 by Brian Nixon
Post:
Add the http_debug event log flag. It will give more detail about the error.


Next 20

©2024 SiDock@home Team