Posts by pschoefer

1) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 1256)
Posted 1 Oct 2021 by pschoefer
Post:
The file is stored locally on the client side in BOINC's Program folder, so no server issue.

There appears to be a server-side workaround, however, namely renewing the letsencrypt certificate with the valid certificate chain as the preferred chain (i.e., certbot --preferred-chain "ISRG Root X1").
2) Message boards : News : SiDock@home September Sailing (Message 1229)
Posted 23 Sep 2021 by pschoefer
Post:
Thanks to the team for organizing this event. :-)
And special thanks to hoarfrost for all the work put into this.

+1

It was a bit stormy in the beginning, but almost smooth sailing after that feeder bottleneck was fixed. Unfortunately, the longer Eprot tasks did not only help to weather the storm, but also lead to CreditNew showing its ugly face. In the end, I have seen credit values between 4 and 200 Cobblestones for 3CLpro tasks with similar run times. The top 3 teams were probably far enough apart, but the very close race for 4th might well have been decided by pure luck instead of computing power. I would be delighted to participate in another challenge with a less random credit distribution some time in the future. There are not many other projects out there that put so much effort into things like this.
3) Message boards : News : SiDock@home September Sailing (Message 1218)
Posted 20 Sep 2021 by pschoefer
Post:
According to the application details for that host, it has already completed 1125 tasks in the last 8 days (the host was created on 12 Sep), but a lot of those tasks were already purged from the database. To me, it looks like it just downloaded way too much work before the challenge, aborted the tasks that it could not finish before the deadline, and should be able to complete most of the remaining tasks before the end of the challenge.
4) Message boards : Number crunching : Point Drop per Work Unit. (Message 1209)
Posted 19 Sep 2021 by pschoefer
Post:
looks like Intel has paid for this credit system.

This is a nice conspiracy theory, but it really is not an Intel vs AMD (or Windows vs Linux) issue. It is just CreditNew turning things into a lottery, as it is prone to do.

Basically, for CreditNew to work smoothly, the following two assumptions have to be met:

  1. task run time scales reasonably well with estimated FLOP count (definitely not true here until now, as both the short 3CLpro and the long Eprot tasks had the same FLOP count estimate; I don't know yet how it will work out once hoarfrost's recent changes take effect)
  2. computation speed of each computer is constant (a very bold assumption, especially with modern CPUs and GPUs that can adjust their clock frequencies on the fly based on temperature or power draw)


As soon as reality deviates from these assumptions, CreditNew does all sorts of weird things that may or may not average out long term (and definitely not short term, e.g. over the duration of a typical competition like the one going on right now). I have two screenshots to illustrate the complete mess CreditNew created here:


  • A set of 3CLpro_v5 tasks. The longest task took ~50% longer than the shortest, but credit varies by a factor of 4.3 and there is no correlation between run time and credit. And this is just a small subset, I have also seen 3CLpro_v5 tasks with credit as low as ~20 and as high as ~210, i.e. 10 times the credit for roughly the same amount of work.
  • A set of Eprot tasks. In this case, the longest task took only ~10% longer than the shortest, but again, credit varies a lot more and there is no correlation with run time.


Three years ago, even David Anderson came to realise that CreditNew is not working all that well and now recommends it only in cases where no better option is available. Let's take a look at the other options:

Pre-assigned credit: I think this is the best option available. There could be a fixed amount of credit based on the target, e.g. 30 Cobblestones for 3CLpro tasks, 450 Cobblestones for Eprot tasks. Yes, as shown above, there is some variation in task run time even for the same target, but a ~50% difference in credit per second is much better than the ~1000% difference we see with CreditNew. It is also cheat-proof, device-neutral, and immediately rewards a CPU switching to turbo mode rather than punish it. The downside is of course that it takes a bit more work when preparing a new target, as the right amount of credit needs to be known in advance.

Post-assigned credit: AFAIK, the run time variations are caused by different run times for the docking simulations for each ligand, while the number of ligands processed in each task is constant. So this credit option would require implementing some sort of FLOP count. I don't think this is feasible.

Runtime-based credit: In theory, this sounds like a good choice for this project, as long as there is no GPU application. In reality, however, it is a complete nightmare that really should not be used by any project any more. It already was a nightmare back when this was the standard credit system, because some users used clients that reported inflated benchmark values; nowadays, the benchmark does not really mirror the true CPU performance even without those "optimisations", because it runs on only one CPU core and the CPU might therefore run at a lot higher frequency than during the actual computations.

Ironically, that leaves adaptive credit aka CreditNew as the recommended option for exactly those cases where it performs the worst. I would argue that even in those cases, some fixed amount of credit per task would be much less of a lottery than the mess created by CreditNew.

5) Message boards : News : SiDock@home September Sailing (Message 1167)
Posted 15 Sep 2021 by pschoefer
Post:
xii5ku wrote:
Michael H.W. Weber wrote:
Please take a look at these guidelines which my team colleague Yoyo has written down
This guide is about keeping the server responsive, not so much about keeping the hosts utilized.

Database optimisation can help under the right circumstances, but usually, when many hosts request work at the same time, the bottleneck is the scheduler queue. I fully agree that the other points in that guide are just about mitigating the impact on the server (with sometimes debatable success), not about solving the underlying problem.

The guide addresses this as well, how to reduce client request frequency.

The underlying problem is that the tasks are not sent out efficiently enough. If a client would not have to ask for tasks several times before it receives any, there would be far less requests. Of course, as soon as the work supply is shaky, the more enthusiastic participants will take measures to avoid running dry (i.e. forcing work requests as frequently as possible, setting higher buffers, etc.), thereby creating (most of) the server problems your guide is trying to mitigate.
6) Message boards : News : SiDock@home September Sailing (Message 1161)
Posted 15 Sep 2021 by pschoefer
Post:
xii5ku wrote:
Michael H.W. Weber wrote:
Please take a look at these guidelines which my team colleague Yoyo has written down
This guide is about keeping the server responsive, not so much about keeping the hosts utilized.

Database optimisation can help under the right circumstances, but usually, when many hosts request work at the same time, the bottleneck is the scheduler queue. I fully agree that the other points in that guide are just about mitigating the impact on the server (with sometimes debatable success), not about solving the underlying problem.
7) Message boards : Number crunching : The new record in SiDock@HOME project (Message 895)
Posted 9 May 2021 by pschoefer
Post:
That number is estimated from the total RAC of all accounts, and the RAC is roughly the credit per day averaged over the last 14 days, so I think it is a safe bet that this will continue to rise in the next 10 days. :)
8) Message boards : Number crunching : Suggestion for better task progress bar (Message 892)
Posted 8 May 2021 by pschoefer
Post:
With checkpoints already appearing on the horizon, I think it is time to revisit another possible improvement of the application from the volunteers' point of view.

Right now, the shown task progress is just estimated by a standard BOINC routine based on the initial runtime estimate. If the runtime is overestimated, tasks will jump to 100% from much smaller percentages, whereas the progress will crawl asymptotically toward 100%, if the runtime is underestimated. Both behaviours can be confusing for new volunteers in particular, and while the runtime estimate may settle for a realistic value in the long-term, it is most likely off for newly attached hosts.

CMDock reports a more useful progress estimate in its logfile, and I guess that something along the lines of reading that estimate and converting it to a fraction that can be read by the wrapper was the purpose of the script that was removed before I joined this project. For a more stable way to get the progress estimate in a wrapper-readable format, I suggest a small modification to CMDock.

As a proof of concept, walli and I have come up with the following patch:
diff --git a/src/exe/cmdock.cxx b/src/exe/cmdock.cxx
index 62adbe8..689ea5e 100644
--- a/src/exe/cmdock.cxx
+++ b/src/exe/cmdock.cxx
@@ -647,13 +647,19 @@ int main(int argc, char *argv[]) {
         std::cout << "Ligand docking duration:      " << recordDuration.count()
                   << " second(s)" << std::endl;
         totalDuration += recordDuration;
+        std::size_t estNumRecords = spMdlFileSource->GetEstimatedNumRecords();
+        // set "fraction_done" after every record
+        double progress = (double)nRec/(double)estNumRecords;
+        std::fstream fraction_done_filename;
+        fraction_done_filename.open("fraction_done", std::ios::out | std::ios::trunc);
+        fraction_done_filename << progress << std::endl;
+        fraction_done_filename.close();
         // report average every 10th record starting from the 1st
         if (nRec % 10 == 1) {
           std::cout << std::endl
                     << "Average duration per ligand:  "
                     << totalDuration.count() / static_cast<double>(nRec)
                     << " second(s)" << std::endl;
-          std::size_t estNumRecords = spMdlFileSource->GetEstimatedNumRecords();
           if (estNumRecords > 0) {
             std::chrono::duration<double> estimatedTimeRemaining =
                 estNumRecords * (totalDuration / static_cast<double>(nRec));

With this patch, CMDock writes its progress as a simple fraction in a fraction_done file that can than be specified as fraction_done_filename in the job.xml.

I successfully finished two tasks using the patched CMDock on this host, and am also testing it on a Raspberry Pi right now. While the reported progress is not completely linear (because some ligands take more time than others), it is much better than BOINC's progress estimate. I even set the fraction_done_exact option, so that the remaining runtime is estimated based on the reported progress. While the estimated remaining runtime jumps a lot in the beginning because the progress report is not very fine-grained, it becomes a very realistic estimate after the first few ligands.
9) Message boards : News : Half a year, the BOINC Workshop and a move to the new server (Message 813)
Posted 23 Apr 2021 by pschoefer
Post:

Society. The BOINC Pentathlon team selected SiDock@home as this year's Marathon project. The Marathon discipline is announced on 30 April and runs for two weeks from 05 through 18 May.

It should be announced on 30. April, but you annouced it today. It's a pity that you not keep the secret and guarantee a fair challenge for every team.

Sorry, this was miscommunication from my side, I forgot to say explicitly that we would like this to remain secret until it is announced on the Pentathlon website.

However, while the genie is out of the bottle, there is no real damage done. Any tasks downloaded now will expire well before the Marathon and the project will be out of work for some time anyway because of the server move, so this pre-announcement won't cause overly aggressive pre-bunkering.




©2024 SiDock@home Team