FAH GPU Tracker V2

FAH GPU Tracker V2 is a Folding@Home Client tracking and control program


    Tracker adapting if encountered bad WU

    Share

    B2K24
    BETA Team Member

    Posts : 102
    Join date : 2010-06-09

    Tracker adapting if encountered bad WU

    Post by B2K24 on Thu Oct 14, 2010 9:27 pm

    Today I had a 4 hour downtime because of a bad wu that was downloaded.
    Project: 10632 (Run 64, Clone 54, Gen 2)
    http://img547.imageshack.us/img547/1219/badwu.jpg

    it failed 56 times in 4 hours and of course stopping and restarting didn't help because the Tracker or rather GPU3 won't download a different WU until some steps are taken to do so.

    upon investigation of what the hell happened, I discovered for the first time a WU Failures folder with logs will be created so everything's in .txt files. (Nicely done jedi I had no idea the Tracker did this probably because I never hit a bad WU before)

    So by manually doing these steps, I was able to be folding again with a different WU.

    Code:
    Stop the client. Delete the work folder and the queue.dat file, . open client.cfg with notepad and change machine ID to a different number

    So my request is, can the steps be automated with the tracker after X UNSTABLE_MACHINES or X number of failed work unit or project so everything will fold again without user intervention?


    Thank you jedi for everything and all the work you put into this software Smile it's very much appreciated.


    avatar
    jedi95
    Dev Team Member

    Posts : 307
    Join date : 2010-05-26
    Job/hobbies : FAH GPU Tracker V2 Developer

    Re: Tracker adapting if encountered bad WU

    Post by jedi95 on Thu Oct 14, 2010 10:46 pm

    B2K24 wrote:Today I had a 4 hour downtime because of a bad wu that was downloaded.
    Project: 10632 (Run 64, Clone 54, Gen 2)
    http://img547.imageshack.us/img547/1219/badwu.jpg

    it failed 56 times in 4 hours and of course stopping and restarting didn't help because the Tracker or rather GPU3 won't download a different WU until some steps are taken to do so.

    upon investigation of what the hell happened, I discovered for the first time a WU Failures folder with logs will be created so everything's in .txt files. (Nicely done jedi I had no idea the Tracker did this probably because I never hit a bad WU before)

    So by manually doing these steps, I was able to be folding again with a different WU.

    Code:
    Stop the client. Delete the work folder and the queue.dat file, . open client.cfg with notepad and change machine ID to a different number

    So my request is, can the steps be automated with the tracker after X UNSTABLE_MACHINES or X number of failed work unit or project so everything will fold again without user intervention?


    Thank you jedi for everything and all the work you put into this software Smile it's very much appreciated.


    Are you sure all of those WUs were the same project/run/clone/gen? If not then getting the same WU wasn't the problem. If all of those WUs were the exact same PRCG then I will definitely consider adding this feature.

    EDIT: Any reason you are still running 3.21? Or was this just an old screenshot?


    _________________

    B2K24
    BETA Team Member

    Posts : 102
    Join date : 2010-06-09

    Re: Tracker adapting if encountered bad WU

    Post by B2K24 on Thu Oct 14, 2010 11:32 pm

    The 56 fails were all the exact same Project: 10632 (Run 64, Clone 54, Gen 2)

    In this experience nothing I did would give me a different project.

    I did all of the following which still made F@H download and attempt the exact same
    Project: 10632 (Run 64, Clone 54, Gen 2)

    - start/stop tracker multiple times
    - client/gpu0/delete WU
    - shutdown/restart windows

    only what i coded in last post got me a different run/clone/gen

    you are correct about 3.21 I will update now. Even the most newest version would make no difference in this instance because I'm 100% certain the WU is bad.
    avatar
    jedi95
    Dev Team Member

    Posts : 307
    Join date : 2010-05-26
    Job/hobbies : FAH GPU Tracker V2 Developer

    Re: Tracker adapting if encountered bad WU

    Post by jedi95 on Fri Oct 15, 2010 12:21 am

    B2K24 wrote:The 56 fails were all the exact same Project: 10632 (Run 64, Clone 54, Gen 2)

    In this experience nothing I did would give me a different project.

    I did all of the following which still made F@H download and attempt the exact same
    Project: 10632 (Run 64, Clone 54, Gen 2)

    - start/stop tracker multiple times
    - client/gpu0/delete WU
    - shutdown/restart windows

    only what i coded in last post got me a different run/clone/gen

    you are correct about 3.21 I will update now. Even the most newest version would make no difference in this instance because I'm 100% certain the WU is bad.

    True, but 3.25 does fix a large number of bugs and other minor issues.

    The problem with coding this feature is that currently the Tracker doesn't mess with machine IDs. It only sets them in the initial config and then it assumes that they will always be correct after that.

    There are a few technical obstacles to overcome if I wanted to add such a feature:
    1. Adding code to read the current machine ID from client.cfg (easy)
    2. Adding code to keep track of which clients have which machine IDs (somewhat easy)
    3. Adding a new reconfig system to allow changing the machine ID in client.cfg after the initial config (moderately difficult)
    4. Finding some reasonable way of handling cases where a client fails a WU and all machine IDs have been tried already. (moderately difficult)
    5. Making sure 2 clients don't get the same WU because of machine ID changes (difficult)
    6. Figuring out what to do to change a machine ID if a client is running (moderately difficult)
    7. Adding code to check for failing the SAME WU multiple times in a row (somewhat easy)

    There are a TON of things to consider when writing code to do something like this. Now if the Tracker only supported a single client this would be much easier since I could just have it try the next machine ID each time it failed X number of the same WU in a row.

    Needless to say, this isn't exactly at the top of my to-do list thanks to the complexity. Failing the same WU like that doesn't happen often enough to make this feature a priority.


    _________________

    B2K24
    BETA Team Member

    Posts : 102
    Join date : 2010-06-09

    Re: Tracker adapting if encountered bad WU

    Post by B2K24 on Fri Oct 15, 2010 10:26 am

    Thanks for the extensive explanation and consideration. This is obviously a F@h client issue rather that a Tracker issue and I don't wish for extra bloat too be added to the FAH GPU Tracher V2 when unnecessary.

    I hadn't taken into consideration the many different clients or fact that people could be simultaneously running many gpu clients, so this can be rather difficult adding this request.

    Now what's really puzzling is when dealing with SMP WU, just simply deleting WU will always have the client fetch a different run/clone/gen sometimes even a different project number, but with the GPU3 client yesterday it automatically downloaded the exact same project/run/clone/gen 56 times. I suppose this behavior depends on exactly what error is generated when running the WU.

    I regret this now, but I should have tested in this instance if ONLY deleting the work folder, unitinfo.txt, and queue.dat would have fetched a different WU without messing with machine id's, but I did not do this Sad

    If interested, here is a live link to the 56 failures I experienced.
    http://dl.dropbox.com/u/10573028/WU%20Failures.rar

    Sponsored content

    Re: Tracker adapting if encountered bad WU

    Post by Sponsored content


      Current date/time is Fri Jul 21, 2017 1:52 am