Title: how to schedule processors in a cluster of work stations?
Post by: tillwemeetagain on May 15, 2012, 07:35:39 AM
I am working on a cluster with 4 workstations, each has 12 cores and 24G memory, the computation task is very huge, So I want to run them in parallel, I set ncpus = 0 and use the local machine as the remote server, the code is something like:
ncpus = 0
job_server = pp.Server(ncpus,ppservers=ppservers)
in each workstation, I started ppserver.py with - w 12 ,but the program seems not to end, so I cancelled it, and instead use the local machine as the local server, the code is:
#ncpus = 0
job_server = pp.Server(ppservers=ppservers)
I did not start ppserver.py in the local machine, ppserver.py is only started in the remote machine without the -w parameter, and I got the following message:
Job execution statistics:
job count | % of all jobs | job time sum | time per job | job server
24 | 50.00 | 0.0000 | 0.000000 | node2:60000
24 | 50.00 | 2375.0645 | 98.961023 | local
Time elapsed since server creation 233.92422986
WARNING: statistics provided above is not accurate due to job rescheduling.
It looks as if the remote server did not do half of the job.
So now the problem is: How should I set the parameters so that each core can do his part of job?
Title: Re: how to schedule processors in a cluster of work stations?
Post by: Vitalii on June 02, 2012, 12:20:17 AM
If your jobs are large you might need to increase TRANSPORT_SOCKET_TIMEOUT.
Also you can run ppserver.py with -d flag to get the detailed log.