|
Title: how to schedule processors in a cluster of work stations? Post by: tillwemeetagain on May 15, 2012, 07:35:39 AM Hi all
I am working on a cluster with 4 workstations, each has 12 cores and 24G memory, the computation task is very huge, So I want to run them in parallel, I set ncpus = 0 and use the local machine as the remote server, the code is something like: ncpus = 0 ppservers=("node1","node2","node3","node4") job_server = pp.Server(ncpus,ppservers=ppservers) in each workstation, I started ppserver.py with - w 12 ,but the program seems not to end, so I cancelled it, and instead use the local machine as the local server, the code is: #ncpus = 0 ppservers=("node1","node2") job_server = pp.Server(ppservers=ppservers) I did not start ppserver.py in the local machine, ppserver.py is only started in the remote machine without the -w parameter, and I got the following message: Job execution statistics: job count | % of all jobs | job time sum | time per job | job server 24 | 50.00 | 0.0000 | 0.000000 | node2:60000 24 | 50.00 | 2375.0645 | 98.961023 | local Time elapsed since server creation 233.92422986 WARNING: statistics provided above is not accurate due to job rescheduling. It looks as if the remote server did not do half of the job. So now the problem is: How should I set the parameters so that each core can do his part of job? Title: Re: how to schedule processors in a cluster of work stations? Post by: Vitalii on June 02, 2012, 12:20:17 AM If your jobs are large you might need to increase TRANSPORT_SOCKET_TIMEOUT.
Also you can run ppserver.py with -d flag to get the detailed log. |