Parallel Python Community Forums

Python Forums => Parallel Python Forum => Topic started by: kfrancoi on December 17, 2008, 02:18:32 AM



Title: pp and SQLAlchemy
Post by: kfrancoi on December 17, 2008, 02:18:32 AM
Hello everyone,

First, I would like to thanks people that have been working on pp module because it only take 5 minutes to get started and it seems to work perfectly well !

I'm trying to use pp to call some simple popen() command, retrieve the result and populate my database with it (using sqlalchemy).  Assuming my command has different parameters, it seems to be easily parallelised ! 

So here is the test code that represent my problem:
Code:
import os
import sys
import pp
import time
from sqlalchemy import create_engine
from elixir import *

metadata.bind = "sqlite:///test_pp.db"
metadata.bind.echo = False

class data(Entity):

myString = Field(String(20))
def __init__(self, myString):
self.myString = myString

def cmdLaunch(letter):

metadata.bind = "sqlite:///test_pp.db"
metadata.bind.echo = False
setup_all()
print 'Hello World'

d = data(letter)
session.add(d)
session.commit()

def main():

setup_all()
create_all()
#Tuple of all parrallel python servers to connect with
ppservers = ()
#Create jobserver with automatically detected number of workers
job_server = pp.Server(ppservers=ppservers)
print "Starting pp risland with", job_server.get_ncpus(), "workers"

start_time = time.time()

for letter in ['A','B','C','D','E','F','G','H']:
#cmdLaunch(letter) #This is OK
job_server.submit(cmdLaunch,(letter,), (data,), ("os","elixir.session", "elixir.metadata", "elixir.setup_all",)) #This is not

print 'Done.'
print "Time elapsed: ", time.time() - start_time, "s"
job_server.print_stats()

main()

By calling "cmdLaunch() outside pp, it work fine. But when I try to call via pp, it just doesn't seem to do anything.  There is also no error!

Does anyone have already use pp and sqlalchemy together ?

Thank you very much

Kevin


Title: Re: pp and popen()
Post by: kfrancoi on December 19, 2008, 07:01:45 AM
The question of SQLAlchemy seems to be OK for me now.  I just used the Elixir module in a bad way.

But I still have some questions regarding my os.popen() command.  When I try to parallelize a very simple function, the time elapsed just doesn't look parallel (30sec for both):

Code:
import os
import sys
import pp
import time

def cmdLaunch(letter):

#Launch command
res = os.popen("sleep 4")
#Wait and read the empty line ;-)
res.readline()
return letter

def main():

result = []
#Tuple of all parrallel python servers to connect with
ppservers = ()
ncpus = 8
#Create jobserver with automatically detected number of workers
job_server = pp.Server(ncpus, ppservers=ppservers)
print "Starting pp risland with", job_server.get_ncpus(), "workers"

start_time = time.time()

for letter in ['A','B','C','D','E','F','G','H']:
job = job_server.submit(cmdLaunch,(letter,), (), ("os",))
result.append(job())

for r in result:
print r

print 'Done.'
print "Time elapsed: ", time.time() - start_time, "s"
job_server.print_stats()

main()

Is it possible to paralleliez the popen() command with pp ?

Thanks


Title: Re: pp and SQLAlchemy
Post by: Vitalii on December 24, 2008, 07:46:45 AM
The following call requests the results of the job and is blocking: job().
You need first to submit all the jobs and then to request the results:
Code: (python)
import os
import sys
import pp
import time
 
def cmdLaunch(letter):

#Launch command
res = os.popen("sleep 4")
#Wait and read the empty line ;-)
res.readline()
return letter
 
def main():

result = []
#Tuple of all parrallel python servers to connect with
ppservers = ()
ncpus = 8
#Create jobserver with automatically detected number of workers
job_server = pp.Server(ncpus, ppservers=ppservers)
print "Starting pp risland with", job_server.get_ncpus(), "workers"

start_time = time.time()

for letter in ['A','B','C','D','E','F','G','H']:
job = job_server.submit(cmdLaunch,(letter,), (), ("os",))
result.append(job)

for r in result:
print r()

print 'Done.'
print "Time elapsed: ", time.time() - start_time, "s"
job_server.print_stats()
 
main()
 
And now it works as expected:
Quote
Starting pp risland with 8 workers
A
B
C
D
E
F
G
H
Done.
Time elapsed:  4.03840208054 s
Job execution statistics:
 job count | % of all jobs | job time sum | time per job | job server
         8 |        100.00 |      28.1278 |     3.515970 | local
Time elapsed since server creation 4.03878498077