|
If you have a serial code or software, i.e. a Matlab .m file, and a 8 core workstation laying in your lab, I bet you are already looking for a method to speed up your work. Here, I show you how to use threading in Python as an easy alternative to other methods such as Nvidia's CUDA.
Let's get started.
Let's say you have 250x8 jobs to be done. This can be to segment a series of TIFF images. By implementing simple threading, you can have a loop over 8 cores and apply it 250 times.This method has a problem: it is not flexible and robust enough for scientific computing (or online data extraction). What if one of the threads keeps waiting for the response, or your loop can not be applied over a definite number of repeats?
Instead, I will use Queue to get over the problems mentioned above. Queue will have a pool of worker threads and assign tasks to them once finished the task before. Something like a 5 worker team with 2000 boxes to put in a truck. Once a box is put inside the truck, the next box is given to the worker to carry inside.
Problem definition: We have 120 TIFF images, which each image has 250 slices inside. You want to apply a simple filter to each slice.
import os
from Queue import Queue
from threading import Thread
import time
Next, we will initiate some variables:
startIndex = 0
endIndex = 120
# Set up max number of threads
num_fetch_threads = 10
enclosure_queue = Queue()
Then, we initilize some worker threads and let them wait for jobs to accept.
os.system('clear')
# Set up some threads to fetch the enclosures
for i in range(num_fetch_threads):
worker = Thread(target=doMyTaskFunction, args=(i, enclosure_queue,))
worker.setDaemon(True)
worker.start()
We start the outer loop for 120 images
for i in range(startIndex, endIndex):
print "\n Parsing image no: %d\n" % (i+1)
Then inside the outer loop, we go over slices in each image and tell Queue to put a job into the list and then apply the filter function. You send the name of the file and slice to the function as highlighted below:
for j in range(0, 250):
# now put them on a thread
enclosure_queue.put(yourWhatEverVariableToPassToFunction(i,j))
print '*** Main thread waiting'
enclosure_queue.join()
print '*** The page parsing is done ***'
Now, the function that will do the calculations:
def doMyTaskFunction (i, q):
while True:
sliceNumber = q.get()
print 'Thread #%d: Processing %s...\n' % (i+1,sliceNumber)
# do processing stuff here
#Here you will apply your filters, or other kind of processing you want
# once everything is finished, you tell the Queue that the processing is finished, give me next job
q.task_done()
The order of the python code is first the variable definition, then the worker function and finally the thread initiation and loops over the images.
Here is the complete python file. Enjoy.
#!/usr/bin/python
import os
from Queue import Queue
from threading import Thread
import time
startIndex = 0
endIndex = 120
# Set up max number of threads
num_fetch_threads = 10
enclosure_queue = Queue()
enclosure_queue = Queue()
def doMyTaskFunction (i, q):
while True:
myvar = q.get()
print 'Thread #%d: Processing %s...\n' % (i+1,myvar)
# do processing stuff here
page = useMYVARforsomething(myvar)
# kill the thread
q.task_done()
#########################
#### Main program #######
#########################
os.system('clear')
# Set up some threads to fetch the enclosures
for i in range(num_fetch_threads):
worker = Thread(target=doMyTaskFunction, args=(i, enclosure_queue,))
worker.setDaemon(True)
worker.start()
for i in range(startIndex, endIndex):
print "\n Parsing image no: %d\n" % (i+1)
for j in range(0, 250):
enclosure_queue.put(sendSomeVariableToFunction)
print '*** Main thread waiting'
enclosure_queue.join()
print '*** The job is done! ***'
|