Many-Task Computing with Python

Monte Lunacek

University of Colorado Boulder

Many-Task Computing with Python

Monte Lunacek

Thomas Hauser


https://github.com/mlunacek/data_science_meetup_2013

https://bit.ly/17KaVl3

Outline

  1. High Performance Computing
    • Types of applications
    • Define Many-task computing
  2. Why Python for HPC?
    • Key packages for scientific computing
    • Abstraction
  3. Many-task computing with Python
    • Scaling
Conclusions, questions

Goals

  1. Understand the landscape of HPC
  2. Links to key packages
  3. Tools
    • Notebook
    • Easy parallelism
  4. Lots of examples
    • Learn something useful
    • Download and try it!
some_text

High Performance Computing

Using applications to solve problems

  1. Size
    • Solve problems that can't fit on a laptop
    • Need more than a few GB of RAM
    • Need more than a few hundred GB of Disk
  2. Speed
    • Same problem, faster
    • Makes a bigger problem more feasible
  3. Henry Neeman, Super Computing in Plain English

Supercomputing

img/infiniband.png
  • Definition changes daily!
  • Cluster of computers linked together
  • 100x bigger, faster, better than a PC

Janus macbook comparison

img/janus.png
Macbook Janus
2.4 GHz Intel x2 2.8 GHz Intel x12 X 1360 ~8000X
8 GB RAM 24 GB RAM ~4000X
3 M cache 12 M cache

Parallel Computing

Traditional

Shared Memory: OpenMP

Distributed: MPI

Accelerator: OpenACC, CUDA

Hybrid

New

MapReduce

Message brokers: AMQP, ZeroMQ


  • Solve a different problem
  • Offer a different set of characteristics

Landscape of Applications

Ioan Raicu, Many-Task Computing: Bridging the Gap between High Throughput Computing and High Performance Computing

Comparing MTC and HTC

Applications that are communication-intensive but are not naturally expressed in MPI. -Ioan Raicu

University of Colorado

monty-python.jpg

Python for High Performance Computing

Effciency

pyp

numba

f2py

cython

Parallel

ipython parallel

celery

scoop

and many more...

Andy TerrelGetting started with Python in HPC

Success Stories

~500,000 simulations on ~7,000 cores with mpi4py

Parameter optimization on ~100 cores with Scoop and DEAP

Improved biological workflow with IPython Parallel

Wrapped an engineering simulation with f2py and IPython Parallel

Packaged multiple MPI tasks with Jinja2

Benchmarking: mpi4py , pandas , Jinja2 , Django

Working on MapReduce with disco and spark

Working on Workflow with NetworkX , IPython Parallel , and Scoop

Why Python?

Numpy (MKL) and Pytables

import numpy as np
import tables as tb

def read_h5(filename):
    filename = filename
    h5 = tb.openFile(filename, mode = "r")
    X = h5.root.x.read()
    h5.close()
    return X

if __name__ == '__main__':

    args = get_args(sys.argv[1:])
	
	A = read_h5(args.matrixA)
	B = read_h5(args.matrixB)
	C = np.dot(A,B)

2 reads, multiply, 1 write

mpy4py

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

if rank == 0:
   
   data = {'key1' : [7, 2.72, 3.2],
           'key2' : ( 'abc', 'xyz')}
else:
   data = None
   
data = comm.bcast(data, root=0)
some_text

Many-Task Computing

Many-task Computing

#!/usr/bin/env python
import argparse, sys, time

def get_args(argv):
    parser = argparse.ArgumentParser()  
    parser.add_argument('-t','--time', help='e.g. --time=50')
    return parser.parse_args(argv)

def work(x):
    time.sleep(x)
    return x

if __name__ == '__main__':

    args = get_args(sys.argv[1:])
    work(float(args.time))

What is the more efficient way to execute work a thousand times?

python work.py --time=10

Approach

Condor/Moab/SLURM
Bash/pbsdsh: painful example
MPI mpi4py: not always what we want
Python
and many, many more ...

Bash/pbsdsh

#!/bin/bash
PATH=$PBS_O_WORKDIR:$PBS_O_PATH
TRIAL=$(($PBS_VNODENUM + $1))
python work.py --time=5

for i in {1..N}
do
  pbsdsh wrapper.sh $count
  count=$(( $count + 12))
done

A little painful

A little inefficient

Message queues

some_text elasticity, memory, fault-tolerance

Examples: celery (AMQP) and IPython (0MQ) .

Examples

some_text

Weak Scaling

Compare: mpi4py, IPython Parallel, Celery
What recommendations can we offer?
Best case scenario

Weak scaling

Initialization

some_text

Weak scaling time

Weak scaling efficiency

Compare

mpi4py many cores, not fault-tolerant
IPython 100 cores, fault-tolerant
IPython many cores, fault-tolerant, consistent time
Celery many cores, fault-tolerant, variable time
Multiprocessing single-node
IPython, Celery launching
all user abstraction

Conclusions

Python is an excellent way to manage MTC jobs

Python provides great abstraction

IPython and Celery

Moving forward

References

paper

Scaling of Many-Task Computing Approaches in Python on Cluster Supercomputers
Monte Lunacek et al. IEEE Cluster 2013

tutorials Univeristy of Colorado Computational Science and Engineering

slides and code
https://github.com/mlunacek/data_science_meetup_2013

Thanks!