Monte Lunacek
University of Colorado Boulder
Monte Lunacek
Thomas Hauser
https://github.com/mlunacek/data_science_meetup_2013
https://bit.ly/17KaVl3Henry Neeman, Super Computing in Plain English
Macbook | Janus | |
---|---|---|
2.4 GHz Intel x2 | 2.8 GHz Intel x12 X 1360 | ~8000X |
8 GB RAM | 24 GB RAM | ~4000X |
3 M cache | 12 M cache |
Traditional
Shared Memory: OpenMP
Distributed: MPI
Accelerator: OpenACC, CUDA
Hybrid
New
MapReduce
Message brokers: AMQP, ZeroMQ
Ioan Raicu, Many-Task Computing: Bridging the Gap between High Throughput Computing and High Performance Computing
Applications that are communication-intensive but are not naturally expressed in MPI. -Ioan Raicu
Andy TerrelGetting started with Python in HPC
~500,000 simulations on ~7,000 cores with mpi4py
Parameter optimization on ~100 cores with Scoop and DEAP
Improved biological workflow with IPython Parallel
Wrapped an engineering simulation with f2py and IPython Parallel
Packaged multiple MPI tasks with Jinja2
Benchmarking:
mpi4py ,
pandas ,
Jinja2 ,
Django
Working on MapReduce with disco and spark
Working on Workflow with NetworkX , IPython Parallel , and Scoop
import numpy as np
import tables as tb
def read_h5(filename):
filename = filename
h5 = tb.openFile(filename, mode = "r")
X = h5.root.x.read()
h5.close()
return X
if __name__ == '__main__':
args = get_args(sys.argv[1:])
A = read_h5(args.matrixA)
B = read_h5(args.matrixB)
C = np.dot(A,B)
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 0:
data = {'key1' : [7, 2.72, 3.2],
'key2' : ( 'abc', 'xyz')}
else:
data = None
data = comm.bcast(data, root=0)
#!/usr/bin/env python
import argparse, sys, time
def get_args(argv):
parser = argparse.ArgumentParser()
parser.add_argument('-t','--time', help='e.g. --time=50')
return parser.parse_args(argv)
def work(x):
time.sleep(x)
return x
if __name__ == '__main__':
args = get_args(sys.argv[1:])
work(float(args.time))
What is the more efficient way to execute work a thousand times?
python work.py --time=10
#!/bin/bash
PATH=$PBS_O_WORKDIR:$PBS_O_PATH
TRIAL=$(($PBS_VNODENUM + $1))
python work.py --time=5
for i in {1..N}
do
pbsdsh wrapper.sh $count
count=$(( $count + 12))
done
A little painful
Weak scaling
mpi4py | many cores, not fault-tolerant |
IPython | 100 cores, fault-tolerant |
IPython | many cores, fault-tolerant, consistent time |
Celery | many cores, fault-tolerant, variable time |
Multiprocessing | single-node |
IPython, Celery | launching |
all | user abstraction |
Python is an excellent way to manage MTC jobs
Python provides great abstraction
IPython and Celery
Moving forward
Scaling of Many-Task Computing Approaches in Python on Cluster Supercomputers
Monte Lunacek et al.
IEEE Cluster 2013
tutorials
Univeristy of Colorado Computational Science and Engineering
slides and code
https://github.com/mlunacek/data_science_meetup_2013