Python: Optimization Facilities
Optimization Principles
- TBP: Target, Benchmark, Profile
- Have target numbers in mind (e.g. SLAs that need be met)
- Don’t optimize without benchmarking
- Ensure benchmarks are taken against realistic data
- Don’t optimize without profiling
Tools
Profiling
strace / dtruss
Profile system calls. dtrace is a Mac alternative to strace. dtruss wraps dtrace.
HTTP Server Stressing
wrk
Simple, provides useful baseline data about app performance
wrk --connections=3 --threads=3 --script=wrk-geo.lua https://localhost:8080/dist
Alternatives
- Apache ab
- vegeta
- Locust
Caching
memcached, Redis
Bloom filter
Used to determine if an item is part of a set or not. A space-efficient, probabilistic data structure. Provides a “definitely not” or “probably yes” response.
Motivation
Common problem in streaming applications: Handling of the same event more than once. “Exactly once” delivery is difficult to achieve. Most systems settle on “at least once” delivery.
pybloom-live
Provides a BloomFilter
implementation
Distributing Work: Concurrency and Parallelism
Limits to parallelism: Amdahl’s Law
Formula for the theoretical limits of speed increase or latency as a result of the parallel computing.
\[ S(N) = \frac{1}{(1-P) + (P/N)} \]
cPython’s Global Interpreter Lock (GIL)
Implication is that a single Python process can only use a single core.
I/O-bound vs CPU-bound programs
I/O-bound programs
Performance-constrained by input/output, such as network or disk Not significantly impacted by GIL (why?)
CPU-bound programs
Performance-constrained by expensive computations. Significantly impacted by GIL (why?). Means we have to reach across the process boundary to overcome CPU constraints on performance and get good parallelism.
Usage / Rules of Thumb
Threads
- Use when I/O-bound. Traditional drivers and networking.
Processes
- Use when CPU-bound. Not a lot of comms between processes.
Examples
ETL (Extract, Transform, Load)
Service gets HTTP server logs, stores them in a database. Service extracts some fields from logs and inserts the lines into the database.
celery
Uses a job queue called a broker Supports Redis and RabbitMQ
Writing Servers
C10K Problem
Context-switching is expensive
Async I/O
Great for I/O-bound programs. Handle several tasks with one thread.
Tools / Implementations
asyncio
module
- Twisted
- Tornado web server