1. 并发编程简介
在现代计算机系统中,使用并发性(concurrency)是实现高效、高性能程序的关键。简而言之,就是一次运行多个任务(processes 或 threads)。在Python中,CPython解释器的全局锁,让同一时刻只能执行一个线程,这就导致Python的多线程程序并不是真正的并行执行,而是交替执行的并发。要实现真正的并行,可以使用多进程模块(multiprocessing)或者第三方库(比如concurrent.futures)。
import concurrent.futures
import requests
def download_image(url: str) -> str:
response = requests.get(url)
filename = url.split("/")[-1]
with open(filename, "wb") as file:
file.write(response.content)
return f"{filename} downloaded!"
if __name__ == "__main__":
urls = [
"https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885__480.jpg",
"https://cdn.pixabay.com/photo/2019/06/02/00/54/sunset-4242011__480.jpg",
"https://cdn.pixabay.com/photo/2016/12/07/15/50/bleak-1882077__480.jpg",
"https://cdn.pixabay.com/photo/2016/06/29/08/42/wheat-field-1487863__480.jpg",
]
with concurrent.futures.ThreadPoolExecutor() as executor:
results = executor.map(download_image, urls)
for result in results:
print(result)
1.1. 多线程与多进程
CPython中的GIL(Global Interpreter Lock),保证了在同一时刻只会有一个线程访问Python解释器。因此,多线程的优势仅在于任务的调度,而不是同时执行多个任务。为了解决这个问题,我们可以使用多进程和第三方并发库。多进程可以达到真正的并行,如下代码所示:
import concurrent.futures
import requests
def download_image(url: str) -> str:
response = requests.get(url)
filename = url.split("/")[-1]
with open(filename, "wb") as file:
file.write(response.content)
return f"{filename} downloaded!"
if __name__ == "__main__":
urls = [
"https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885__480.jpg",
"https://cdn.pixabay.com/photo/2019/06/02/00/54/sunset-4242011__480.jpg",
"https://cdn.pixabay.com/photo/2016/12/07/15/50/bleak-1882077__480.jpg",
"https://cdn.pixabay.com/photo/2016/06/29/08/42/wheat-field-1487863__480.jpg",
]
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(download_image, urls)
for result in results:
print(result)
由于每个进程都有自己的解释器,因此可以实现真正的并行处理任务,这比使用多线程的方式更加高效。但是由于进程之间的切换比线程之间的切换要耗费更多的计算资源,所以不要创建过多的进程。
1.2. 协程与异步IO
Python中的协程和异步IO是实现并发编程的另外两个重要工具。协程是一种轻量级线程,可以在单线程内实现任务的调度。协程所占的内存极少,创建和销毁速度也很快,适合处理大量短时间任务,在高并发下性能优秀。
下面是一个使用asyncio实现协程的例子:
import asyncio
import aiohttp
async def download_image(url: str) -> str:
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
filename = url.split("/")[-1]
with open(filename, "wb") as file:
file.write(await resp.content.read())
return f"{filename} downloaded!"
if __name__ == "__main__":
urls = [
"https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885__480.jpg",
"https://cdn.pixabay.com/photo/2019/06/02/00/54/sunset-4242011__480.jpg",
"https://cdn.pixabay.com/photo/2016/12/07/15/50/bleak-1882077__480.jpg",
"https://cdn.pixabay.com/photo/2016/06/29/08/42/wheat-field-1487863__480.jpg",
]
loop = asyncio.get_event_loop()
results = loop.run_until_complete(asyncio.gather(*[download_image(url) for url in urls]))
for result in results:
print(result)
异步IO则是Python中处理I/O密集型任务的利器。在使用传统的同步IO的情况下,进程会被阻塞直到IO操作完成。而使用异步IO,进程会在IO操作进行的同时处理其他任务,从而提高CPU的利用率。
2. 线程与进程池
2.1. 线程池
线程池可以减少线程的创建和销毁过程的开销,从而提高程序的性能。Python内置的concurrent.futures模块提供了ThreadPoolExecutor类来实现线程池。下面是一个使用ThreadPoolExecutor实现多线程下载图片的例子:
import concurrent.futures
import requests
def download_image(url: str) -> str:
response = requests.get(url)
filename = url.split("/")[-1]
with open(filename, "wb") as file:
file.write(response.content)
return f"{filename} downloaded!"
if __name__ == "__main__":
urls = [
"https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885__480.jpg",
"https://cdn.pixabay.com/photo/2019/06/02/00/54/sunset-4242011__480.jpg",
"https://cdn.pixabay.com/photo/2016/12/07/15/50/bleak-1882077__480.jpg",
"https://cdn.pixabay.com/photo/2016/06/29/08/42/wheat-field-1487863__480.jpg",
]
with concurrent.futures.ThreadPoolExecutor() as executor:
results = executor.map(download_image, urls)
for result in results:
print(result)
ThreadPoolExecutor会创建一个线程池,其中包含多个线程,每个线程可以处理一个任务。通过使用executor.map函数或executor.submit函数将任务添加到线程池中,可以让多个任务在不同的线程中并发执行。
2.2. 进程池
与线程池类似,进程池可以减少进程的创建和销毁过程的开销,从而提高程序的性能。Python内置的concurrent.futures模块提供了ProcessPoolExecutor类来实现进程池。下面是一个使用ProcessPoolExecutor实现多进程下载图片的例子:
import concurrent.futures
import requests
def download_image(url: str) -> str:
response = requests.get(url)
filename = url.split("/")[-1]
with open(filename, "wb") as file:
file.write(response.content)
return f"{filename} downloaded!"
if __name__ == "__main__":
urls = [
"https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885__480.jpg",
"https://cdn.pixabay.com/photo/2019/06/02/00/54/sunset-4242011__480.jpg",
"https://cdn.pixabay.com/photo/2016/12/07/15/50/bleak-1882077__480.jpg",
"https://cdn.pixabay.com/photo/2016/06/29/08/42/wheat-field-1487863__480.jpg",
]
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(download_image, urls)
for result in results:
print(result)
与线程池类似,ProcessPoolExecutor也会创建一个池,其中包含多个进程,每个进程可以处理一个任务。
3. 锁
锁(lock)是一种同步操作,用于协调多个进程或线程对共享资源的访问。Python内置的threading模块提供了Lock类来实现线程锁。下面是一个使用Lock类保证线程同步的例子:
import threading
total = 0
lock = threading.Lock()
def add():
global total
for i in range(1000000):
lock.acquire()
total += 1
lock.release()
def sub():
global total
for i in range(1000000):
lock.acquire()
total -= 1
lock.release()
if __name__ == "__main__":
threads = [threading.Thread(target=add) for i in range(5)] + [threading.Thread(target=sub) for i in range(5)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print(total)
使用Lock类可以保证多线程程序中对共享资源的互斥访问。即,在同一时刻最多只能有一个线程对共享资源进行操作。当一个线程获得了锁之后,其他线程只能等待该线程释放锁。
4. 总结
Python中的并发编程有多种方式可以实现,包括线程、进程、协程和异步IO。线程和进程池可以在不同的线程或进程中实现同时执行多个任务,提高程序性能。同时,锁可以保证多线程环境下对共享资源的互斥访问。在进行并发编程时,需要根据具体需求选择适合的并发方式。