python 并发编程

1. 并发编程简介

在现代计算机系统中,使用并发性(concurrency)是实现高效、高性能程序的关键。简而言之,就是一次运行多个任务(processes 或 threads)。在Python中,CPython解释器的全局锁,让同一时刻只能执行一个线程,这就导致Python的多线程程序并不是真正的并行执行,而是交替执行的并发。要实现真正的并行,可以使用多进程模块(multiprocessing)或者第三方库(比如concurrent.futures)。

import concurrent.futures

import requests

def download_image(url: str) -> str:

response = requests.get(url)

filename = url.split("/")[-1]

with open(filename, "wb") as file:

file.write(response.content)

return f"{filename} downloaded!"

if __name__ == "__main__":

urls = [

"https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885__480.jpg",

"https://cdn.pixabay.com/photo/2019/06/02/00/54/sunset-4242011__480.jpg",

"https://cdn.pixabay.com/photo/2016/12/07/15/50/bleak-1882077__480.jpg",

"https://cdn.pixabay.com/photo/2016/06/29/08/42/wheat-field-1487863__480.jpg",

]

with concurrent.futures.ThreadPoolExecutor() as executor:

results = executor.map(download_image, urls)

for result in results:

print(result)

1.1. 多线程与多进程

CPython中的GIL(Global Interpreter Lock),保证了在同一时刻只会有一个线程访问Python解释器。因此,多线程的优势仅在于任务的调度,而不是同时执行多个任务。为了解决这个问题,我们可以使用多进程和第三方并发库。多进程可以达到真正的并行,如下代码所示:

import concurrent.futures

import requests

def download_image(url: str) -> str:

response = requests.get(url)

filename = url.split("/")[-1]

with open(filename, "wb") as file:

file.write(response.content)

return f"{filename} downloaded!"

if __name__ == "__main__":

urls = [

"https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885__480.jpg",

"https://cdn.pixabay.com/photo/2019/06/02/00/54/sunset-4242011__480.jpg",

"https://cdn.pixabay.com/photo/2016/12/07/15/50/bleak-1882077__480.jpg",

"https://cdn.pixabay.com/photo/2016/06/29/08/42/wheat-field-1487863__480.jpg",

]

with concurrent.futures.ProcessPoolExecutor() as executor:

results = executor.map(download_image, urls)

for result in results:

print(result)

由于每个进程都有自己的解释器,因此可以实现真正的并行处理任务,这比使用多线程的方式更加高效。但是由于进程之间的切换比线程之间的切换要耗费更多的计算资源,所以不要创建过多的进程。

1.2. 协程与异步IO

Python中的协程和异步IO是实现并发编程的另外两个重要工具。协程是一种轻量级线程,可以在单线程内实现任务的调度。协程所占的内存极少,创建和销毁速度也很快,适合处理大量短时间任务,在高并发下性能优秀。

下面是一个使用asyncio实现协程的例子:

import asyncio

import aiohttp

async def download_image(url: str) -> str:

async with aiohttp.ClientSession() as session:

async with session.get(url) as resp:

filename = url.split("/")[-1]

with open(filename, "wb") as file:

file.write(await resp.content.read())

return f"{filename} downloaded!"

if __name__ == "__main__":

urls = [

"https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885__480.jpg",

"https://cdn.pixabay.com/photo/2019/06/02/00/54/sunset-4242011__480.jpg",

"https://cdn.pixabay.com/photo/2016/12/07/15/50/bleak-1882077__480.jpg",

"https://cdn.pixabay.com/photo/2016/06/29/08/42/wheat-field-1487863__480.jpg",

]

loop = asyncio.get_event_loop()

results = loop.run_until_complete(asyncio.gather(*[download_image(url) for url in urls]))

for result in results:

print(result)

异步IO则是Python中处理I/O密集型任务的利器。在使用传统的同步IO的情况下,进程会被阻塞直到IO操作完成。而使用异步IO,进程会在IO操作进行的同时处理其他任务,从而提高CPU的利用率。

2. 线程与进程池

2.1. 线程池

线程池可以减少线程的创建和销毁过程的开销,从而提高程序的性能。Python内置的concurrent.futures模块提供了ThreadPoolExecutor类来实现线程池。下面是一个使用ThreadPoolExecutor实现多线程下载图片的例子:

import concurrent.futures

import requests

def download_image(url: str) -> str:

response = requests.get(url)

filename = url.split("/")[-1]

with open(filename, "wb") as file:

file.write(response.content)

return f"{filename} downloaded!"

if __name__ == "__main__":

urls = [

"https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885__480.jpg",

"https://cdn.pixabay.com/photo/2019/06/02/00/54/sunset-4242011__480.jpg",

"https://cdn.pixabay.com/photo/2016/12/07/15/50/bleak-1882077__480.jpg",

"https://cdn.pixabay.com/photo/2016/06/29/08/42/wheat-field-1487863__480.jpg",

]

with concurrent.futures.ThreadPoolExecutor() as executor:

results = executor.map(download_image, urls)

for result in results:

print(result)

ThreadPoolExecutor会创建一个线程池,其中包含多个线程,每个线程可以处理一个任务。通过使用executor.map函数或executor.submit函数将任务添加到线程池中,可以让多个任务在不同的线程中并发执行。

2.2. 进程池

与线程池类似,进程池可以减少进程的创建和销毁过程的开销,从而提高程序的性能。Python内置的concurrent.futures模块提供了ProcessPoolExecutor类来实现进程池。下面是一个使用ProcessPoolExecutor实现多进程下载图片的例子:

import concurrent.futures

import requests

def download_image(url: str) -> str:

response = requests.get(url)

filename = url.split("/")[-1]

with open(filename, "wb") as file:

file.write(response.content)

return f"{filename} downloaded!"

if __name__ == "__main__":

urls = [

"https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885__480.jpg",

"https://cdn.pixabay.com/photo/2019/06/02/00/54/sunset-4242011__480.jpg",

"https://cdn.pixabay.com/photo/2016/12/07/15/50/bleak-1882077__480.jpg",

"https://cdn.pixabay.com/photo/2016/06/29/08/42/wheat-field-1487863__480.jpg",

]

with concurrent.futures.ProcessPoolExecutor() as executor:

results = executor.map(download_image, urls)

for result in results:

print(result)

与线程池类似,ProcessPoolExecutor也会创建一个池,其中包含多个进程,每个进程可以处理一个任务。

3. 锁

锁(lock)是一种同步操作,用于协调多个进程或线程对共享资源的访问。Python内置的threading模块提供了Lock类来实现线程锁。下面是一个使用Lock类保证线程同步的例子:

import threading

total = 0

lock = threading.Lock()

def add():

global total

for i in range(1000000):

lock.acquire()

total += 1

lock.release()

def sub():

global total

for i in range(1000000):

lock.acquire()

total -= 1

lock.release()

if __name__ == "__main__":

threads = [threading.Thread(target=add) for i in range(5)] + [threading.Thread(target=sub) for i in range(5)]

for thread in threads:

thread.start()

for thread in threads:

thread.join()

print(total)

使用Lock类可以保证多线程程序中对共享资源的互斥访问。即,在同一时刻最多只能有一个线程对共享资源进行操作。当一个线程获得了锁之后,其他线程只能等待该线程释放锁。

4. 总结

Python中的并发编程有多种方式可以实现,包括线程、进程、协程和异步IO。线程和进程池可以在不同的线程或进程中实现同时执行多个任务,提高程序性能。同时,锁可以保证多线程环境下对共享资源的互斥访问。在进行并发编程时,需要根据具体需求选择适合的并发方式。

后端开发标签