使用Python计算神经机器翻译的BLEU分数-猿码集

神经机器翻译与BLEU分数

如今，自然语言处理(NLP)已经被应用到许多实际的场景中。其中，机器翻译是最受欢迎和流行的应用程序之一。神经机器翻译(NMT)是机器翻译的一种算法，它使用神经网络来实现翻译。 BLEU是评估自动机器翻译的质量的一种常用的度量工具。在本文中，我们将学习如何使用Python计算神经机器翻译的BLEU分数。

什么是BLEU分数？

翻译的结果越接近人工翻译，翻译的质量就越好。 BLEU分数是一种计算机器翻译质量的方法。 BLEU的全称是bilingual evaluation understudy，它的实现方法是基于n-gram匹配的一个度量标准。

计算BLEU分数需要哪些步骤？

单词计数

统计翻译结果中每个单词出现的次数，并计算这些单词在参考翻译中的最大出现次数。这一步可以使用Python中的collections包中的Counter函数来实现。下面是计算单词数量的Python代码：


from collections import Counter
# 参考翻译
ref_sentence = 'the cat is on the mat'
ref_counts = Counter(ref_sentence.split())
# 翻译结果
translation = 'the cat is on the mat'
trans_counts = Counter(translation.split())
print(ref_counts)    # Counter({'the': 2, 'cat': 1, 'is': 1, 'on': 1, 'mat': 1})
print(trans_counts)  # Counter({'the': 2, 'cat': 1, 'is': 1, 'on': 1, 'mat': 1})

n-gram计数

在这一步中，我们计算参考翻译中n-gram的数量，并计算它们在翻译结果中的最大数量。同样，这一步可以使用Counter函数实现，如下面的代码所示：


# 参考翻译
ref_counts = Counter(zip(ref_sentence.split(), ref_sentence.split()[1:], ref_sentence.split()[2:]))
# 翻译结果
trans_counts = Counter(zip(translation.split(), translation.split()[1:], translation.split()[2:]))
print(ref_counts)    # Counter({('the', 'cat', 'is'): 1, ('cat', 'is', 'on'): 1, ('is', 'on', 'the'): 1, ('on', 'the', 'mat'): 1})
print(trans_counts)  # Counter({('the', 'cat', 'is'): 1, ('cat', 'is', 'on'): 1, ('is', 'on', 'the'): 1, ('on', 'the', 'mat'): 1})

BLEU计算公式

最后一步是使用BLEU公式计算BLEU分数。 BLEU分数的计算方式如下所示：


  BLEU = exp(min(0, 1 - (ref_len / trans_len))) * ((sum(match_counts) + smooth) / (sum(trans_counts) + smooth))

其中：

ref_len：参考翻译的长度

trans_len：翻译结果的长度

match_counts：在翻译结果和参考翻译中都出现的n-gram数量的数组

trans_counts：翻译结果中出现的n-gram数量的数组

smooth：为避免出现分母为0，加一个充分小的平滑项

如何使用Python计算BLEU分数？

在Python中，我们可以使用NLTK(Natural Language Toolkit)库来计算BLEU分数。在使用NLTK之前，需要下载nltk和punkt（nltk中的一个模块）。


!pip install nltk
import nltk
nltk.download('punkt')

接下来，我们将使用nltk库实现BLEU分数的计算。

导入库以及数据

我们将使用来自欧洲议会的数据来演示这一过程。首先，让我们看一下数据集中的两个文件：

europarl-v7.de-en.de

europarl-v7.de-en.en

.de文件是德文语言的原始文本，而.en文件是英文机器翻译的结果。我们将使用 nltk 中的 bleu_score 子模块来计算 BLEU 分数。在这个示例中，我们将导入 bleu_score 以及引入必要的函数。我们开始编写我们的例子代码：


from nltk.translate.bleu_score import corpus_bleu
# 训练翻译模型
train_translation = ["the cat is on the mat", "the dog is in the yard", "the bird is flying in the sky"]
# 实际的参考翻译
actual_translation = [["the cat is on the mat", "the dog is in the yard", "the bird is flying in the sky"],
                      ["the cat is on the carpet", "the dog is in the yard", "the bird is flying in the air"]]
# 机器翻译结果
predicted_translation = [["the cat is on the mat", "the dog is in the yard", "the bird is flying in the sky"],
                         ["the cat is on the carpet", "the dog is in the yard", "the bird is flying in the sky"]]
# 计算BLUE分数
score = corpus_bleu(actual_translation, predicted_translation, weights=(1, 0, 0, 0), smoothing_function=nltk.translate.bleu_score.SmoothingFunction().method3)
print(f"BLEU Score: {score}")

weights参数控制了n-gram的权重以及n-gram的数量，smoothing_function参数用于处理未出现的n-gram的问题。在本例中，我们使用了权重为(1,0,0,0)的BLEU分数，这意味着我们只考虑翻译结果中的unigram。如果我们想考虑翻译结果中的bigram，我们需要将weights设置为(0.5, 0.5, 0, 0)。

使用神经机器翻译和BLEU分数

另外，我们可以使用神经机器翻译和BLEU分数来提高翻译结果的质量。 InferSent是Facebook的神经机器翻译库。我们可以使用InferSent来比较翻译结果和参考翻译，然后计算不同的权重的BLEU分数。


!pip install infersent
import torch
import numpy as np
from infersent.models import InferSent
import nltk
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
from nltk.corpus import wordnet as wn
# Load pretrained InferSent model
MODEL_PATH = 'encoder/infersent1.pkl'
params_model = {'bsize': 64, 'word_emb_dim': 300, 'enc_lstm_dim': 2048,
                'pool_type': 'max', 'dpout_model': 0.0, 'version': 1}
model = InferSent(params_model)
model.load_state_dict(torch.load(MODEL_PATH))
# Load the GloVe embeddings
model.set_w2v_path('https://dl.fbaipublicfiles.com/infersent/infersent1.pkl')
# Input sentences
sentences = ['the cat is on the mat', 'the dog is in the yard', 'the bird is flying in the sky']
# Build the vocabulary of sentences
model.build_vocab(sentences, tokenize=True)
# Compute the embeddings for the sentences
embeddings = model.encode(sentences, tokenize=True)
# Compute the cosine similarity matrix
cosine_similarities = np.zeros((len(embeddings), len(embeddings)))
for i in range(len(embeddings)):
    for j in range(len(embeddings)):
        cosine_similarities[i, j] = nltk.cluster.util.cosine_distance(embeddings[i], embeddings[j])
# Find the closest reference sentence for each translation
closest_refs = np.argmin(cosine_similarities, axis=0)
# Compute the BLEU score for different weights
weights = [(1, 0, 0, 0), (0.5, 0.5, 0, 0), (0.33, 0.33, 0.33, 0), (0.25, 0.25, 0.25, 0.25)]
for weight in weights:
    score = corpus_bleu([['the cat is on the mat',
                          'the dog is in the yard',
                          'the bird is flying in the sky']],
                        [['the cat is on the mat',
                          'the dog is in the yard',
                          'the bird is flying in the sky']],
                        weights=weight,
                        smoothing_function=nltk.translate.bleu_score.SmoothingFunction().method3)
    print(f"BLEU Score (unigrams): {score}")  # unigram score
    score = corpus_bleu([['the cat is on the mat',
                          'the dog is in the yard',
                          'the bird is flying in the sky']],
                        [['the cat is on the carpet',
                          'the dog is in the yard',
                          'the bird is flying in the sky']],
                        weights=weight,
                        smoothing_function=nltk.translate.bleu_score.SmoothingFunction().method3)
    print(f"BLEU Score (bigrams): {score}")  # bigram score
    predicted_translation = sentences[closest_refs]
    actual_translation = [['the cat is on the mat', 'the dog is in the yard', 'the bird is flying in the sky']]
    score = corpus_bleu(actual_translation, [predicted_translation], weights=weight,
                        smoothing_function=nltk.translate.bleu_score.SmoothingFunction().method3)
    print(f"BLEU Score (NMT): {score}")

总结

BLEU分数是机器翻译常用的一种质量度量工具之一，允许比较自动翻译和人类翻译之间的相似之处。在本文中，我们介绍了计算BLEU分数的一些必要步骤，并提供了如何使用Python计算BLEU分数的示例。此外，我们还介绍了如何使用神经机器翻译和BLEU分数来改进翻译结果的质量。此外，我们介绍了如何使用InferSent库来比较不同的翻译结果。我们希望这篇文章对您学习神经机器翻译和BLEU分数有所帮助。

使用Python计算神经机器翻译的BLEU分数

神经机器翻译与BLEU分数

什么是BLEU分数？

计算BLEU分数需要哪些步骤？

如何使用Python计算BLEU分数？

总结

相关阅读

后端开发标签

Python热门

Python更新