加入思维链，GPT-4可以更好地完成复杂任务

GPT4充值加微信gptchongzhi2024-01-31 22:22:43419

先做个广告：如需代注册GPT4帐号或代充值 GPT4.0（plus会员），添加站长微信：gptchongzhi

上一篇文章探究了GPT-4对多张图片进行处理的能力，GPT-4可以做Template Matching：

推荐使用GPT中文版，国内可直接访问：https://ai.gpt86.top

图像配准（Image registration）是一种更为常见的图像处理问题，配准的本质也是做 Template Matching，但配准的应用场景更广且更复杂。

这篇文章会探究 GPT-4 在图像配准方面的能力，同时加入思维链（Chain-of-Thought (CoT)），从而让 GPT-4 可以更好完成复杂任务：

这次用来测试的图像：

来源：https://www.mathworks.com/help/images/registering-an-aerial-photo-to-an-orthophoto.html

思维链‍‍‍

一、思维链 Prompt (Chain-of-Thought Prompting)

首先导入两张图像，第一张是 reference 图像，第二张是需要做配准的图像，输入的 Prompt 如下：

这里需要注意加入思维链，即加入一句 "Let's think step by step"。

这样 GPT-4 会一步一步地去分析问题，把一个复杂的问题分解为小的问题，分布解答。

在输入多个图像时，最好 描述清楚图像的类型。例如这里的第一张图像是 8-bit 的图像，第二张图像是 RGB 的图像。

除了加入思维链，你也可以提示 GPT-4，不要直接得出结论，先做第一步把图像导入。

第一轮GPT-4回答：

可以看到加入思维链后，GPT-4 首先分析了输入图像的大小，并向我提出了问题：对哪张图片进行 resize。

如果没有加入思维链，GPT-4 可能会在图像大小的这个地方犯错。

我告诉 GPT-4，对第二张图片进行 Resize：

第二轮 GPT-4 回答：

在成功导入两张图片后，下一步是做配准。

因为加入了思维链，GPT-4会首先询问我是否继续，或者我需不需要指定特定的算法。

我告诉GPT-4，继续配准的过程，没有指定算法：

第三轮GPT-4回答：

首先，GPT-4 先列出了图像配准的常见步骤，并列举出一些常用的配置算法，例如 SIFT 等。

因为我没有指定算法，GPT-4 决定先用一个简单的算法进行尝试：

GPT-4 在遇到两次 Error 后，成功地利用 SIFT 对图像进行了变换。

后面可以要求 GPT-4，把配准后的图和 Reference 画到一起，检查配准的效果：

可以看到效果是比较好的。

代码分析

GPT-4 采用了 SIFT 算法来提取特征，用 Brute-Force Matcher 来匹配特征，得到匹配的点。

然后通过这些特征匹配的点来得到变换矩阵。

首先是图像导入，并转换为 numpy 矩阵：

from PIL import Image
import numpy as np
import cv2
# Load the images
ref_image = Image.open(ref_image_path)
reg_image = Image.open(reg_image_path)
# Resize the registered image to match the dimensions of the reference image
resized_reg_image = reg_image.resize(ref_image.size)
# Convert images to numpy arrays
ref_array = np.array(ref_image)
resized_reg_array = np.array(resized_reg_image)
# Convert the registered image to grayscale (if the reference image is grayscale)
resized_reg_gray = cv2.cvtColor(resized_reg_array, cv2.COLOR_RGB2GRAY)

然后利用 SIFT 算法提取特征并进行匹配：

# Initialize SIFT detector
sift = cv2.SIFT_create()
# Detect SIFT features and compute descriptors for both images
keypoints1, descriptors1 = sift.detectAndCompute(ref_array, None)
keypoints2, descriptors2 = sift.detectAndCompute(resized_reg_gray, None)
# Initialize Matcher and match descriptors
bf = cv2.BFMatcher()
matches = bf.knnMatch(descriptors1, descriptors2, k=2)
# Apply ratio test to find good matches
good_matches = [m for m, n in matches if m.distance < 0.75 * n.distance]
# Extract location of good matches
points1 = np.float32([keypoints1[m.queryIdx].pt for m in good_matches])
points2 = np.float32([keypoints2[m.trainIdx].pt for m in good_matches])
# Find Homography
H, _ = cv2.findHomography(points2, points1, cv2.RANSAC)
# Use homography to transform the registered image
registered_transformed = cv2.warpPerspective(resized_reg_array, H, (ref_array.shape[1], ref_array.shape[0])

‍

GPT-4 在图像配准能力上的总结‍‍‍‍‍‍‍‍‍‍‍‍

SUMMARY

总结一下，GPT-4 在做图像配准时的一些特性：