Python 实现Excel对比

工作当中,发现用户在测试过程当中都是手工检查两个Excel文件的数据是否一致,有的都有好几万行,非常耗时也可能会遗漏一些数据。

中间也试了Spreadsheet Compare这个工具,有时候对比达不到想要的效果,就想着用Python写一个小程序来实现两个Excel的数据对比,思路:

  1. 用户不懂技术,实现一个简单界面让用户去选择文件
  2. 对比数据,把结果写到一个新Excel里,并标注不同的数据

工具实现

通过tkinter和pandas来实现,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
from tkinter import *
from tkinter.filedialog import *
import pandas as pd

root = Tk()
root.title("Compare Excel")
root.geometry('600x250')


def select_compare_file():
name = askopenfilename()
file1_text.set(name)

Button(root, text="Select Compare File", command=select_compare_file).pack()
file1_text = StringVar()
file1 = Entry(root, textvariable=file1_text, width=150)
file1_text.set(" ")
file1.pack()


Label(root, text="").pack()


def select_to_file():
a = askopenfilename()
file2_text.set(a)
Button(root, text="Select To File", command = select_to_file).pack()
file2_text = StringVar()
file2 = Entry(root, textvariable=file2_text, width=150)
file2_text.set(" ")
file2.pack()


Label(root, text="").pack()

def compare_excel(compare, to):
f1 = pd.read_excel(compare)
f2 = pd.read_excel(to)

# Perform Diff
dfDiff = f1.copy()
for row in range(dfDiff.shape[0]):
for col in range(dfDiff.shape[1]):
value_OLD = f1.iloc[row, col]
try:
value_NEW = f2.iloc[row, col]
if value_OLD == value_NEW:
dfDiff.iloc[row, col] = f2.iloc[row, col]
else:
dfDiff.iloc[row, col] = ('{}→{}').format(value_OLD, value_NEW)
except:
dfDiff.iloc[row, col] = ('{}→{}').format(value_OLD, 'NaN')

# Save output and format
writer = pd.ExcelWriter('compare_result.xlsx', engine='xlsxwriter')

dfDiff.to_excel(writer, sheet_name='DIFF', index=False)

# get xlsxwriter objects
workbook = writer.book
worksheet = writer.sheets['DIFF']
worksheet.hide_gridlines(2)

# define formats
highlight_fmt = workbook.add_format({'font_color': '#FF0000', 'bg_color': '#6A93B0'})

# set format over range
## highlight changed cells
worksheet.conditional_format('A1:ZZ1000', {'type': 'text',
'criteria': 'containing',
'value': '→',
'format': highlight_fmt})
# save
writer.save()
print('Compare Done.')


def on_click():
f1 = file1_text.get()
f2 = file2_text.get()

compare_excel(f1, f2)

Label(root, text="Compare done, please see the result: 'compare_result.xlsx'").pack()


Button(root, text="Start Compare", command=on_click).pack()
root.mainloop()

验证

  1. 准备对比文件

    • 文件1

      1
      2
      3
      4
      5
      学号	姓名	年龄	年级
      1 tom 12 1
      2 jim 33 8
      3 Green 21 2
      4 Hello 11 1
    • 文件2

      1
      2
      3
      4
      5
      No.	Name	Age	Grade
      1 Tom 12 1
      2 Jimmy 23 8
      3 Green 21 2
      4 Hello 11 1
  2. 运行程序,选择文件,对比

  3. 查看结果

进一步

用户不会直接用命令行运行脚本,把脚本封装成一个bat文件,直接双击使用。

唐胡璐 wechat
欢迎您扫一扫上面的微信公众号,订阅我的博客!
分享创造价值,您的支持将鼓励我继续前行!