2012年5月31日 星期四

[Python]讀寫csv檔教學

Python自從2.3版之後就有提供csv module讓使用者去讀寫csv格式的檔案。簡單的說,csv(comma-separated values)就是一種data之間用特殊符號隔開的資料表示格式,通常是用逗號(,)隔開。
比如說下列是從證交所網站抓下來的csv檔範例:
日期,成交股數,成交金額,成交筆數,發行量加權股價指數,漲跌點數
 101/01/02,"2,104,640,545","50,472,201,894","497,889","6,952.21",-119.87
 101/01/03,"2,640,781,030","64,138,886,194","623,192","7,053.38",101.17
 101/01/04,"2,983,756,821","75,845,629,353","728,417","7,082.97",29.59
 101/01/05,"3,062,757,248","76,850,752,698","737,780","7,130.86",47.89
 101/01/06,"3,314,147,435","88,101,921,548","823,810","7,120.51",-10.35
 101/01/09,"2,610,433,477","71,323,370,369","634,137","7,093.04",-27.47
 101/01/10,"3,595,611,551","97,168,542,017","869,327","7,178.87",85.83
 101/01/11,"3,322,390,467","88,786,671,892","802,539","7,188.21",9.34
 101/01/12,"2,969,248,375","78,947,910,460","731,328","7,186.58",-1.63
 101/01/13,"3,497,838,901","99,286,437,370","819,762","7,181.54",-5.04
 101/01/16,"3,184,795,667","76,842,611,338","719,725","7,103.62",-77.92
 101/01/17,"3,720,277,205","101,133,309,290","899,597","7,221.08",117.46
 101/01/18,"4,088,756,925","113,988,641,382","1,020,420","7,233.69",12.61
 101/01/30,"4,540,373,544","143,900,658,474","1,113,727","7,407.41",173.72
 101/01/31,"5,486,734,180","162,361,181,834","1,283,951","7,517.08",109.67
這種資料格式被大量用在試算表以及資料庫。例如大多數的資料庫都可以讓使用者把table用csv格式匯出。這些csv檔可以很輕易的用Python內建的csv module來做讀寫。
以下是一個範例程式示範如何讀寫上面範例中的大盤指數csv檔:
# -*- coding: utf-8 -*-
import csv
f = open('example.csv', 'r')
for row in csv.reader(f):
    print row
f.close()
[' 101/01/02', '2,104,640,545', '50,472,201,894', '497,889', '6,952.21', '-119.87']
[' 101/01/03', '2,640,781,030', '64,138,886,194', '623,192', '7,053.38', '101.17']
[' 101/01/04', '2,983,756,821', '75,845,629,353', '728,417', '7,082.97', '29.59']
[' 101/01/05', '3,062,757,248', '76,850,752,698', '737,780', '7,130.86', '47.89']
[' 101/01/06', '3,314,147,435', '88,101,921,548', '823,810', '7,120.51', '-10.35']
[' 101/01/09', '2,610,433,477', '71,323,370,369', '634,137', '7,093.04', '-27.47']
[' 101/01/10', '3,595,611,551', '97,168,542,017', '869,327', '7,178.87', '85.83']
[' 101/01/11', '3,322,390,467', '88,786,671,892', '802,539', '7,188.21', '9.34']
[' 101/01/12', '2,969,248,375', '78,947,910,460', '731,328', '7,186.58', '-1.63']
[' 101/01/13', '3,497,838,901', '99,286,437,370', '819,762', '7,181.54', '-5.04']
[' 101/01/16', '3,184,795,667', '76,842,611,338', '719,725', '7,103.62', '-77.92']
[' 101/01/17', '3,720,277,205', '101,133,309,290', '899,597', '7,221.08', '117.46']
[' 101/01/18', '4,088,756,925', '113,988,641,382', '1,020,420', '7,233.69', '12.61']
[' 101/01/30', '4,540,373,544', '143,900,658,474', '1,113,727', '7,407.41', '173.72']
[' 101/01/31', '5,486,734,180', '162,361,181,834', '1,283,951', '7,517.08', '109.67']
此程式會先把csv檔打開,之後透過csv.reader()把每一行的內容用逗號切開,回傳一個list。其實有點像是split,但是cvs.reader()會幫你把" "處理掉。這是單純用split沒辦法做到的事情。
csv module還有提供其他好用的功能。比如說可以幫你把資料parsing成dictionary的格式,使用第一列當作dictionary的key。
# -*- coding: utf-8 -*- 
import csv
f = open('example.csv', 'r')
for row in csv.DictReader(f):
    print row['成交金額']
f.close()
50,472,201,894
64,138,886,194
75,845,629,353
76,850,752,698
88,101,921,548
71,323,370,369
97,168,542,017
88,786,671,892
78,947,910,460
99,286,437,370
76,842,611,338
101,133,309,290
113,988,641,382
143,900,658,474
162,361,181,834
也可以自己指定key的名稱:
# -*- coding: utf-8 -*-
import csv
f = open('example.csv', 'r')
for row in csv.DictReader(f, ["日期", "成交股數", "成交金額", "成交筆數", "指數", "漲跌點數"]):
    print row['指數']
發行量加權股價指數
6,952.21
7,053.38
7,082.97
7,130.86
7,120.51
7,093.04
7,178.87
7,188.21
7,186.58
7,181.54
7,103.62
7,221.08
7,233.69
7,407.41
7,517.08
發行量加權股價指數的資料被對應到"指數"這個我們自行命名的key,由此可知csv.DictReader會根據你所提供list的順序來自己對應到資料中。
要寫把資料存成csv格式可以使用csv.writer(),以下是個簡單的範例:
data = [
  [' 101/01/31', '5,486,734,180', '162,361,181,834', '1,283,951', '7,517.08', '109.67'],
  [' 101/01/13', '3,497,838,901', '99,286,437,370', '819,762', '7,181.54', '-5.04'],
]
f = open("stock.csv","w")
w = csv.writer(f)
w.writerows(data)
f.close()

4 則留言:

  1. cool post
    useful to me~

    回覆刪除
  2. 請問最後writerows是兩個list要怎麼寫呢 並且要有欄位名稱
    例如
    label=[1,2,3,4,5]
    num=[10,20,30,40,50]
    輸出成
    label num
    1 10
    2 20
    3 30
    .....
    謝謝

    回覆刪除
  3. 請問讀寫中文資料該怎麼寫呢?
    我都會出現亂碼
    存成CSV黨都會亂碼

    回覆刪除