提取網(wǎng)頁特定數(shù)據(jù)的案例

2019-11-14 09:19:32

字體：大中小

供稿：網(wǎng)友

BeautifulSoup可以使我們通過網(wǎng)頁的標(biāo)簽找到網(wǎng)頁中我們想要的特定數(shù)據(jù)。本案例可以清楚地理順從html文件變化到我們想要獲得的數(shù)據(jù)。Python程序如下：

from bs4 import BeautifulSoupimport requestsurl = 'http://new.cpc.com.tw/division/mb/oil-more4.aspx'html = requests.get(url).textbs = BeautifulSoup(html, 'html.parser')#PRint(bs)data = bs.find_all('span' ,{'id':'Showtd'} )#print(data)rows = data[0].find_all('tr')#print(rows)prices = list()i = 0for row in rows:    if i < 16:        print(row)    cols = row.find_all("td")    if len(cols[1].text ) > 0:        item = [cols[0].text, cols[1].text, cols[2].text, cols[3].text]        prices.append(item)    i += 1i = 0for p in prices:    if i < 16:        print(p)    i += 1現(xiàn)在從變量容器的變化過程，認(rèn)識提取特定數(shù)據(jù)的步驟：
通過BeautifulSoup(html, 'html.parser')，把html文件包裝為可以解析的對象，該對象對應(yīng)的文本文件（部分內(nèi)容）是：2.操作可解析的對象sb,通過find_all('span',{'id':'Showtd'})，把標(biāo)簽<span></span>的內(nèi)容找出來，形成數(shù)據(jù)表：3.再從上面的數(shù)據(jù)表中，找出標(biāo)簽<tr></tr>表示的項，組成如下表：4.對上表的每個表項<td></td>再進(jìn)行提取，得到最終數(shù)據(jù)：