python - Parsing HTML Tables with BeautifulSoup -


i have used beautifulsoup in past against new; incredibly generic/minimal html table markup... goal grab each value , it's label (each in there own td) , print them out... can merged, don't care, want make sure each label applied correct value. here example table:

<tbody><tr> <td class="labels">dawn:</td> <td class="site_data" style="text-align: left;">07:01</td> <td class="labels">sunrise:</td> <td class="site_data" style="text-align: left;">07:26</td> <td class="labels">moonrise:</td> <td class="site_data" style="text-align: left;">14:29</td> <td rowspan="3"><img src="images/moon.bmp" alt="moon" width="64" align="left" border="0" height="64" style="margin: 0px 10px" /></td> </tr> <tr> <td class="labels">dusk:</td> <td class="site_data" style="text-align: left;">18:27</td> <td class="labels">sunset:&nbsp;</td> <td class="site_data" style="text-align: left;">18:02</td> <td class="labels">moonset:</td> <td class="site_data" style="text-align: left;">01:55</td> </tr> <tr> <td class="labels">daylight:</td> <td class="site_data" style="text-align: left;">11:26</td> <td class="labels">day length:</td> <td class="site_data" style="text-align: left;">10:36</td> <td class="labels">moon phase:</td> <td class="site_data" style="text-align: left;">waxing gibbous</td> </tr> </tbody> 

i know how grab these values...

for td in soup.findall('table')[0]:  # theres more 1 table on page     print td.rendercontents().strip() 

but gives me....

'dawn:' '07:01' 'sunrise:' '07:26' 'moonrise:' '14:29' '<img src="images/moon.bmp" alt="moon" width="64" align="left" border="0" height="64" style="margin: 0px 10px" />' 'dusk:' '18:27' 'sunset:&nbsp;' '18:02' 'moonset:' '01:55' 'daylight:' '11:26' 'day length:' '10:36' 'moon phase:' 'waxing gibbous' 

i guess grab onto class values "labels" , "site_data" how make sure labels , data grouped correctly?

the following should simpler , easier follow:

import pprint beautifulsoup import beautifulsoup   soup = beautifulsoup(doctxt) groupeddata = [] row in soup.findall("tr"):     data = {}     alltds = row.findall("td")     x in range(0, len(alltds)-1, 2):         data[alltds[x].rendercontents().strip()] = alltds[x+1].rendercontents().strip()     groupeddata.append(data)  pprint.pprint(groupeddata) 

output:

[{'dawn:': '07:01', 'moonrise:': '14:29', 'sunrise:': '07:26'},  {'dusk:': '18:27', 'moonset:': '01:55', 'sunset:&nbsp;': '18:02'},  {'day length:': '10:36',   'daylight:': '11:26',   'moon phase:': 'waxing gibbous'}] 

Comments

Popular posts from this blog

android - Spacing between the stars of a rating bar? -

html - Instapaper-like algorithm -

c# - How to execute a particular part of code asynchronously in a class -