HTML Parsing in Python is another important parameter used by different programmers in performing different tasks. Learn more about it here.
Using CSS selectors in BeautifulSoup
BeautifulSoup has a limited support for CSS selectors, but covers most commonly used ones. Use SELECT() method to find multiple elements and select_one() to find a single element.
Basic example:
from bs4 import BeautifulSoup
data = “””
- item1
- item2
- item3
“””
soup = BeautifulSoup(data, "html.parser")
for item in soup.select("li.item"):
print(item.get_text())
Prints:
item1
item2
item3
PyQuery
pyquery is a jquery-like library for python. It has very well support for css selectors.
from pyquery import PyQuery
html = “””
Sales
Lorem | 46 |
Ipsum | 12 |
Dolor | 27 |
Sit | 90 |
“””
doc = PyQuery(html)
title = doc('h1').text()
print title
table_data = []
rows = doc('#table > tr')
for row in rows:
name = PyQuery(row).find('td').eq(0).text()
value = PyQuery(row).find('td').eq(1).text()
print "%s\t %s" % (name, value)
HTML Parsing in Python: Locate a text after an element in BeautifulSoup
Imagine you have the following HTML: Name: John Smith
And you need to locate the text “John Smith” after the label element.
In this case, you can locate the label element by text and then use .next_sibling property:
from bs4 import BeautifulSoup
data = """ Name: John Smith
"""
soup = BeautifulSoup(data, "html.parser")
label = soup.find("label", text="Name:")
print(label.next_sibling.strip())
Prints John Smith.
Must Read Python Interview Questions
200+ Python Tutorials With Coding Examples
Other Python Tutorials
- What is Python?
- Python Advantages
- Python For Beginners
- Python For Machine Learning
- Machine Learning For Beginners
- 130+ Python Projects With Source Code On GitHub