Scraping from a price element which has multiple classes (spaces between class names)
If you find a price HTML element which has multiple classes in the class=”” attribute, then you should understand a bit more about how CSS works.
It’s important to know how spaces operate differently when you’re looking at HTML class attributes versus CSS to match elements.
If you’re looking at HTML then spaces in the class attribute mean this element has multiple classes.
If you’re looking at CSS then spaces in the CSS selectors indicate hierarchies.
Here’s an example:
<span class="ProductMeta__Price Price Text--subdued u-h4">£15</span>
When you look at the contents of the class attribute in the raw HTML, if there are spaces then this means there are multiple classes not one big single class. Spaces are not allowed in class names.
So – you might be tempted to use CSS selectors to identify your prices like this:
.ProductMeta__Price Price Text--subdued u-h4
Don’t do that – what the above technically means is:
Match elements with the .ProductMeta__Price class, and then <Price> elements within that then <Text–subdued> elements within that, then <u-h4> elements within that. That will never work.
Instead, you could use this:
The above means: match elements with the .ProductMeta__Price class and the .Price class and then .Text–subdued class and the .u-h4 class.
When you’re using CSS to match elements, spaces now instead mean hierarchy. For example:
Would mean match elements with the .products class (e.g. <div class=”products”>) then within those elements match elements with the .price class (e.g. <div class=”price”>)