Web scraping with BeautifulSoup: How I got my favorite soccer players' stats from fotmob

Using web scrapping to relevant information quickly

I'm taking a Data Engineering professional certificate path offered by IBM and Coursera. During the Python for Data Science, AI & Development course, I learned about web scrapping with Beautiful Soup Libraries.

What is BeautifulSoup?

Years ago, I did some web scraping with Excel and VBA, but I had never tried web scraping in Python with the BeautifulSoup library. From BeautifulSoup docs, "Beautiful Soup is a library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work."

To get an instance of BeautifulSoup, you must pass a html_doc and a parser as inputs.

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

The HTML doc I got from fotmob is shown in the below image. The soup instance contains all of the HMTL text structure on the website.

requests.get('https://www.fotmob.com/teams/5800/squad/venezuela/').text
soup = BeautifulSoup(html_text, 'lxml')

The beautifulSoup documentation explains how to navigate the tree structure once the instance is created, but I mainly focused on getting information from specific class definitions.

player_stats = player_soup.find_all('div', class_ = 'css-1v73fp6-StatItemCSS e1uibvo10')

To get to the class definition that contained the information I was looking for, I used right-click "Inspect" to get to the part of the source code where the information was. I stored all the stats in the player_stats variable and then iterated over the stats to store them in a dictionary with each player on the team.

I also learned that fotmob had a player-specific page, so I got each player link from the team website and followed a similar process of instantiating a soup object to explore the player's page.

The result is a beautiful dictionary that contains all the relevant stats per player so far in the season.

{
'Jhonder Cadiz': 
    {
        'Goals': '3', 
        'Assists': '0', 
        'Started': '3',     
        'Matches': '4', 
        'Minutes played': '315'
    }
}

What businesses can benefit from web scrapping?

  • Pricing strategies: Maybe you need to find the current price of the top 10 products in your catalog on five different websites. Web scrapping can do this much faster than clicking through each website and manually getting the prices.

  • Sports followers: A web scrapping project with the beautifulSoup libraries can summarize your team's performance by player.

  • News: Scrapping for certain words on different news portals to get specific news can be helpful for reporters or social media analysts.

What are some disadvantages of the scrapping strategy?

  • Scale: The more information you need, the slower the process. Depending on the amount of data required, it can take seconds, minutes, or hours.

  • Legal: scrapping public data over the internet is legal, but one has to be careful with their actions after scraping the data.

As I continue with the data engineering courses, I will publish more articles with learnings on Python libraries and Data Science.