Tuesday, 23 April 2019

Python - I'm trying to create a simple web scraper for the steam market

Answer


Answer





so I'm in school and have finished up with my Python introductory class and I decided to use my skills to try and make something useful, so I wanted to make a script to scrape a Steam market webpage and notify me when an item is listed for at or below a desired price. I'm kind of stuck and hope I can get any tips to help me out. I am using urllib2 and BeautifulSoup



from bs4 import BeautifulSoup
from urllib2 import urlopen
import time



item = str(raw_input('Please enter the item you are looking for(Exact URL): '))

price = str(raw_input('Please enter the price you want to buy the item at: '))

print('Searching for item at that price....\n' + item)

market = urlopen(item)

def getPrices(market,desiredPrice):
while True:
soup = BeautifulSoup(market)
prices = soup.findAll('span',{'class':'market_listing_price market_listing_price_with_fee'})


"""
So now my logic assumed I should do something like;

if desiredPrice in prices:
print('found item at the desired price!')
return link_to_item

"""


print('Searching...')
time.sleep(20)


getPrices(market, price)


For testing I am using this steam market link: https://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29



And the span that contains the prices of each item on the front page is class='market_listing_price market_listing_price_with_fee'




Bottom line problem:
I cannot seem to get just the data from within each span tag; I want to just grab the prices as floats and put them into a list and then I would be able to sort through that; and Then I would be able to compare them to the desired price and find anything below the desired price.


Answer



There is a lot of text in those spans. If you filter it out you should be fine.



>>> [i.text.strip() for i in prices]
[u'Sold!', u'\xa5 33.69', u'\xa5 33.69', u'Sold!', u'\xa5 33.69', u'\xa5 33.69', u'\xa5 33.69', u'\xa5 33.69', u'\xa5 33.69', u'\xa5 33.69']


There is a yen sign in there, you could take that out too unless you need currency information.




To get only numbers I'd do:



prices = [i.text.strip() for i in prices]
prices = [float(k) for k in [''.join([j for j in i if j in '0123456789.']) for i in prices] if k]
if min(prices)< desiredPrice:


Bear in mind that you'd need to float(desiredPrice) first and make sure you are reading web data within the loop. Currently you will check exactly the same data every 20 seconds!


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...