Sunday 5 November 2017

python - How to run my webscraper with a Flask REST API without waiting for it to finish before returning a response?

I have a RESTFUL API that I built with Flask which runs a
webscraper to download files.


I want to host it on a linux
EC2 instance and serve it using NGINX and Gunicorn.


I have
been testing my API with postman but because the scraper takes about 10 minutes to
finish postman hangs waiting for a response.


My flask app
looks something like this:


from flask import
Flask
application =
Flask(__name__)
@application.route('/scraper/run',
methods=['POST'])
def init_scrape():
data = request.json

command = './web_scrape.py -us "{0}" -p "{1}" -url "{2}"'.format(data['username'],
data['password'], data['url'])
# This takes about 10 minutes

output = subprocess.check_output(['bash','-c', command])
return
jsonify({'Scraping this site: ': request.json["url"]}), 201
if __name__ ==
'__main__':
application.run(host="0.0.0.0",
port="8080")

Is there a way I can
run my scraper without having to wait for it to finish before returning some data to
postman?

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...