Due to the growing number of publications, crawl-time has increased dramatically. I spent the day experimenting with using multi-threading on the current implementation and using the Python Twisted library to crawl asynchronously. The latter results in a far greater speed-up, but would require a lot of code refactoring to implement.
No comments:
Post a comment