- Creating separate queues for every section in rabbitmq and checking the queues based on the time interval for extracting.
- Initial spider collecting with sockets connection.
- Working with Ajax/Onclick/Authentication sites.
- Checking index pages for every 10 mins and discovering new spiders.
- Extracting PDF/DOC/XML contents from normal URL link / downloading link.
- Ensuring timely delivery of content without losing data, despite website configuration changes.
- Identifying the website threshold limit to alert system administrators.