Our solution on page monitoring and extraction rely on the powerful technique of rabbitmq and reds queue system to identify newly added contents / updated contents from the websites. It then extracts this into structured API with metadata like initial revision, current revision and previous revision.

As soon as a website is added into our Node Js platform with the required configuration, the extractor system crawls carefully to identify all the potential links of the website called spiders and stores them on to elastic search.

Based on the configured intervals, the extractor system looks for new / updated contents on the website, scraps it and updates the API which can be embedded into any platform like mobile or web.