Our solution on page monitoring and extraction rely on the powerful technique of rabbitmq and reds queue system to identify newly added contents / updated contents from the websites. It then extracts this into structured API with metadata like initial revision, current revision and previous revision.
Derivedata is the hub empowered robust platform which does continuous monitoring of different kinds of websites (HTML, RSS, AJAX, Angular, React etc.) and delivers structured information in the form of API.