Derivedata

success-derivedata

 
 

THE SUCCESS

Our solution on page monitoring and extraction rely on the powerful technique of rabbitmq and reds queue system to identify newly added contents / updated contents from the websites. It then extracts this into structured API with metadata like initial revision, current revision and previous revision.

services-derivedata

THE CHALLENGE

  • Creating separate queues for every section in rabbitmq and checking the queues based on the time interval for extracting.
  • Initial spider collecting with sockets connection.
  • Working with Ajax/Onclick/Authentication sites.
  • Checking index pages for every 10 mins and discovering new spiders.
  • Extracting PDF/DOC/XML contents from normal URL link / downloading link.

about-derivedata

 

THE REQUIREMENT

To design a page monitoring and extraction system which can monitor and scrape data from websites, providing useful information.

The platform should be able to deliver information instantly as and when its published in the websites.

partner-derivedata

Users are provided with value added information in the form of content metadata within the API.

The platform has a simple dashboard which enables system administrators to receive email alerts and statistics for rendering reports and charts.

derivedata-hero

Case Study
 

THE OVERVIEW

Derivedata is the hub empowered robust platform which does continuous monitoring of different kinds of websites (HTML, RSS, AJAX, Angular, React etc.) and delivers structured information in the form of API.