This deliverable reports on the task, led by SINDICE, for the providing of a distributed semantic indexing infrastructure ultimately leading to the software that will serve the use cases – specifically the ARPA and the TRAGSA Use cases.The platform provides ETL workflows for transformation of the environmental data produced by the partners according to the vocabulary defined in WP3. The transformed data is then indexed by Sindice distributed technology “Siren”, then underpinned by Solr or Elasticsearch and made available as a cloud solution for the exploitation plan. The platform unique point is the integration of large scale structured data handling (e.g. database) with powerful textual capabilities, allowing for the integration of all sort of non-structured or semistructured source (articles, reports, legislations, social content) alongside the datasets. Finally, the task goes end to end and demonstrates a first prototype of Software-as-a-Service platform for search based data analytics, which will be then used in the case studies of Tragsa and ARPA.
You are here
D4.1 - Distributed semantic indexing infrastructure
Type of document: