How's the progress with SemWIQ, AndyL?

Drupal says, RDFStats had been released 46 weeks and 3 days ago now... Well, many things changed since the first release. For about two years now I've been evaluating different approaches concerning information integration with Semantic Technologies and playing around, implementing my own stuff.

The current tools I'm about to contribute are the following:

  • RDFStats v2.0, generates histograms and provides estimation functions, can be sit on-top of an endpoint or run inside the semwiq-mediator to fetch stats remotely.
  • A patch for ARQ-2.7 which adds extended query capabilities such as
  • A patch for Joseki 3.0 which supports voiD descriptions and integrates RDFStats.
  • semwiq-mediator, which provides a virtual mediated Jena assembler-compatible dataset and supports federated SPARQL queries over multiple endpoints - it is based on the patched versions of ARQ and Joseki and includes a data source registry, a multi-threaded monitor, and integrates RDFStats. It can be used inside of other Jena-based Semantic Web applications via the API. Additionally, the semwiq-mediator package contains a simple Swing GUI, which is mainly used for testing.
  • semwiq-endpoint, which is a base configuration consisting of the patched Joseki, Pubby for serving linked data, and Snorql, a pretty query GUI known from D2R-Server. The package also contains a deamon which can be started and controlled via RMI by semwiq-controller (a Swing GUI mainly for testing distributed query processing).
    Of course, any other SPARQL endpoint can be used together with semwiq-mediator, however, if RDFStats have to be generated remotely, this is not the best solution, it is too slow for large datasets.
  • semwiq-webapp, which is a JSF-based Web application developed by Thomas Leitner as part of a student project. It provides a Web-based GUI and a SPARQL endpoint on top of the semwiq-mediator.
  • XLWrap, which is a wrapper for all kinds of spreadsheets will integrate RDFStats as well and therefore allow the integration of data from collections of Excel and CSV files with databases and native RDF sources. Initially, I wanted to demonstrate how new wrappers can be developed based on ARQ and the Graph.find() interface, so I've started the XLWrap/XLWrap-Server project in June 2009.
  • finally I will provide a patched D2R-Server 0.7, with RDFStats integrated.

All this stuff is currently under work. RDFStats2, semwiq-endpoint, and semwiq-controller are finsihed. I'm currently working on semwiq-mediator. Some refactoring on the semwiq-webapp will be required. Stay tuned!

When I started with the semwiq-mediator, I had rather specific requirements (properties of instances wheren't fragmented over data sources and every resource had at least one type) which made my solution not compatible to LOD applications. In future, SemWIQ will feature at least two different federators:

  • the InstanceBasedFederator (the traditional one)
  • the TripleBasedFederator (the new LOD-compatible one)

the triple-based federator is similar to the DARQ approach by Bastian Quilitz. The optimizer is, however, based on RDFStats which provides a more efficient data source selection and a more accurate estimation for plan optimizations.