virtuoso | Vanir Systems Blog

Tag: virtuoso

Importing Linked Data into a Spreadsheet - Part 2

Thursday, July 7th, 2011 | codesnippets, technical | daniel

Note: This guide is part two of the previous blog post on Importing Linked Data into a Spreadsheet.

Introduction and Theory:

Say you don’t want your data in Google Spreadsheet, but would prefer it in Excel, OpenOffice, LibreOffice or some kind of standalone desktop application on your computer. There is still potential to work with dynamic Linked Data - via the powers of WebDAV (which is a technology allowing the establishment of an “online hard drive” over the protocols that power the world-wide-web).

A WebDAV URL is also a Data Source Name (aka an “address”), you see it is capable of being linked to as it is a URL - it is still Linked Data, and yet it can be treated as a store. This is the one of the many powers of the Linked Data Web.

Once the Method Part One is done there are two options for the tutorial, the first “Part 2″ is dealing with the data in LibreOffice (and I presume that the process is very similar in contemporary versions of OpenOffice), the second “Part 2″ is for dealing with the data in Excel (I’ve used version 2010 on Windows 7).

Prerequisites:

You will need a copy of Virtuoso on your machine up and running (the enterprise edition and the open source edition should both work). You must also have administrative access to it.
A new-ish version of LibreOffice, OpenOffice or Microsoft Excel
An operating system that can cope with WebDAV (which seems to be most of them these days - to varying degrees of success)

An Example Query:

Ideally this method is ideal for fast-paced data, the data that changes often - such as statistics or locations of crime etc. However, for now I’ve just used a simple lat-long search of those “areas” that touch my local area of “Long Ashton and Wraxall”.

SELECT DISTINCT ?TouchesAreaURI, ?TouchesName, ?TouchesAreaLat, ?TouchesAreaLong
WHERE {
<https://data.ordnancesurvey.co.uk/id/7000000000000770>  <https://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches>  ?TouchesAreaURI .
GRAPH ?TouchesAreaURI {
?TouchesAreaURI <https://www.w3.org/2000/01/rdf-schema#label> ?TouchesName;
<https://www.w3.org/2003/01/geo/wgs84_pos#lat> ?TouchesAreaLat;
<https://www.w3.org/2003/01/geo/wgs84_pos#long> ?TouchesAreaLong
}
}

Method Part One: Generic

As mentioned in the prerequisites - you will need administrative access to Virtuoso in order to fully run through this tutorial. This is because we need to create folders which need to be attached to the SPARQL user in order for the /sparql endpoint to save to WebDAV.

Administrative Setup:
1. Login to Conductor
2. Go to System Admin > User Accounts
3. Click “Edit” next to the SPARQL user
4. Change the following:
  - DAV Home Path: /DAV/home/QL/ (you could call the “QL” folder whatever you like - just remember what you’ve changed it to)
  - DAV Home Path “create”: Checked
  - Default Permissions: all checked
  - User Type: SQL/ODBC and WebDAV
5. Click Save.
6. Go to Web Application Server > Content Management > Repository
7. Navigate the WebDAV to: DAV/home/QL (or whatever you named “QL”)
8. Click the New Folder Icon (it looks like a folder with an orange splodge on the top-left)
9. Make a new folder:
  - Name: saved-sparql-results (must not be different!)
  - Permissions: all checked
  - Folder Type: Dynamic Resources
10. Click Create
Query and Data Setup:
1. Hit https://<server>:<port-usually-8890>/sparql
2. Enter the SPARQL query (e.g. The Example Query above)
3. Change the following:
  - Change to a grab everything type query - i.e. “Try to download all referenced resources (this may be very slow and inefficient)”. Or one of the other options - dependent on the locations, data and the query.
  - Format Results as: Spreadsheet (or CSV)
  - “Display the result and do not save” change this to “Save the result to the DAV and refresh it periodically with the specified name:”
  - Add a filename (with file extension). For example testspreadsheet.xls (or testspreadsheet.csv)
4. Click “Run Query”
5. You’ll know see a “Done” screen, with the URI of the result, this is a WebDAV accessible URL. Please take note of the URL of the “saved-sparql-results”, it should look a little like this: https://<server>:<port-usually-8890>DAV/home/QL/saved-sparql-results

Method Part Two A: For LibreOffice and OpenOffice users

You will need to do the following in order to connect to a WebDAV folder:

Tools > Options > LibreOffice/OpenOffice > General
and ensure that “Use LibreOffice/OpenOffice dialogue boxes” is turned on.

You will be linking dynamically from your spreadsheet to the resource on your WebDAV instance:

Start a new spreadsheet, or load up a spreadsheet where you want the resource to go.
Go to Insert > Link to External Data
Click the “…” button
Enter your “saved-sparql-results” URL (not including the filename itself!), and press enter
You should now see your “saved-sparql-results” WebDAV directory. Select the file, and click insert. The program will then probably ask you for your dav login details (you may want to make the program remember the details), it may also ask you about the format of the file - just follow that through how you would normally when importing/opening a file. You may also have to select “HTML_all” if you chose the “Spreadsheet option” in the sparql interface.
Check the “Update every” box, and change the time to a suitable time based on the data.
Finally, press the “OK” button… and you’ll see your lovely Linked Data inside your spreadsheet. Then you’ll be able to do whatever you want to your data (e.g. create a graph, do some calculations etc etc) - and everything will update when the data is updated. Funky!

Method Part Two B: For Excel users

OK, so I’m not a native Windows user (I used Mac OS and Amiga OS in my childhood, before moving to Unix and Linux based operating systems in about 2001). What I have found is that Windows 7 and Excel go a little strange with WebDAV, they like certain configurations - so I’ll be showing you a reasonably bodgy way of doing this :-P

Prerequisite: In step 2.c of Generic Method One - save the results as HTML, and make sure the file extension is also .html
Open Excel (I’m using Excel 2010)
Click on the Data menu
Click “From Web”
In the address bar enter your “saved-sparql-queries” URL, press enter - this will probably ask you to enter your dav username and password
Click on your <filename>.html file - you should then see the HTML Table
Press the Import button
A dialogue will pop up asking about where you would like to place your data - for ease I use the default.
You’ll see the data! The important thing to note is that this is Linked Data - however, it is not quite self-updating yet. In order to do that we need to set the connection properties… so…
Select the imported data
Click “properties” which is in the “Connections” subpanel of the “Data” menu
Change the “Refresh Every”, and/or check the “Refresh data when opening the file”. Click ok.
Self-updating Excel spreadsheets from Linked Data. Funky!

Documentation Resources

Importing Linked Data into a Spreadsheet Part 1 (Google Spreadsheets)
LibreOffice: “Opening a Document Using WebDAV over HTTPS”
OpenLink Software Documentation of the SPARQL implementation in Virtuoso and its endpoint

Software Resources

OpenLink Virtuoso
Virtuoso Universal Server - Proprietary Edition
Virtuoso Open Source Edition
LibreOffice
OpenOffice
Microsoft Office

I hope that all works for me, and feel free to share any ideas or findings!

Tags: excel, libreoffice, linked data, openlink, openoffice, semantic web, virtuoso, webdav

RESTful WebID Verification

Friday, July 1st, 2011 | codesnippets, technical | daniel

What?

This article goes through the details of verifying a WebID certificate using REST built in a PHP client. It will connect to an OpenLink Virtuoso service for WebID verification.

WebID is a new technology for carrying your identity with you, essentially you store your identity in the form of a certificate in your browser, and this certificate can be verified against a WebID service. WebID is a combination of technologies (notably FOAF (Friend of a Friend) and SSL (Secure Socket Layer)). If you haven’t got yourself a WebID yet, then you can pick one up at any ODS installation (for instance: https://id.myopenlink.net/ods ) and you can find them under then Security tab when editing your profile. To learn more about the WebID standard please visit: https://www.w3.org/wiki/WebID For more information about generating a WebID through ODS please see: https://www.openlinksw.com/wiki/main/ODS/ODSX509GenerateWindows

REST (or Representational State Transfer) is a technique for dealing with a resource at a location. The technologies used are usually HTTP (HyperText Transfer Protocol), the resource is usually in some standardised format (such as XML or JSON) and the location is specified by a URL (Uniform Resource Locator). These are pretty standardised and contemporary tools and techniques that are used on the World Wide Web.

PHP is a programming/scripting language usually used for server-side development. It is a very flexible language due to its dynamic-weak typing and its capability of doing both object-oriented and proceedural programming. Its server-side usage is often “served” using hosting software such as Apache HTTP Server or OpenLink Virtuoso Universal Server. To learn more about PHP visit: https://www.php.net/

Virtuoso is a “Universal Server” - it contains within it, amongst other things, a database server, a web hosting server and a semantic data triple store. It is capable of working with all of the technologies above - REST, PHP and WebID - along with other related technologies (e.g. hosting other server-side languages, dealing with SQL and SPARQL, providing WebDAV etc etc). It comes in two forms: an enterprise edition and an open source edition, and is installable anywhere (including cloud-based servers such as Amazon EC2). To learn more about Virtuoso please visit: https://virtuoso.openlinksw.com/

ODS (OpenLink Data Spaces) is a linked data web application for hosting and manipulating personal, social and business data. It holds within it packages for profiling, webdav file storage, feed reading, address book storage, calendar, bookmarking, photo gallery and many other functions that you would expect from a social website. ODS is built on top of Virtuoso. To learn more about ODS please visit: https://ods.openlinksw.com/wiki/ODS/

Why?

Identity is an important issue for trust on the web, and it comes from two perspectives:

When a user accesses a website they want to know that their identity remains theirs, and that they can log in easily without duplicating effort.
When a developer builds a web application they want to know that the users accessing their site are who they say they are.

WebID handles this through interlinking using URIs over HTTP, profiling using the FOAF standard, and security using the SSL standard. From a development point of view it is necessary to verify a user, and this is the reason for writing this article.

How?

To make things a lot easier OpenLink Software have created a service built into their ODS Framework which verifies a certificate provider with an issued certificate. The URL for the web service is: https://id.myopenlink.net/ods/webid_verify.vsp

This webservice takes the following HTTP Get Parameter:

callback

string

The callback is the URL that you want the success/failure information to be returned to. The cleverness actually comes from the fact that the service also tests your SSL certificate information which is stored in the header information that the browser sends across, this is a three agent system. The three agent system could be shown a bit like this:

So we can start to build up a picture of how a “Verification Requester” might look like:

First Page: Send user to the “Verifier” with the relevant Callback URL
Callback Page: Receive details from the verifier - details will be found in the HTTP Parameters.
1. If a WebID URI is returned then you know everything is ok
2. If an error is returned then the WebID has not been verified

Lets build something then, we shall build a simple single page script which does different things based on whether it has in the first pass through or the second….

(example code based on code written by OpenLink Software Ltd)…

<?php
  function apiURL()
  {
    $pageURL = $_SERVER['HTTPS'] == 'on' ? 'https://' : 'https://';
    $pageURL .= $_SERVER['SERVER_PORT'] <> '80' ? $_SERVER['SERVER_NAME'] . ':' . $_SERVER['SERVER_PORT'] : $_SERVER['SERVER_NAME'];
    return $pageURL . '/ods/webid_demo.php';
  }
	$_webid = isset ($_REQUEST['webid']) ? $_REQUEST['webid'] : '';
	$_error = isset ($_REQUEST['error']) ? $_REQUEST['error'] : '';
	$_action = isset ($_REQUEST['go']) ? $_REQUEST['go'] : '';
  if (($_webid == '') && ($_error == ''))
  {
    if ($_action <> '')
    {
      if ($_SERVER['HTTPS'] <> 'on')
      {
        $_error = 'No certificate';
      }
      else
      {
        $_callback = apiURL();
        $_url = sprintf ('https://id.myopenlink.net/ods/webid_verify.vsp?callback=%s', urlencode($_callback));
        header (sprintf ('Location: %s', $_url));
        return;
      }
    }
  }
?>

This first bit of code (above) simply deals with redirecting the user process to the Verifier service with the relevant (dynamic) Callback URL. You will notice that it only redirects when the “go” request is set - this is for demonstration purposes. We shall continue….

<html>
  <head>
    <title>WebID Verification Demo - PHP</title>
  </head>
  <body>
    <h1>WebID Verification Demo</h1>
    <div>
      This will check your X.509 Certificate's WebID  watermark. <br/>Also note this service supports ldap, http, mailto, acct scheme based WebIDs.
    </div>
    <br/>
    <br/>
    <div>
      <form method="get">
        <input type="submit" name="go" value="Check"/>
      </form>
    </div>
    <?php
      if (($_webid <> '') || ($_error <> ''))
      {
    ?>
      <div>
      	The return values are:
  	    <ul>
          <?php
            if ($_webid <> '')
            {
          ?>
  	      <li>WebID -  <?php print ($_webid); ?></li>
  	      <li>Timestamp in ISO 8601 format - <?php print ($_REQUEST['ts']); ?></li>
          <?php
            }
            if ($_error <> '')
            {
          ?>
  	      <li>Error - <?php print ($_error); ?></li>
          <?php
            }
          ?>
  	    </ul>
      </div>
    <?php
      }
    ?>
  </body>
</html>

This second part of the code is twofold:

Firstly, it displays a simple form with a “go” button - this is simply to demonstrate the “redirection” part of the code
Secondly, this is where we print out the results from what we’ve callback’d. You’ll see that we try to print out the WebID URI, the Timestamp and any Error message.

What is great about the above code is that this can be run on any server that has PHP installed, it doesn’t need to be installed specifically on Apache HTTP Server, nor on OpenLink Virtuoso - it could be installed on any HTTP server with PHP hosting. It could even be adapted to be the Ruby programming language, Python, Perl, ASP or any server-side language, scripting language (including Javascript), or standalone programming language.

Thing is this not only works with http: WebIDs it can work with ldap:, mailto:, or acct: WebIDs too! Kingsley Idehen demonstrates this to us in his twitpic

Grab the full code here

Tags: foaf, openlink, REST, ssl, virtuoso, webid

If you like my blog and my work then do feel free help me survive by giving a little money. No worries if you can't or don't want to.