Note: This guide is part two of the previous blog post on Importing Linked Data into a Spreadsheet.
Introduction and Theory:
Say you don’t want your data in Google Spreadsheet, but would prefer it in Excel, OpenOffice, LibreOffice or some kind of standalone desktop application on your computer. There is still potential to work with dynamic Linked Data - via the powers of WebDAV (which is a technology allowing the establishment of an “online hard drive” over the protocols that power the world-wide-web).
A WebDAV URL is also a Data Source Name (aka an “address”), you see it is capable of being linked to as it is a URL - it is still Linked Data, and yet it can be treated as a store. This is the one of the many powers of the Linked Data Web.
Once the Method Part One is done there are two options for the tutorial, the first “Part 2″ is dealing with the data in LibreOffice (and I presume that the process is very similar in contemporary versions of OpenOffice), the second “Part 2″ is for dealing with the data in Excel (I’ve used version 2010 on Windows 7).
Prerequisites:
- You will need a copy of Virtuoso on your machine up and running (the enterprise edition and the open source edition should both work). You must also have administrative access to it.
- A new-ish version of LibreOffice, OpenOffice or Microsoft Excel
- An operating system that can cope with WebDAV (which seems to be most of them these days - to varying degrees of success)
An Example Query:
Ideally this method is ideal for fast-paced data, the data that changes often - such as statistics or locations of crime etc. However, for now I’ve just used a simple lat-long search of those “areas” that touch my local area of “Long Ashton and Wraxall”.
SELECT DISTINCT ?TouchesAreaURI, ?TouchesName, ?TouchesAreaLat, ?TouchesAreaLong WHERE { <https://data.ordnancesurvey.co.uk/id/7000000000000770> <https://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches> ?TouchesAreaURI . GRAPH ?TouchesAreaURI { ?TouchesAreaURI <https://www.w3.org/2000/01/rdf-schema#label> ?TouchesName; <https://www.w3.org/2003/01/geo/wgs84_pos#lat> ?TouchesAreaLat; <https://www.w3.org/2003/01/geo/wgs84_pos#long> ?TouchesAreaLong } }
Method Part One: Generic
As mentioned in the prerequisites - you will need administrative access to Virtuoso in order to fully run through this tutorial. This is because we need to create folders which need to be attached to the SPARQL user in order for the /sparql endpoint to save to WebDAV.
- Administrative Setup:
- Login to Conductor
- Go to System Admin > User Accounts
- Click “Edit” next to the SPARQL user
- Change the following:
- DAV Home Path: /DAV/home/QL/ (you could call the “QL” folder whatever you like - just remember what you’ve changed it to)
- DAV Home Path “create”: Checked
- Default Permissions: all checked
- User Type: SQL/ODBC and WebDAV
- Click Save.
- Go to Web Application Server > Content Management > Repository
- Navigate the WebDAV to: DAV/home/QL (or whatever you named “QL”)
- Click the New Folder Icon (it looks like a folder with an orange splodge on the top-left)
- Make a new folder:
- Name: saved-sparql-results (must not be different!)
- Permissions: all checked
- Folder Type: Dynamic Resources
- Click Create
- Query and Data Setup:
- Hit https://<server>:<port-usually-8890>/sparql
- Enter the SPARQL query (e.g. The Example Query above)
- Change the following:
- Change to a grab everything type query - i.e. “Try to download all referenced resources (this may be very slow and inefficient)”. Or one of the other options - dependent on the locations, data and the query.
- Format Results as: Spreadsheet (or CSV)
- “Display the result and do not save” change this to “Save the result to the DAV and refresh it periodically with the specified name:”
- Add a filename (with file extension). For example testspreadsheet.xls (or testspreadsheet.csv)
- Click “Run Query”
- You’ll know see a “Done” screen, with the URI of the result, this is a WebDAV accessible URL. Please take note of the URL of the “saved-sparql-results”, it should look a little like this: https://<server>:<port-usually-8890>DAV/home/QL/saved-sparql-results
Method Part Two A: For LibreOffice and OpenOffice users
You will need to do the following in order to connect to a WebDAV folder:
- Tools > Options > LibreOffice/OpenOffice > General
- and ensure that “Use LibreOffice/OpenOffice dialogue boxes” is turned on.
You will be linking dynamically from your spreadsheet to the resource on your WebDAV instance:
- Start a new spreadsheet, or load up a spreadsheet where you want the resource to go.
- Go to Insert > Link to External Data
- Click the “…” button
- Enter your “saved-sparql-results” URL (not including the filename itself!), and press enter
- You should now see your “saved-sparql-results” WebDAV directory. Select the file, and click insert. The program will then probably ask you for your dav login details (you may want to make the program remember the details), it may also ask you about the format of the file - just follow that through how you would normally when importing/opening a file. You may also have to select “HTML_all” if you chose the “Spreadsheet option” in the sparql interface.
- Check the “Update every” box, and change the time to a suitable time based on the data.
- Finally, press the “OK” button… and you’ll see your lovely Linked Data inside your spreadsheet. Then you’ll be able to do whatever you want to your data (e.g. create a graph, do some calculations etc etc) - and everything will update when the data is updated. Funky!
Method Part Two B: For Excel users
OK, so I’m not a native Windows user (I used Mac OS and Amiga OS in my childhood, before moving to Unix and Linux based operating systems in about 2001). What I have found is that Windows 7 and Excel go a little strange with WebDAV, they like certain configurations - so I’ll be showing you a reasonably bodgy way of doing this :-P
- Prerequisite: In step 2.c of Generic Method One - save the results as HTML, and make sure the file extension is also .html
- Open Excel (I’m using Excel 2010)
- Click on the Data menu
- Click “From Web”
- In the address bar enter your “saved-sparql-queries” URL, press enter - this will probably ask you to enter your dav username and password
- Click on your <filename>.html file - you should then see the HTML Table
- Press the Import button
- A dialogue will pop up asking about where you would like to place your data - for ease I use the default.
- You’ll see the data! The important thing to note is that this is Linked Data - however, it is not quite self-updating yet. In order to do that we need to set the connection properties… so…
- Select the imported data
- Click “properties” which is in the “Connections” subpanel of the “Data” menu
- Change the “Refresh Every”, and/or check the “Refresh data when opening the file”. Click ok.
- Self-updating Excel spreadsheets from Linked Data. Funky!
Documentation Resources
- Importing Linked Data into a Spreadsheet Part 1 (Google Spreadsheets)
- LibreOffice: “Opening a Document Using WebDAV over HTTPS”
- OpenLink Software Documentation of the SPARQL implementation in Virtuoso and its endpoint
Software Resources
- OpenLink Virtuoso
- Virtuoso Universal Server - Proprietary Edition
- Virtuoso Open Source Edition
- LibreOffice
- OpenOffice
- Microsoft Office
I hope that all works for me, and feel free to share any ideas or findings!