I’ve recently come into contact with the usefulness of the ImportXML() function (and the related ImportHTML() and ImportFeed() functions) found in Google Docs Spreadsheet app [1], and their usefulness is to do with Uniform Resource Locators (URLs). This was partly thanks to Kingsley Idehens example Google Spreadsheet on the “Patriots Players” (and his tweet dated 22nd June at 10:11am), and so I wanted to see how it was done and maybe make something that wasn’t about American Football.
URLs are the actual addresses for Data, they are the “Data Source/Location Names” (DSNs). With the ImportHtml() (et al) functions you just have to plug in a URL and your data is displayed in the spreadsheet. Of course this is dependent on the structure of the data at that address, but you get the picture.
Let me show you an example based on a SPARQL query I created many moons ago:
SELECT DISTINCT ?NewspaperURI ?Newspaper ?Stance WHERE {
?NewspaperURI rdf:type dbpedia-owl:Newspaper ;
rdfs:label ?Newspaper ;
dcterms:subject <https://dbpedia.org/resource/Category:Newspapers_published_in_the_United_Kingdom>;
<https://dbpedia.org/property/political> ?Stance .
FILTER (lang(?Stance) = "en") .
FILTER (lang(?Newspaper) = "en")
}
ORDER BY ?Stance
Briefly, what the above does is shows a Newspaper URI, a Newspaper name and the Newspapers political stance - and these are limited to just those newspapers published in the United Kingdom. The result is a little messy as not all of the newspapers have the same style of label, and some of the newspapers stances will be hidden behind a further set of URIs - but this is just as an example.
Now we can plug this in to a sparql endpoint such as:
- https://dbpedia.org/sparql or
- https://lod.openlinksw.com/sparql
We shall use the 2nd for now - if you hit that in your browser than you’ll be able to plug it into a nicely made HTML form and fire that off to generate an HTML webpage. [2]
However, it isn’t entirely useful as we want to get in into a Google Spreadsheet! So, we need to modify the URL that the form creates. Firstly, copy the URL as it is… for instance…
https://lod.openlinksw.com/sparql?default-graph-uri=&should-sponge=&query=SELECT+DISTINCT+%3FNewspaperURI+%3FNewspaper+%3FStance+WHERE+{%0D%0A%3FNewspaperURI+rdf%3Atype+dbpedia-owl%3ANewspaper+%3B%0D%0A++rdfs%3Alabel+%3FNewspaper+%3B%0D%0A++dcterms%3Asubject+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ANewspapers_published_in_the_United_Kingdom%3E%3B%0D%0A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2Fpolitical%3E+%3FStance+.+%0D%0AFILTER+%28lang%28%3FStance%29+%3D+%22en%22%29+.+%0D%0AFILTER+%28lang%28%3FNewspaper%29+%3D+%22en%22%29%0D%0A}%0D%0AORDER+BY+%3FStance&debug=on&timeout=&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&save=display&fname=
Then where it says “text%2Fhtml
” (which means that it is of text/html type) change it to “application%2Fvnd.ms-excel
” (which means that it is of application/vnd.ms-excel type - in other words a spreadsheet-friendly table). In our example this would make the URL look like…
We then need to plug this into our Google Spreadsheet - so open a new spreadsheet, or create a new sheet, or go to where you want to place the data on that sheet.
Then click on one of the cells, and enter the following function:
=ImportHtml("https://lod.openlinksw.com/sparql?default-graph-uri=&should-sponge=&query=SELECT+DISTINCT+%3FNewspaperURI+%3FNewspaper+%3FStance+WHERE+{%0D%0A%3FNewspaperURI+rdf%3Atype+dbpedia-owl%3ANewspaper+%3B%0D%0A++rdfs%3Alabel+%3FNewspaper+%3B%0D%0A++dcterms%3Asubject+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ANewspapers_published_in_the_United_Kingdom%3E%3B%0D%0A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2Fpolitical%3E+%3FStance+.+%0D%0AFILTER+%28lang%28%3FStance%29+%3D+%22en%22%29+.+%0D%0AFILTER+%28lang%28%3FNewspaper%29+%3D+%22en%22%29%0D%0A}%0D%0AORDER+BY+%3FStance&debug=on&timeout=&format=application%2Fvnd.ms-excel&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&save=display&fname=", "table", 1)
The first parameter is our query URL, the second is for pulling our the table elements, and the third is to grab the first of those table elements.
The spreadsheet will populate with all the data that we asked for. Pretty neat!
Now that the data is in a Google Spreadsheet you can do all kinds of things that spreadsheets are good at - one based on our example might be statistics of political stance, or if you modify the query a bit to pull out more than those in the UK then the statistics of those stances based on UK and how that relates to the political parties currently in power.
It also doesn’t have to specifically be data found in DBPedia, it could be anything (business data, science data, personal data etc etc). The key to all of this is the power of URLs, and how they allow the dynamic linking to hyperdata! This is the killer-power of Linked Data.
If you do the above tutorial then let us know if you find anything of interest, and share any experiences you may have.
Resources
- Google Documentation of using ImportXml, ImportHtml and ImportFeed functions in Google Spreadsheets: https://docs.google.com/support/bin/answer.py?answer=75507
- OpenLink Software Documentation of the SPARQL implementation in Virtuoso and its endpoint: https://docs.openlinksw.com/virtuoso/rdfsparql.html
- Kingsley has been adding to demos on his bookmarks page: https://www.delicious.com/kidehen/linked_data_spreadsheet_demo
Back in 2005 when I was first learning about “Agent Oriented Development” I was taught that agents must Perform in some Environment which it manipulates using Actuators and perceives through its Sensors (this is known as PEAS).
What reminded me about this was Kingsley Idehens 2006 blog post on the Dimensions of the Web… which is very much still a valid concept. What came to me though was that if Intelligent Agents are to be truly Autonomous on a Linked Data Web they will probably have to be mobile - and I don’t mean mobile as in developed for mobile devices, what I mean is that they should probably be capable of moving themselves from one computer to another…. otherwise they will just become some kind of “clever” web service which is stored in one place and gets its data from other web services (i.e. we revert back to Dimension 2 in Kingsleys post).
But there is a problem with this model… it sounds a bit like a virus, and something initially quite good could potentially become a bad thing with an evolutionary technique such as Genetic Programming. An intelligent agent capable of adapting to its environment through genetic programming techniques is incredibly powerful, but with great power comes great responsibility.
I have mixed feelings about Singularity philosophy, and I am particularly wary about the Singularity University, however maybe we should be thinking about the ethical/security/identity implications of Autonomous Intelligent Agents on the Linked Data Web.
Some food for thought… on this lovely Monday morning… ;-)