Archive codesnippets | Vanir Systems Blog

Archive for 'codesnippets'

Quick Snippet: DBPedia Categories Example

Monday, July 11th, 2011 | codesnippets | daniel

Just a quick example of a category-based search on DBPedia. It finds all “Latin Words and Phrases” on DBpedia, the results are the DBPedia URIs, the Labels and the Wikipedia URIs.

SELECT DISTINCT ?uri, ?label, ?WikipediaURI
WHERE {
    ?uri dcterms:subject category:Latin_words_and_phrases ;
        rdfs:label ?label ;
        foaf:page ?WikipediaURI .
    FILTER (lang(?label) = "en")
}
ORDER BY ?label

To make it work feel free to plug it into the DBPedia SPARQL endpoint.

Hopefully that’s nice and easy to understand, its nothing complicated at all. Enjoy! Feel free to comment.

Tags: dbpedia, sparql

Importing Linked Data into a Spreadsheet - Part 2

Thursday, July 7th, 2011 | codesnippets, technical | daniel

Note: This guide is part two of the previous blog post on Importing Linked Data into a Spreadsheet.

Introduction and Theory:

Say you don’t want your data in Google Spreadsheet, but would prefer it in Excel, OpenOffice, LibreOffice or some kind of standalone desktop application on your computer. There is still potential to work with dynamic Linked Data - via the powers of WebDAV (which is a technology allowing the establishment of an “online hard drive” over the protocols that power the world-wide-web).

A WebDAV URL is also a Data Source Name (aka an “address”), you see it is capable of being linked to as it is a URL - it is still Linked Data, and yet it can be treated as a store. This is the one of the many powers of the Linked Data Web.

Once the Method Part One is done there are two options for the tutorial, the first “Part 2″ is dealing with the data in LibreOffice (and I presume that the process is very similar in contemporary versions of OpenOffice), the second “Part 2″ is for dealing with the data in Excel (I’ve used version 2010 on Windows 7).

Prerequisites:

You will need a copy of Virtuoso on your machine up and running (the enterprise edition and the open source edition should both work). You must also have administrative access to it.
A new-ish version of LibreOffice, OpenOffice or Microsoft Excel
An operating system that can cope with WebDAV (which seems to be most of them these days - to varying degrees of success)

An Example Query:

Ideally this method is ideal for fast-paced data, the data that changes often - such as statistics or locations of crime etc. However, for now I’ve just used a simple lat-long search of those “areas” that touch my local area of “Long Ashton and Wraxall”.

SELECT DISTINCT ?TouchesAreaURI, ?TouchesName, ?TouchesAreaLat, ?TouchesAreaLong
WHERE {
<https://data.ordnancesurvey.co.uk/id/7000000000000770>  <https://data.ordnancesurvey.co.uk/ontology/spatialrelations/touches>  ?TouchesAreaURI .
GRAPH ?TouchesAreaURI {
?TouchesAreaURI <https://www.w3.org/2000/01/rdf-schema#label> ?TouchesName;
<https://www.w3.org/2003/01/geo/wgs84_pos#lat> ?TouchesAreaLat;
<https://www.w3.org/2003/01/geo/wgs84_pos#long> ?TouchesAreaLong
}
}

Method Part One: Generic

As mentioned in the prerequisites - you will need administrative access to Virtuoso in order to fully run through this tutorial. This is because we need to create folders which need to be attached to the SPARQL user in order for the /sparql endpoint to save to WebDAV.

Administrative Setup:
1. Login to Conductor
2. Go to System Admin > User Accounts
3. Click “Edit” next to the SPARQL user
4. Change the following:
  - DAV Home Path: /DAV/home/QL/ (you could call the “QL” folder whatever you like - just remember what you’ve changed it to)
  - DAV Home Path “create”: Checked
  - Default Permissions: all checked
  - User Type: SQL/ODBC and WebDAV
5. Click Save.
6. Go to Web Application Server > Content Management > Repository
7. Navigate the WebDAV to: DAV/home/QL (or whatever you named “QL”)
8. Click the New Folder Icon (it looks like a folder with an orange splodge on the top-left)
9. Make a new folder:
  - Name: saved-sparql-results (must not be different!)
  - Permissions: all checked
  - Folder Type: Dynamic Resources
10. Click Create
Query and Data Setup:
1. Hit https://<server>:<port-usually-8890>/sparql
2. Enter the SPARQL query (e.g. The Example Query above)
3. Change the following:
  - Change to a grab everything type query - i.e. “Try to download all referenced resources (this may be very slow and inefficient)”. Or one of the other options - dependent on the locations, data and the query.
  - Format Results as: Spreadsheet (or CSV)
  - “Display the result and do not save” change this to “Save the result to the DAV and refresh it periodically with the specified name:”
  - Add a filename (with file extension). For example testspreadsheet.xls (or testspreadsheet.csv)
4. Click “Run Query”
5. You’ll know see a “Done” screen, with the URI of the result, this is a WebDAV accessible URL. Please take note of the URL of the “saved-sparql-results”, it should look a little like this: https://<server>:<port-usually-8890>DAV/home/QL/saved-sparql-results

Method Part Two A: For LibreOffice and OpenOffice users

You will need to do the following in order to connect to a WebDAV folder:

Tools > Options > LibreOffice/OpenOffice > General
and ensure that “Use LibreOffice/OpenOffice dialogue boxes” is turned on.

You will be linking dynamically from your spreadsheet to the resource on your WebDAV instance:

Start a new spreadsheet, or load up a spreadsheet where you want the resource to go.
Go to Insert > Link to External Data
Click the “…” button
Enter your “saved-sparql-results” URL (not including the filename itself!), and press enter
You should now see your “saved-sparql-results” WebDAV directory. Select the file, and click insert. The program will then probably ask you for your dav login details (you may want to make the program remember the details), it may also ask you about the format of the file - just follow that through how you would normally when importing/opening a file. You may also have to select “HTML_all” if you chose the “Spreadsheet option” in the sparql interface.
Check the “Update every” box, and change the time to a suitable time based on the data.
Finally, press the “OK” button… and you’ll see your lovely Linked Data inside your spreadsheet. Then you’ll be able to do whatever you want to your data (e.g. create a graph, do some calculations etc etc) - and everything will update when the data is updated. Funky!

Method Part Two B: For Excel users

OK, so I’m not a native Windows user (I used Mac OS and Amiga OS in my childhood, before moving to Unix and Linux based operating systems in about 2001). What I have found is that Windows 7 and Excel go a little strange with WebDAV, they like certain configurations - so I’ll be showing you a reasonably bodgy way of doing this :-P

Prerequisite: In step 2.c of Generic Method One - save the results as HTML, and make sure the file extension is also .html
Open Excel (I’m using Excel 2010)
Click on the Data menu
Click “From Web”
In the address bar enter your “saved-sparql-queries” URL, press enter - this will probably ask you to enter your dav username and password
Click on your <filename>.html file - you should then see the HTML Table
Press the Import button
A dialogue will pop up asking about where you would like to place your data - for ease I use the default.
You’ll see the data! The important thing to note is that this is Linked Data - however, it is not quite self-updating yet. In order to do that we need to set the connection properties… so…
Select the imported data
Click “properties” which is in the “Connections” subpanel of the “Data” menu
Change the “Refresh Every”, and/or check the “Refresh data when opening the file”. Click ok.
Self-updating Excel spreadsheets from Linked Data. Funky!

Documentation Resources

Importing Linked Data into a Spreadsheet Part 1 (Google Spreadsheets)
LibreOffice: “Opening a Document Using WebDAV over HTTPS”
OpenLink Software Documentation of the SPARQL implementation in Virtuoso and its endpoint

Software Resources

OpenLink Virtuoso
Virtuoso Universal Server - Proprietary Edition
Virtuoso Open Source Edition
LibreOffice
OpenOffice
Microsoft Office

I hope that all works for me, and feel free to share any ideas or findings!

Tags: excel, libreoffice, linked data, openlink, openoffice, semantic web, virtuoso, webdav

RESTful WebID Verification

Friday, July 1st, 2011 | codesnippets, technical | daniel

What?

This article goes through the details of verifying a WebID certificate using REST built in a PHP client. It will connect to an OpenLink Virtuoso service for WebID verification.

WebID is a new technology for carrying your identity with you, essentially you store your identity in the form of a certificate in your browser, and this certificate can be verified against a WebID service. WebID is a combination of technologies (notably FOAF (Friend of a Friend) and SSL (Secure Socket Layer)). If you haven’t got yourself a WebID yet, then you can pick one up at any ODS installation (for instance: https://id.myopenlink.net/ods ) and you can find them under then Security tab when editing your profile. To learn more about the WebID standard please visit: https://www.w3.org/wiki/WebID For more information about generating a WebID through ODS please see: https://www.openlinksw.com/wiki/main/ODS/ODSX509GenerateWindows

REST (or Representational State Transfer) is a technique for dealing with a resource at a location. The technologies used are usually HTTP (HyperText Transfer Protocol), the resource is usually in some standardised format (such as XML or JSON) and the location is specified by a URL (Uniform Resource Locator). These are pretty standardised and contemporary tools and techniques that are used on the World Wide Web.

PHP is a programming/scripting language usually used for server-side development. It is a very flexible language due to its dynamic-weak typing and its capability of doing both object-oriented and proceedural programming. Its server-side usage is often “served” using hosting software such as Apache HTTP Server or OpenLink Virtuoso Universal Server. To learn more about PHP visit: https://www.php.net/

Virtuoso is a “Universal Server” - it contains within it, amongst other things, a database server, a web hosting server and a semantic data triple store. It is capable of working with all of the technologies above - REST, PHP and WebID - along with other related technologies (e.g. hosting other server-side languages, dealing with SQL and SPARQL, providing WebDAV etc etc). It comes in two forms: an enterprise edition and an open source edition, and is installable anywhere (including cloud-based servers such as Amazon EC2). To learn more about Virtuoso please visit: https://virtuoso.openlinksw.com/

ODS (OpenLink Data Spaces) is a linked data web application for hosting and manipulating personal, social and business data. It holds within it packages for profiling, webdav file storage, feed reading, address book storage, calendar, bookmarking, photo gallery and many other functions that you would expect from a social website. ODS is built on top of Virtuoso. To learn more about ODS please visit: https://ods.openlinksw.com/wiki/ODS/

Why?

Identity is an important issue for trust on the web, and it comes from two perspectives:

When a user accesses a website they want to know that their identity remains theirs, and that they can log in easily without duplicating effort.
When a developer builds a web application they want to know that the users accessing their site are who they say they are.

WebID handles this through interlinking using URIs over HTTP, profiling using the FOAF standard, and security using the SSL standard. From a development point of view it is necessary to verify a user, and this is the reason for writing this article.

How?

To make things a lot easier OpenLink Software have created a service built into their ODS Framework which verifies a certificate provider with an issued certificate. The URL for the web service is: https://id.myopenlink.net/ods/webid_verify.vsp

This webservice takes the following HTTP Get Parameter:

callback

string

The callback is the URL that you want the success/failure information to be returned to. The cleverness actually comes from the fact that the service also tests your SSL certificate information which is stored in the header information that the browser sends across, this is a three agent system. The three agent system could be shown a bit like this:

So we can start to build up a picture of how a “Verification Requester” might look like:

First Page: Send user to the “Verifier” with the relevant Callback URL
Callback Page: Receive details from the verifier - details will be found in the HTTP Parameters.
1. If a WebID URI is returned then you know everything is ok
2. If an error is returned then the WebID has not been verified

Lets build something then, we shall build a simple single page script which does different things based on whether it has in the first pass through or the second….

(example code based on code written by OpenLink Software Ltd)…

<?php
  function apiURL()
  {
    $pageURL = $_SERVER['HTTPS'] == 'on' ? 'https://' : 'https://';
    $pageURL .= $_SERVER['SERVER_PORT'] <> '80' ? $_SERVER['SERVER_NAME'] . ':' . $_SERVER['SERVER_PORT'] : $_SERVER['SERVER_NAME'];
    return $pageURL . '/ods/webid_demo.php';
  }
	$_webid = isset ($_REQUEST['webid']) ? $_REQUEST['webid'] : '';
	$_error = isset ($_REQUEST['error']) ? $_REQUEST['error'] : '';
	$_action = isset ($_REQUEST['go']) ? $_REQUEST['go'] : '';
  if (($_webid == '') && ($_error == ''))
  {
    if ($_action <> '')
    {
      if ($_SERVER['HTTPS'] <> 'on')
      {
        $_error = 'No certificate';
      }
      else
      {
        $_callback = apiURL();
        $_url = sprintf ('https://id.myopenlink.net/ods/webid_verify.vsp?callback=%s', urlencode($_callback));
        header (sprintf ('Location: %s', $_url));
        return;
      }
    }
  }
?>

This first bit of code (above) simply deals with redirecting the user process to the Verifier service with the relevant (dynamic) Callback URL. You will notice that it only redirects when the “go” request is set - this is for demonstration purposes. We shall continue….

<html>
  <head>
    <title>WebID Verification Demo - PHP</title>
  </head>
  <body>
    <h1>WebID Verification Demo</h1>
    <div>
      This will check your X.509 Certificate's WebID  watermark. <br/>Also note this service supports ldap, http, mailto, acct scheme based WebIDs.
    </div>
    <br/>
    <br/>
    <div>
      <form method="get">
        <input type="submit" name="go" value="Check"/>
      </form>
    </div>
    <?php
      if (($_webid <> '') || ($_error <> ''))
      {
    ?>
      <div>
      	The return values are:
  	    <ul>
          <?php
            if ($_webid <> '')
            {
          ?>
  	      <li>WebID -  <?php print ($_webid); ?></li>
  	      <li>Timestamp in ISO 8601 format - <?php print ($_REQUEST['ts']); ?></li>
          <?php
            }
            if ($_error <> '')
            {
          ?>
  	      <li>Error - <?php print ($_error); ?></li>
          <?php
            }
          ?>
  	    </ul>
      </div>
    <?php
      }
    ?>
  </body>
</html>

This second part of the code is twofold:

Firstly, it displays a simple form with a “go” button - this is simply to demonstrate the “redirection” part of the code
Secondly, this is where we print out the results from what we’ve callback’d. You’ll see that we try to print out the WebID URI, the Timestamp and any Error message.

What is great about the above code is that this can be run on any server that has PHP installed, it doesn’t need to be installed specifically on Apache HTTP Server, nor on OpenLink Virtuoso - it could be installed on any HTTP server with PHP hosting. It could even be adapted to be the Ruby programming language, Python, Perl, ASP or any server-side language, scripting language (including Javascript), or standalone programming language.

Thing is this not only works with http: WebIDs it can work with ldap:, mailto:, or acct: WebIDs too! Kingsley Idehen demonstrates this to us in his twitpic

Grab the full code here

Tags: foaf, openlink, REST, ssl, virtuoso, webid

Importing Linked Data into a Spreadsheet - Part 1 (Google Spreadsheet)

Thursday, June 23rd, 2011 | codesnippets, technical | daniel

I’ve recently come into contact with the usefulness of the ImportXML() function (and the related ImportHTML() and ImportFeed() functions) found in Google Docs Spreadsheet app [1], and their usefulness is to do with Uniform Resource Locators (URLs). This was partly thanks to Kingsley Idehens example Google Spreadsheet on the “Patriots Players” (and his tweet dated 22nd June at 10:11am), and so I wanted to see how it was done and maybe make something that wasn’t about American Football.

URLs are the actual addresses for Data, they are the “Data Source/Location Names” (DSNs). With the ImportHtml() (et al) functions you just have to plug in a URL and your data is displayed in the spreadsheet. Of course this is dependent on the structure of the data at that address, but you get the picture.

Let me show you an example based on a SPARQL query I created many moons ago:

SELECT DISTINCT ?NewspaperURI ?Newspaper ?Stance WHERE { ?NewspaperURI rdf:type dbpedia-owl:Newspaper ; rdfs:label ?Newspaper ; dcterms:subject <https://dbpedia.org/resource/Category:Newspapers_published_in_the_United_Kingdom>; <https://dbpedia.org/property/political> ?Stance . FILTER (lang(?Stance) = "en") . FILTER (lang(?Newspaper) = "en") } ORDER BY ?Stance

Briefly, what the above does is shows a Newspaper URI, a Newspaper name and the Newspapers political stance - and these are limited to just those newspapers published in the United Kingdom. The result is a little messy as not all of the newspapers have the same style of label, and some of the newspapers stances will be hidden behind a further set of URIs - but this is just as an example.

Now we can plug this in to a sparql endpoint such as:

https://dbpedia.org/sparql or
https://lod.openlinksw.com/sparql

We shall use the 2nd for now - if you hit that in your browser than you’ll be able to plug it into a nicely made HTML form and fire that off to generate an HTML webpage. [2]

However, it isn’t entirely useful as we want to get in into a Google Spreadsheet! So, we need to modify the URL that the form creates. Firstly, copy the URL as it is… for instance…

https://lod.openlinksw.com/sparql?default-graph-uri=&should-sponge=&query=SELECT+DISTINCT+%3FNewspaperURI+%3FNewspaper+%3FStance+WHERE+{%0D%0A%3FNewspaperURI+rdf%3Atype+dbpedia-owl%3ANewspaper+%3B%0D%0A++rdfs%3Alabel+%3FNewspaper+%3B%0D%0A++dcterms%3Asubject+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ANewspapers_published_in_the_United_Kingdom%3E%3B%0D%0A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2Fpolitical%3E+%3FStance+.+%0D%0AFILTER+%28lang%28%3FStance%29+%3D+%22en%22%29+.+%0D%0AFILTER+%28lang%28%3FNewspaper%29+%3D+%22en%22%29%0D%0A}%0D%0AORDER+BY+%3FStance&debug=on&timeout=&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&save=display&fname=

Then where it says “text%2Fhtml” (which means that it is of text/html type) change it to “application%2Fvnd.ms-excel” (which means that it is of application/vnd.ms-excel type - in other words a spreadsheet-friendly table). In our example this would make the URL look like…

We then need to plug this into our Google Spreadsheet - so open a new spreadsheet, or create a new sheet, or go to where you want to place the data on that sheet.

Then click on one of the cells, and enter the following function:

=ImportHtml("https://lod.openlinksw.com/sparql?default-graph-uri=&should-sponge=&query=SELECT+DISTINCT+%3FNewspaperURI+%3FNewspaper+%3FStance+WHERE+{%0D%0A%3FNewspaperURI+rdf%3Atype+dbpedia-owl%3ANewspaper+%3B%0D%0A++rdfs%3Alabel+%3FNewspaper+%3B%0D%0A++dcterms%3Asubject+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ANewspapers_published_in_the_United_Kingdom%3E%3B%0D%0A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2Fpolitical%3E+%3FStance+.+%0D%0AFILTER+%28lang%28%3FStance%29+%3D+%22en%22%29+.+%0D%0AFILTER+%28lang%28%3FNewspaper%29+%3D+%22en%22%29%0D%0A}%0D%0AORDER+BY+%3FStance&debug=on&timeout=&format=application%2Fvnd.ms-excel&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&save=display&fname=", "table", 1)

The first parameter is our query URL, the second is for pulling our the table elements, and the third is to grab the first of those table elements.

The spreadsheet will populate with all the data that we asked for. Pretty neat!

Now that the data is in a Google Spreadsheet you can do all kinds of things that spreadsheets are good at - one based on our example might be statistics of political stance, or if you modify the query a bit to pull out more than those in the UK then the statistics of those stances based on UK and how that relates to the political parties currently in power.

It also doesn’t have to specifically be data found in DBPedia, it could be anything (business data, science data, personal data etc etc). The key to all of this is the power of URLs, and how they allow the dynamic linking to hyperdata! This is the killer-power of Linked Data.

If you do the above tutorial then let us know if you find anything of interest, and share any experiences you may have.

Resources

Google Documentation of using ImportXml, ImportHtml and ImportFeed functions in Google Spreadsheets: https://docs.google.com/support/bin/answer.py?answer=75507
OpenLink Software Documentation of the SPARQL implementation in Virtuoso and its endpoint: https://docs.openlinksw.com/virtuoso/rdfsparql.html
Kingsley has been adding to demos on his bookmarks page: https://www.delicious.com/kidehen/linked_data_spreadsheet_demo

Tags: dbpedia, google, linked data, semantic web, sparql, spreadsheets, uri, url

Accessible POSH XHTML Strict Address Forms

Friday, December 17th, 2010 | codesnippets, technical | daniel

I just wanted to share with the world some thing which I have just created for a project that I am working on. It is a form which is for inputting an address. I decided that it would be quite good to make it as accessible as possible, so I turned to that great site accessify.com and their wonderful Quick Form Builder. I used the address elements as described in the vCard format, which are also microformat (hCard / adr) friendly. I then used the W3C XHTML Validator and the W3C CSS Validator to make sure that all is in order with XHTML 1.0 Strict and CSS validation. I then further enhanced accessibility by doing very rigorous accessibility checks with achecker.ca adding any elements and attributes that it recommended.

The resultant code looks a bit like this:

<form id="address" action="#YOUR-ACTION-GOES-HERE" method="post">
<div><label title="Post Office Box (if applicable)"  for="txt_PostOfficeBox">Post Office Box</label>
<input id="txt_PostOfficeBox" name="txt_PostOfficeBox" type="text" tabindex="1" /></div>
<div><label title="Street Address (first line)" for="txt_StreetAddress">Street Address</label>
<input id="txt_StreetAddress" name="txt_StreetAddress" type="text" tabindex="2" /></div>
<div><label title="Extended Address (second line)"  for="txt_ExtendedAddress">Extended Address </label>
<input id="txt_ExtendedAddress"  name="txt_ExtendedAddress" type="text" tabindex="3" /></div>
<div><label title="Locality, City, Town or Village" for="txt_Locality">Locality</label>
<input id="txt_Locality" name="txt_Locality" type="text" tabindex="4" /></div>
<div><label title="Region, Province, State or County" for="txt_Region">Region</label>
<input id="txt_Region" name="txt_Region" type="text" tabindex="5" /></div>
<div><label title="Post Code or Zip Code" for="txt_PostalCode">Postal Code</label>
<input id="txt_PostalCode" name="txt_PostalCode" type="text" tabindex="6" /></div>
<div><label title="Country or Nation" for="txt_CountryName">Country Name</label>
<input id="txt_CountryName" name="txt_CountryName" type="text" tabindex="7" /></div>
<div><input type="submit" value="Submit form" tabindex="8" /></div>
</form>

It is:

XHTML 1.0 Strict Valid
CSS Valid (and unobtrusive - as all styling is in a separate CSS file)
Accessible to WCAG 2 AA standards, and probably to a few other standards (it partially adheres to Section 508, because I’ve decided not to set the value attributes)
Semantically enhanced through Microformat (adr) class names (i.e. it is therefore <acronym title=”Plain Old Semantic (X)HTML”>POSH</acronym>)

I’ve got an independent page showing this example available:

Code Example: Address Form

Of course if you’re making the address form as part of a larger form then I would recommend surrounding the address sub-form with the HTML “fieldset” tag.

If we all used address code such as the above then we would be able to ensure conformance with standards (code, accessibility and location standards), it would allow our websites to become machine readable and malleable and would enhance the users experience. That is, of course my opinion, but it does sound quite useful to me. Of course my above form elements could probably be enhanced in several ways, but this was good enough for me and for the project that I am working on.

Oh - on a separate note, its almost Christmas. So if I don’t make a blog post before then a Great Yule, a Merry Christmas, plenty of Seasonal Greetings and a Happy New Year to all!

Daniel

Tags: accessibility, microformats, posh, xhtml

Ping-a-snippet: How to produce XHTML 1.0 Strict Markup with TinyMCE

Tuesday, November 16th, 2010 | codesnippets, technical | daniel

I’ve just found a wonderful blog post which is a tutorial (with downloadable source code) about producing valid XHTML 1.0 Strict using TinyMCE. Very useful indeed for those people like me who love standardisation.

Go get it here:
“How to produce XHTML 1.0 Strict markup with TinyMCE” - https://www.gethifi.com/blog/how-to-produce-xhtml-10-strict-markup-with-tinymce (written by Josh Lockhart of HiFi fame)

I have recently used this for something that I am working on and thought that I’d share my happiness with the world about XHTML Strict code through a Javascript-based HTML editor such as TinyMCE. Of course we have WYMeditor, which are quite good and provide XHTML strict straight out of the box but it needs a bit more work to enable it to be production-ready.

Tags: outbound, standardisation, tinymce, xhtml

More sed trickery

Tuesday, September 28th, 2010 | codesnippets | daniel

I’ve just needed to do the following on the linux command line:

Search through files within a directory for a certain bit of text
Display the unique files with that bit of text in there

This required a bit of jiggery. First I’ll show how you do it using three commands, then I’ll show you the shortcut method which I used which really displays the power of linux/unix command line workflows.

First we have to use grep on the files within a directory, which is done like this:

grep -rin keyword(s) directory > outputfile

The we have to get rid of the non-useful bits that grep outputs:

sed 's/:.*//g' outputfile> outputfile

Finally we have to get all of those unique filenames:

uniq outputfile > outputfile

It is quite neat that we can even do it as I’ve shown above. But here is the Pièce de résistance, the linux/unix piping:

grep -rin keyword . | sed 's/:.*//g' | uniq > outputfile.csv

Here we see I’ve piped the output of grep into sed and then into uniq, which outputs its result into a CSV file… so none of that working-on-and-saving-files-multiple-times malarkey. All your lines are consolidated into one easily manageable line :-)

Hopefully thats quite a neat example of piping, grepping, sedding and uniqing… feel free to use, and feel free to comment!

Tags: commandline, grep, sed

MySQL Find a column

Thursday, August 12th, 2010 | codesnippets | daniel

There are some really nuisance times when you just forget where a column in a database is, or possibly due to architectural reasons you know what an unlinked foreign key is called but don’t know where it is. Well, you can find out using a rather simple SQL statement in MySQL (since version 5.0). See here:

SELECT * FROM `information_schema`.`COLUMNS` WHERE TABLE_SCHEMA = "databasename" and COLUMN_NAME = "columnname";

Yes, there is a hidden database schema called “information_schema“, which contains all the metadata from all of your database schemas on that mysql server. Within information_schema is a table called COLUMNS which has details about the columns of each table. By matching the TABLE_SCHEMA variable with your DBN and the COLUMN_NAME with the name that you think it is you’ll find the tables that you’re looking for. This could also quite easily be a LIKE comparator. Yay for metadata analysis!

To give you an example. In Drupal, each node (essentially an object of any type to put it in Object-Oriented terminology) has an nid (roughly equivalent to a universal Object ID to put it in OO terms again). This is called nid in all tables, but there are no explicit foreign key declarations. So we can do this to find all occurrences of the “nid” column, like this:

SELECT * FROM `information_schema`.`COLUMNS` WHERE TABLE_SCHEMA = "drupal" and COLUMN_NAME = "nid";

Tags: SQL

Removing names from a list of emails

Saturday, August 7th, 2010 | codesnippets | daniel

You know those annoying times when you see a list of emails like this:

“John Smith” <[email protected]>
Harry Bo <[email protected]>
Lillian Frog <[email protected]>
<[email protected]>

And your task is to get rid of everything that isn’t the email itself?

Well look no more! I have a simple sed command which removes everything up to and including the left arrow for each line. Simply run this linux/unix command on a filename and give it an output filename, press enter, and bobs-your-uncle you have a new file with no names at the start.

sed 's/.*<//g' filename.txt > filename.done.txt

You will still have to open up a file and do a find and replace on the right arrow, but it saves some time! I did this for a file containing lots of email addresses a few months ago, and thought that I’d share it with the world. Hope it helps others out there.

Tags: commandline

Vanir Systems Blog

Friday, August 6th, 2010 | codesnippets, interesting, reviews, technical, vanirsystems | daniel

Welcome to the New Vanir Systems Blog.

It is true that I used to have a blog called “Daniels Blog“. I hadn’t posted for a while, and it seemed to become a bit redundant. It will still continue to run in the background, but I’m treating it a bit like an archive now. It will be replaced with this blog, the Vanir Systems Blog.

The mission of this blog is:

To update everybody interested with the work done by Vanir Systems, this will be using the “vanirsystems” category.
To highlight interesting bits of technical news, along with my opinion, using the “technical” category. This may be on the topic of the Semantic Web / Linked Data, or Artificial Intelligence, Programming or Web Development.
To highlight interesting bits of general news, along with my opinion, using the “interesting” category. This may be on the topic of religion, spirituality, theology, philosophy, psychology, or a mixture of science and technology.
To provide code snippets in various programming languages, using the “codesnippets” category.
To provide reviews or previews of products (hardware, software, etc), using the “reviews” category. If you have anything that you would like me to (p)review then please to get in touch, and I am sure that we may be able to work out a deal.

It will attempt to:

Use relevant inbound tags
Use outbound links to relevant information
Be 100% valid XHTML that is reasonably POSH - wish me luck ;-) (maybe with sneaky bits of RDFa or Microformats)

Thank you for reading, and I hope you join me here again soon.

Daniel

If you like my blog and my work then do feel free help me survive by giving a little money. No worries if you can't or don't want to.