[GSoC] Usuful info from dereferenceable URIs

0

Posted by myr | Posted in GSoC_2009 | Posted on 26-06-2009

In this blog post I’ll make references to my last weekly status report blog post.

I discovered that what I called ‘fromFile’ or fromSource’ properties are already always included in every sensor data instance within the ‘Resource’ field LOL.
However there are some cases in which such properties should be included intentionally because of their utility, for example as a sensor retrieves information about a developer’s focus changing from one source to another one. Then I’ll still check if those properties are available.

Today I made my Restlet server providing more complex RDF data when someone looks up for resource of type ‘File’. This took me a little bit of time but I should be faster with all the other resource types as the mechanism is similar, and I’m going to explain some clue part of it here.
As I have a statement to be added to the model, I check if a statement with the same subject and predicate has already be added, and in case of a positive answer, I add the new one (substituting then, the previous one) only if the time at which that old one statement has been collected (that is the timestamp of its related sensor data) is equal or less recent than the collection time of the new one.

Now every URI is dereferenceable but the information provided by the server for each URI is very poor with the except of the resource types project, user, sensor data type, sensor data, file. I have to enrich info provided for all the other resource types.

Example of info provided as asking for the resource URI:
http://localhost:9875/linkedservicedata/source/file/__home__myrtill__Hackystat_linkedData__mysqlProva__workspace__hackystat-linked-service-data__src__org__hackystat__linkedservicedata__resource__sensordata__SensorDataResource.java

(every ’slash’ within the ‘fullPath’ field is substituted with a sequence of two consecutive underscores)
(consider that I manually sent very few sensor data information and a little set of all the properties that could be set have been set, so this server answer has only a little set of all the information potentially providable).

(I’ve deleted all the ‘<' and '>‘ because of Wordpress conflicts)


http://localhost:9875/linkedservicedata/projects/myrpandemon@yahoo.it/Default a http://usefulinc.com/ns/doap#Project ; http://dasha.ics.hawaii.edu:9875/linkedservicedata/vocab/ended "2010-01-01T23:59:59.999-10:00" ; http://dasha.ics.hawaii.edu:9875/linkedservicedata/vocab/modified "2009-05-24T18:02:33.401-10:00" ; http://usefulinc.com/ns/doap#created "2000-01-01T00:00:00.000-10:00" ; http://usefulinc.com/ns/doap#description "The default Project" ; http://usefulinc.com/ns/doap#maintainer "myrpandemon@yahoo.it" ; http://usefulinc.com/ns/doap#name "Default" .

http://localhost:9875/linkedservicedata/source/file/__home__myrtill__Hackystat_linkedData__mysqlProva__workspace__hackystat-linked-service-data__src__org__hackystat__linkedservicedata__resource__sensordata__SensorDataResource.java a http://dasha.ics.hawaii.edu:9875/linkedservicedata/vocab/File ; http://dasha.ics.hawaii.edu:9875/linkedservicedata/vocab/ClassFileName "SensorDataResource" ; http://dasha.ics.hawaii.edu:9875/linkedservicedata/vocab/classCount "1" ; http://dasha.ics.hawaii.edu:9875/linkedservicedata/vocab/functionCount "42" ; http://dasha.ics.hawaii.edu:9875/linkedservicedata/vocab/project http://localhost:9875/linkedservicedata/projects/myrpandemon@yahoo.it/Default ; http://dasha.ics.hawaii.edu:9875/linkedservicedata/vocab/totLines "1300" .

[GSoC]The LinkedServiceData Project

0

Posted by myr | Posted in GSoC_2009 | Posted on 11-06-2009

As stated in my last weekly status report blog post I gave up with mapping files and relational databases. Today I finished a class which provides RDF models including all the data retrievable from the sensorbase (I used the SensorBaseClient) which are Project, Sensor and User data. It checks every time if the user logged in has the rights to access those data required to construct a proper model. In particular it’s possible to get out:

  • all the sensor data for a given sensor data type and user (which must be the same of the logged-in user)
  • all the user data for a given user (which must be the same of the logged-in user or must have the role of ‘Project Manager’
  • all the project data for a given user (which must be the owner, spectator or member of that project

Additionally all the RDF models available in cache and representing data accessible to the given user, are unified with the model containing one of the provided information, described above. It’s used the UriCache provided by the Hackystat Utilities component (which uses Apache JCS), and models are grouped by user names. When a request for a model containing a certain kind of information arrives, the system firstly check if the uri obtained in one of the following ways

  • for user data: userName(replacing the ‘@’ with ‘_at_’ to avoid the arising of possible conflicts with the O.S. because of rules in creating file names)
  • for project data: userName(as above)/projectName
  • for sensor data: userName(as above)/projectName/sensordatatype

is stored in the group cached data identified with the logged-in user name.
I know that there are lots of additional filters that could be specified to retrieve sensor and project data, such as the time stamp, but for the moment, I prefer to limit to these ones because I’m imagining a situation in which a user explore RDF data through a graph visualized in the GUI. He firstly visualize only the nodes representing classes in that graph and then clicking on “one” class (such as a sensor data type or the project or user class) he expand it to view all the related data plus all the other data avilable in cache, in such a way that even if it’s not possible to visualize the whole graph for performance reasons, he’s not limited to the graph portion selected but can although view other little portions.

Also I know that Apache JCS is not thread safe and I’m going to treat this issue from application level as the DPD service already does, that is through a HashMap between user names and client instances (there should be only one client instance for a unique user name). In the case of LinkedServiceData (I finally decided to call so my component) there would be as many user/client maps as the number of services from which is possible to retrieve useful RDF data (which currently is only the sensorbase but I’m going to implement RDF representations also for the DPD, Telemetry and maybe Tickertape data, too).
I’m going to organize all the packages etc. following the DPD project directives.

[GSoC] D2RQ mapping and the subClassOf relation

0

Posted by myr | Posted in GSoC_2009 | Posted on 04-06-2009

In the previous post I exposed the problem I was trying to challenge, which consists in using the sensordata table’s ’sdt’ attribute to refer to the proper sdt class.
Today I found that:

  1. within the sensorbase db the sensordata table’s ’sdt’ attribute is not a foreign key referencing the sdt table’s primary key ‘name’. Then if I have to retrieve sdt info stored in the sdt table given a sensordata instance, I have two choices:
    1. changing the db schema in such a way that the sdt attribute becomes a foreign key referencing the sdtName attribute
    2. every time there’s a needing to retrieve info for a particular sdt, I could forward the request to the sensorbaseclient or create an ad-hoc sparql query, referring directly to the sdt table, without using any connection with the sensordata table. Then I could insert the retrieved info in the Jena model.
  2. Planning the RDF schema I thought of inserting the existing SDT into a hierarchy according to their meaning. If in future will be created new SDT they could be represented in the hierarchy as direct children of the first common parent of all the sdts, which currently is the ’skill’ class.
  3. the relationship that has been planned to exist between the class representing the existing sdts and rawsensordata is a rdfs:subClassOf relation. RawSensorData is the superclass and every sdt is a sub-class (a particular kind) of it, inheriting all its properties and adding new ones. However this relation is hard to be represented in the mapping file, because within the sensorbase db there is not a table for each distinct sdt and because, at last but not at least, D2RQ does not support the rdfs:subClassOf predicate. It does not support it, because, according to what stated by one of its creator:

    “as long as there are no reasoners around that can handle the big data
    sets onto which D2R Server is aiming, we won’t integrate a reasoner into D2R
    Server and first try to get the data integration issues right. Maybe
    reasoning is also not a task for a RDF data server, but more a task for
    client applications, which want to do smart stuff with the data.”

    It’s suggested a trick to avoid the problem:

    A hack that we frequently use to handle basic subsumption hierarchies is to
    add additional rdf:type property bridges to the mapping for all desired
    superclasses.

    that is:

    map:PersonsClassMap a d2rq:ClassMap ;
    d2rq:uriColumn "Persons.URI" ;
    d2rq:dataStorage map:Database1 .

    map:PersonsType a d2rq:PropertyBridge ;
    d2rq:property rdf:type ;
    d2rq:pattern "http://annotation.semanticweb.org/iswc/iswc.daml#@@Persons.Type@@" ;
    d2rq:belongsToClassMap map:PersonsClassMap .

    in the mapping file, when in the schema subclasses of the person class are declared, as professor, employee etc.

Tomorrow I’ll better understand how the trick used to avoid the subClassOf issue works, and I’ll try to query the sensorbase client to get sdt info, given a sd instance, trying also to simulate from code other subClassOf relationships (I would have to add some class that are not contemplated in the db, manually to the Jena model).