Linking Bug novelties - upadate 1

0

Posted by myr | Posted in GSoC_2009 | Posted on 18-07-2009

Well,
now I have published a Sparql endpoint only for resources of interest which are “users”, “projects”, “issue”! ^__^

Linking Bug novelties

1

Posted by myr | Posted in GSoC_2009 | Posted on 17-07-2009

Hiiiii everyooneee!
there are lots of novelties here!

Thanks to Daniel on the Baetle’s ml I found Launchpad: a code hosting platform which offer lots of functionalities to developers (more than what is offered by Google code hosting system although it’s inexplicable less spread)for example linking a bug with projects affected by him and other bugs if a user marks it as already fixed elsewhere. Even if they don’t adopt any automatic algorithm to link bugs, this is the only existing tentative of bugs linkage and offers lots of suggestions to my work. Moreover, considered that there’s no other dataset available online that allows clients to browse their RDF bug data through a Sparql endpoint or particular dereferenceable URIs, I can link Hackystat bugs not only to bugs of other Hackystat servers but also to bugs on Launchpad using their Rest Api (even if they’re neither RDF nor Linked Data they’re better than nothing, and Launchpad has a veeeeery “large” dataset).

Moreover I’ve read also an interestin article on this well known “Web of Data Discovery” issue published on the Nodalities Magazine. There it’s written:

As we have learned from the Web of Documents and SEO, the basic idea is to let the owner of a linked dataset do the ‘annotation part’ (i.e. telling about the content and the features of her dataset) and then let a search engine - or semantic indexer, for what it’s worth - do the dirty job of crawling these

This annotation part can be performed using the Vocabulary of Interlinked Datasets (voiD) which is a vocabulary aimed to bridge data publishers and data users, so that users can find the right data for their tasks more easily using the voiD description about a linked dataset. With discovery of datasets we mean the identification of datasets given certain attributes, trying to answer the question: given a set of attributes, which available resources match the desired set and what is their location? They’re trying to take the problem I’m encountering right now: how could I reach the other available datasets online related to the mine?

As soon as the search engine has the RDF triples from the voiD description in its index it can answer arbitrary complex queries such as :

SELECT ?dataset FROM http://sw.joanneum.at/void/demo/
demo3.rdf {
?dataset a void:Dataset;
dc:subject http://dbpedia.org/resource/
Proceedings . ?datasetSrc a void:Dataset;
foaf:homepage http://dbpedia.org/ ;
void:containsLinks ?linkset .
?linkset void:target ?dataset.

which will return all datasets that have data about proceedings and which are linked from DBpedia.
In this way my doubt about how to find dynamically all the other datasets relate-able to mine are solved!
According to that article, to make my dataset indexed by semantic search engines, I have to publish a sitemap with a particular sitemap extension containing a reference to the document describing my dataset by means of the voID vocabulary.

More-more-over I’ve found good suggestions about content-negotiation and cool uris definition.

I have finally a better defined idea of a Bug Report (thanks to Launchpad) and a mock-up is coming soon at least for what regards Issues.

Well, today I’m going to make Hackystat LiSeD servers able to let clients browse Hackystat data (I’ll try through a Sparql endpoint, otherwise I’ll use an URI lookup endpoint).

stay tuned ;)
myriam

P.S.
To summarize the main objectives to reach in the less time as possible are:

  • find projects having the same tags
  • find users with a specific level of Karma (equal or greater or lesser than a value) and/or “knows” something
  • find issues marked as ‘duplicate’ by users or having the same tags

Linked Bug Data

0

Posted by myr | Posted in GSoC_2009 | Posted on 16-07-2009

I’m trying to understand:

  1. how I could link bug data coming from a Hackystat server with bug data coming from external datasets
  2. which are these datasets and how could I retrieve their hosts
  3. how could I know the proper uri to use their own sparql endpoints (necessary to browse their internal stored data)
  4. how could I allow at least Hackystat servers to browse their bug data with each other; which means how could I implement a sparql endpoint or use an already existent tool even if I don’t have any database (there are lots of tools out there providing also sparql endpoints but only if linked to a database)

Do you know how bugs can be linked? and tests? and development phases? No one has already linked them and no one knows. To search for linkable data which not adhere to a simple pattern such as the one used for books for example. I need a Sparql endoint, which generally requires a database (all the tools providing them requires a database), relational or not, with which communicate through a mapping or directly, in which are stored data. But I was discouraged to use a database. Consequently I hope it’s not so, but it could be necessary for me to handle by myself the sparql syntax translating queries to SensorBase Api calls.

Moreover I need to have at least linked data about bugs and tests implemented within the 25th of July; then I should have also linked data about users and projects within the end of July. While implementing them I’m going also to refine the Api to make the server following the official rules to construct URIs and publish ontologies stated at http://www.w3.org/TR/chips/ and at http://www.w3.org/TR/swbp-vocab-pub/ .

HOWEVER talking about more concrete things, this is the point reached currently by me while trying to clarify the doubts listed above:
Since now the only way to link a bug with another external one, consists in searching external datasets for bug descriptions containing the URI of the bug that you want to link, within their ’summary’ or ‘description’ fields.
I’m searching for an alternative way to determine if two bugs are linkable, because this one imply that external developers know the URI of my own bug a priori, otherwise they couldn’t insert it in the description of their bugs, and I find this implication a little bit too constrained.
However even if finally I’ll use this mechanism to search for linkable external bugs, I need a mechanism to retrieve all those external datasets. Even retrieving the other hackystat servers could become a problem. I’d like to avoid having a static list of datasets to crawl..I’d like to use a more dynamic way to list them…a list which is enlarged as new datasets become available and which is available itself to everyone. What I’d like to have is a framework such Umbel that provides subject concepts of which datasets are instances according to their topic. Then instead of search for a list of datasets related to the ‘Bug’ topic, I could simply declare myself of being an instance of the Umbel subject concept ‘Bug’ and then crawl or the other instances of that same concept. but I don’t know if this is actually realizable and I’ve asked for it on their mailing list, still waiting for an answer.
if this will be not realizable I’ll need a static list of hosts inviting at least the users of other Hackystat servers to add themself to this list.

Then I have to better focus on ways to link tests.

I’m really busy.
cheers & cheers
myriam