Current Items
Tasks are listed in intended order of execution,
given relative need, size and dependencies.
Generated SPARQL Search Pages
Create generated SPARQL search pages for the info-filter, so that they
can be directly linked to from the trac.
Update Search Link
Update the search link to the real RDS/WIP rdl endpoint search.
Test StoreManager Changes
Changes made to the storage manager to support the PCA-RDL
conversion need to be tested for the effects on the servlet
deployment.
Templates
Support for templates/relationships and class of relationship
required in default presentation xsls.
Enable Step-by-Step Content for Web Access
Step-by-step content is available via the command line interface, but
not via the web interface - enable it for web.
Copyright
Copyright needs to be declared properly for each fragment,
and we need a uniform policy for copyright on contributions.
Broken Links Tester
Provide an automated means of exploring all of the links - maybe
just use 3rd party tools for this.
Base Model URI
The InfoFilter needs a better way of getting the model and/or we
should make sure that the base URI of the model in Jena/MySQL is
set up properly.
Error/Warning Handling
Errors and warnings currently go to the console or to the
app. server logging mechanism - should intercept errors and warnings
etc. for use by the application for inclusion in reporting (perhaps
pass variable as document node).
SOAP Fault
SOAP implementation requires SOAP Fault handling for exceptions
not related to output stream.
MySQL Driver Stability
The stability of the MySQL driver is a bit of an open question -
its prone to catastrophic and permanent failure (until VM restart)
if it blows its heap during a query. Why heap is blown is unknown -
it could be the Joseki/Jena infrastructure, or it could be the MySQL
client side state - probably the former.
Jena JDBC Stability
Jena needs better database connection recovery for MySQL -
solution right now is to reboot Jetty when it messes up.
Presentation Performance
Presentation performance is adequate for items with a small number
associated instances, but is very poor when the number of related
things increases - need to find a way of streamlining the SPARQL
queries to reduce latency. Can also improve the template compilation.
Note that transforms can be done in parallel, so that is a way of
dealing with latency too.
Clear Model Option
A "Clear Model" button is needed in the GWT UI.
Also need the union/add/replace option on the upload dialog.
Correct Model/Dataset Use
Current implementation blurs models and datasets, and in fact
may well mis-use model names in the API. Need to correct the
uploader to use a dataset with default model rather than
plain model, if that's what's happenning.
Random Dataset Support
Currently, the implementation only supports manually mapping
the Joseki services - we need to find a way of mapping them
automatically out of a single database.
Jena MySQL Database Performance
Modify the MySQL specific model setup scripts in the Jena
source to generate the required full text indexing over the
two different model columns (short & expanded).
Consider integrating SDB?
WSDL Support
Provision for reporting a suite of WSDL files concerning
an endpoint to Web Servics clients.
DNS Service
Make the server the authoritative source of DNS records for
the ids-adi.org domain by installing a BIND service and
documenting configuration. Will also need to transfer
domain name ownership to IDS-ADI.
Domain Name Mappings
Provide documentation for building mappings such as having
http://dm.rdswip.ids-adi.org/data proxy opaquely to
http://rdswip.ids-adi.org/servlet/endpoint/dm.
Jetty Logging
Configure the logging for Jetty, J2SE1.4, Log4J and all the other
logging components to route to the syslog - currently its routing
to stdout/stderr and then getting forwarded to the syslog.
Completed Items
Tasks are listed in order of completion.
Load Latest Data (jbourne 2008-07-07 16:30Z)
Load the latest PCA RDL data into the test endpoint.
Loaded data - took approx 1hr - no timeouts or other
hassle - inserts into mysql are the limiting factor and
seem to pro-rate the upload speed, keeping the connection
active.
Union/Add/Replace Upload Option (jbourne 2008-07-07 16:57Z)
The upload form needs a drop down with union/add/replace
options to allow the data to join the existing data,
without creating duplicates, add to the existing data
(duplicate triples allowed) or completely remove and
replace the existing data. Actually though, it looks
like union is what occurs automatically, unlike RAP,
so "add" is currently just like "union" right now.
Result Page for Upload Form (jbourne 2008-07-07 17:40Z)
The result page for the upload form is currently empty -
would be nice to say "loaded xxx triples" or some such.
Ended up implementing it so that it reports any triples
removed and the net triples added.
Integrate Configuration (jbourne 2008-07-07 18:35Z)
Current servlets require individual configuration options -
should merge them into the StoreManager I think. Note:
couldn't merge into StoreManager due to changes in the
servlet API that restrict servlets from finding other
instantiated servlets by name - used context params instead.
SOAP Support (jbourne 2008-07-08 02:19Z)
Automated SOAP support on any SPARQL endpoint. Added the SOAP
support, but didn't test - just used known operable code.
Load Latest Data (jbourne 2008-07-08 08:00Z)
Load the latest PCA RDL data into the test endpoint,
but with changes to use the endpoint as the base URI.
Also modified the SQL tables backing this endpoint to
support searching on the object column.
MySQL Error (jbourne 2008-07-09 11:42Z)
There's a problem with MySQL handling content from Jena
that is flagged poorly with respect to character encoding.
Turns out this was solved by restarted mysqld (and jetty didn't
recover).
SPARQL to Presentation Binding (jbourne 2008-07-08 7:00Z)
Build some infrastructure to take a URI, apply a set of SPARQL
queries to the inherent endpoint and project the results using
XSLT into a general XML format, with final XHTML presentation
via the scripts similar to the IRM ones. All SPARQL & XSLT
should reside in the presentation tree. Working now - very
neat little framework - all SPARQL and XSLT integrated and
control the XML production from the endpoint.
SOAP Testing (rpatil 2008-07-09 13:00Z)
Test SOAP support using standard client.
InfoFilter Standalone Use (jbourne 2008-07-09 17:40Z)
Would be very useful to allow the info filter to run in standalone
mode, as a .jar outside of the server. Done and used in the test
scripts committed to svn.
Performance Improvement
(jbourne 2008-07-09 22:00Z)
Improved performance dramatically by optimizing the order
of the statements in a SPARQL join - shouldn't (theoretically)
have any effect, so this is exposing either an issue in the ARQ
Jena implementation or MySQL.
Imported Part2
into PCA RDL (jbourne 2008-07-09 22:41Z)
Imported part2 directly into the PCA RDL for ease of use - future
this should be managed with named graphs or something similar.
Search Implementation (jbourne 2008-07-10 14:00Z)
Modified implementation so that it is a real, user-oriented
search, not a SPARQL query. Modified the scripts to provide
an entry point for the search.
Upgraded MySQL 5.0.32 (jbourne 2008-07-11 01:00Z)
Modified implementation so that it is a real, user-oriented
search, not a SPARQL query. Modified the scripts to provide
an entry point for the search.
InfoFilter Messaging/Debug (jbourne 2008-07-12 03:00Z)
Would be useful to have a way of extracting the steps of the filter
as a single streamed zip file, complete with error output, ordered
by step number. Actually implemented this as "application/zip" or
as "multipart/mixed" so that since most of the data is text it can
be delivered either way - very large numbers of "steps" are typical
(eg. I think electric motor gives 240 or so).
Cache Implementation (jbourne 2008-07-13 00:15Z)
Extensive VM caching for objects frequently re-used within the
context of a single query (compiled XSLTs). Extensive VM caching
for objects reused between queries (ARQ queries). Extensive
FS caching for SPARQL/XSLT transform results. All caching set up
to be automatically cleared on app. server restart. Considerable
performance gains. FS caching is incomplete implementation -
relies on SHA1 too much. All caching needs an external bypass
& flush.
Cache Flush (jbourne 2008-07-13 05:00Z)
Cache flush implemented at /servlets/admin/flush-cache.
Remote Restart (jbourne 2008-07-13 05:30Z)
App. server needs remote restart capability for authorized
security principals implemented at user level (not root).
Implemented - see /presentation/admin/index.html for details.
Cache Collision Handling (jbourne 2008-07-13 17:00Z)
Also need to put hash collision detection
in SHA1 FS caching implementation - done, using simple raw
binary data bucket indexing to store full key. Position of key
determines name of resource.
Cache Control Handling (jbourne 2008-07-13 18:00Z)
Caching system operates, but needs a bypass option
passed through UI. Implemented as a cache parameter that
can take the following words:
- normal: read/write cache
- bypass: do not use or change cache
- fresh: update cache, do not use values from cache
- fetch: use values from cache, do not update cache
Also, Pragma and Cache-Control HTTP headers with a
"no-cache" value will implicitly add a cache=fresh parameter
pair. For most browsers, this means that holding down the
shift button and pressing refresh will have the expected
behaviour (clears any existing cache entries and posts new
cache entries).
Idempotency Control (jbourne 2008-07-13 23:30Z)
Added idemopotency control over generated content (ability to
suppress meta content about the generation, like dates/times), in
order to provide for controled testing of the caching system while
using productions that normally include aforementioned meta content.
Presentation Changes (jbourne 2008-07-13 23:30Z)
Modified the main CSS file - table and list content
is looking a little better. Also added (bogus) copyright info and
some generated summary info - could probably add more summary info.
Authentication Framework (jbourne 2008-07-23 04:30Z)
Added a whole bunch of infrastructure to better support authentication
at every level of the OS and services. Basically it is
LDAP integrated with PAM, NSS & Apache and configured with
PHPLDAPAdmin (in Apache). Lots of little config details to keep
track of. Theoretically, this could be opened up for other
purposes (eg. authentication for trac server).
Backup System (jbourne 2008-07-28 01:00Z)
An automated, incremental off-site backup system has been put in place
for the critical content filesystems.
Security Enhancements (jbourne 2008-07-29 01:23Z)
SSL Certificate purchased from Thawte and installed. All
authenticated services moved to https://secure.ids-adi.org,
including exposed repository. Shared structures set up for
consistent security application between vhosts. Repository
authorization rules centrally sourced.
CSS Enhancements (jbourne 2008-07-29 01:23Z)
Added some color; softened the titles, changed the body to a
sans-serif font, made links more obvious.
WSDL Support (jbourne 2008-07-30 02:56Z)
Provision for reporting a suite of WSDL files concerning
an endpoint to Web Servics clients. Added - still imperfect - need
a way of mapping the client request URI (I think). Any endpoint
served up by the joseki-soap-lite code should respond to both
?wsdl and /wsdl.
SOAP Failure Fix (jbourne 2008-07-31 03:00Z)
Fixed SOAP failures introduced by WSDL Support.
Added Images (jbourne 2008-07-31 21:00Z)
Added main web page logo image, background image and favourites icon.
Added Heading Fragments (jbourne 2008-08-01 14:00Z)
Fleshed out generated pages with consistent heading fragments,
and other pages with some semi-consistent headings and icons
from the materials above. Altered CSS to suite.
Added SQL Backup (jbourne 2008-08-01 18:00Z)
SQL Server is now backed up off-site daily (in addition to critical
content filesystems as above).
Sun Aug 3 18:21:20 UTC 2008
Repaired Page Generation Error (jbourne 2008-08-03 6:00Z)
Disabled the header fragment in the generated pages since
it seemed to be causing strange server side deadlocks (in Apache).
Added System Layout Page (jbourne 2008-08-03 18:00Z)
Added a
page describing the general layout and assembly of
the system as an operating whole.
Added Account Admin Page (jbourne 2008-08-03 18:00Z)
Added a
page describing how to administer accounts on the
system, for the four most typical operations: new user,
add to group, remove from group and disable user.
Cache Bleed (jbourne 2008-08-08 21:00Z)
InfoFilter Cache is bleeding queries between endpoints - needs to be
sealed up. Fixed and tested. Results from different
SPARQL query endpoints are independent now, however note that
different invoking endpoints shared cached results from the
same query endpoints.
ID Generator Extras (jbourne 2008-08-09 21:00Z)
Some extra functionality requested on the ID generator, including:
an operating unbind;
a popup menu for base URIs;
a config xml that shows the configured base URIs and other info; and
an info option that shows what has been said about a given ID.
All of these have been implemented now, plus there has been a
change in the permissions for identifiers - only the original
requester can revoke now, and only until they have fixated.
Once they have fixated, then other people can bind.
Completed PCA-RDL to RDS/WIP Converter (jbourne 2008-08-18 03:00Z)
PCA-RDL to RDS/WIP Converter largely completed and entered into
testing phase on clean room system.
Config, Logging, RDF Writer and other Infrastructure (jbourne 2008-08-19 01:00Z)
Infrastructural refactoring for PCA-RDL converter to support config,
logging, RDF writing and non-heap mapping. Solve scalability
issues with Jena and configuration and reporting issues with
existing IDS-ADI framework.
Converter PCA-RDL to Live RDS/WIP Endpoints (jbourne 2008-08-20 02:20Z)
With the converter code ready and tested on the clean-room
install, ran the converter on the server machine with fresh
data as at 2008-08-18 from PCA. RDS/WIP SPARQL endpoints
are now live for raw HTTP access.
SOAP for RDS/WIP (jbourne 2008-08-20 03:10Z)
Enabled SOAP for RDS/WIP endpoints.
RDS/WIP UI: InfoFilter (jbourne 2008-08-22 01:34Z)
The InfoFilter-based UI (XSLT+SPARQL) is now in place for the RDS/WIP.
There is a rudimentary search functionality too, but performance
is poor - should be able to be improved with DB tweaking.
There is also a lot of "junk" in the data that will have to be
carved out with another pass of the converter at some point.