RDS/WIP Tasks & Changes
Purpose
This page is for keeping track of fine-grained tasks on the RDS/WIP project relating to the configuration and maintenance of this server machine: once changes are made, the task is moved to the end of the page as a change note.
Current Items
Tasks are listed in intended order of execution, given relative need, size and dependencies.
Generated SPARQL Search Pages
Create generated SPARQL search pages for the info-filter, so that they can be directly linked to from the trac.
Update Search Link
Update the search link to the real RDS/WIP rdl endpoint search.
Test StoreManager Changes
Changes made to the storage manager to support the PCA-RDL conversion need to be tested for the effects on the servlet deployment.
Templates
Support for templates/relationships and class of relationship required in default presentation xsls.
Enable Step-by-Step Content for Web Access
Step-by-step content is available via the command line interface, but not via the web interface - enable it for web.
Copyright
Copyright needs to be declared properly for each fragment, and we need a uniform policy for copyright on contributions.
Broken Links Tester
Provide an automated means of exploring all of the links - maybe just use 3rd party tools for this.
Base Model URI
The InfoFilter needs a better way of getting the model and/or we should make sure that the base URI of the model in Jena/MySQL is set up properly.
Error/Warning Handling
Errors and warnings currently go to the console or to the app. server logging mechanism - should intercept errors and warnings etc. for use by the application for inclusion in reporting (perhaps pass variable as document node).
SOAP Fault
SOAP implementation requires SOAP Fault handling for exceptions not related to output stream.
MySQL Driver Stability
The stability of the MySQL driver is a bit of an open question - its prone to catastrophic and permanent failure (until VM restart) if it blows its heap during a query. Why heap is blown is unknown - it could be the Joseki/Jena infrastructure, or it could be the MySQL client side state - probably the former.
Jena JDBC Stability
Jena needs better database connection recovery for MySQL - solution right now is to reboot Jetty when it messes up.
Presentation Performance
Presentation performance is adequate for items with a small number associated instances, but is very poor when the number of related things increases - need to find a way of streamlining the SPARQL queries to reduce latency. Can also improve the template compilation. Note that transforms can be done in parallel, so that is a way of dealing with latency too.
Clear Model Option
A "Clear Model" button is needed in the GWT UI. Also need the union/add/replace option on the upload dialog.
Correct Model/Dataset Use
Current implementation blurs models and datasets, and in fact may well mis-use model names in the API. Need to correct the uploader to use a dataset with default model rather than plain model, if that's what's happenning.
Random Dataset Support
Currently, the implementation only supports manually mapping the Joseki services - we need to find a way of mapping them automatically out of a single database.
Jena MySQL Database Performance
Modify the MySQL specific model setup scripts in the Jena source to generate the required full text indexing over the two different model columns (short & expanded). Consider integrating SDB?
WSDL Support
Provision for reporting a suite of WSDL files concerning an endpoint to Web Servics clients.
DNS Service
Make the server the authoritative source of DNS records for the ids-adi.org domain by installing a BIND service and documenting configuration. Will also need to transfer domain name ownership to IDS-ADI.
Domain Name Mappings
Provide documentation for building mappings such as having http://dm.rdswip.ids-adi.org/data proxy opaquely to http://rdswip.ids-adi.org/servlet/endpoint/dm.
Jetty Logging
Configure the logging for Jetty, J2SE1.4, Log4J and all the other logging components to route to the syslog - currently its routing to stdout/stderr and then getting forwarded to the syslog.
Completed Items
Tasks are listed in order of completion.
Load Latest Data (jbourne 2008-07-07 16:30Z)
Load the latest PCA RDL data into the test endpoint. Loaded data - took approx 1hr - no timeouts or other hassle - inserts into mysql are the limiting factor and seem to pro-rate the upload speed, keeping the connection active.
Union/Add/Replace Upload Option (jbourne 2008-07-07 16:57Z)
The upload form needs a drop down with union/add/replace options to allow the data to join the existing data, without creating duplicates, add to the existing data (duplicate triples allowed) or completely remove and replace the existing data. Actually though, it looks like union is what occurs automatically, unlike RAP, so "add" is currently just like "union" right now.
Result Page for Upload Form (jbourne 2008-07-07 17:40Z)
The result page for the upload form is currently empty - would be nice to say "loaded xxx triples" or some such. Ended up implementing it so that it reports any triples removed and the net triples added.
Integrate Configuration (jbourne 2008-07-07 18:35Z)
Current servlets require individual configuration options - should merge them into the StoreManager I think. Note: couldn't merge into StoreManager due to changes in the servlet API that restrict servlets from finding other instantiated servlets by name - used context params instead.
SOAP Support (jbourne 2008-07-08 02:19Z)
Automated SOAP support on any SPARQL endpoint. Added the SOAP support, but didn't test - just used known operable code.
Load Latest Data (jbourne 2008-07-08 08:00Z)
Load the latest PCA RDL data into the test endpoint, but with changes to use the endpoint as the base URI. Also modified the SQL tables backing this endpoint to support searching on the object column.
MySQL Error (jbourne 2008-07-09 11:42Z)
There's a problem with MySQL handling content from Jena that is flagged poorly with respect to character encoding. Turns out this was solved by restarted mysqld (and jetty didn't recover).
SPARQL to Presentation Binding (jbourne 2008-07-08 7:00Z)
Build some infrastructure to take a URI, apply a set of SPARQL queries to the inherent endpoint and project the results using XSLT into a general XML format, with final XHTML presentation via the scripts similar to the IRM ones. All SPARQL & XSLT should reside in the presentation tree. Working now - very neat little framework - all SPARQL and XSLT integrated and control the XML production from the endpoint.
SOAP Testing (rpatil 2008-07-09 13:00Z)
Test SOAP support using standard client.
InfoFilter Standalone Use (jbourne 2008-07-09 17:40Z)
Would be very useful to allow the info filter to run in standalone mode, as a .jar outside of the server. Done and used in the test scripts committed to svn.
Performance Improvement (jbourne 2008-07-09 22:00Z)
Improved performance dramatically by optimizing the order of the statements in a SPARQL join - shouldn't (theoretically) have any effect, so this is exposing either an issue in the ARQ Jena implementation or MySQL.
Imported Part2 into PCA RDL (jbourne 2008-07-09 22:41Z)
Imported part2 directly into the PCA RDL for ease of use - future this should be managed with named graphs or something similar.
Search Implementation (jbourne 2008-07-10 14:00Z)
Modified implementation so that it is a real, user-oriented search, not a SPARQL query. Modified the scripts to provide an entry point for the search.
Upgraded MySQL 5.0.32 (jbourne 2008-07-11 01:00Z)
Modified implementation so that it is a real, user-oriented search, not a SPARQL query. Modified the scripts to provide an entry point for the search.
InfoFilter Messaging/Debug (jbourne 2008-07-12 03:00Z)
Would be useful to have a way of extracting the steps of the filter as a single streamed zip file, complete with error output, ordered by step number. Actually implemented this as "application/zip" or as "multipart/mixed" so that since most of the data is text it can be delivered either way - very large numbers of "steps" are typical (eg. I think electric motor gives 240 or so).
Cache Implementation (jbourne 2008-07-13 00:15Z)
Extensive VM caching for objects frequently re-used within the context of a single query (compiled XSLTs). Extensive VM caching for objects reused between queries (ARQ queries). Extensive FS caching for SPARQL/XSLT transform results. All caching set up to be automatically cleared on app. server restart. Considerable performance gains. FS caching is incomplete implementation - relies on SHA1 too much. All caching needs an external bypass & flush.
Cache Flush (jbourne 2008-07-13 05:00Z)
Cache flush implemented at /servlets/admin/flush-cache.
Remote Restart (jbourne 2008-07-13 05:30Z)
App. server needs remote restart capability for authorized security principals implemented at user level (not root). Implemented - see /presentation/admin/index.html for details.
Cache Collision Handling (jbourne 2008-07-13 17:00Z)
Also need to put hash collision detection in SHA1 FS caching implementation - done, using simple raw binary data bucket indexing to store full key. Position of key determines name of resource.
Cache Control Handling (jbourne 2008-07-13 18:00Z)
Caching system operates, but needs a bypass option passed through UI. Implemented as a cache parameter that can take the following words: Also, Pragma and Cache-Control HTTP headers with a "no-cache" value will implicitly add a cache=fresh parameter pair. For most browsers, this means that holding down the shift button and pressing refresh will have the expected behaviour (clears any existing cache entries and posts new cache entries).
Idempotency Control (jbourne 2008-07-13 23:30Z)
Added idemopotency control over generated content (ability to suppress meta content about the generation, like dates/times), in order to provide for controled testing of the caching system while using productions that normally include aforementioned meta content.
Presentation Changes (jbourne 2008-07-13 23:30Z)
Modified the main CSS file - table and list content is looking a little better. Also added (bogus) copyright info and some generated summary info - could probably add more summary info.
Authentication Framework (jbourne 2008-07-23 04:30Z)
Added a whole bunch of infrastructure to better support authentication at every level of the OS and services. Basically it is LDAP integrated with PAM, NSS & Apache and configured with PHPLDAPAdmin (in Apache). Lots of little config details to keep track of. Theoretically, this could be opened up for other purposes (eg. authentication for trac server).
Backup System (jbourne 2008-07-28 01:00Z)
An automated, incremental off-site backup system has been put in place for the critical content filesystems.
Security Enhancements (jbourne 2008-07-29 01:23Z)
SSL Certificate purchased from Thawte and installed. All authenticated services moved to https://secure.ids-adi.org, including exposed repository. Shared structures set up for consistent security application between vhosts. Repository authorization rules centrally sourced.
CSS Enhancements (jbourne 2008-07-29 01:23Z)
Added some color; softened the titles, changed the body to a sans-serif font, made links more obvious.
WSDL Support (jbourne 2008-07-30 02:56Z)
Provision for reporting a suite of WSDL files concerning an endpoint to Web Servics clients. Added - still imperfect - need a way of mapping the client request URI (I think). Any endpoint served up by the joseki-soap-lite code should respond to both ?wsdl and /wsdl.
SOAP Failure Fix (jbourne 2008-07-31 03:00Z)
Fixed SOAP failures introduced by WSDL Support.
Added Images (jbourne 2008-07-31 21:00Z)
Added main web page logo image, background image and favourites icon.
Added Heading Fragments (jbourne 2008-08-01 14:00Z)
Fleshed out generated pages with consistent heading fragments, and other pages with some semi-consistent headings and icons from the materials above. Altered CSS to suite.
Added SQL Backup (jbourne 2008-08-01 18:00Z)
SQL Server is now backed up off-site daily (in addition to critical content filesystems as above).
Sun Aug 3 18:21:20 UTC 2008
Repaired Page Generation Error (jbourne 2008-08-03 6:00Z)
Disabled the header fragment in the generated pages since it seemed to be causing strange server side deadlocks (in Apache).
Added System Layout Page (jbourne 2008-08-03 18:00Z)
Added a page describing the general layout and assembly of the system as an operating whole.
Added Account Admin Page (jbourne 2008-08-03 18:00Z)
Added a page describing how to administer accounts on the system, for the four most typical operations: new user, add to group, remove from group and disable user.
Cache Bleed (jbourne 2008-08-08 21:00Z)
InfoFilter Cache is bleeding queries between endpoints - needs to be sealed up. Fixed and tested. Results from different SPARQL query endpoints are independent now, however note that different invoking endpoints shared cached results from the same query endpoints.
ID Generator Extras (jbourne 2008-08-09 21:00Z)
Some extra functionality requested on the ID generator, including: an operating unbind; a popup menu for base URIs; a config xml that shows the configured base URIs and other info; and an info option that shows what has been said about a given ID. All of these have been implemented now, plus there has been a change in the permissions for identifiers - only the original requester can revoke now, and only until they have fixated. Once they have fixated, then other people can bind.
Completed PCA-RDL to RDS/WIP Converter (jbourne 2008-08-18 03:00Z)
PCA-RDL to RDS/WIP Converter largely completed and entered into testing phase on clean room system.
Config, Logging, RDF Writer and other Infrastructure (jbourne 2008-08-19 01:00Z)
Infrastructural refactoring for PCA-RDL converter to support config, logging, RDF writing and non-heap mapping. Solve scalability issues with Jena and configuration and reporting issues with existing IDS-ADI framework.
Converter PCA-RDL to Live RDS/WIP Endpoints (jbourne 2008-08-20 02:20Z)
With the converter code ready and tested on the clean-room install, ran the converter on the server machine with fresh data as at 2008-08-18 from PCA. RDS/WIP SPARQL endpoints are now live for raw HTTP access.
SOAP for RDS/WIP (jbourne 2008-08-20 03:10Z)
Enabled SOAP for RDS/WIP endpoints.
RDS/WIP UI: InfoFilter (jbourne 2008-08-22 01:34Z)
The InfoFilter-based UI (XSLT+SPARQL) is now in place for the RDS/WIP. There is a rudimentary search functionality too, but performance is poor - should be able to be improved with DB tweaking. There is also a lot of "junk" in the data that will have to be carved out with another pass of the converter at some point.