Steps to build the Drupal repository
Here's an outline of the steps to build the new Drupal repository and migrate the pp (PastPerfect) data to it:
1. Import pp data into spreadsheets and do a 'phase I' review and manipulation of data. The import has been done (for all but the 'Objects' data set), and is described below. The phase I review is where we decide what fields to keep, which to merge, and do some wholesale changing and cleaning up of the data. More on this later...
2. Load the data into mysql tables for phase II review and processing. Once the data is in a proper databases (mysql) it becomes possible to work with it in a much more flexible way, and to put it on the web so that we can easily access subsets of the records and fields from web pages. This will enable other staff at LHC to access the data while we're working on the migration project. (They can also use pp for viewing, just not changing) Also, we can easily concatenate fields, make substitutions, split data into tags, etc.
3. Create new data structures in Drupal. New content types need to be created for each category of data (accessions, images, books, archives, oral-histories, and objects). The fields for each have to be precisely defined. This first involves a detail mapping of the fields which we create from step 1. Also, we develop the taxonomy system and menu structure that will let us get to the records.
4. Import data into Drupal. Once the new content types are created, we import the data from each of the mysql tables into Drupal. At this point, the data is now part of the Drupal system and can be accessed, edited, listed, searched, etc.
5. Create the file system and linkages to connect the Drupal records to the underlying file objects (scans, oral-history files, archive finding aid pdfs, etc.)
6. Create views, printouts, access control, etc. This will be an ongoing process as we refine the system.
7. Develop workflow processes and train the staff.
It's very important that we don't go backwards in this progression, as each step is very time-consuming. Having moved the data out of pp and into spreadsheets, we can no longer import changes to the pp system. Once we import the spreadsheet data into mysql we can't make changes to the spreadsheets.
I've imported the pp data to spreadsheets in order to do the 'phase I' review of the data. Below are links to the spreadsheets. Download these into a new folder on your machine so you have a local copy to play with.
The accessions file lists all of the accessions. This data will be split into two spreadsheets: contact info, which will go into CiviCRM, and accession records, which will go into Drupal as a new content type "Accessions". We need to decide what goes where.
The archives file contains everything from the 'Archives' set in pp, except the oral histories, which have been moved into their own file. There will be 2 new content types in Drupal, one for archives and one for oral histories. If we are likely to collect other audio besides oral histories (eg. speeches, recordings of events, factory sounds, etc.) then maybe we should broaden this new content type to 'audio'.
The images file contains everything from the 'Photos' set in pp. I've changed the name to images to reflect the fact that images other than photos belong in this new content type, like maps, drawings, etc.
Last, the books file contains everything from the 'Library' set in pp. I've changed the name to books because I thought 'library' was too general. In a sense, everything is part of the library. If you think 'books' is too specific (because it includes pamphlets) then perhaps we should change it back to library, or something else.
Here are the spreadsheets, they're all version 1 (v1):
http://lawrencehistory.org/files/misc/db/accessions-v1.xls
http://lawrencehistory.org/files/misc/db/archives-v1.xls
http://lawrencehistory.org/files/misc/db/oral-histories-v1.xls
http://lawrencehistory.org/files/misc/db/images-v1.xls
http://lawrencehistory.org/files/misc/db/books-v1.xls
Start by just looking at the data - don't change anything. I've done some initial processing by deleting many fields (columns) which were not being used. Also, if every record (row) contained the same data for a field I deleted that field (eg. all records in the oral history collection had the same name for 'collection' so I deleted the field. We can still add it back to the final database, but it's simpler to work with the databases if we remove fields that don't contain distinguishing data.). If a field had entries for only a few records I moved that data into another field and deleted the field (column).
The spreadsheet representation of the data has the advantage of being able to see all of the data and navigating easily around it, scrolling up and down to see how a particular field is being used, and left and right to see how a particular record uses fields.
Next, we'll move the data into mysql and make it available on the web. We'll also nail down the final field mapping and I'll create the new Drupal content types. (Steps 2 & 3). This schedule could be thrown if we hit snags, so set expectations that there may well be delays.
One of the most critical parts of the review at this stage is to determine what fields we're going to use, and how we're going to blend fields.
This is a BIG project, but I think we'll have a killer system when it's done. The entire repository will be accessible from anywhere in the world, and we'll have the ability to allow Lawrencians to explore their history from their web browser. It should be a relatively easy next step to add their photos, artifacts and oral histories to our repository.
- Printer-friendly version
- Login to post comments



