MYSQL File Analysis for Audio Database
Audio-v7.txt is the complete dump of the oral-histories-v5.xls spreadsheet. (1.4M). This is a much simpler db, so it should be quick to check for completeness, use of fields, etc. Note that, due to Amita's machine-like precision in recording the OH status in the notes (F, L, C, R, T) I was easily able to extract this and put the data into new fields of the corresponding name so that we can operate on that field (filter, sort). In most cases, I've been able to remove traces of the indication in the note field, but some of them remain (which is why you'll see an occasional letter (T, L, etc.) in the note field. )
Audio-fielduse.txt is the listing of the fields with recurring values in fields, along with the count of each occurrence of that value. Decide what fields can be consolidated.
Audio-title.txt is a list of the objectid and titles for the db. Notice that some titles have an added phrase after the person's name (I've separated it with a semi-colon). We should probably move this into another field so that the title field is left with only the person's name.
Audio-dates.txt gives a list of the dates from the date field in the spreadsheet (renamed to date_old) and a new date field which extracts a single date. I've done this so that the interviews can be sorted and filtered by date, which requires the field to be in a true date format. The way that the date field was being used made this impossible because of inconsistencies in the format (sometimes there were two dates, sometimes there were additional notes, and every date was in a non-sortable format). In order to make this change, we need to move the notes and additional dates into the description field. This file is sorted by date so that you can see the problem records (there aren't many) at the top.
Audio-bydate.txt is a file which gives a listing of all interviews by date. This isn't needed for checking anything, but is a useful way to see current state of the db.
We need to decide on the name of this db. I've called it 'audio' rather than 'oral-histories' so that it's a little broader and can accomodate additional resources we might add in the future which aren't just oral-histories, such as speeches, music, discussions, factory sounds!, etc. The db and Drupal presentation of this content type will be the same for all these, so I think it makes sense to choose a broader name for the db than just oral-histories.
- Printer-friendly version
- Login to post comments



