2020-07-07 & 20-07-13 Meeting notes- ETD processes
Date
Jul 7, 2020, continued July 13, 2020
Participants
@Gabe Galson@Annie Johnson (Unlicensed) @Alicia Pucci @Holly Tomren @Stefanie Ramsay (Unlicensed)
Agenda- 20-07-07
Define workflow and clarify roles
SWORD Process, technical
Complete! The next batch can be delivered directly to ScholarShare
Formally transition process from FTP to SWORD through ProQuest (Gabe)
Overview- this won’t change anything about Christa’s workflow
Any reason to wait on this?
No
There are always items delivered after the main batch; we’ll have to take this into account before starting the workflow
There are also corrections and conditions placed on certain items ('wait 6 months on this one', for example)
Stefanie has these backed up on the Isilon and has a system for uploading in line with Christa’s conditions. This will require some coordination and further workflow definition down the line.
Define and assign workflow steps- PQ ETD batch delivery processing
Coordinating delivery-Who communicates with Christa? (NA mostly, Alicia if necessary)
Christa will do this in the same way, through the ETD listserv
The list of graduates should also go to Alicia
Discrepancies, and the workflow to identify them (Alicia)
Vetting and removing supplemental files upon delivery (Alicia)
What’s the criteria for selection currently? Who was assigned to this last?
Stefanie does this currently, in consultation with Margery and Holly. Margery would bring this to uni counsel if necessary. Stefanie can forward the documentation used.
Note that if copyrighted materials are included, a copyright permission letter is supposed to be included.
Should this happen later in the workflow, given the new structure?
no
Embargoes- who applies them (Alicia)
A new workflow for this is needed
OpenRefine process is used for XML files. This isn’t applicable to the new SWORD process
This workflow will be based on exports from ETD administrator, which will allow easy identification of embargoed items
Gabe will request ETD administrator access for Alicia, as well as ProQuest support center
Additional QA to conform with MAP and system constraints (Alicia)
Does this need to be a separate step, or can it be bundled with step 4?
Publication (Alicia)
QA and metadata standardization, post-publication (Celio)
How will this workflow work, generally?
Advisor and committee member names are what’s standardized
DSpace’s browse functionality will make this easier
Open Question: how can we use a metadata field to allow Celio to isolate the latest load.
Note: Celio will be working remotely for the foreseeable future; we’ll have to think about how to coordinate the training
Will Celio do this in ScholarShare, or through metadata exports from the system?
Any retraining necessary before next batch? If so, who will do this?
We should schedule a training session for Carla and Celio, get their input on this workflow step
Additional documentation needed?
Yes. Carla will be the one to write that up.
Preservation- Triggered when Celio’s ETD work is complete
Generating exports from the system (Alicia? Stefanie?)
Application of retention schedule (is this necessary?)
Moving exports to the Isilon (Alicia? Stefanie?)
Agenda- 20-07-13
ETD migration from CDM
Plan out and timeline the CDM->ScholarShare ETD migration.
Alicia and Gabe can start ASAP
Conditions
New export from CDM needed (Stefanie)
Celio needs to wrap up the current batch before the load.
Embargoes are clearly documented. Which doc is used for this currently, and is it up-to-date?
Tracked through this sheet: https://docs.google.com/spreadsheets/d/13ty_Qkqes69_bly8lDHzekW2vETP62mcPQdI-CmgkRQ/edit#gid=1242336235
Inappropriate supplemental files removed from prior batches are clearly marked. In which doc?
No in-progress metadata cleanup blocks this. Holly and Stefanie say we’re good to go
Timeline: Goal deadline: end of summer
Possible complications
file names are not reflected accurately in the CDM export. This could slow down uploads
Unknown limits on how many items can be uploaded simultaneously
Further issues with remote access to Alicia’s desktop and/or the shared drive
(Gabe and Alicia already defined a strategy for loads administered remotely with Chin)
Do we need to correct any filename issues we find in CDM?
No
Any metadata cleanup that must, should, or could happen mid-migration in OpenRefine or another program, while Alicia is formatting the data for import?
In previous discussions, we agreed this could happen post-load, so that this doesn’t slow down the migration timeline. Still true?
Still true.
However, if there are obvious issues that are easy to clean up we can pursue this.
Assign a point person or interested group for metadata questions from Alicia and Gabe during this process (as we did with SWORD mapping, systems testing).
Gabe and Alicia will reach out to Holly and Stefanie with questions, they can refer to Michael, Carla, Celio
Workflow assignment
1. Export metadata from CDM (Stefanie?)
2. Field reformating and renaming in OpenRefine; bulk metadata corrections (Alicia, Gabe)
3. Breaking load into discrete batches (all items with supplementals; all embargoed items; all other items, broken into manageable chunks). (Gabe, Alicia)
4. Bulk loads (Alicia)
Next steps
Plan to retest and document the entire ETD workflow in prod, from PQ SWORD import through external delivery via ScholarShare's OAI endpoint.
Holly will be doing this, no need for a discrete step. She’s working with Chad to harvest from the OAI endpoint into Alma, through a similar process used in several other systems. When new MARC records are generated, this will serve as the test described.
Follow up task: update OAI endpoint for other systems that currently harvest our ETDs (NDLTD, etc.)
Holly and Annie will investigate. Point person and workflow TBD.
Document the workflow and assignments once finalized.
Questions about Carla and Celio’s workflows from Alicia.
Do we absolutely need to be able to isolate the batches via metadata? In DSpace the fields to be cleaned up are not facetable.
Holly: the reason we need them isolatable is so they can be chunked up through the interface for metadata cleanup purposes (for example Celio’s advisor name cleanup). We may want to revisit our front end in terms of the facets.
Gabe: one way to do this is to put a per-batch value in a field not featured through the user view.
Note on workflow order- we tentatively decided Celio would clean up after Alicia publishes, but we’ll reevaluate as we train and talk to him and Carla.
Note: 2 other institutions that use OR and PQ:
Augusta University - https://augusta.openrepository.com/
University of Maryland, Baltimore - https://archive.hshsl.umaryland.edu/If needed we can reach out to them about issues and workflow ideas.