2020-07-07 & 20-07-13 Meeting notes- ETD processes

Date

Jul 7, 2020, continued July 13, 2020

Participants

@Gabe Galson@Annie Johnson (Unlicensed) @Alicia Pucci @Holly Tomren @Stefanie Ramsay (Unlicensed)

Agenda- 20-07-07

Define workflow and clarify roles

SWORD Process, technical

Complete! The next batch can be delivered directly to ScholarShare
Formally transition process from FTP to SWORD through ProQuest (Gabe)
- Overview- this won’t change anything about Christa’s workflow
- Any reason to wait on this?
  - No
- There are always items delivered after the main batch; we’ll have to take this into account before starting the workflow
- There are also corrections and conditions placed on certain items ('wait 6 months on this one', for example)
  - Stefanie has these backed up on the Isilon and has a system for uploading in line with Christa’s conditions. This will require some coordination and further workflow definition down the line.

Define and assign workflow steps- PQ ETD batch delivery processing

Coordinating delivery-Who communicates with Christa? (NA mostly, Alicia if necessary)
1. Christa will do this in the same way, through the ETD listserv
2. The list of graduates should also go to Alicia
3. Discrepancies, and the workflow to identify them (Alicia)
Vetting and removing supplemental files upon delivery (Alicia)
1. What’s the criteria for selection currently? Who was assigned to this last?
  1. Stefanie does this currently, in consultation with Margery and Holly. Margery would bring this to uni counsel if necessary. Stefanie can forward the documentation used.
  2. Note that if copyrighted materials are included, a copyright permission letter is supposed to be included.
2. Should this happen later in the workflow, given the new structure?
  1. no
Embargoes- who applies them (Alicia)
1. A new workflow for this is needed
  1. OpenRefine process is used for XML files. This isn’t applicable to the new SWORD process
  2. This workflow will be based on exports from ETD administrator, which will allow easy identification of embargoed items
    1. Gabe will request ETD administrator access for Alicia, as well as ProQuest support center
Additional QA to conform with MAP and system constraints (Alicia)
1. Does this need to be a separate step, or can it be bundled with step 4?
Publication (Alicia)
QA and metadata standardization, post-publication (Celio)
1. How will this workflow work, generally?
  1. Advisor and committee member names are what’s standardized
  2. DSpace’s browse functionality will make this easier
  3. Open Question: how can we use a metadata field to allow Celio to isolate the latest load.
  4. Note: Celio will be working remotely for the foreseeable future; we’ll have to think about how to coordinate the training
2. Will Celio do this in ScholarShare, or through metadata exports from the system?
  1. Any retraining necessary before next batch? If so, who will do this?
    1. We should schedule a training session for Carla and Celio, get their input on this workflow step
  2. Additional documentation needed?
    1. Yes. Carla will be the one to write that up.
Preservation- Triggered when Celio’s ETD work is complete
1. Generating exports from the system (Alicia? Stefanie?)
2. Application of retention schedule (is this necessary?)
3. Moving exports to the Isilon (Alicia? Stefanie?)

Agenda- 20-07-13

ETD migration from CDM

Plan out and timeline the CDM->ScholarShare ETD migration.
- Alicia and Gabe can start ASAP
  - Conditions
    - New export from CDM needed (Stefanie)
      - Celio needs to wrap up the current batch before the load.
    - Embargoes are clearly documented. Which doc is used for this currently, and is it up-to-date?
      - Tracked through this sheet: https://docs.google.com/spreadsheets/d/13ty_Qkqes69_bly8lDHzekW2vETP62mcPQdI-CmgkRQ/edit#gid=1242336235
    - Inappropriate supplemental files removed from prior batches are clearly marked. In which doc?
      - Take a look at this: https://docs.google.com/spreadsheets/d/13ty_Qkqes69_bly8lDHzekW2vETP62mcPQdI-CmgkRQ/edit#gid=1242336235  https://docs.google.com/spreadsheets/d/1k8s13Ak-wOklduZJKgo0SPR8nrpWrnTMyAVYZMVgsH0/edit#gid=0
    - No in-progress metadata cleanup blocks this. Holly and Stefanie say we’re good to go
Timeline: Goal deadline: end of summer
- Possible complications
  - file names are not reflected accurately in the CDM export. This could slow down uploads
  - Unknown limits on how many items can be uploaded simultaneously
  - Further issues with remote access to Alicia’s desktop and/or the shared drive
    - (Gabe and Alicia already defined a strategy for loads administered remotely with Chin)
  - Do we need to correct any filename issues we find in CDM?
    - No
Any metadata cleanup that must, should, or could happen mid-migration in OpenRefine or another program, while Alicia is formatting the data for import?
- In previous discussions, we agreed this could happen post-load, so that this doesn’t slow down the migration timeline. Still true?
  - Still true.
  - However, if there are obvious issues that are easy to clean up we can pursue this.
- Assign a point person or interested group for metadata questions from Alicia and Gabe during this process (as we did with SWORD mapping, systems testing).
  - Gabe and Alicia will reach out to Holly and Stefanie with questions, they can refer to Michael, Carla, Celio
Workflow assignment
- 1. Export metadata from CDM (Stefanie?)
- 2. Field reformating and renaming in OpenRefine; bulk metadata corrections (Alicia, Gabe)
- 3. Breaking load into discrete batches (all items with supplementals; all embargoed items; all other items, broken into manageable chunks). (Gabe, Alicia)
- 4. Bulk loads (Alicia)

Next steps

Plan to retest and document the entire ETD workflow in prod, from PQ SWORD import through external delivery via ScholarShare's OAI endpoint.
- Holly will be doing this, no need for a discrete step. She’s working with Chad to harvest from the OAI endpoint into Alma, through a similar process used in several other systems. When new MARC records are generated, this will serve as the test described.
Follow up task: update OAI endpoint for other systems that currently harvest our ETDs (NDLTD, etc.)
- Holly and Annie will investigate. Point person and workflow TBD.
Document the workflow and assignments once finalized.
Questions about Carla and Celio’s workflows from Alicia.
- Do we absolutely need to be able to isolate the batches via metadata? In DSpace the fields to be cleaned up are not facetable.
  - Holly: the reason we need them isolatable is so they can be chunked up through the interface for metadata cleanup purposes (for example Celio’s advisor name cleanup). We may want to revisit our front end in terms of the facets.
  - Gabe: one way to do this is to put a per-batch value in a field not featured through the user view.
- Note on workflow order- we tentatively decided Celio would clean up after Alicia publishes, but we’ll reevaluate as we train and talk to him and Carla.
- Note: 2 other institutions that use OR and PQ:
  Augusta University - https://augusta.openrepository.com/
  University of Maryland, Baltimore - https://archive.hshsl.umaryland.edu/
  - If needed we can reach out to them about issues and workflow ideas.

Action items

@Gabe Galson will set up ETD admin access for Alicia

@Gabe Galson will figure out how Celio can isolate individual batches, report back to group.

@Holly Tomren will talk to Carla and Celio, then update this group on how to proceed with the planning of a training, through which we’ll get their input on the workflow structure

@Stefanie Ramsay (Unlicensed) will request a new share, defining the name and access in consultation with Annie and Alicia

@Gabe Galson will think of ways to split out new items from the Preservation exports

@Gabe Galson will schedule a part 2 meeting covering the ETD migration plan

@Gabe Galson will transfer the ETD delivery process from FTP to SWORD

@Alicia Pucci will train Celio and Carla, coordinate on training dates

@Holly Tomren will share the cleanup sheet she used that isolated supplemental files.