Manuscript submitted May 26, ; returned to author September 29, for revision; revised manuscript submitted November 23, ; returned to author for minor revision February 4, ; accepted for publication March 25, Most academic theses and dissertations are now born-digital assets i. As such, they often coexist with author-supplied metadata that has the potential for being repurposed and enhanced to facilitate discovery and access in an online environment. The authors describe the evolution of the electronic thesis and dissertation ETD cataloging workflow at a large research library, from the era of print to the present day, with emphasis on the challenges and opportunities of harvesting author-supplied metadata for cataloging ETDs.
The authors provide detailed explanations of the harvesting process, creating code for the metadata transformations, loading records, and quality assurance procedures. I n August , the Cataloging and Metadata Services Department of the Pennsylvania State University Libraries created the Digital Access Team in response to the need to devote more resources to the management of metadata for digital assets. The team began looking at repurposing metadata from other platforms for use in The CAT in October These schemes can be customized for local metadata harvesting.
Examples of MARC records for ETDs before and after the new procedure was implemented are provided, and time savings are quantified on the basis of studies conducted over a twelve-month period three semesters. The paper also describes in detail the mappings created to harvest the metadata, the customizations made to the XSLT crosswalk, and the steps taken to ensure that the metadata batchloaded into The CAT is of sufficiently high quality.
Literature addressing the harvesting of ETD author-supplied metadata for creating MARC records for online catalogs is somewhat sparse, although efforts date back as far as Early harvesting strategies used Perl scripts. McCutcheon et al. Boock and Kunda also described the OSU experience, but focused more on workflow changes and cost savings. Another avenue for acquiring ETD author-supplied metadata was to repurpose data supplied by ProQuest. Although the literature addressed multiple ways to acquire ETD author-supplied metadata, the variable and often substandard quality of this metadata arose as a common theme.
Cataloging of print theses and dissertations TDs at Penn State has historically been minimal level and formulaic. Catalog records generally consisted of the full title, author, date of issuance, a pagination count, degree type, and graduate degree program in a local MARC field. From until , LCSH were added only when a personal name, corporate name, or title of a work were present in the TD title.
Beginning in , full subject analysis was performed and LCSH was assigned only for TDs containing the term Pennsylvania or a local Pennsylvania name such as a town or county in the title. This practice has continued to the present. With this workflow, the average thesis required ten to fifteen minutes to catalog, with an additional five to ten minutes per thesis if referred for subject analysis.
Such a relatively minimalist approach was designed primarily as a balance between providing sufficient access for TDs while minimizing the amount of time spent on complicated subject analysis for what are generally very narrow and specialized subject areas. Penn State University Libraries initiated a pilot project in collaboration with the Graduate School, Information Technology Services, and Digital Library Technologies in the fall of to investigate the possibility of allowing theses and dissertations to be submitted and archived electronically.
Portland State Graduate School | Thesis and Dissertation Information
Cataloging of ETDs began in Because thesis titles in PDF files were sometimes entirely capitalized, copying and pasting proved to be as time-consuming as typing the title from scratch. Cataloging an ETD generally took about half as long as cataloging a print thesis, a time savings due primarily to the efficiency gained through the copying and pasting of data. Starting with a blank template in the local SirsiDynix Symphony ILS, the cataloger used macros line-by-line to fill in constant fields fixed fields, , , , , The cataloger transcribed or copied the title, author, degree type, advisor s , and thesis department as they appeared on the ETD document.
The cataloger took metadata from the ETD server page when it did not appear in the document, such as keywords for the field. The URL provided in the field led to the splash page for the individual thesis. The cataloger provided local authority control for ETD authors and advisors by searching the local catalog for any previous works by the author or advisor and using the form of name found.
Typically, the cataloger would spend the bulk of a month — person-hours cataloging — ETDs after each semester. With a shrinking staff, competing demands for time, and new priorities such as the creation of metadata for digital projects , Cataloging and Metadata Services felt the time was right to transition from a largely manual, title-by-title process for cataloging ETDs to a more automated, batch approach that leveraged the power of harvesting author-supplied metadata.
These are not submitted to ProQuest. At this point, the MARC data can be further manipulated as needed for quality assurance. The edited. These data elements are internally mapped to DC elements. The solution was to change the mapping of the Graduate Program ETD element to a DC element not used elsewhere in the data, coverage, and edit the. The solution was to map degree to a DC element not used elsewhere in the mappings, relation, and then edit the.
The DC element contributor had not been used in the mappings, so Committee was mapped to contributor and the XSLT file was edited to output this data in fields. Much of this could be added to the records with MarcEdit following harvesting, but customizing the XSLT transformation allowed us to add this data as part of the harvest itself.
One of the authors had prior programming experience, but no experience in XSLT coding.
- foreshadowing in macbeth essay.
- Ask A Librarian!.
- in the beginning and other essays on intelligent design.
Return on investment The first customizations made on the XSLT crosswalk handled local non-standard assignments to the DC elements coverage, relation, and rights that are discussed in the previous section. Coverage contained the name of the graduate degree program, such as Architecture, Aerospace Engineering, and Kinesiology. Similar changes were made to other DC elements.merigond.fr/templates/xarahyjax/rencontres-gay-landes.php
Penn State University Libraries
Because degree types belong in MARC , the code in the crosswalk was changed to map it to that field. The DC element rights contained access restrictions. These three remaps were also re-positioned in the crosswalk so that their output displayed in the correct positions within a MARC record. The MARC was a special case: positions 00—05 required a computer-generated, six-character numeric string indicating the date the record was created in the format yymmdd.
A function that could retrieve the current date was required. One of these is a function that retrieves the current date, current-date , but because this function is from XPath 2. This saved coding time, but in the future it may be desirable to convert the entire crosswalk to XSLT 2.
The output of this function yielded the date in the format yyyy-mm-dd-hh:mm. To convert this date into the format needed for MARC , the output was concatenated using three separate substring functions together. This was obtained during the harvest from DC element date.
- Thesis, Dissertation, Performance and Oral Presentation Deadlines Calendar.
- comparing athens sparta essay!
- what is creative writing class in high school.
- battle of dunkirk term paper.
- Main Site Navigation.
- Penn State - Dissertations and Theses - Library Guides at Penn State University.
- super size me essay summary.
The date was in yyyy-mm-dd format and with the use of a substring function, the first four characters were mapped to all of these MARC21 positions. The department began the transition to RDA in early , but thesis cataloging had not yet made the transition at the time of testing. If it matched one of them i. This will be helpful for detecting any future new degree types. Each time a harvest is performed, a visual scan of the records is sufficient to catch these for manual correction and future updating of the crosswalk. Figure 3 shows the XSLT coding for mapping the field.
At Penn State, the field is used to create holdings information for each record during batchload into the catalog. The field contains nine subfields of which eight are constant data, set by local policy and coded directly into the XSLT crosswalk:. This will prevent this subfield from being blank and causing a batchload to fail.
Coding was added to the XSLT crosswalk to handle initial articles in thesis titles. In all other cases, it is set to 0. Another challenge was determining where a title ends and a subtitle begins. Sharretts, Shieh, and French noted that they considered anything following a colon as a subtitle. This decision was made in anticipation of unusual usage of colons in acronyms or for artistic or typographical effects.
Our samples showed that the space following the colon was used in all cases and future testing will determine whether more elaborate coding is warranted. Functionality for mapping additional authors to MARC was retained even though co-authors were not found among any of the samples tested. Unlike author names, thesis advisor and committee member names were stored in the DC element contributor in direct order. The form that ETD authors used to submit their thesis advisors and committee members is in free format, though there are separate areas for the advisors and committee members. In addition to the name, DC element contributor contains the role the individual played following the name and separated by a semicolon and space character.
Adding to the complexity, names as entered by the ETD author sometimes include prefixes Dr. The goal was to get all thesis advisors associated with a thesis mapped to MARC fields with their names in indirect order. This was a particularly challenging and complicated coding task. An Open Archives Initiative harvest of theses was used as a sample to determine the variations found in the DC contributor element.
Each variation was noted and an algorithm was developed to address the most common forms and some of the more prevalent problematic forms. While processing the contributor element, any unusual findings were mapped to the MARC Added Entry—Uncontrolled Name for evaluation after the harvest. Provides access to Penn State electronic theses and dissertations. CAT - Penn State Library Catalog Theses and dissertations can be isolated and searched separately if you use the advanced search options. Full-text access to dissertations and theses written by graduates of CIC institutions are available.
If a title is not available full text you may purchase or request it through Interlibrary loan. Includes dissertations from England and Ireland. Penn State faculty, staff, and students can place requests for books within E-ZBorrow. It contains over 43 million records describing library holdings around the world.
Access PA also allows you to search a number of databases, if you have a library card from a public library. The database contains over 5.
First time users must register. Subjects: Law , Social Sciences.