The Texas Digital Library has created bi-weekly reports for our Texas Data Repository (TDR) institutional liaisons. The TDR hosts 11 Texas Institutional Dataverses. Researchers at those institutions use the TDR as a platform for publishing and archiving datasets and other data products created by faculty, staff, and students at Texas higher education institutions. The repository is built in Dataverse, an open-source web application.
We collected requirements from our institutional liaisons, one of whom from each institution serves on the TDR Steering Committee. The TDR liaisons wanted to improve the reporting from the TDR. Primarily, they asked to see data from only their institution, as the reporting we’d originally provided showed stats from all operating TDR institutions. Additionally, in their institutional reports, they wanted to be able to parse information about the datasets, the dataverses and the institutional users separately. Finally, they needed us to include both published and unpublished datasets and dataverses in their reports.
With these requirements in mind, TDL Senior Software Engineer, Nick Woodward, and Deputy Director, Courtney Mumma, developed a python3-based tool to generate and email statistical reports from Dataverse (https://dataverse.org/) using the application’s APIs and database queries. As with Miniverse (https://github.com/IQSS/miniverse), the reports require access to the Dataverse database for information not available via the APIs. The reporting tool generates Excel spreadsheets containing three tabs–dataverses, datasets and users for each institution defined as a top-level dataverse. Each institutional liaison receives a report for only their own institution. Our next goal is to create only one report for the entire TDR Dataverse for oversight use by Texas Digital Library staff.
After the release of the reports to the TDR Institutional Liaisons, our Assessment Working Group brought up some specific issues which we will try to prioritize when we have developer resources in 2019. In general, they are interested in creating a data dictionary for the report elements and headers, accessing lists of all users who logged in rather than just dataverse creators, and including the Identifier along with depositor name for datasets. Inconsistencies in the metadata in the reports has also brought up questions about batch updates to metadata.
The TDL will continue to refine and expand on TDR reports in the coming year as time permits. Future planned work includes the ability to generate several charts and graphs that our institutional liaisons can use to synthesize the statistical data in presentations, and create the above-mentioned data dictionary to better define the elements included in the reporting spreadsheet.
See the resources below for more information:
Learn more about Texas Digital Library’s Texas Data Repository at https://www.tdl.org/texas-data-repository/
Visit the TDR at https://dataverse.tdl.org/
Access the Dataverse reports GitHub at https://github.com/TexasDigitalLibrary/dataverse-reports