# Opening Digital Data to an Analog System

The organization for whom this proposal is submitted is an archive and association dedicated to serving the historical needs of Norwegian-American immigrants and their descendants.  Funded through donations by the public whom they serve, this association is constructed around the principles of collecting and preserving information about Norwegian-American history and culture in order that they may provide this information to the surrounding public, many of whom are still of Norwegian heritage.

The collections at the association have not been able to keep up with the evolving nature of information, and have severely fallen behind on digital collections.  Due to this, records of recent importance to Norwegian-American culture and heritage cannot be accessed through the association’s archives.  Working with collaborating organizations already in possession of such digital systems, this proposal will give access to the association to bolster its deficient records.  Without keeping concurrent with newer information, the Norwegian-American heritage the association wishes to serve will become outdated to any cultural study of today.

The proposed plans include such modifications to existing systems as:

• Increasing archival staff or workhours to accommodate increased catalog needs
• Establishing best practices for a shared catalog
• Creating a dedicated server to handle increased traffic of digital information
• Increasing digital storage space available
• Improving the database management system to optimize flow of information

These initial steps will bring a steady stream of digital information into the association’s collection, and facilitate informing its patron population of more modern events.  This step is crucial to maintaining a collection relevant to the ephemeral nature of culture.

# Site Introduction

The site I have chosen is an archival association.  It is closely tied to a private liberal-arts college; however, it is a separate entity with its own board of directors, funding, and a full-time director.  The archival association and college in question do not wish to be listed by name.  Instead, they will be hereafter referred to as “The Association” and “The College” respectively. A third stakeholder, a special collections library, hereafter “The Library,” is also involved in this program and, similarly, has its own board, director, and budget while being very closely tied to the College. “The Library” should not be confused with the official library of The College, wherein the archivists and access services will be considered as well.  Therefore, to avoid confusion, any reference to library services for the unnamed college library will be referred to also as “The College.”

The College, Library, and Association are so closely linked because they simultaneously serve the same patrons.  All stakeholders are located on the campus of The College, and are open to any of the students.  Being specifically designed to serve the students, the official library of the College serves most undergraduate needs.  The Library is a world renowned special collection, and as such draws many scholars to it.  These scholars have access to the other two stakeholders for any materials needed to round out The Library’s resources.  The Association has focused its access services towards the local community in the surrounding area.  This community is not academically affiliated, but a local population needing the specific information about region and people’s history held within the Association’s catalog.  All three maintain open access to one another’s catalog to provide better access to materials to the patrons.  Crossover in access services has created enough culture of sharing resources amongst the three stakeholders that current plans are being drafted to build a specific special collection building to house the Library, Association, and rare materials of the College in one place.[1]

This project focuses specifically on the needs of The Association.  There will be much discussion about the other two stakeholders as the three can coordinate on a unified plan, but one issue in particular will be determining success.  The Association has extensive records prior to the late 1980’s.  Thereafter, the inclusiveness of the records kept by The Association drop dramatically, and records of events in the 21st century are next to nothing.  This is due to The Association having virtually no systems in place for collection or analysis of digital information.  The affiliated College has many effective techniques and systems for this collection and archiving, but there is not one cohesive catalog by which The Association can assemble its own records from those resources at The College.  To bridge this gap, I have proposed two plans, one to simply share records already accrued by The College to both The Association and The Library.  The second plan is how The Association can tap into the income of digital data to create its own set of records and catalog using the raw information incoming.

# Technology Needs

Although the Association is the focus of this project, the College is by far the most established and best funded.  Similarly, the College has the most thorough network established, and so this project proposes the Library and the Association to adopt the College’s system and be given greater access and rights to that network.  The College houses its own servers to host the network and the Library’s web presence is structured as a section of the College’s website.[1][2]

The Association does maintain its own website, but this site is still registered under the College’s network.[3]  As this de facto connection of technology services already exists, the computer and networking needs of the Association are mostly met already.  What is needed by the Association is a better system by which they can collect, archive, and catalog digital data and materials as they come to the College or Library.  The Association is rather antiquated and does not have a robust system of materials collection.  The system to collect any digital materials is nearly nonexistent.  The bifurcated plan listed in the next section does create two different sets of computing needs through two approaches to solving this problem.  Therefore, the physical needs of the Association for this proposal will be deferred to the technology costs section.

# Technology Plan

Plan A

I explore two parallel plans in this proposal.  For simplicity, I will call these plans A and B.  Plan A will be to centralize the digital archival resources under the College, and the Association and Library will share the results of these archived materials.  One central archivist specializing in digital materials will be responsible for sifting through all incoming materials.  This archivist will work through The College’s official library archive department.  This is for three reasons.  First, The College has the greatest systems and resources to collect digital materials and is the main repository of digital information for the patrons of the three stakeholders.  Secondly, although The Association is designed to serve the needs of the population, the reach and reputation of The College far outweighs that of The Association.  Thereby, any attempt to broaden the reach of digital material collection would have the greatest affect coming from an archivist working within The College.  Lastly, The College has the greatest amount of IT support as well as owning and maintaining the network used by the three stakeholders.  Therefore, any IT support needed would be most easily and cost effectively obtained should the digital materials pass through The College.  Plan A would also allow the materials to be cataloged once and to be held on one storage system.  This again reduces overhead of the project by removing needs for the Association or Library to invest in further terminals and hard drive banks.  Although neither the Association nor Library will own the materials, the collaboration and open catalogs these three stakeholders share would give both organizations access to the materials without increasing their operating budgets.

In addition to the need for The College to have a centralized archivist, plan A will include a necessity for a specific server to the digital archive materials.  Due to the nature of digital materials prominence over physical materials for most information entering databases and archives for modern records, plan A’s centralized system will require all uses of modern records for three research facilities to travel through one database.  This requirement would be very taxing for The College’s current system to handle without augmentation, yet an additional server dedicated to this task alone should be sufficient for the foreseeable future.

Plan B

Should the boards of any of the three object to the first proposal, plan B is for the existing network of The College to work as a hub to the three organizations of raw digital data collected anywhere across the three stakeholders.  This plan still requires The College to act as collector and repository of digital data; however, The College already completes these tasks in its own digital archiving methods.  What this plan removes is the need for The College to brunt the cost of analyzing and archiving the materials before sending the collections to the other two stakeholders or to store and preserve the archived information.  Each entity will be responsible to employ its own archivist to analyze the raw materials, create a specific catalog entry for that institution, and create the system for storage and retrieval.  Plan B does duplicate much of the labor hours required by each of the institutions to maintain the materials, yet it removes the need to establish a server dedicated to these materials and it distributes the cost amongst the three institutions instead of leveling all costs onto The College.  The other advantage in this second plan is that each institution has the chance to customize the collections and catalogs to specifically meet the needs of their intended patrons.

At this time, any plan C for The Association to conduct its own gathering of digital data would be impractical.  The organizational overhaul it would require of The Association added to the likelihood that The College will be willing to share at least its raw materials of digital information give no reason to suspect that The Association needs to plan for such independence.

# Technology Costs and Labor

Plan A’s inclusion of a dedicated server does not necessarily hold a price tag to it.  To mitigate costs, a server can be repurposed should the College find it needs to upgrade a server system.  During the upgrade, a server may be found to be insufficient for the college wide network, but enough to handle the archiving needs.  Should plan A need to buy a server, one can be purchased very easily for $300-$1,000.[1]  Most in the lower range would operate just fine, especially when linked with a database management system (DBMS) to provide all the file storage needs.  When this server is directly linked to the College’s overall networking system, the server would function as handling one subsect of the overall domain.  Thus, this could minimize the amount of stress on the archiving server by only directing the terminals to this server when looking under the College’s domain and looking at the archival materials for the three archives in collaboration.

The DBMS I would recommend is either MsSQL or MySQL.[2]  Both of these programs have the benefit of being free.  Beyond this, both programs can be modified to be read or write dominant (if you need to deposit more information or read the information already there more quickly) or can float in between.  Thus, if the stakeholders have a project resulting in much more information deposited in the archives or if they have a summer of scholars all needing materials, MsSQL or MySQL can be optimized to the need at the time.[3]  Of course these systems will require physical hardware to store the information for access to/from the server. For this project, I recommend the Drobo B810i as being that platform.  The onboard redundancy program creates automatic backups to the information stored, and has a very intuitive networking program. Although the price tag of $1,700 is initially daunting, the capacity to handle 64TB of data not including the backups does justify the cost.[4] Drobo B1200i$4,000

12 hard drive bays
76TB Capacity w/single or dual disk automatic backup and redundancies
3 4x Gigabyte Ethernet ports
iSCSI or CHAP Network Protocols

Drobo B810i

$1,700 8 hard drive bays 64TB Capacity w/single or dual disk automatic backup and redundancies 2 2x Gigabyte Ethernet ports iSCSI, CHAP, or MPIO Network Protocols Drobo 5c$350
5 hard drive bays
36TB Capacity w/single or dual disk automatic backup and redundancies
One USB C port

The three stakeholders are already connected to the existing network, and so are already linked into The College networks and have dedicated modems and routers set up into the offices in need of terminal access.  Therefore, much of the networking needs are already established within the office of The Association.  With plan A any hardline to the router or Wi-Fi receiver should be sufficient for The Association or The Library to connect to the digital archive server.

A physical space for the server and DBMS terminal and hooking them into the existing network should be the only wires to be run.  Once the server machine is obtained, either through purchase or reuse, The College IT personnel must install the operating system (OS) compatible with the rest of the network for The College.  In reality, almost any OS with capabilities to operate as a server (Linux is very popular, but Windows and Mac have their own systems as well) but it is good practice to keep a consistency of server operating systems across the network.  The College IT department is not willing to share specifics about their network system, yet we will assume that they have a mesh topology.  Under this assumption, it would be good practice to establish connections to at least two other servers within the network to ensure that this server can be accessed, even should one connection go down.  Lastly, the File Transfer Protocol (FTP) must be configured with user access and determining determine who will be granted admission to conduct changes to files.

With the centralized archiving and catalog of plan A, only one terminal or set of terminals need access to commit changes to the files stored.  This means that the DBMS can be configured to allow many retrievals with few submissions to the system once the initial setup is completed.  Thankfully the Drobo connects to the terminal and/or the server by way of an Ethernet cable, and so the DBMS may communicate with the storage system directly without the DBMS terminal becoming the center of a star topology and interfering with the Drobo communicating with the server.  Under this configuration, the archival server FTP allows anyone to read the information stored on the Drobo, while only those granted access through The College archive department may write information onto the Drobo storage system.

Plan B

Under plan A, The College may have to burden the cost for the project, or The College, Library, and Association could all share the cost of materials.  Should The College refuse to shoulder the costs and plan B is chosen, there would be no need for the server.  The information could be collected via a similar, free DBMS installed with the College’s archivist.  With MySQL or MsSQL, the data brought in could be simultaneously deposited with the Association and Library archivists when it arrives to the College archives.[5]  The Association will still need to improve its data storage capacity, and Drobo would still be my recommendation.  However, the Drobo 5c holds a much friendlier price tag of \$350 and can still handle 37TB with space left for internal backups to the information.[6]  The installation of this second plan would be at almost no cost.  Simply unpack and plug in the Drobo to the computer hosting the web services for the Association would be sufficient.  Then, any digital archive created by The Association could be transferred through the existing network through existing connections.

The on-campus proximity of the three stakeholders has most scholars in one collection needing information from another simply walking down the hall to the desired collection.  Thus, creating a server to save the patron ten steps would be impractical for The Association’s operating budget.

# Continuing Maintenance

The College houses the IT departments that serve both the Library and the Association; however, both Library and Association employ a part time IT personnel to serve the specific networking needs.  Under plan A, the College IT department would be responsible for updates and maintenance.  The server and Drobo could hold lifespan from several years up to a decade.  The server may need updating or replacing, but the cost-effective method of repurposing an older server from the campus network could be implemented anytime within the current server’s lifespan.  The Drobo would only be limited by the fact that the redundancies and backups are only programmed to support 64TB of stored information.  Yet, other hard drives may be linked to the Drobo to provide extra space if these backups are foregone.  The staff requirements for network maintenance under plan A would be absorbed by the College’s IT department.  However, seeing as most of the actual labor for the entries would have to be done by an archivist, there may be a need for an additional archivist.  This archivist may have his or her FTE be distributed by the three stakeholders to disperse the costs.

Overall, plan B includes a much greater cost overall and a greater cost to the Association, despite installation, updating, and repairs all still be handled by the College’s IT department.  The 37TB storage would be sufficient for several years, yet the Association’s space needs would not be that much more than if all three entities combined resources.  Therefore, the smaller Drobo would become outmoded much sooner. Beyond storage space, the Association’s archivist and IT worker are employed to work only ten hours per week, and should the Association find itself needing to analyze, organize, and store its own entries from the data shared across the network, both IT and archive staff must be increased.[1]  Furthermore, this shift of requiring the Association to archive and catalogue its own digital preservations would fundamentally change the archival process.  To date, the Association has mainly focused on physical records and preserving specific materials as they become available to archive.[2]  This bottom-up method shapes the archive and the institutions as the functions of the Association will evolve to represent what materials have individually been chosen to preserve.  Inversely, the high-volume production of digital records requires archives to adopt a top-down approach, wherein the institutional mission chosen by the board of directors determines the priorities and practices given to creating new digital archive entries.[3]  In other words, the flood of digital information must be sifted before archiving instead of after.  This issue will place a great deal of stress either on the archivist, the board, or both.  The cost of adopting plan B may be so high on the Association that it would prefer to maintain a scope of records dating prior to the digital presence.

# Conclusion

This proposal is merely the first in a series of steps to help modernize The Association’s collection.  The plans outlined here are ways to connect the pre-digital collection of The Association to the plentiful resources available to its neighbor.  This step may seem small; however, this simple idea will be a large stride to bring The Association into the 21st century.

Continuous growth and evolution will be needed to ensure The Association does not fall back into its absence of modern records.  Specified collective practices will be needed to gather relevant materials to Norwegian-American heritage that may fall outside the scope of collection practices employed by The College.  Similarly, research and training for personnel will be needed to effectively catalog the digital materials to speak directly to younger and more modern patrons.

The need to modernize, ironically, is nothing new.  Archives are rooted in the past, but need to embrace the future.  The duality of which often is hardest for those most invested in the archive’s traditions i.e. the archivists.  However, without keeping an eye to what opportunity technology opens to academics, collections will rapidly become outmoded and forgotten.