Thursday, 12 June 2008
Choosing software for the University of Manchester's institutional repository - Part 1
As part of the University of Manchester's Institutional Repository Project, I and other members of the Project Implementation Team are faced with the daunting task of recommending and implementing a suitable software solution. This article and subsequent articles outline my thoughts and our travels towards this.
The only certainty is uncertainty!

Some might say choosing the right software for an institutional repository is like backing the right racing horse. As with horse racing we have to choose from a set of available candidates. Each candidate has good features to varying degrees. To choose a winner you assess each feature for each horse. This may involve examining a horses past performance based on recorded race results. You add up the good and bad points and hey presto, the winner is chosen!
A standard way to choose software is to undertake a requirements analysis. As with horse racing, this involves writing down a (normally very long) list of requirements. Ideally these should be based on what users want. Then you weight these in some form to indicate how important each is and score each software against each requirement. We add up the scores (weighted if necessary) and the software candidate that scores the highest is our choice.
For institutional repository software, I see a number of problems with this approach to choosing the software (none of which are new to software engineers).
- its apples and oranges - repository software comes in a number of very different shapes, sizes and colours, as a consequence, comparing candidates can be difficult because, effectively, they do the same thing but in different ways
- it can be difficult to know what is exactly around the corner - repository software is evolving rapidly with developer communities continuously launching new and better features
- its all still very new - institutional repositories are still a relatively new subject to your average academic user making it difficult to identify requirements
- you need to see it working - determining if a particular product does what you want is difficult from documentation alone, because documentation is often a work in progress; the only real way to accurately know what a piece software does, is install, configure and test it
- its all very time consuming - it can take a lot of time and effort to do requrements analysis, so much so that by the time you have finished, your uses knowledge and expectations have changed and/or a new release of the software has launched such that you need to restart the process
- we need it now! - its common that delivering a product like an institutional repository has to be done with limited resource and within a certain timeframe; the more time you take selecting your software the less time you have to implement and test it
Others have concluded the same to varying degrees. Useful articles in this respect include,
- in April 2004, Jody DeRidder published the article "Choosing Software for an Institutional Repository" in which she argued for consideration of scope and future interoperability
- in August 2004, The Open Society Institute published the "System Feature and Functionality Table" which attempts to explain the relevance of system technical features in the context of a repository's broader planning, design, and policy framework
- in November 2004, Chris Taylor compared a number of OAI-PMH 2.0 compliant software solutions in his paper, "Criteria for choosing repository software"
- in December 2005, Andy Powell published "Notes about technical cirteria for evaluating institutional repository (IR) software" in which he outlined a number of issues to be addressed when making a choice
- in August 2006, as part of the Open Access Repositories in New Zealand Project, Richard Wyles published a Technical Evaluation of Research Repositories - probably the most comprehensive comparison of the three main open source repository solutions to date
In summary, I believe its fair to say, the only certainty about selecting repository software is that there is considerable uncertainty!
What are we to do?

First lets throw away the racing horse analogy. Choosing repository software is not about winning and loosing. The end result is not the best, fastest or brightest product, it is the most satisfactory solution to our situation.
To make an informed choice we need to manage uncertainty, uncertainty in our knowledge of the software, both now and in the future. This is like a balancing act. We can only improve our knowledge incrementally and only when we feel confident enough can we make an informed choice.
Of course we can't spend forever choosing the software. So we need to make our choice with the minimum time and effort, leaving more time to implement our prefered solution.
In my next article I'll focus on "our situation" and answer the questions, who are we and what are we trying to achieve?
Go to Part 2.
Subscribe to Posts [Atom]