A scientific repository is a collection of experimental data, together with a collection
of programs that analyse the data to assess its accordance with some theory. Often, the
data must be annotated with additional classifications and parameters to explain its link
with the theory, and these annotations are also accumulated in the data base for subsequent
analyses. There are tools to assist in this annotation. An e-experiment is conducted by
applying a judicious combination of tools to one or more collections of data. Its results
are subjected to the scrutiny of publication before acceptance into the repository.
When a repository is first set up, each collection of experimental data uses a different
format, a different notation, and a different ontology of concepts. The tools are
similarly incompatible, each working on only one data set. The responsibility of a
scientific repository is to assist the experimental scientists in the conduct of
experiments: by curation and homogenisation of the format and the semantics of the data;
by an educational and advisory service for the users of the tools; and by establishment
and implementation of standards for interworking of the tools. There should also be
collaboration with similar repositories in other countries.
I define a Verified Program Repository as consisting of a collection of programs,
assertions, specifications, design trajectories, test sets, etc., together with a set of
tools that analyses them, and helps to formulate or generate any of these that are missing.
The most important of these tools is a program verifier, which analyses a program to assess
and confirm its accordance with some specification, perhaps partial or even total. A
program which passes this analysis is said to be mechanically verified. To begin with,
none of the programs will have this status, because the annotations are inadequate, and a
fully mechanical program verifier does not exist. In the end, we hope they will all be
annotated and verified to some level of specification.
To achieve this, the repository and its users will have to bring such a program verifier
into existence. We can build on the best of existing technology, by integrating and
combining the available tools; we can stimulate and guide their further development by
systematic application and evaluation of toolsets on the challenge codes and other data
in the repository. Ideally, these challenge codes should be real systems that people
actually use in practice. The ultimate criterion of success is that those responsible
for the maintenance and evolution of the codes will begin to use the program verifier for
all further developments. But the project cannot start until there is a critical mass of
experimentalists who want to work on an agreed set of data.
I will describe the analogy between such a project and other large-scale
scientific projects like the recently completed human genome project. I
will describe the reasons why there is some hope that it will be
successful within a foreseeable timescale, say fifteen years. And I
hope to inspire some of the attendants to devote their own research
efforts to this achievement, and the rest to sympathise with those who
do.