In most disciplines, the evolution of knowledge involves learning
by observing, formulating theories, and experimenting. Theory
formulation represents the encapsulation of knowledge and experience.
It is used to create and communicate our basic understanding of
the discipline. Checking that our understanding is correct involves
testing our theories, i.e., experimentation in some form. Analyzing
the results of the experimental study promotes learning and the
ability to change and refine our theories. These steps take time
which is why the understanding of a discipline, and its research
methods, evolves over time.
The paradigm of encapsulation of knowledge into theories and the
validation and verification of those theories based upon experimentation,
empirical evidence, and experience is used in many fields, e.g.,
physics, medicine, manufacturing.
What do these fields have in common? They evolved as disciplines
when they began learning by applying the cycle of observation,
theory formulation, and experimentation. In most cases, they began
with observation and the recording of what was observed in theories
or specific models. They then evolved to manipulating the variables
and studying the effects of change in the variables.
How does the paradigm differ for these fields? The differences
lie in the objects they study, the properties of those objects,
the properties of the system that contain them, and the relationship
of the objects to the system. So differences exist in how the
theories are formulated, models are built, and how studies are
performed; often affecting the details of the research methods.
Software engineering has things in common with each of these other
disciplines and several differences.
In physics, there are theorists and experimentalists. The discipline
has progressed because of the interplay between both groups. Theorists
build models (to explain the universe). These models predict the
results of events that can be measured. The models may be based
upon theory from understanding the essential variables and their
interaction or data from prior experiments, or better yet, from
both. Experimentalists observe and measure, i.e., carry out studies
to test or disprove a theory or to explore a new domain. But at
whatever point the cycle is entered there is a pattern of modeling,
experimenting, learning and remodeling.
The early Greek model of science was that observation followed
by logical thought, was sufficient for understanding. It took
Galileo, and his dropping of balls off the tower at Pisa, to demonstrate
the value of experimentation. Eddington's study of the 1919 eclipse
differentiated the domain of applicability of Einstein's theories
vs. Newton's.
In medicine, we have researchers and practitioners. The researcher
aims at understanding the workings of the human body and the effects
of various variables, e.g., procedures and drugs. The practitioner
aims at applying that knowledge by manipulating those variables
for some purpose, e.g., curing an illness. There is a clear relationship
between the two; knowledge is often built by feedback from the
practitioner to the researcher.
Medicine began as an art form. It evolved as a field when it began
observation and theory formulation. For example, Harvey's controversial
theory about the circulation of blood through the body was the
result of many careful experiments; performed while he practiced
medicine in London. Experimentation varies from controlled experiments
to qualitative analysis. Depending on the area of interest, data
may be hard to acquire. Human variance causes problems in interpreting
results. However, our knowledge of the human body has evolved
overtime.
The focus in manufacturing is to better understand and control
the relationship between process and product for quality control.
The nature of the discipline is that the same product is generated,
over and over, based upon a set of processes, allowing the building
of models with small tolerances. Manufacturing made tremendous
strides in improving productivity and quality when it began to
focus on observing, model building, and experimenting with variations
in the process, measuring its effect on the revised product, building
models of what was learned.
This journal is dedicated to the position that like other disciplines,
software engineering requires the cycle of model building, experimentation,
and learning; the belief that software engineering requires an
empirical study as one of its components. There are researchers
and practitioners. Research has an analytic and experimental component.
The role of the researcher is to build models of and understand
the nature of processes, products, and the relationship between
the two in the context of the system in which they live. The practitioners
role is to build "improved" systems, using the knowledge
available and to provide feedback. But like medicine (e.g. Harvey),
the distinction between researcher and practitioner is not absolute,
some people do both at the same time or at different times in
their careers. This mix is especially important in planning empirical
studies and when formulating models and theories.
Like manufacturing, these roles are symbiotic. The researcher
needs laboratories and they only exist where practitioners build
software systems. The practitioner needs to better understand
how to build systems more productively and profitably; the researcher
can provide the models to help this happen.
Just as the early model of science evolved from learning based
purely on logical thought, to learning via experimentation, so
must software engineering evolve. It has a similar need to move
from simple assertions about the effects of a technique to a scientific
discipline based upon observation, theory formulation, and experimentation.
To understand how model building and empirical studies need to
be tailored to the discipline, we first need to understand the
nature of the discipline. What characterizes the software engineering
discipline? Software is development not production. Here it is
unlike manufacturing. The technologies of the discipline are human
based. It is hard to build models and verify them via experiments
- as with medicine. As with the other disciplines, there are a
large number of variables that cause differences and their effects
need to be studied and understood. Currently, there is a lack
of models that allow us to reason about the discipline, there
is a lack of recognition of the limits of technologies for certain
contexts, there is a lack of analysis and experimentation.
There have been empirical analysis and model building in software
engineering but the studies are often isolated events. For example,
in one of the earliest empirical studies, Belady & Lehman
('72,'76) observed the behavior of OS 360 with respect to releases.
They posed several theories that were based upon their observation
concerning the entropy of systems. The idea of entropy - that
you might redesign a system rather than continue to change it
was a revelation. On the other hand, Basili & Turner ('75)
observed that a compiler system being developed, using an incremental
development approach, gained structure over time. This appears
contradictory. But under what conditions is each phenomenon true?
What were the variables that caused the different effects? What
were the different variables in the second case? Where are the
studies that provide some insights into the effect of such variables
as size, methods, the nature of the changes? We can hypothesize,
but what evidence do we have to support those hypotheses?
In another area, Walston and Felix ('79) identified 29 variables that had an effect on software productivity in the IBM FSD environment. Boehm ('81) observed that 15 variables seemed sufficient to explain/predict the cost of a project across several environments. Bailey and Basili ('81) identified 2 composite variables that when combined with size were a good predictor of effort in the SEL environment. There were many other cost models at the time. Why were the variables different? What did the data tell us about the relationship of variables?
Clearly the answer to these questions require more empirical studies
that will allow us to evolve our knowledge of the variables of
the discipline and the effects of their interaction.
In our discipline, there is little consensus on terminology, often
depending upon whether the ancestry of the researcher is the physical
sciences, social sciences, medicine, etc.. One of the roles of
this journal is to begin to focus on a standard set of definitions.
We tend to use the word experiment broadly, i.e. as a research
strategy in which the researcher has control over some of the
conditions in which the study takes place and control over the
independent variables being studied; an operation carried out
under controlled conditions in order to discover an unknown effect
or law, to test or establish a hypothesis, or to illustrate a
known law. This term thus includes quasi-experiments and pre-experimental
designs. We use the term study to mean an act or operation
for the purpose of discovering some thing unknown or of testing
a hypothesis. This covers various forms of research strategies,
including all forms of experiments, qualitative studies, surveys,
and archival analyses. We reserve the term controlled experiment
to mean an experiment in which the subjects are randomly assigned
to experimental conditions, the researcher manipulates an independent
variable, and the subjects in different experimental conditions
are treated similarly with regard to all variables except the
independent variable.
As a discipline software engineering, and more particularly, the
empirical component is at a very primitive stage in its development.
We are learning how to build models, how to design experiments,
how to extract useful knowledge from experiments, and how to extrapolate
that knowledge. We believe there is a need for all kinds of studies:
descriptive, correlational, cause-effect studies; studies on novices
and experts, studies performed in a laboratory environment or
in real projects, quantitative and qualitative studies, and replicated
studies.
We would expect that over time, we will see a maturing of the
empirical component of software engineering. The level of sophistication
of the goals of an experiment and our ability to understanding
interesting things about the discipline will evolve over time.
We would like to see a pattern of knowledge building from series
of experiments; researchers building on each others' work, combining
experimental results; studies replicated under similar and differing
conditions.
This journal is a forum for that learning process. Our experiments
in some cases, like those in the early stages of other disciplines,
will be primitive. They will have both internal and external validity
problems. Some of these problems will be based upon the nature
of the discipline, affecting our ability to generate effective
models or effective laboratory environments. These problems will
always be with us, as they are with any discipline as it evolves
and learns about itself. Some problems will be based on our immaturity
in understanding experimentation as a discipline, e.g. not choosing
the best possible experimental design, not choosing the best way
to analyze the data. But we can learn from weakly designed experiments
how to design them better. We can learn how to better analyze
the data. This journal encourages people to discuss the weaknesses
in their experiments. We encourage authors to provide their data
to the journal so that other researchers may re-analyze them.
The journal supports the publication of artifacts and laboratory
manuals. For example, in this issue, the paper "The Empirical
Investigation of Perspective-based Reading" has associated
with it a laboratory manual that will be furnished as part of
the ftp site at Kluwer. It contains everything needed to replicate
the experiment, including both the artifacts used and the procedures
for analysis. It is hoped that the papers in this journal will
reflect success and failures in experimentation, they will display
the problems and attempts at learning how to do things better.
At this stage we hope to be open and support the evolution of
the experimental discipline in software engineering.
We ask researchers to critique their own experiments and we
ask reviewers to evaluate experiments in the context of the current
state of the discipline. Remember, that because youth of the experimental
side of our discipline, our expectations cannot yet be the same
as those of the more mature disciplines, such as physics and medicine.
This goal of this journal is to contribute to a better scientific
and engineering basis for software engineering.
References
J. Bailey, V. R. Basili, "A Meta-Model for Software Development
Resource Expenditures," Proceedings of the Fifth International
Conference on Software Engineering, San Diego, USA, pp. 107-116,
March1981.
V. R. Basili, A. J. Turner, "Iterative Enhancement: A Practical
Technique for Software Development," IEEE Transactions on
Software Engineering, vol. SE-1, no. 4, December 1975.
L. A. Belady and M. M. Lehman, "An Introduction to Growth
Dynamics", Statistical Computer Performance Evaluation, Academic
Press, New York, 1972.
L. A. Belady and M. M. Lehman, "A Model of Large Program
Development, IBM Systems Journal, Vol. 15. No. 3, pp. 225-252,
1976.
B. W. Boehm, "Software Engineering Economics, "Prentice-Hall,
Englewood Cliffs, NJ, 1981.
C. Walston and C. Felix, "A Method of Programming Measurement
and Estimation", IBM Systems Journal, Vol. 16. No. 1, pp.
54-73, 1977.
Empirical Software Engineering, vol.1 no.2, 1996