	       EVOLUTION OF A PROBABILISTIC EXPERT SYSTEM PROGRAM
			  WITH MONTE CARLO SIMULATION

			     ME 715  Assignment 9.1
				 Brad Rodriguez
				14 December 1989

	I.  INTRODUCTION

	    This project will extend the data structured, backward
	    chaining expert system program from assignment 8.1, to
	    include the concepts of probability.

	    Two different treatments of probability will be employed
	    and compared: a strict mathematical probability, and a
	    Monte Carlo simulation.


	II. PROBABILITY RULES

	    The first program will use a strict mathematical treatment
	    of probability, as given in Chapter 9 of the text.  It will
	    encompass both "type 1" uncertainty (uncertainty about the
	    inputs), and "type 2" uncertainty (uncertainty about the
	    rules).

	    We will handle type 1 uncertainty by assigning to each input
	    an "evidence probability" P(E), the probability that this
	    item of evidence is true.

	    We will handle type 2 uncertainty by assigning to each rule
	    two probability values:

		P[C|E], the probability that the conclusion is true
			(i.e., that the rule fires), given that the
			combination of the evidence is true.

		P[C|~E], the probability that the conclusion is true,
			given that the combination of the evidence is
			NOT true.

	    Given these probabilities, and the "combined" evidence
	    probability P(E), we can calculate the probability of the
	    conclusion, P[C], from Bayes' theorem:

		P[C]   =   P[C|E] P(E)   +   P[C|~E] P(~E)

	    If the result of this rule is the input to another rule,
	    then this P[C] becomes the "evidence probability" of the
	    input.

	    The "combined" evidence probability mentioned above is
	    calculated for a logical combination of independent inputs
	    by the laws of mathematical probability:

		P( not A )    =   1  -  P(A)

		P( A and B )  =   P(A)  *  P(B)

		P( A or B )   =   1  -  [ (1-P(A))  *  (1-P(B)) ]

	    The formulas for "and" and "or" are associative, and so may
	    be extended to larger numbers of inputs.  For example,

		P( A and B and C )  =  P( A and B )  *  P(C)


	III. PROGRAM MODIFICATIONS FOR STRICT PROBABILITY

	    We now consider what changes are necessary to the expert
	    system to support these probability calculations.

	    First, the Forth words which represent rules and inputs must
	    return probabilities, rather than simple yes-or-no truth
	    values as in previous versions of the system.

	    The combinations of evidence which define a rule are no
	    longer simple logical operations; they now involve
	    probability computations.  To evaluate these combinations,
	    we will need a set of math functions which perform the
	    probability computations described above.  Forth allows us
	    to redefine any word to perform a new function, so we will
	    redefine the names AND, OR, and NOT to operate on
	    probability values -- using the formulas given above --
	    rather than Boolean truth values.

	    Since the basic Forth language supports only integer
	    numbers, we shall represent a probability of 1.0 as the
	    integer 10000, and a probability of 0.0 as the integer 0.
	    These are the most convenient values within the 16-bit
	    Forth's range of -32768 to +32767.  We will use the value -1
	    to indicate "not yet evaluated."

	    We will no longer use pruning, since we desire to know the
	    probabilistic effect, however slight, of all of the
	    inputs to each rule.

	    We still wish to evaluate each rule only once, which means
	    we will have to store the result of the evaluation.  The
	    VALUE storage location in the rules table is sized to hold a
	    16-bit integer.

	    Two new data values must be stored with each rule: P[C|E],
	    and P[C|~E].  It would be possible to include these in each
	    rule's evaluator function, but it is expedient to add them
	    to the rules table, and have the common logic for all rules
	    incorporate their effect.

	    Again, inputs will appear structurally as rules.  We could
	    (as in the previous program) define special evaluator
	    functions to return the input probabilities.  However, we
	    would prefer to store the input probabilities in the rules
	    table.  If the evaluator returns P(E)=1 for an input, then
	    the P[C] returned by the input "rule" is

		P[C]   =   P[C|E] * 1   +   P[C|~E] * 0

	    In other words, if the evaluator function for an input
	    always returns 1.0, we can store the input probability in
	    the P[C|E] field, and this value will be produced when the
	    input "rule" is executed.

	    No special treatment is needed for a conclusion rule.  A
	    single word will be defined which evaluates all of the
	    conclusions and prints their probabilities.  This could be
	    an ordinary Forth word, but we will make this word a RULE so
	    that it can be easily re-DEFINEd.

	    The resulting program is given in Listing 1.

	    Screen 39 holds the new definitions for AND, OR, and NOT,
	    which perform probability computations.  It also holds the
	    new, expanded rules table.  The two new fields to hold the
	    type 2 uncertainty are named C|E and C|~E.

	    Screen 40 contains the words which are used to specify the
	    uncertainty of any rule.  LIKELY sets the P[C|E] value of
	    any given rule, and UNLIKELY sets the P[C|~E] value.  (This
	    is in keeping with the Forth tradition of choosing
	    memorable, if not exactly accurate, names.)  Note that --
	    because of the way we've decided to represent inputs -- the
	    word LIKELY is also used to set the probabilities of the
	    inputs.

	    Screen 40 also includes some words to perform "prettier"
	    numeric input and output, and to monitor the progress of
	    the rules evaluation.

	    Screen 41 contains the "meat" of the probabilistic expert
	    system.  Rules are defined as entries in the rules table,
	    the same as in the previous assignment.  The initialization
	    of this table, however, is somewhat different:

		EVALUATOR  is set to a function which returns 1.0000.
		P[C|E]     is set to 1.0000.
		P[C|~E]    is set to 0.0000.

	    These defaults mean that input "rules", for which nothing is
	    DEFINEd, will nonetheless have a valid EVALUATOR function,
	    and that this function will cause the P[C|E] value to be
	    returned as the input's probability (as described above).
	    Also, the default values of P[C|E] and P[C|~E] mean that
	    rules will be treated as "absolutely certain" (no type 2
	    uncertainty) unless otherwise specified.

	    The DOES> clause in screen 41 is what happens when a rule
	    is executed.  If the rule has been previously evaluated,
	    that value is simply returned.  Otherwise, the EVALUATOR
	    function is invoked to give P(E), and then the formula

		P[C]   =   P[C|E] P(E)   +   P[C|~E] P(~E)

	    is computed, and the resulting P[C] is stored and returned.

	    The word DEFINE, to create an EVALUATOR function, is
	    unchanged from the previous version of this program.

	    Screens 42 through 45 declare and define all of the rules of
	    the system.  There are two important differences.  First,
	    the "reverse Polish" AND and OR operators are used instead
	    of the old, "algebraic" & and | operators.  Second, for each
	    rule we now specify two probabilities, P[C|E] with LIKELY,
	    and P[C|~E] with UNLIKELY.  Note that some rules on screen
	    45 are declared to be 1.0000 LIKELY and 0.0000 UNLIKELY;
	    this means that they are considered to be "certain."

	    Screen 46 is the code which tries all of the conclusions;
	    this is changed to cause the probability of each conclusion
	    to be printed.  Note that ALL conclusions are now attempted.

	    Screen 47 shows how the input evidence is given.  The
	    probability for each input is specified with LIKELY.  Then
	    SCRUB is used to erase any old, saved results, and
	    CONCLUSION invokes the expert system.

	    Figure 1 is a test run of this system, with no type 2
	    uncertainty.  That is, all rules are left with their
	    default values of P[C|E]=1 and P[C|~E]=0.  The listing
	    shows the probability of each input and intermediate
	    conclusion, when first evaluated, and the probabilities of
	    all of the conclusions.  The most probable conclusion is
	    "zebra", the correct identification, with a probability of
	    0.9.  The next most probable conclusion (ostrich) has a
	    probability of only about 0.01.

	    Figure 2 is a test run, with arbitrary values for type 2
	    uncertainty as shown in the program listing.  Observe that
	    the effect of the type 2 uncertainty is to reduce the high
	    probabilities, and increase the low probabilities.  We are
	    now less sure that the animal is a zebra (probability 0.58),
	    and also less sure that it is NOT an ostrich (probability
	    0.13).  Also, "giraffe" is now the second most likely
	    conclusion (probability 0.20) -- it has a very uncertain
	    rule (see screen 43).  This "confusion" effect is no doubt
	    exaggerated by the excessive type 2 uncertainties we have
	    chosen, but it illustrates quite well how type 2 uncertainty
	    reduces our confidence in both the "positive" and the
	    "negative" conclusions.


	IV. MONTE CARLO SIMLUATION

	    Monte Carlo simulation does not use the mathematical
	    formulas described above to determine the probability of any
	    particular conclusion.  Instead, a true frequency count is
	    made, over a large number of trials which are carefully
	    randomized to portray the assigned input and rule
	    probabilities.

	    Each input is either true or false, and each rule either
	    fires or does not fire, depending on a "spin of the wheel"
	    (a random number generator), weighted according to their
	    assigned probabilities.  This weighting is done as follows:

		For INPUTS:
		generate a random number R (from 0 to 1).
		if this input's P(E) > R, then the input is true;
		    else, the input is false.

		For RULES:
		evaluate the evidence (a logical combination of true-or-
		    false inputs and intermediate conclusions).
		if the evidence is TRUE:
		    generate a random number R.
		    if P[C|E] > R, the rule is true (fires),
			else, the rule is false.
		if the evidence is FALSE:
		    generate a random number R.
		    if P[C|~E] > R, the rule is true (fires),
			else, the rule is false.

	    If the random numbers are evenly distributed between 0 and
	    1, then the frequency with which an event is made true -- in
	    a very large number of trials -- will exactly equal its
	    assigned probability value.


	V.  PROGRAM MODIFICATIONS FOR MONTE CARLO SIMULATION

	    Only minor changes are needed to the expert system to
	    perform a Monte Carlo simulation.

	    First, each input and rule returns a definite true-or-false
	    result.  The effects of probability appear in the frequency
	    distribution of results, and not in the result values
	    themselves.  So, the Forth words can be re-revised to return
	    1 or 0.

	    AND, OR, and NOT are once again truly logical operations.
	    For simplicity in converting the rules, we will continue to
	    use the "reverse Polish" forms of these operators.

	    In order to maintain frequency counts for each conclusion,
	    we will add yet another field to the rules table.  This
	    field will be a tally of the number of times each rule
	    fires (returns true).  Although we only need this
	    information for the final conclusion rules, it may be
	    instructive to keep this tally for all rules.

	    We still need values of P[C|E] and P[C|~E] for each rule,
	    and P(E) for each input.  Fortunately, an examination of
	    the algorithms given above shows that we can still use
	    the "trick" of representing an input as a rule whose
	    evaluator returns "true", and whose probability is stored
	    in the P[C|E] field.  (In this case, only the "true
	    evidence" branch of the rule algorithm will be executed.
	    Substituting P(E) for P[C|E] in this branch gives exactly
	    the same result as the inputs algorithm.)

	    The resulting program is given in Listing 2.

	    In screen 48 the NOT operator is redefined to work with
	    logical values of 1=true, 0=false.  Forth's normal AND and
	    OR operators do not need redefinition.

	    Screen 49 contains the random number generator for the
	    Monte Carlo simulator; it uses integer multiplication with
	    16 bit overflow to produce a pseudo-random sequence of
	    integers from 0 to 65,535.  These are scaled to integer
	    probability values 0 to 10,000.

	    The algorithm which incorporates this probability is in
	    the DOES> clause on screen 50.  Depending on the truth of
	    the evidence, this routine obtains a random number and
	    compares it with P[C|E] or P[C|~E].  As noted above, this
	    logic also works for inputs, since the evaluator function
	    for an input (ONE) always returns "true".

	    Note that only one random number is "tossed" for each input
	    and for each rule, during a single trial; i.e., each input
	    and each rule is evaluated only once.  This is a convenient
	    byproduct of the logic which prevents a rule from being
	    re-evaluated once its result is known.  To force the inputs
	    and rules to be re-randomized, their stored values must be
	    erased with SCRUB.

	    The type 2 probabilities for the rules, in screens 51
	    through 54, are identical to those used in the probabilistic
	    program.

	    Observe two additions to screen 55: the word TRIALS which
	    causes CONCLUSION to be attempted many times, with a SCRUB
	    before each attempt (as just noted); and the word SHOW
	    which prints out the firing tally of any given rule, as a
	    probability.

	    Figures 3, 4, and 5 are test runs of the Monte Carlo
	    simulator, with 100, 1000, and 10000 trials, respectively.
	    Figure 6 is another 10000-trial test, but with a different
	    "seed" for the random number generator.  All of these trials
	    had the same type 1 and type 2 uncertainty values as the
	    probabilisitic example given in Figure 2.  It is instructive
	    to compare the results:

			    Monte Carlo simulations    probabil-
	    animal        100    1000   10000   10000    istic

	    cheetah     0.1100  0.1110  0.1089  0.1075  0.1059
	    tiger       0.0900  0.1200  0.1227  0.1246  0.1229
	    giraffe     0.2600  0.2120  0.2051  0.2039  0.2049
	    zebra       0.4900  0.5610  0.5818  0.5798  0.5810
	    ostrich     0.1300  0.1110  0.1261  0.1307  0.1307
	    penguin     0.1100  0.1110  0.1159  0.1142  0.1135
	    albatross   0.0000  0.0000  0.0000  0.0000  0.0000

	    The Monte Carlo simulation results would seem to bear out
	    the "rule of thumb" that the accuracy of the method is
	    one-tenth the number of trials; i.e., 100 trials are have an
	    error of roughly 1 part in 10 (0.1), 1000 trials roughly 1
	    part in 100 (0.01), and 10000 trials roughly 1 part in 1000
	    (0.001).

	    For example, rounding the "zebra" results accordingly gives:
	    for 100 trials, 0.5 (should be 0.6); for 1000 trials, 0.56
	    (should be about 0.58), and for 10000 trials, 0.580 to
	    0.582.  Comparing the results of the two 10000 trial runs in
	    this manner, we find variances from 0.000 to 0.004.

	    Notice also how astonishingly close the probabilistic
	    results are to the Monte Carlo simulation results.  Indeed,
	    the variance between the probabilistic result and the
	    simulation result is usually no worse than the variance
	    between different 10000-trial simulations.  This would
	    indicate that our assumption of independent inputs, and
	    independent intermediate conclusions, which was made in the
	    probabilistic formulas for AND and OR, either a) is a valid
	    assumption for this system, or b) leads to errors on the
	    part of 1 in 1000.  It is NOT safe to generalize this
	    latter statistic to probabilistic systems in general,
	    however, as this animal species example may not have
	    presented an opportunity for degenerate effects of
	    dependence to appear.

	    While the probabilistic program may compare favorably to
	    the Monte Carlo program in terms of accuracy, it is far
	    superior in terms of speed:

		Monte Carlo  100 trials     14 seconds
			    1000 trials    130 seconds
			   10000 trials   1299 seconds
			  probabilistic      2 seconds

	    These timings are on a 4.77 MHz IBM PC/XT clone, with a
	    V20 processor upgrade.

	    The Monte Carlo simulation requires about 0.13 seconds per
	    trial (with about 1 second of fixed overhead, which
	    appears in the 100 trial example).  A 10,000-trial test
	    run requires over 20 minutes!  The probabilistic program,
	    on the other hand, requires less than one second: most of
	    the 2 seconds noted above are spent printing the result!
	    Clearly the probabilistic program is to be preferred for
	    "everyday" use.  But since it is possible to create systems
	    for which the probabilistic system will produce degenerate
	    results, it may be desirable to use the Monte Carlo
	    simulation as a "primary standard," used off-line to check
	    the validity of the probabilistic program, which itself is
	    used as the "production" program.


	VI. EXPERTISE EVALUATION

	    To observe the effects of type 2 uncertainty, an "expertise
	    evaluation" can be performed, in which all of the evidence
	    is deterministic.  Since there is no type 1 uncertainty, any
	    uncertainty in the conclusions will be a result of the
	    uncertainty in the rules.

	    Figures 7 and 8 are the probabilistic program and the
	    Monte Carlo program of 10000 trials, respectively, with
	    deterministic evidence for a zebra.  The results are as
	    follows:
			    probabil-   Monte
			     istic      Carlo

		cheetah      0.1000     0.1027
		tiger        0.1000     0.1001
		giraffe      0.2000     0.2009
		zebra        0.6300     0.6332
		ostrich      0.1000     0.0975
		penguin      0.1000     0.1026
		albatross    0.0000     0.0000

	    (Note once again how close the probabilistic result is to
	    the Monte Carlo result of 10000 trials.)

	    The rules contributing to the zebra conclusion are:

		rule        P[C|E]    P[C|~E]

		ZEBRA        0.7        0.2
		UNGULATE     0.9        0.2
		T2           1.0        0.0
		MAMMAL       1.0        0.3
		T3           1.0        0.0

	    As might be expected, the uncertainty in the actual
	    conclusion is the product of the P[C|E] uncertainties in
	    its "chain" of evidence.

	    This was a relatively short chain, and many of the rules
	    were "certain."  Still, one uncertain rule (ZEBRA), and
	    one rule which might be considered "almost certain"
	    (UNGULATE), were sufficient to reduce the probability of
	    the conclusion to an unacceptably low level.

	    It would seem a natural tendency to assign a probability
	    of 90% to anything one is not absolutely sure of.  (Few
	    people think in more than ten intervals; most think in
	    five.)  But the specialist creating an expert system must
	    be cautioned that, if he yields to this tendency too
	    often, the accumulation of "0.9's" will ensure that the
	    expert system will never produce a "confident" result.
