Posted on April 4, 2013 @ 11:49:00 PM by Paul Meagher
In my last blog introducing a classification framework for Bayesian Angel Investing, I discussed a php-based software class called ClassifierDiagnostics.php . I showed how you enter bivariate data points into it and the type of output it displays. I didn't go into much detail on what the output is telling us. Today I will go into some more detail on what the output is telling us and start to give some indication as to why it is important if you want to be a successful Bayesian Angel Investor.
One way to formulate the problem of Bayesian Angel Investing is as a classification problem where an Investor is trying to asign a probability to whether a startup belongs to the class of "Successful" (S) companies or "Unsuccessful" (U) companies. One way to do this would be to just rely upon the prior odds of a startup being successful or not. You would not conditionalize the probability assignments (e.g., P(S) = θ1, P(U) = θ2) on information about the start up (e.g., P(S|I) = θ3), just the fact that
they are a startup and the historical probabilities that a startup will be unsuccessful or successful. This is more difficult than it sounds because the success of a startup is already conditionalized insofar as we have to delimit the scope of the concept "startup" in some way in order to measure the probabilities of success or not. So let us say
we will look at startups confined to some region near the Investor's place of residence - the state or province level statistics on startup success.
Can you use the startup success statistics, your "priors", to make successful investment decisions? My guess is that the rate of success for startups in your region is below
50% so if the probability of any given startup being successful is below 50% it is unlikely you will ever invest. You would have to invest randomly according to a "priors only"
strategy (i.e., P(S) = θ1 AND P(U) = θ2) and that would produce losses.
To get more levarage on making good angel investments, you will need to incorporate information about the startup in your classification decision regarding the likely
success or not of the startup. You will want to identify types of information that have good diagnostic value in classifying startups into bins labelled Successful (S) and
Unsuccessful (U). In the example I provided yesterday I suggested that you could use your evaluation of their business plan as a good indicator of whether the startup
might succeed or not. If the business plan addresses enough of your checklist of concerns, then you will assign the business plan a "Pass" value (1), otherwise you assign the business plan a "Fail" value (0). The question then becomes whether our pass/fail assignments can be used to successfully distinguish between successful and unsuccessful startups. In other words, how diagnostic is a good business plan of being a successful startup?
In my last blog, I entered observations of 4 startups into my classifier diagnostics program. Each observation consisted of two values, a value specifying whether the business plan passed (1) or
failed (0), and a value specifying whether the startup eventually succeeded (1) or failed (0) in their enterprise. When I entered the data into my classifier diagnostics
program it generated the output below. I have removed some of the statistics being reported because I want to focus on the foundational concepts in diagnostic problem solving.
|
Successful Company |
Yes |
No |
Business Plan |
Pass |
2 (TP) |
0 (FP) |
Fail |
1 (FN) |
1 (TN) |
|
|
Successful Company |
Yes |
No |
Business Plan |
Pass |
0.67 (TP) |
0.00 (FP) |
Fail |
0.33 (FN) |
1.00 (TN) |
|
Test Sensitivity (TP) |
0.67 |
False Alarm Rate (FP) |
0.00 |
Miss Rate (FN) |
0.33 |
Test Specificity (TN) |
1.00 |
One critical observation to make about this data is that business plan quality is not a perfect test for classifying startups as successful or unsuccessful. The most grievous error is the case where a startup had a failing business plan but ended up being successful (example of a "miss" or false negative). The test "missed" the correct classification. Because we have
such a low sample size, 4 startups, this one error throws our percentages around quite a bit.
What we are looking for in a good test of statup success is one that has high Test Sensivity and high Test Specificity. Test Sensitivity measures the proportion of actual positives which are correctly identified as such. Test specificity measures the proportion of actual negatives which are correctly identified as such. In real life, test sensitivity and specificity
are seldom 1, so we have to figure out how we will cope with false alarms (negative instances identified as positive instances) and misses (positives instances identified as negative instances). Rise averse angel investors will likely be more worried about false alarms than misses because in the case of a false alarm you could invest in an unsuccessful
company and lose money whereas in the case of a miss you will not have invested in a successful company but will have at least retained your money.
One way to proceed towards becoming a Bayesian Angel Investor is to do some diagnostic work and figure out what types of tests are the best to use in order to classify startups into those who will succeed or not. When evaluating tests to use, you should examine the diagnostic accuracy of your tests using some of the metrics provided above (Test Sensitivity, False Alarm Rate, Miss Rate, Test Specificity). Bayesian Angel Investing likely taps into the same problem solving skills as a doctor who must diagnose whether a patient has cancer or not. They will order up a series of tests (often binary scored) and make a diagnosis, or, if matters are still unclear, order up more tests (e.g., scans, probes, incisions, etc...) so that they can achieve more confidence in their decision making.
|