You should apply one or more of the models covered in class for your final project.
If you implement the algorithms yourself, then you do not need to compare more than one model, but you have to document your implementation and append it with the final report (e.g., you implemented a genetic algorithm that selects variables to be used in a neural network model).
If you are using existing software packages (See5.0, Genie, Rosetta, etc.), then I would like you to apply your data in at least two different types of models, preferably more (e.g. classification tree and rough sets), justify why you think the data are better suited for these methods (e.g., categorical data are better suited for classification trees than neural networks).
Use medically related data. I will provide those with little medical connections with published data sets (e.g., trauma patients, extracted from Christensen's textbook on logistic regression). Use this set or another of your interest to compare performance of several models or to investigate selection of variables (e.g., comparison of different selection methods), selection of cases (e.g., detection of outliers). You can use medical data from:
Remember to use the evaluation methods covered in class (ROC curves, calibration curves and indices, etc.) to report performance of your models.
The report should be between 4-6 single spaced pages, 10-point font. Report can be longer if you really need, but please do not add fluff. Include the usual:
Delivery: Please create a web-page for your report. This web-page should contain:
If you want to install stuff on your own machine, R is free. Feel free to implement algorithms if you want. Other packages you might consider are:
Please note that some of these are freely available as demo versions only, so there may be limits on number of cases, variables, days available for trial, etc.
If you don't know whether your project would be OK or not, please discuss it with the instructors.
Example final projects are included courtesy of the students listed below. Used with their permission.
Final Report (PDF)
Oral Presentation (PDF - 1.70 MB)
Behavior of Various Machine Learning Models in the Face of Noisy Data
Final Report (PDF)
Computerized Pulmonary Artery Catheter Waveform Interpretation
Final Report (PDF)
Oral Presentation (PDF)
C# .NET Algorithm for Variable Selection Based on the Mallow's Cp Criterion
Final Report (PDF)