Varozhka: Generation of a prediction set (part 4)
Now you should have Netflix dataset, Varozhka and a compiled test estimator at your machine. Please, check previous posts if you don't.
Before you start the processing I would recommend to unload all unneccessary stuff, because importing and processing are CPU- and memory- consuming tasks.
1. Run Varozhka.UI.exe. If this is a first run, it will ask you about settings:
You should provide:
- Directory with the Netflix dataset.
- Directory where prediction sets should be generated.
- Assembly with an estimator (if you followed steps from the previous post it should be at C:\Projects\MyEstimator\bin\Release).
2. Indexing will start automatically if settings are valid:
This is a long operation... you can take a cup of coffee while it's importing. On my four years old P4 it takes around 40 minutes... a good place to optimize ;)
3. The main UI will appear after the indexing is completed. Here you can check RMSE against the current estimator:
And, if results are good - generate a prediction set to submit:
On complete it will start Submission page at Netflix Prize site, and will open Explorer window with the Output directory (you specified it in the Settings):
output.txt.gz is the Prediction file, and md5.txt contains md5 hash string.
To submit the generated prediction set:
- Fill up your team info.
- Choose output.txt.gz in Prediction File field.
- Put content of md5.txt in MD5 Hash field.
- Hit Submit button.
Soon you should receive emails with submission results ;)
There are several things which are not implemented in the current version:
- processing cannot be stopped
- estimator cannot be reloaded, so you should quit and start the app again if you changed the estimator
Labels: .net, netflix, netflix prize, netflixprize, varozhka