Friday, December 15, 2006

Varozhka: Generation of a prediction set (part 4)

Now you should have Netflix dataset, Varozhka and a compiled test estimator at your machine. Please, check previous posts if you don't.

Before you start the processing I would recommend to unload all unneccessary stuff, because importing and processing are CPU- and memory- consuming tasks.

1. Run Varozhka.UI.exe. If this is a first run, it will ask you about settings:

Settings dialog

You should provide:

  • Directory with the Netflix dataset.
  • Directory where prediction sets should be generated.
  • Assembly with an estimator (if you followed steps from the previous post it should be at C:\Projects\MyEstimator\bin\Release).

2. Indexing will start automatically if settings are valid:

Indexing the Netflix dataset

This is a long operation... you can take a cup of coffee while it's importing. On my four years old P4 it takes around 40 minutes... a good place to optimize ;)

 3. The main UI will appear after the indexing is completed. Here you can check RMSE against the current estimator:

RMSE check in progress...

And, if results are good - generate a prediction set to submit:

Prediction set generation in progress...

On complete it will start Submission page at Netflix Prize site, and will open Explorer window with the Output directory (you specified it in the Settings):

Prediction set is ready to submit

output.txt.gz is the Prediction file, and md5.txt contains md5 hash string.

To submit the generated prediction set:

  1. Fill up your team info.
  2. Choose output.txt.gz in Prediction File field.
  3. Put content of md5.txt in MD5 Hash field.
  4. Hit Submit button.

Soon you should receive emails with submission results ;)

There are several things which are not implemented in the current version:

  • processing cannot be stopped
  • estimator cannot be reloaded, so you should quit and start the app again if you changed the estimator

 

 

Labels: , , , ,

2 Comments:

Blogger Anastasios said...

Cool little package but can't get it to run (was just trying to gauge my machine's suitability to the task) Without further ado, I run the UI but no matter how I specify the netflix directory it always tells me it isn't! Is the detection logic broken?

3:14 AM  
Blogger digizzle said...

"... but no matter how I specify the netflix directory it always tells me it isn't!"

Could you specify the exact error message? It will narrow my search of the problem...

Thanks,
Eugene

3:37 PM  

Post a Comment

<< Home