Friday, December 15, 2006

Varozhka: Introduction (part 1)

As you may know, Netflix organized a competition for systems predicting user ratings for movies.

I'm sure a lot of bright people have ideas on how to improve that, but do not have time to spend on it.

This project is a framework to automate most of the dirty work with the dataset. So you can concentrate on the prediction algorithm ;)

Current features:

  • No additional DB engine is required. All indexes are loaded in memory.
  • Abstract layer to play with data (this is a place to plug in).
  • Data access layer.
  • Easy way to check RMSE against the probe set.
  • Generation of submission dataset.

So, basically you can download the Netflix dataset, extract it to a directory, start a wizard (which do all import tasks), implement your own rating estimator, and use a wizard to submit results to Netflix.

The project named Varozhka (belarusian word for "fortune-teller"). It hosted at Google Code and SourceForge.Net.

This is an introductory post about the project. More details later...

NOTE: The project is under development, and most of the code is not optimized in any way.

 

Labels: , , , ,

0 Comments:

Post a Comment

<< Home