Sunday, 7 August 2016

Italian Recipes Revisited

After much trying, the BBC still won't let me use the recipes on their website for this book. That is a shame because the BBC is publicly funded and content should be public wherever possible.


The Italian Cookbook - The Art of Eating Well

Project Gutenberg, as we saw earlier, hosts books which are freely available and usable, mostly out of copyright books. So I used "The Italian Cookbook - The Art of Eating Well" (1919).


Our Own Small Recipes Corpus

I sampled some of the recipes .. 22 of them .. to make our own small corpus of recipes. A small corpus will be useful to experiment with, and this on is specialised to a domain - Italian cooking.

I included a range of dishes, except desserts, which would have competed with the savoury dishes in terms of ingredients and processes.

Here are the plain text files on github: https://github.com/makeyourowntextminingtoolkit/makeyourowntextminingtoolkit/tree/master/data_sets/recipes


Italian Recipes Word Cloud

Following the same approach as before - we obtain the following word cloud (stop words, lower case, min word length 5):

What does this tell us? There's a lot of chopping and olives in Italian cooking ...

No comments:

Post a Comment