Page 1 of 1

save to one file vs many files

Posted: Tue Jan 11, 2011 9:43 am
by Isaac
I have a few lists, each with a key column (like user IDs) and reference columns. I have a few of these that are pulled together using a master list that has one key column and reference column, which the app can quickly pull from.

Is it better to have this all in one file or is it faster for python have separate files, which are read and written during each user's session? Sometimes these sessions run parallel to each other.

Posted: Tue Jan 11, 2011 9:08 pm
by Jeff250
I don't think I understand the motivating problem, but if you're only reading or changing small bits of the file at a time, then it's generally better to have it spread over multiple files so as to avoid rewriting the rest when you only change a bit.

Having the data spread over multiple files also reduces the chance for file system race conditions, but the problem is still present. Since most Web servers can handle multiple requests--even from the same person--simultaneously, you need to take extra precautions to be robust to these. Suppose I click on 'increment.py' simultaneously in two windows. This could be the order of events:

Process A reads i=10 in counter.txt.
Process B reads i=10 in counter.txt.
Process A writes i=11 to counter.txt.
Process B writes i=11 to counter.txt.

I clicked increment twice, but it only incremented once! In general, you can get around this using file locks, but at this point, instead of reinventing a transactional database, why not use one like Postgres or MySQL. (Postgres seems to be more popular in the Python world in my experience.)

Posted: Sat Jan 15, 2011 3:52 pm
by Isaac
Cool ok. I'll start reading about the two.

Posted: Sat Jan 15, 2011 4:22 pm
by Xamindar
Also, if it is a small need check out sqlite as well. You might not need to have a full fledged database server installed just for what you are doing.