[p4] Client-side file fragmentation on NTFS
Peter Weldon
peter.weldon at lollotec.com
Thu Dec 6 12:14:11 PST 2007
On 12/5/07, Frank Compagner <frank.compagner at guerrilla-games.com> wrote:
>
> [snip]
Changing the work flow so that daily multi-gigabyte syncs are not needed may
be worthwhile. Not always possible but as the number of clients doing this
gets larger the more necessary and beneficial it will become.
As you can see, a nice improvement. Now, my question is: has anybody
> else ever noticed this?
Yes I have observed this problem first hand and found similar behaviour when
I investigated. The impact of this goes well beyond just sync times as the
fragmentation quickly deteriorates the performance of any applications
accessing the synced files (data / code builds) and leads to fragmentation
of non-synced files being touched on the system.
As you observed when p4 syncs for each new revision of a file being
retrieved to a workspace it first creates a temporary file. Then even though
the new file size is known it does not preallocate but instead appends in
4KB chunks. Once the file is completely downloaded it deletes the previous
file and renames the temporary file.
This is not an atypical process for applications, lots work in similar way
without causing undue problems. In the p4 case this becomes problematic as
it is in effect placing a server filesystem workload pattern onto each
client. Think of it as like having your whole team edit their files on your
workstation. Fragmentation quickly becomes an issue even for non-gigabyte
workspaces.
If so, has anybody tried to measure the
> performance impact?
Yes. The impact is particularly easy to measure on automated build machines
that run a build process on a daily/hourly/changelist basis. The build
process would typically involve a sync step. Capturing the fragmentation
before each step and the elapsed time for each step makes it possible to
track down the steps causing fragmentation and the impact there of. I found
syncing to be a large contributer to fragmentation and it impact on
subsequent steps substantial.
Have you spoken to support about this?
No.
Do you
> agree that improving this behaviour is worthwhile?
Yes, but look a little further than the sync to make sure you do not have
other processes contributing to the fragmentation.
If so, you might
> consider contacting support to register your interest in the subject,
> which might get the enhancement request somewhat higher up the list.
Sorry I no longer work in the environment where I encountered this.
Finally, some details on the p4fs tool
This sounds great. I have something similar that I coincidently created
recently (while working on a t-ntfs project), it does not use the p4api
directly though and like yours needs to be polished. As I noted above p4
sync works on a file by file basis, retrieving, deleting and then renaming.
One thing I was considering doing is a delete pass first, and then all the
retrieves with preallocation. The retrieves would optionally be
multi-threaded but this was originally more to do with populating a perforce
proxy and would most probably not help if you are disk io bound.
More information about the perforce-user
mailing list