10-02-2011, 12:16 PM,
|
|
BiGBeN87
Junior Member
|
Posts: 20
Threads: 4
Joined: Sep 2011
|
|
Hello joevenzon,
first of all thank you for your elaborate reply. In fact I haven't considered locking and in fact it seems that git does not have any equivalent mechanism:
http://stackoverflow.com/questions/11944...rol-system
I agree that the distributed nature of Git puts an stronger emphasis on this problem, because people might not get notice of a lock, when they work offline. GitHub also adds to this, because it encourages spontaneous contributions more than the svn on sourceforge.
I have never used locking in subversion, myself. The comments on StackOverflow suggest that even though SVN has locking, one can run into the same problems as without it. Was it usual for VDrift to use subversions locking in the past? Were your experiences positive?
I am working on a project (8 developers) that till recently used a mailing list for locking and releasing an unmergeable database dump. That worked pretty well, we only had one situation where two persons made changes concurrently and one had to redo them on the dump of the other.
We did move away from this method because we did not want the overhead of checking/writing emails anymore. Instead we now export our changes in code and automate the import/export of them via scripts. The binary files of VDrift however are not substitutable with more atomic files, as far as I understand.
Therefore I think, handling them is only possible with proper communication and sticking to the necessary protocol of editing. I am in doubt, whether the locking in svn really helps avoiding conflicts better than having split the code into individual cars/tracks/etc on github. With more atomic code and easier and therefore more frequent commits the maximum possible damage could actually be lowered.
I would suggest having a lockfile in the root each repository, that needs to be touched before and after working on the data repositories. An obligatory guide on how to fork, lock, edit, commit and release could be given in each repo's readme, too.
joevenzon Wrote:Other stuff specific to git: I had heard that a git repository size grows much faster with binary file changes than an SVN repository. I had also heard that the way git works on a blob level (versioning the content of files, not the file itself) makes it really slow at scanning for updated files (since it needs to hash everything to determine if it changed) while subversion can cheat by looking at file properties. This info may be out of date. If you set up a test git data repo we can do some experiments.
I would be happy to help testing this. I will deploy something later today or tomorrow.
|
|
10-02-2011, 02:57 PM,
|
|
joevenzon
Administrator
|
Posts: 2,679
Threads: 52
Joined: Jun 2005
|
|
BiGBeN87 Wrote:Was it usual for VDrift to use subversions locking in the past? Were your experiences positive?
I've used it at work for other projects where it was vital, but to be honest, I don't think anyone's really used it for VDrift. There are a small number of people working on anything at a given moment, so not many collisions. I think the forum has been used sometimes to communicate before beginning work on something that someone else checked in (although this has failed in the past as well). The problem we run into on VDrift most often is just someone changing a file that someone else had previously changed, and then that person being like "wtf, why did you change that?", but they weren't working on it simultaneously, so that's a different problem with some different solution.
Quote:I would be happy to help testing this. I will deploy something later today or tomorrow.
Some tests:
* making a series of small changes to a .png image file, checking repository size change
* adding and then deleting the equivalent of a track's set of files, checking repository size change
* on a large repository like the entire vdrift data repo, make a single change to a .png file somewhere deep and check the time taken to make a git commit -a or svn ci
* on a large repo, test git workflow for branching, changing a .png file, and merging back in. test time and repository size change
Considerations:
* for the size and time tests, how does running git gc affect the results? would this need to be a regular manual maintenance for a github hosted repo?
|
|
10-05-2011, 05:43 PM,
|
|
BiGBeN87
Junior Member
|
Posts: 20
Threads: 4
Joined: Sep 2011
|
|
I just finished cloning the svn and uploading it to github:
https://github.com/bigben87/VDrift-Data
So fork me on GitHub and do your tests! I will go to sleep now, but I have an early measurement already:
1.6 GiB of binary data with 900 revisions weigh in at 1,.7 GiB .git directory. So size considerations seem for no reason.
joevenzon Wrote:The problem we run into on VDrift most often is just someone changing a file that someone else had previously changed, and then that person being like "wtf, why did you change that?", but they weren't working on it simultaneously, so that's a different problem with some different solution.
I have witnessed this behavior in some other projects to. I always felt that a forum is the wrong place to talk about code, however. GitHub's pull-request implementation encourages discussion on specific changes throughout the whole development process. Maybe that already gives enough structure on the right place to prevent wtf-commits in the future.
joevenzon Wrote:Some tests:
* making a series of small changes to a .png image file, checking repository size change
PNGs are usually compressed. Therefore Git will add almost the full file size.
joevenzon Wrote:Considerations:
* for the size and time tests, how does running git gc affect the results? would this need to be a regular manual maintenance for a github hosted repo?
Git automatically invoked gc after cloning the svn: https://gist.github.com/1261544 Therefore I can not test it now.
|
|
10-07-2011, 01:55 PM,
|
|
BiGBeN87
Junior Member
|
Posts: 20
Threads: 4
Joined: Sep 2011
|
|
I toyed around with time and git and tryed to answer:
joevenzon Wrote:* on a large repository like the entire vdrift data repo, make a single change to a .png file somewhere deep and check the time taken to make a git commit -a or svn ci
Blind test on a fresh repository:
Code: $ time git status
# On branch master
nothing to commit (working directory clean)
real 0m0.177s
user 0m0.060s
sys 0m0.100s
Git does not seem to have trouble with finding a single changed bit deep in the tree:
Code: $ chmod +x cars/FF/interior.png
$ time git commit -a -m 'testing commit -a time'
[master c91cd97] testing commit -a time
1 files changed, 0 insertions(+), 0 deletions(-)
mode change 100644 => 100755 cars/FF/interior.png
real 0m0.266s
user 0m0.100s
sys 0m0.090s
$ time git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
nothing to commit (working directory clean)
real 0m0.139s
user 0m0.080s
sys 0m0.050s
|
|
10-08-2011, 08:18 PM,
|
|
BiGBeN87
Junior Member
|
Posts: 20
Threads: 4
Joined: Sep 2011
|
|
Git svn clone should take around 5 h with my internet connection, but I haven't actually timed it. Git gives us a speed up by factor 4, on the equivalent case:
Code: $ time git clone https://github.com/bigben87/VDrift-Data.git VDrift-Data
Cloning into VDrift-Data...
remote: Counting objects: 25277, done.
remote: Compressing objects: 100% (16627/16627), done.
remote: Total 25277 (delta 8576), reused 25277 (delta 8576)
Receiving objects: 100% (25277/25277), 1.54 GiB | 306 KiB/s, done.
Resolving deltas: 100% (8576/8576), done.
real 70m30.272s
user 4m16.600s
sys 1m18.090s
This translates into shallow clones as follows:
Code: $ time git clone --depth 1 https://github.com/bigben87/VDrift-Data.git VDrift-Data
Cloning into VDrift-Data...
remote: Counting objects: 12128, done.
remote: Compressing objects: 100% (11709/11709), done.
remote: Total 12128 (delta 443), reused 11895 (delta 375)
Receiving objects: 100% (12128/12128), 1.25 GiB | 736 KiB/s, done.
Resolving deltas: 100% (443/443), done.
real 39m2.372s
user 3m19.890s
sys 1m4.370s
svn checkout for comparison:
Code: $ time svn checkout -q https://vdrift.svn.sourceforge.net/svnroot/vdrift/vdrift-data VDrift-Data
real 55m15.538s
user 4m18.790s
sys 1m36.670s
So Git/Hub is faster at downloading for end users and patch-only developers, who can use the shallow clone.
|
|
10-19-2011, 11:45 AM,
|
|
BiGBeN87
Junior Member
|
Posts: 20
Threads: 4
Joined: Sep 2011
|
|
I imported the SVN-Tags manually into Git: https://github.com/bigben87/VDrift-Data/tags
I tested downloading them:
Code: $ time wget https://github.com/bigben87/VDrift-Data/tarball/2011-09-01
--2011-10-19 16:02:56-- https://github.com/bigben87/VDrift-Data/tarball/2011-09-01
Auflösen des Hostnamen github.com... 207.97.227.239
Verbindungsaufbau zu github.com|207.97.227.239|:443... verbunden.
HTTP-Anforderung gesendet, warte auf Antwort... 302 Found
Platz: https://nodeload.github.com/bigben87/VDrift-Data/tarball/2011-09-01 [folge]
--2011-10-19 16:02:57-- https://nodeload.github.com/bigben87/VDrift-Data/tarball/2011-09-01
Auflösen des Hostnamen nodeload.github.com... 207.97.227.252
Verbindungsaufbau zu nodeload.github.com|207.97.227.252|:443... verbunden.
HTTP-Anforderung gesendet, warte auf Antwort... 200 OK
Länge: 1442262807 (1,3G) [application/octet-stream]
In »2011-09-01« speichern.
100%[====================================>] 1.442.262.807 382K/s in 52m 51s
2011-10-19 16:55:50 (444 KB/s) - »2011-09-01« gespeichert [1442262807/1442262807]
real 52m53.388s
user 1m16.469s
sys 1m56.087s
|
|
10-23-2011, 05:02 PM,
|
|
BiGBeN87
Junior Member
|
Posts: 20
Threads: 4
Joined: Sep 2011
|
|
joevenzon Wrote:Only 4 tags...?
These 4 were the only ones, I found on SourceForge: http://vdrift.svn.sourceforge.net/viewvc/vdrift/tags/
joevenzon Wrote:If we do switch to git for data, the auto-updater needs to be rewritten to use it instead of the sourceforge svn. This may be easier because git has an API, whereas the sourceforge svn code is scraping the html, although having a concept of revision number is handy.
Yes, GitHub has an API that can be accessed via HTTP:
http://developer.github.com/v3/
I would suggest the updater to consider new tags, only. Usually packages are updated with new releases, so this would be the equivalent to the usual behaviour. There is a function to list the commits of a repository, so the last tag can be identified:
http://developer.github.com/v3/git/commits/
There is a function to get information about a specific tag:
http://developer.github.com/v3/git/tags/
A tag object contains a commit object and in it is a tree object, listing the sha1s of contained blobs.
These can then be cross-referenced with the hashes returned by the VDriftDataHasher I am working on. I added object hashing that is compatible to git in:
https://github.com/bigben87/VDriftDataHa...0b31e31b87
joevenzon Wrote:BiGBeN87 Wrote:So Git/Hub is faster at downloading for end users and patch-only developers, who can use the shallow clone.
1) What's the workflow for a patch-only developer using a shallow clone
patch-only-dev:
- initally, git clone --depth=1
- checkout master
- modify files
- git add them
- git commit them
- git format-patch origin/master..master
- create a gist containing the patche(s)
- create an issue
Examples:
- https://github.com/mootools/mootools-cor...it-Patches
- https://github.com/rakudo/rakudo/wiki/st...te-a-patch
VDrift-maintainer:
- see the issue
- download the raw patch(es) from gist
- git apply/am the patch(es)
- git push origin master
joevenzon Wrote:2) Is using a fork and pull request a valid workflow for the entire data tree?
I think it is, because the strict hierarchy separates commits contained in branches from each other even the forking was done some time ago and the master was not updated. I think, splitting data into individual repos for each car/track would help keeping things clear and atomic, too.
joevenzon Wrote:3) What's the workflow for a developer working in master? They must do a full clone, correct?
- git pull
- modify
- git add
- git commit
- git push
I would think this is the recommendable workflow, for developers/maintainers and contributors as well, because branching/merging/pull-requesting is only possible with non-shallow clones and therefore needed for agile and clean development.
In cases of more complex (read: multi-commit) projects, developers should work in branches, too and merge their work into master when they are more or less done.
|
|
|