<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ThotSpots &#187; rsync</title>
	<atom:link href="http://www.thotspots.com/tag/rsync/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.thotspots.com</link>
	<description>Agile Software Development</description>
	<lastBuildDate>Wed, 09 Sep 2009 18:13:09 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Software Archeology Using Rsync</title>
		<link>http://www.thotspots.com/software-archeology-using-rsync/</link>
		<comments>http://www.thotspots.com/software-archeology-using-rsync/#comments</comments>
		<pubDate>Sat, 03 Mar 2007 04:09:03 +0000</pubDate>
		<dc:creator>Craig Jones</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[rsync]]></category>
		<category><![CDATA[searching code]]></category>
		<category><![CDATA[software engineering]]></category>

		<guid isPermaLink="false">http://www.thotspots.com/?p=24</guid>
		<description><![CDATA[The most powerful tool in the knapsack of a software archeologist/maintainer, is the grep search.  Unfortunately the signal-to-noise ratio for grep search results can often be quite low.  This happens when the project source files are intermingled with other artifacts such as generated files, raw templates, library/framework documentation files and examples.

One trick to filtering out the noise is to define a shell script that uses Rsync to create/update a searchable shadow copy of the working folder, and then to search that copy...]]></description>
			<content:encoded><![CDATA[<p>The most powerful tool in the knapsack of a software archeologist/maintainer, is the grep search.  Unfortunately the signal-to-noise ratio for grep search results can often be quite low.  This happens when the project source files are intermingled with other artifacts such as generated files, raw templates, library/framework documentation files and examples.</p>
<p>One trick to filtering out the noise is to define a shell script that uses Rsync to create/update a searchable shadow copy of the working folder, and then to search that copy&#8230;<span id="more-24"></span>  In case you&#8217;re not familiar with Rsync, it is a tool intended to keep two remote file systems synchronized.  Rsync&#8217;s main claim to fame is that it&#8217;s fast because it only transmits the differences, but Rsync is also quite powerful when it comes to specifying exactly which files and folders are to be synchronized and how.  It&#8217;s this secondary feature of Rsync that allows us to filter out the noise.  There are two parts to this solution: the actual shell script, and a file that lists all of the inclusion and exclusion patterns.  (This example uses CygWin, running on a Windows box.)</p>
<h3>Here is the (entire) shell script (C:workcmdsearchcopy.sh):</h3>
<pre>
 #!/bin/sh
 pushd /cygdrive/c/work
 mkdir -p /cygdrive/e/work_search
 rsync -vrut --filter='. /cygdrive/c/work/cmd/searchcopy_filelist.txt' alpha bravo charlie /cygdrive/e/work_search
 popd</pre>
<ul>
<li>/cygdrive/c/work is your working folder (that&#8217;s CygWin speak for C:work).</li>
<li>Alpha, bravo, and charlie are the folder names of the projects that you are interested in.</li>
<li>/cygdrive/e/work_search is the name of the searchable shadow copy you want to create/update (over on your E: removable USB drive).</li>
</ul>
<h3>Here is (an abbreviated version of) the filter file (C:workcmdsearchcopy_filelist.txt), to give you an idea:</h3>
<pre>
 - .svn/
 - bin/
 - build*/
 - deployment/
 - lib/
 - log/
 - .#*
 - *.[ehjstw]ar
 - *.[Bb][Aa][Kk]
 - *.doc
 - *.[Ee][Xx][Ee]
 - *.gif
 - *.httpunit
 - *.ico
 - *.jasper
 - *.jpg
 - *.library
 - *.log
 - *.[Oo][Ll][Dd]
 - *.pdf
 - *.[Zz][Ii][Pp]</pre>
<p>In this case, they are all exclusions (leading minus sign),  Thus, everything in the alpha, bravo, and charlie folders will be copied, except files or subfolders matching these patterns.</p>
<h3>Tips for using Rsync:</h3>
<ul>
<li>Don&#8217;t waste time with the &#8211;include and &#8211;exclude switches, they are merely dumbed-down versions of the &#8211;filter switch, so just use the &#8211;filter switch right off.</li>
<li>Avoid the &#8211;cvs-exclude switch, if you can, and pay close attention to what it ignores if you can&#8217;t.  For example, it ignores any file or folder named &#8220;core&#8221;, and it ignores *.script files; both of which burned me when I tried using it on a certtain Tapestry application.</li>
<li>Most implementations of Rsync are case sensitive, including CygWin&#8217;s!  So if there is a possibility of filenames that exist with multiple casings, then you either have to repeat the pattern or use the square bracket notation:
<pre>
 - *.EXE
 - *.Exe
 - *.exe</pre>
<p>or</p>
<pre>
 - *.[Ee][Xx][Ee]</pre>
</li>
<li>Pay close attention to the man pages that describe other aspects of the pattern matching algorithm.  For example, leading and trailing slashes each have special significance.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.thotspots.com/software-archeology-using-rsync/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
