<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>What You&#039;re Doing Is Rather Desperate</title>
	<atom:link href="http://nsaunders.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://nsaunders.wordpress.com</link>
	<description>Notes from the life of a bioinformatician</description>
	<lastBuildDate>Wed, 15 May 2013 23:18:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='nsaunders.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>What You&#039;re Doing Is Rather Desperate</title>
		<link>http://nsaunders.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://nsaunders.wordpress.com/osd.xml" title="What You&#039;re Doing Is Rather Desperate" />
	<atom:link rel='hub' href='http://nsaunders.wordpress.com/?pushpress=hub'/>
		<item>
		<title>How to: remember that you once knew how to parse KEGG</title>
		<link>http://nsaunders.wordpress.com/2013/04/22/how-to-remember-that-you-once-knew-how-to-parse-kegg/</link>
		<comments>http://nsaunders.wordpress.com/2013/04/22/how-to-remember-that-you-once-knew-how-to-parse-kegg/#comments</comments>
		<pubDate>Mon, 22 Apr 2013 00:06:30 +0000</pubDate>
		<dc:creator>nsaunders</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[biostar]]></category>
		<category><![CDATA[how to]]></category>
		<category><![CDATA[kegg]]></category>
		<category><![CDATA[pathways]]></category>
		<category><![CDATA[rest]]></category>

		<guid isPermaLink="false">http://nsaunders.wordpress.com/?p=3367</guid>
		<description><![CDATA[Recently, someone asked me if I could generate a list of genes associated with a particular pathway. Sure, I said and hacked together some rather nasty code in R which, given a KEGG pathway identifier, used a combination of the KEGG REST API, DBGET and biomaRt to return HGNC symbols. Coincidentally, someone asked the same [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3367&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Recently, someone asked me if I could generate a list of genes associated with a particular pathway. Sure, I said and hacked together some rather nasty code in R which, given a KEGG pathway identifier, used a combination of the KEGG <a href="http://www.kegg.jp/kegg/docs/keggapi.html" target="_blank">REST API</a>, <a href="http://www.genome.jp/dbget/" target="_blank">DBGET</a> and <a href="http://www.bioconductor.org/packages/release/bioc/html/biomaRt.html" target="_blank">biomaRt</a> to return HGNC symbols.</p>
<p>Coincidentally, someone asked <a href="http://www.biostars.org/p/69336/" target="_blank">the same question</a> at Biostar. Pierre recommended the <a href="http://togows.dbcls.jp/site/en/rest.html" target="_blank">TogoWS</a> REST service, which provides an API to multiple biological data sources. An article describing TogoWS <a href="http://www.ncbi.nlm.nih.gov/pubmed/20472643" target="_blank">was published in 2010</a>.</p>
<p>An excellent suggestion &#8211; and one which, I later discovered, I <a href="https://www.diigo.com/user/Neilfws/bioinformatics%20rest%20integration" target="_blank">had bookmarked</a>. Twice. As long ago as 2008. This &#8220;rediscovery of things I once knew&#8221; happens to me with increasing frequency now, which makes me wonder whether (1) we really are drowning in information, (2) my online curation tools/methods require improvement or (3) my mind is not what it was. Perhaps some combination of all three.</p>
<p>Anyway &#8211; using Ruby (1.8.7), a list of HGNC symbols given a KEGG pathway, <em>e.g.</em> <a href="http://www.genome.jp/kegg/pathway/hsa/hsa04010.html" target="_blank">MAPK signaling</a>, is as simple as:</p>
<pre class="brush: ruby; title: ; notranslate">
require 'rubygems'
require 'open-uri'
require 'json/pure'

j = JSON.parse(open(&quot;http://togows.dbcls.jp/entry/pathway/hsa04010/genes.json&quot;).read)
g = j.first.values.map {|v| /^(.*?);/.match(v)[1] }
# first 5 genes
g[0..4]
# [&quot;MAP3K14&quot;, &quot;FGF17&quot;, &quot;FGF6&quot;, &quot;DUSP9&quot;, &quot;MAP3K6&quot;]
</pre>
<p>This code parses the JSON returned from TogoWS into an array with one element; the element is a hash with key/value pairs of the form:</p>
<pre class="brush: plain; title: ; notranslate">
&quot;9020&quot;=&gt;&quot;MAP3K14; mitogen-activated protein kinase kinase kinase 14 [KO:K04466] [EC:2.7.11.25]&quot;
</pre>
<p>Values for all keys that I&#8217;ve seen to date begin with the HGNC symbol followed by a semicolon, making extraction quite straightforward with a simple regular expression.</p>
<br />Filed under: <a href='http://nsaunders.wordpress.com/category/bioinformatics/'>bioinformatics</a>, <a href='http://nsaunders.wordpress.com/category/programming/'>programming</a>, <a href='http://nsaunders.wordpress.com/category/ruby/'>ruby</a> Tagged: <a href='http://nsaunders.wordpress.com/tag/biostar/'>biostar</a>, <a href='http://nsaunders.wordpress.com/tag/how-to/'>how to</a>, <a href='http://nsaunders.wordpress.com/tag/kegg/'>kegg</a>, <a href='http://nsaunders.wordpress.com/tag/pathways/'>pathways</a>, <a href='http://nsaunders.wordpress.com/tag/rest/'>rest</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nsaunders.wordpress.com/3367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nsaunders.wordpress.com/3367/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3367&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nsaunders.wordpress.com/2013/04/22/how-to-remember-that-you-once-knew-how-to-parse-kegg/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e41743fa8aee7f5c7d1cd7ebfa77da85?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">nsaunders</media:title>
		</media:content>
	</item>
		<item>
		<title>A brief note: R 3.0.0 and bioinformatics</title>
		<link>http://nsaunders.wordpress.com/2013/04/04/a-brief-note-r-3-0-0-and-bioinformatics/</link>
		<comments>http://nsaunders.wordpress.com/2013/04/04/a-brief-note-r-3-0-0-and-bioinformatics/#comments</comments>
		<pubDate>Wed, 03 Apr 2013 22:07:26 +0000</pubDate>
		<dc:creator>nsaunders</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[3.0.0]]></category>
		<category><![CDATA[affymetrix]]></category>
		<category><![CDATA[bioconductor]]></category>
		<category><![CDATA[microarray]]></category>

		<guid isPermaLink="false">http://nsaunders.wordpress.com/?p=3363</guid>
		<description><![CDATA[Today marks the release of R 3.0.0. There will be plenty of commentary and useful information at sites such as R-bloggers (for example, Tal&#8217;s post). Version 3.0.0 is great news for bioinformaticians, due to the introduction of long vectors. What does that mean? Well, several months ago, I was using the simpleaffy package from Bioconductor [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3363&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Today marks <a href="https://stat.ethz.ch/pipermail/r-announce/2013/000561.html" target="_blank">the release of R 3.0.0</a>. There will be plenty of commentary and useful information at sites such as <a href="http://www.r-bloggers.com/" target="_blank">R-bloggers</a> (for example, <a href="http://www.r-bloggers.com/r-3-0-0-is-released-whats-new-and-how-to-upgrade/" target="_blank">Tal&#8217;s post</a>).</p>
<p>Version 3.0.0 is great news for bioinformaticians, due to the introduction of long vectors. What does that mean? Well, several months ago, I was using the <a href="http://www.bioconductor.org/packages/release/bioc/html/simpleaffy.html" target="_blank"><em>simpleaffy</em></a> package from Bioconductor to normalize Affymetrix exon microarrays. I began as usual by reading the CEL files:</p>
<pre class="brush: r; title: ; notranslate">
f &lt;- list.files(path = &quot;data/affyexon&quot;, pattern = &quot;.CEL.gz&quot;, full.names = T, recursive = T)
cel &lt;- ReadAffy(filenames = f)
</pre>
<p>When this happened:</p>
<pre class="brush: plain; title: ; notranslate">
Error in read.affybatch(filenames = l$filenames, phenoData = l$phenoData,  : 
  allocMatrix: too many elements specified
</pre>
<p>I had a relatively-large number of samples (337), but figured a 64-bit machine with ~ 100 GB RAM should be able to cope. I was wrong: due to a hard-coded limit to vector length in R, my matrix had become too large regardless of available memory. See <a href="http://blog.revolutionanalytics.com/2012/07/big-vectors-coming-to-r.html" target="_blank">this post</a> and <a href="http://stackoverflow.com/questions/1819418/r-error-allocmatrix" target="_blank">this StackOverflow question</a> for the computational details.</p>
<p>My solution at the time was to resort to <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3220870/" target="_blank">Affymetrix Power Tools</a>. Hopefully, the introduction of the LONG vector will make Bioconductor even more capable and useful.</p>
<br />Filed under: <a href='http://nsaunders.wordpress.com/category/bioinformatics/'>bioinformatics</a>, <a href='http://nsaunders.wordpress.com/category/programming/'>programming</a>, <a href='http://nsaunders.wordpress.com/category/statistics/r/'>R</a>, <a href='http://nsaunders.wordpress.com/category/statistics/'>statistics</a> Tagged: <a href='http://nsaunders.wordpress.com/tag/3-0-0/'>3.0.0</a>, <a href='http://nsaunders.wordpress.com/tag/affymetrix/'>affymetrix</a>, <a href='http://nsaunders.wordpress.com/tag/bioconductor/'>bioconductor</a>, <a href='http://nsaunders.wordpress.com/tag/microarray/'>microarray</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nsaunders.wordpress.com/3363/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nsaunders.wordpress.com/3363/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3363&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nsaunders.wordpress.com/2013/04/04/a-brief-note-r-3-0-0-and-bioinformatics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e41743fa8aee7f5c7d1cd7ebfa77da85?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">nsaunders</media:title>
		</media:content>
	</item>
		<item>
		<title>Git for bioinformaticians at the Bioinformatics FOAM meeting</title>
		<link>http://nsaunders.wordpress.com/2013/03/27/git-for-bioinformaticians-at-the-bioinformatics-foam-meeting/</link>
		<comments>http://nsaunders.wordpress.com/2013/03/27/git-for-bioinformaticians-at-the-bioinformatics-foam-meeting/#comments</comments>
		<pubDate>Tue, 26 Mar 2013 22:12:17 +0000</pubDate>
		<dc:creator>nsaunders</dc:creator>
				<category><![CDATA[australia]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[computing]]></category>
		<category><![CDATA[meetings]]></category>
		<category><![CDATA[csiro]]></category>
		<category><![CDATA[eresearch]]></category>
		<category><![CDATA[foam]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[ict]]></category>
		<category><![CDATA[slideshare]]></category>
		<category><![CDATA[version control]]></category>

		<guid isPermaLink="false">http://nsaunders.wordpress.com/?p=3357</guid>
		<description><![CDATA[Last week, I attended the annual Computational and Simulation Sciences and eResearch Conference, hosted by CSIRO in Melbourne. The meeting includes a workshop that we call Bioinformatics FOAM (Focus On Analytical Methods). This year it was run over 2.5 days (up from the previous 1.5 by popular request); one day for internal CSIRO stuff and [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3357&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Last week, I attended the annual <a href="http://research.ict.csiro.au/conferences/css" target="_blank">Computational and Simulation Sciences and eResearch Conference</a>, hosted by CSIRO in Melbourne. The meeting includes a workshop that we call Bioinformatics FOAM (Focus On Analytical Methods). This year it was run over 2.5 days (up from the previous 1.5 by popular request); one day for internal CSIRO stuff and the rest <a href="http://australianbioinformatics.net/bioinfo-foam-2013-program/" target="_blank">open to external participants</a>.</p>
<p>I had the pleasure of giving a brief presentation on the use of Git in bioinformatics. Nothing startling; aimed squarely at bioinformaticians who may have heard of version control in general and Git in particular but who are yet to employ either. I&#8217;m excited because for once I am free to share, resulting in my first upload to Slideshare in almost 4.5 years. You can <a href="http://www.slideshare.net/neilfws/version-control-in-bioinformatics-our-experience-using-git" target="_blank">view it here</a>, or at the <a href="http://www.slideshare.net/AustralianBioinformatics/version-control-in-bioinformatics-neil-saunders" target="_blank">Australian Bioinformatics Network Slideshare</a>, or in the embed below.</p>
<p><span id="more-3357"></span><br />
<iframe src='http://www.slideshare.net/slideshow/embed_code/17695884' width='425' height='348'></iframe></p>
<br />Filed under: <a href='http://nsaunders.wordpress.com/category/australia/'>australia</a>, <a href='http://nsaunders.wordpress.com/category/bioinformatics/'>bioinformatics</a>, <a href='http://nsaunders.wordpress.com/category/computing/'>computing</a>, <a href='http://nsaunders.wordpress.com/category/meetings/'>meetings</a> Tagged: <a href='http://nsaunders.wordpress.com/tag/csiro/'>csiro</a>, <a href='http://nsaunders.wordpress.com/tag/eresearch/'>eresearch</a>, <a href='http://nsaunders.wordpress.com/tag/foam/'>foam</a>, <a href='http://nsaunders.wordpress.com/tag/git/'>git</a>, <a href='http://nsaunders.wordpress.com/tag/ict/'>ict</a>, <a href='http://nsaunders.wordpress.com/tag/slideshare/'>slideshare</a>, <a href='http://nsaunders.wordpress.com/tag/version-control/'>version control</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nsaunders.wordpress.com/3357/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nsaunders.wordpress.com/3357/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3357&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nsaunders.wordpress.com/2013/03/27/git-for-bioinformaticians-at-the-bioinformatics-foam-meeting/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e41743fa8aee7f5c7d1cd7ebfa77da85?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">nsaunders</media:title>
		</media:content>
	</item>
		<item>
		<title>The end of Google Reader: a scientist&#8217;s perspective</title>
		<link>http://nsaunders.wordpress.com/2013/03/18/the-end-of-google-reader-a-scientists-perspective/</link>
		<comments>http://nsaunders.wordpress.com/2013/03/18/the-end-of-google-reader-a-scientists-perspective/#comments</comments>
		<pubDate>Mon, 18 Mar 2013 10:53:53 +0000</pubDate>
		<dc:creator>nsaunders</dc:creator>
				<category><![CDATA[google]]></category>
		<category><![CDATA[web resources]]></category>
		<category><![CDATA[google reader]]></category>
		<category><![CDATA[rss]]></category>

		<guid isPermaLink="false">http://nsaunders.wordpress.com/?p=3343</guid>
		<description><![CDATA[Since 2005, I have started almost every working day by using one Web application &#8211; an application that occupies a permanent browser tab on my work and home desktop machines. That application is Google Reader. If you&#8217;re reading this, you&#8217;re probably aware that Google Reader will cease to exist from July 1 2013. Others have [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3343&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Since 2005, I have started almost every working day by using one Web application &#8211; an application that occupies a permanent browser tab on my work and home desktop machines. That application is Google Reader.</p>
<p>If you&#8217;re reading this, you&#8217;re probably aware that Google Reader will <a href="http://googleblog.blogspot.com.au/2013/03/a-second-spring-of-cleaning.html" target="_blank">cease to exist from July 1 2013</a>. Others have ranted, railed against the corporate machine and expressed their sadness. I thought I&#8217;d try to explain why, for this working scientist at least, RSS and feed readers are incredibly useful tools which I think should be valued highly.</p>
<p><span id="more-3343"></span></p>
<div id="attachment_3350" class="wp-caption alignright" style="width: 310px"><a href="http://nsaunders.files.wordpress.com/2013/03/screenshot-from-2013-03-18-213303.png"><img class="size-medium wp-image-3350" alt="GReader" src="http://nsaunders.files.wordpress.com/2013/03/screenshot-from-2013-03-18-213303.png?w=300&#038;h=139" width="300" height="139" /></a><p class="wp-caption-text">Some feeds, yesterday</p></div>
<p><strong>RSS: a primer</strong><br />
When I first discovered the concept of <a href="http://en.wikipedia.org/wiki/RSS" target="_blank">RSS</a>, it was one of those moments that made me think: &#8220;this is so brilliant, simple and obvious &#8211; why isn&#8217;t everyone using this?&#8221;</p>
<p>In fact even today, very few of my immediate peers know what RSS is or why it&#8217;s useful. This may be an issue specific to Australian science, which is not exactly renowned for being at the cutting edge of the web revolution. However, for anyone else struggling with the concept, let&#8217;s spell it out:</p>
<p>The point of RSS is that:</p>
<ul>
<li>you can monitor multiple, diverse sources of information in one location (aggregation)</li>
<li>you don&#8217;t have to visit those sources until their content updates and your feed reader tells you when that happens</li>
</ul>
<p>What are these multiple, diverse sources of information? For a scientist they could include:</p>
<ul>
<li>Tables of contents from journals</li>
<li>Alerts and searches at key research-related websites <em>e.g.</em> NCBI PubMed</li>
<li>Science blogs</li>
<li>Saved job searches</li>
<li>Activity monitoring at personal websites</li>
</ul>
<p>Brilliant, simple, obvious. I wonder how scientists keep up to date in their field <em>without</em> RSS.</p>
<p><strong>The Rise of Google Reader</strong></p>
<p>Soon after launch, Google Reader rose to become the predominant feed reader. Undoubtedly, this was due in part to the brand. However, GReader does boast several key features which I believe contributed to its adoption:</p>
<ul>
<li>It&#8217;s part of the &#8220;Google suite&#8221;; one login, multiple applications; in other words it&#8217;s &#8220;just there&#8221;</li>
<li>It&#8217;s &#8220;in the cloud&#8221; and so available and synched on all your machines; no local setup</li>
<li>No need to read everything immediately; it&#8217;s a searchable archive (they did take their time implementing search though, didn&#8217;t they)</li>
<li>Sharing to multiple networks is relatively easy via the &#8220;send to&#8221; function (forget about the now-defunct sharing button)</li>
<li>Intelligent keyboard shortcuts and simple layout enable rapid click-through, allowing focus on interesting items and discarding of irrelevant ones</li>
</ul>
<p><strong>RSS Is Not Dead (even if you&#8217;d like it to be)</strong></p>
<p>Permit me a brief rant?</p>
<p>I&#8217;m tired of twenty-something hipster data scientists telling me that RSS is dead and has been supplanted by Twitter, Google+ and so on. Or as <a href="http://googlesystem.blogspot.com/2013/03/no-more-google-reader.html?showComment=1363224185416#c3922801478047681229" target="_blank">someone put it</a> at the Google Operating System blog: &#8220;not everyone likes seeing 1% of their news as it scrolls by&#8221;. It seems that there are those who would like RSS to be dead and who believe that if they repeat the phrase often enough, it will come true. They may be right.</p>
<p>I speculate that the popularity of RSS among (enlightened) scientists and librarians is an indication that it&#8217;s a tool for people who like to read things properly, slowly and in-depth.</p>
<p><strong>The Future</strong></p>
<p>Google can do what they like of course, none of us paid for the product and there are alternatives available. I&#8217;m currently trying <a href="http://www.feedly.com/" target="_blank">Feedly</a>, who assure us that their product will continue to work after July 1. My fear is that there are those who equate RSS with Google Reader and who see the demise of the latter as further evidence of the death of the former. And again, they may be right as a self-fulfilling prophecy takes hold.</p>
<p>However the Web evolves, I just hope there will always be tools and protocols to provide information for people with attention spans longer than a gnat and a requirement for serious research.</p>
<br />Filed under: <a href='http://nsaunders.wordpress.com/category/google/'>google</a>, <a href='http://nsaunders.wordpress.com/category/web-resources/'>web resources</a> Tagged: <a href='http://nsaunders.wordpress.com/tag/google-reader/'>google reader</a>, <a href='http://nsaunders.wordpress.com/tag/rss/'>rss</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nsaunders.wordpress.com/3343/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nsaunders.wordpress.com/3343/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3343&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nsaunders.wordpress.com/2013/03/18/the-end-of-google-reader-a-scientists-perspective/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e41743fa8aee7f5c7d1cd7ebfa77da85?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">nsaunders</media:title>
		</media:content>

		<media:content url="http://nsaunders.files.wordpress.com/2013/03/screenshot-from-2013-03-18-213303.png?w=300" medium="image">
			<media:title type="html">GReader</media:title>
		</media:content>
	</item>
		<item>
		<title>R/ggplot2 tip: aes_string</title>
		<link>http://nsaunders.wordpress.com/2013/02/26/rggplot2-tip-aes_string/</link>
		<comments>http://nsaunders.wordpress.com/2013/02/26/rggplot2-tip-aes_string/#comments</comments>
		<pubDate>Mon, 25 Feb 2013 23:40:44 +0000</pubDate>
		<dc:creator>nsaunders</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[research diary]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[aes]]></category>
		<category><![CDATA[aes_string]]></category>
		<category><![CDATA[ggplot2]]></category>

		<guid isPermaLink="false">http://nsaunders.wordpress.com/?p=3332</guid>
		<description><![CDATA[I&#8217;m a big fan of ggplot2. Recently, I ran into a situation which called for a useful feature that I had not used previously: aes_string. Imagine that you have data consisting of observations for several variables &#8211; let&#8217;s say A, B, C &#8211; where each observation is from one of two groups &#8211; call them [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3332&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I&#8217;m a big fan of <a href="http://ggplot2.org/" target="_blank">ggplot2</a>. Recently, I ran into a situation which called for a useful feature that I had not used previously: <em>aes_string</em>.<br />
<span id="more-3332"></span></p>
<p>Imagine that you have data consisting of observations for several variables &#8211; let&#8217;s say A, B, C &#8211; where each observation is from one of two groups &#8211; call them X and Y:</p>
<pre class="brush: r; title: ; notranslate">
df1 &lt;- data.frame(A = rnorm(50), B = rnorm(50), 
                  C = rnorm(50), group = rep(LETTERS[24:25], 25))
head(df1)
#           A          B           C group
# 1 0.2748922 -0.4805635 -1.80242191     X
# 2 0.0060852 -1.2972077  0.64262069     Y
# 3 0.1994655 -0.4628783  0.07670911     X
# 4 0.5416900  0.3853958  0.50193895     Y
# 5 0.3118773  0.9488503 -0.55855749     X
# 6 2.0924626  0.3027878 -0.03000122     Y
</pre>
<p>If you were interested in the distribution of variable A by group, you might generate a boxplot like so:</p>
<table>
<tr>
<td valign="top">
<pre class="brush: r; title: ; notranslate">
png(&quot;A.png&quot;, width = 800, height = 600)
print(ggplot(df1) + geom_boxplot(aes(group, A, fill = group)) + theme_bw())
dev.off()
</pre>
</td>
<td valign="top">
<div id="attachment_3335" class="wp-caption alignright" style="width: 310px"><a href="http://nsaunders.files.wordpress.com/2013/02/a.png"><img src="http://nsaunders.files.wordpress.com/2013/02/a.png?w=300&#038;h=225" alt="Boxplot of A by group" width="300" height="225" class="size-medium wp-image-3335" /></a><p class="wp-caption-text">Boxplot of A by group</p></div>
</td>
</tr>
</table>
<p>Here, the arguments to <em>aes()</em> are expressions (group, A) which <em>ggplot</em> interprets as column names from the data frame.</p>
<p>What if you wanted to generate plots for each of variable A, B and C using a loop? You might start like this:</p>
<pre class="brush: r; title: ; notranslate">
for(i in names(df1)[1:3])
# oh wait, these are characters not expressions
# [1] &quot;A&quot; &quot;B&quot; &quot;C&quot;
</pre>
<p>You see the problem. How do we pass the column names which are characters, not expressions, to <em>aes()?</em></p>
<p>The answer: use <em>aes_string()</em> instead.</p>
<pre class="brush: plain; title: ; notranslate">
Description:
     Aesthetic mappings describe how variables in the data are mapped
     to visual properties (aesthetics) of geoms.  Compared to aes this
     function operates on strings rather than expressions.
</pre>
<p>And so:</p>
<pre class="brush: r; title: ; notranslate">
for(i in names(df1)[1:3]) {
  png(paste(i, &quot;png&quot;, sep = &quot;.&quot;), width = 800, height = 600)
  df2 &lt;- df1[, c(i, &quot;group&quot;)]
  print(ggplot(df2) + geom_boxplot(aes_string(x = &quot;group&quot;, y = i, fill = &quot;group&quot;)) + theme_bw())
  dev.off()
}
</pre>
<p>It&#8217;s a little ugly as it stands (better to write a function using one of the <em>apply</em> family). However, the key point is: you can pass data frame column names as expressions to <em>aes()</em> or as characters to <em>aes_string()</em>.</p>
<p>With thanks to Hadley&#8217;s contribution to <a href="http://tolstoy.newcastle.edu.au/R/e3/help/07/12/6372.html" target="_blank">this mailing list thread</a>.</p>
<br />Filed under: <a href='http://nsaunders.wordpress.com/category/programming/'>programming</a>, <a href='http://nsaunders.wordpress.com/category/statistics/r/'>R</a>, <a href='http://nsaunders.wordpress.com/category/research-diary/'>research diary</a>, <a href='http://nsaunders.wordpress.com/category/statistics/'>statistics</a> Tagged: <a href='http://nsaunders.wordpress.com/tag/aes/'>aes</a>, <a href='http://nsaunders.wordpress.com/tag/aes_string/'>aes_string</a>, <a href='http://nsaunders.wordpress.com/tag/ggplot2/'>ggplot2</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nsaunders.wordpress.com/3332/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nsaunders.wordpress.com/3332/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3332&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nsaunders.wordpress.com/2013/02/26/rggplot2-tip-aes_string/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e41743fa8aee7f5c7d1cd7ebfa77da85?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">nsaunders</media:title>
		</media:content>

		<media:content url="http://nsaunders.files.wordpress.com/2013/02/a.png?w=300" medium="image">
			<media:title type="html">Boxplot of A by group</media:title>
		</media:content>
	</item>
		<item>
		<title>Basic R: rows that contain the maximum value of a variable</title>
		<link>http://nsaunders.wordpress.com/2013/02/13/basic-r-rows-that-contain-the-maximum-value-of-a-variable/</link>
		<comments>http://nsaunders.wordpress.com/2013/02/13/basic-r-rows-that-contain-the-maximum-value-of-a-variable/#comments</comments>
		<pubDate>Wed, 13 Feb 2013 04:08:17 +0000</pubDate>
		<dc:creator>nsaunders</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[research diary]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://nsaunders.wordpress.com/?p=3323</guid>
		<description><![CDATA[File under &#8220;I keep forgetting how to do this basic, frequently-required task, so I&#8217;m writing it down here.&#8221; Let&#8217;s create a data frame which contains five variables, vars, named A &#8211; E, each of which appears twice, along with some measurements: Now, let&#8217;s say we want only the rows that contain the maximum values of [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3323&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>File under &#8220;I keep forgetting how to do this basic, frequently-required task, so I&#8217;m writing it down here.&#8221;</p>
<p>Let&#8217;s create a data frame which contains five variables, <em>vars</em>, named A &#8211; E, each of which appears twice, along with some measurements:</p>
<pre class="brush: r; title: ; notranslate">
df.orig &lt;- data.frame(vars = rep(LETTERS[1:5], 2), obs1 = c(1:10), obs2 = c(11:20))
df.orig
#    vars obs1 obs2
# 1     A    1   11
# 2     B    2   12
# 3     C    3   13
# 4     D    4   14
# 5     E    5   15
# 6     A    6   16
# 7     B    7   17
# 8     C    8   18
# 9     D    9   19
# 10    E   10   20
</pre>
<p>Now, let&#8217;s say we want only the rows that contain the maximum values of <em>obs1</em> for A &#8211; E. In bioinformatics, for example, we might be interested in selecting the microarray probeset with the highest sample variance from multiple probesets per gene. The answer is obvious in this trivial example (6 &#8211; 10), but one procedure looks like this:<br />
<span id="more-3323"></span></p>
<pre class="brush: r; title: ; notranslate">
# use aggregate to create new data frame with the maxima
df.agg &lt;- aggregate(obs1 ~ vars, df.orig, max)
# then simply merge with the original
df.max &lt;- merge(df.agg, df.orig)
df.max
#   vars obs1 obs2
# 1    A    6   16
# 2    B    7   17
# 3    C    8   18
# 4    D    9   19
# 5    E   10   20
</pre>
<p>This also works using <em>min()</em> and, I guess, using any function that returns a single value per variable mapping to a value in the original data frame.</p>
<p>With thanks to this <a href="http://grokbase.com/t/r/r-help/126teytan0/r-selecting-rows-by-maximum-value-of-one-variables-in-dataframe-nested-by-another-variable" target="_blank">mailing list thread</a>.</p>
<br />Filed under: <a href='http://nsaunders.wordpress.com/category/programming/'>programming</a>, <a href='http://nsaunders.wordpress.com/category/statistics/r/'>R</a>, <a href='http://nsaunders.wordpress.com/category/research-diary/'>research diary</a>, <a href='http://nsaunders.wordpress.com/category/statistics/'>statistics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nsaunders.wordpress.com/3323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nsaunders.wordpress.com/3323/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3323&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nsaunders.wordpress.com/2013/02/13/basic-r-rows-that-contain-the-maximum-value-of-a-variable/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e41743fa8aee7f5c7d1cd7ebfa77da85?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">nsaunders</media:title>
		</media:content>
	</item>
		<item>
		<title>Genes x Samples: please explain</title>
		<link>http://nsaunders.wordpress.com/2013/02/12/genes-x-samples-please-explain/</link>
		<comments>http://nsaunders.wordpress.com/2013/02/12/genes-x-samples-please-explain/#comments</comments>
		<pubDate>Tue, 12 Feb 2013 04:11:44 +0000</pubDate>
		<dc:creator>nsaunders</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[research diary]]></category>

		<guid isPermaLink="false">http://nsaunders.wordpress.com/?p=3289</guid>
		<description><![CDATA[One of my bioinformatics pet peeves involves statements like this one, from the CNAmet user guide: Inputs to CNAmet are three m x n matrices, where m is the number of genes and n the number samples What we&#8217;re looking at here is the hot, but poorly-defined topic of data integration, in which biological measurements [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3289&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>One of my bioinformatics <a href="http://en.wikipedia.org/wiki/Pet_peeve" target="_blank">pet peeves</a> involves statements like this one, from the <a href="http://csbi.ltdk.helsinki.fi/CNAmet/index.html" target="_blank">CNAmet</a> user guide:</p>
<blockquote><p>
Inputs to CNAmet are three m x n matrices, where m is the number of genes and n the number samples
</p></blockquote>
<p>What we&#8217;re looking at here is the hot, but poorly-defined topic of <em>data integration</em>, in which biological measurements from two or more different platforms are somehow combined in a way that provides more information than each platform separately. Read any paper on this topic, download the software and you&#8217;ll find example datasets containing two or more matched matrices, with rows where measurements have been summarized to a &#8220;gene&#8221;. What you won&#8217;t find, typically, is a detailed explanation of the summarization procedure that you could implement yourself.</p>
<p><span id="more-3289"></span><br />
To their credit the authors of CNAmet are quite clear that the procedure used to generate these matrices is not their problem:</p>
<blockquote><p>
Since the three microarray platforms contain non-overlapping probes, the m dimension of the input matrices must match. This is because the problem of mapping measurements (probe to probe mapping) between different array types is not dealt with by CNAmet.
</p></blockquote>
<p>Two problems.</p>
<p>First, let&#8217;s face it, the very concept of an object called a &#8220;gene&#8221; is flawed; what we have in reality are fuzzy locations of transcriptional activity.</p>
<p>Second, some measurements summarize more readily than others. Exon expression arrays, for example, are frequently summarized to &#8220;gene level&#8221; by taking the median measurement of probesets in a transcript cluster. For copy number arrays, we might typically segment the measurements over each chromosome, then assign a number to a &#8220;gene&#8221; by determining overlap between gene and segment. However, something like a methylation array is more difficult; probesets map to different transcript-associated features (islands, shores, shelves) &#8211; which do we use?</p>
<p>Our group recently looked at several publications which tried to integrate measurements of methylation and gene expression. We found at least half a dozen ways of generating the &#8220;genes x samples&#8221; matrices, from <a href="http://genome.cshlp.org/content/22/7/1197.long" target="_blank">selecting one probe per gene</a> using particular criteria (<em>e.g.</em> highest variance) to <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2903569/" target="_blank">complex clustering procedures</a> based on chromosome coordinates. In one <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3528691/" target="_blank">horror show of a study</a>, the authors decided that it was fine to combine methylation data from their study with completely-unrelated publicly-available expression data. Why the reviewers and editors agreed is anyone&#8217;s guess.</p>
<p> My second law of bioinformatics, then:</p>
<blockquote><p>
On no account must the data pre-processing steps required to summarize multi-platform measurements to gene level be revealed
</p></blockquote>
<p>Seriously, if you have a great idea about the best way to combine, for example, measurements from the Affymetrix Human Exon 1.0 ST and the Illumina Infinium HumanMethylation450 beadchip &#8211; go for it in the comments.</p>
<br />Filed under: <a href='http://nsaunders.wordpress.com/category/bioinformatics/'>bioinformatics</a>, <a href='http://nsaunders.wordpress.com/category/research-diary/'>research diary</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nsaunders.wordpress.com/3289/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nsaunders.wordpress.com/3289/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3289&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nsaunders.wordpress.com/2013/02/12/genes-x-samples-please-explain/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e41743fa8aee7f5c7d1cd7ebfa77da85?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">nsaunders</media:title>
		</media:content>
	</item>
		<item>
		<title>Lots of &#8220;open goodness&#8221; in the AU/NZ region</title>
		<link>http://nsaunders.wordpress.com/2013/02/06/lots-of-open-goodness-in-the-aunz-region/</link>
		<comments>http://nsaunders.wordpress.com/2013/02/06/lots-of-open-goodness-in-the-aunz-region/#comments</comments>
		<pubDate>Wed, 06 Feb 2013 00:38:20 +0000</pubDate>
		<dc:creator>nsaunders</dc:creator>
				<category><![CDATA[australia]]></category>
		<category><![CDATA[australian news]]></category>
		<category><![CDATA[open access]]></category>
		<category><![CDATA[open science]]></category>
		<category><![CDATA[new zealand]]></category>
		<category><![CDATA[open data]]></category>
		<category><![CDATA[tim berners-lee]]></category>

		<guid isPermaLink="false">http://nsaunders.wordpress.com/?p=3305</guid>
		<description><![CDATA[January/February are exciting months for open [data&#124;research&#124;science&#124;access] proponents in our region &#8211; by which I mean Australia and New Zealand. First, we&#8217;ve enjoyed a speaking tour by Sir Tim Berners-Lee, during which he discussed the benefits of open data several times. I was able to attend two events in Sydney in person and a third, [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3305&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>January/February are exciting months for open [data|research|science|access] proponents in our region &#8211; by which I mean Australia and New Zealand.</p>
<p>First, we&#8217;ve enjoyed <a href="http://tbldownunder.org/" target="_blank">a speaking tour</a> by Sir Tim Berners-Lee, during which he discussed the benefits of open data several times. I was able to attend two events in Sydney in person and a third, <a href="http://linux.conf.au/" target="_blank">linux.conf.au</a>, by video stream. The events were the work of many people but in particular, <a href="https://twitter.com/piawaugh" target="_blank">Pia Waugh</a>. Go follow her on Twitter, now.</p>
<p>Next &#8211; I wish I had been able to get to this one &#8211; the <a href="https://sites.google.com/site/nzauopenresearch/" target="_blank">Open Research Conference</a> on February 6-7, University of Auckland. I&#8217;m enjoying the high-quality <a href="http://live.auckland.ac.nz/openresearch_oggb5.htm" target="_blank">live stream</a> right now. Flying the flag for Sydney are <a href="https://twitter.com/MatToddChem" target="_blank">Mat</a> and <a href="https://twitter.com/ceptional" target="_blank">Alex</a>.</p>
<p>Not strictly under the &#8220;open&#8221; umbrella but worth a mention anyway: software carpentry <a href="http://software-carpentry.org/bootcamps/2013-02-macquarie.html" target="_blank">is in town</a>, February 7-8, just up the road from me at Macquarie University. Looking forward to hearing some reports from that.</p>
<br />Filed under: <a href='http://nsaunders.wordpress.com/category/australia/'>australia</a>, <a href='http://nsaunders.wordpress.com/category/australian-news/'>australian news</a>, <a href='http://nsaunders.wordpress.com/category/open-access/'>open access</a>, <a href='http://nsaunders.wordpress.com/category/open-science/'>open science</a> Tagged: <a href='http://nsaunders.wordpress.com/tag/australia/'>australia</a>, <a href='http://nsaunders.wordpress.com/tag/new-zealand/'>new zealand</a>, <a href='http://nsaunders.wordpress.com/tag/open-data/'>open data</a>, <a href='http://nsaunders.wordpress.com/tag/tim-berners-lee/'>tim berners-lee</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nsaunders.wordpress.com/3305/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nsaunders.wordpress.com/3305/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3305&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nsaunders.wordpress.com/2013/02/06/lots-of-open-goodness-in-the-aunz-region/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e41743fa8aee7f5c7d1cd7ebfa77da85?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">nsaunders</media:title>
		</media:content>
	</item>
		<item>
		<title>It&#8217;s #overlyhonestmethods come to life!</title>
		<link>http://nsaunders.wordpress.com/2013/01/31/its-overlyhonestmethods-come-to-life/</link>
		<comments>http://nsaunders.wordpress.com/2013/01/31/its-overlyhonestmethods-come-to-life/#comments</comments>
		<pubDate>Wed, 30 Jan 2013 21:28:58 +0000</pubDate>
		<dc:creator>nsaunders</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[publications]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[microarray]]></category>
		<category><![CDATA[reproducibility]]></category>
		<category><![CDATA[retraction]]></category>

		<guid isPermaLink="false">http://nsaunders.wordpress.com/?p=3299</guid>
		<description><![CDATA[Retraction Watch reports a study of microarray data sharing. The article, published in Clinical Chemistry, is itself behind a paywall despite trumpeting the virtues of open data. So straight to the Open Access Irony Award group at CiteULike it goes. I was not surprised to learn that the rate of public deposition of data is [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3299&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="https://retractionwatch.wordpress.com/2013/01/30/study-links-failure-to-share-data-with-poor-quality-research-and-leads-to-a-plos-one-retraction/" target="_blank">Retraction Watch reports</a> a study of microarray data sharing. The article, published in Clinical Chemistry, is itself behind a paywall despite trumpeting the virtues of open data.  So straight to the <a href="http://www.citeulike.org/group/13803" target="_blank">Open Access Irony Award group</a> at CiteULike it goes.</p>
<p>I was not surprised to learn that the rate of public deposition of data is low, nor that most deposited data ignores standards and much of it is low quality. What did catch my eye though, was a <a href="http://www.plosone.org/annotation/listThread.action;jsessionid=C9C431B577C95A51FA3637F4E622EC77?root=53705" target="_blank">retraction notice</a> for one of the articles from the study, in which the authors explain the reason for retraction.<br />
<span id="more-3299"></span><br />
Two phrases in particular stand out:</p>
<blockquote><p>
we discovered an error in the data fed into the software
</p></blockquote>
<blockquote><p>
This decision was based on the instructions from the software during the initial data feed process
</p></blockquote>
<p>The language used strongly suggests a process whereby data was blindly &#8220;fed&#8221; into software, with little or no understanding of either how the software worked or the statistical methods employed. To quote Bill in our Twitter discussion:</p>
<blockquote class='twitter-tweet'><p>@<a href="https://twitter.com/neilfws">neilfws</a> &quot;decision was based on the instructions from the software&quot; What the&#8230;? It&#039;s <a href="http://twitter.com/search?q=%23overlyhonestmethods" title="#overlyhonestmethods">#overlyhonestmethods</a> come to life!&mdash; <br />Bill Hooker (@sennoma) <a href='http://twitter.com/#!/sennoma/status/296714247545683968' data-datetime='2013-01-30T20:19:23+00:00'>January 30, 2013</a></p></blockquote>
<p>If you are in this situation, seek help. Talk to a friendly local statistician. Or if there isn&#8217;t one, do your research on the Web before publishing. At the very least, try to ensure that what you&#8217;re doing corresponds broadly with what most other people in the field would consider &#8220;best practice&#8221; &#8211; even if figuring out what that is, from the literature, is not always easy.</p>
<br />Filed under: <a href='http://nsaunders.wordpress.com/category/bioinformatics/'>bioinformatics</a>, <a href='http://nsaunders.wordpress.com/category/publications/'>publications</a>, <a href='http://nsaunders.wordpress.com/category/statistics/'>statistics</a> Tagged: <a href='http://nsaunders.wordpress.com/tag/microarray/'>microarray</a>, <a href='http://nsaunders.wordpress.com/tag/reproducibility/'>reproducibility</a>, <a href='http://nsaunders.wordpress.com/tag/retraction/'>retraction</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nsaunders.wordpress.com/3299/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nsaunders.wordpress.com/3299/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3299&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nsaunders.wordpress.com/2013/01/31/its-overlyhonestmethods-come-to-life/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e41743fa8aee7f5c7d1cd7ebfa77da85?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">nsaunders</media:title>
		</media:content>
	</item>
		<item>
		<title>The future of science publishing from 1996</title>
		<link>http://nsaunders.wordpress.com/2013/01/11/the-future-of-science-publishing-from-1996/</link>
		<comments>http://nsaunders.wordpress.com/2013/01/11/the-future-of-science-publishing-from-1996/#comments</comments>
		<pubDate>Thu, 10 Jan 2013 22:36:27 +0000</pubDate>
		<dc:creator>nsaunders</dc:creator>
				<category><![CDATA[publications]]></category>
		<category><![CDATA[altmetrics]]></category>
		<category><![CDATA[history]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[sydney brenner]]></category>
		<category><![CDATA[www]]></category>

		<guid isPermaLink="false">http://nsaunders.wordpress.com/?p=3284</guid>
		<description><![CDATA[Floating by in the Twitter stream, this from @leonidkruglyak. It leads to a light-hearted opinion(ated) piece by Sydney Brenner in Current Biology, 1996. In 1996, you may recall, the Web was just a few years old. Amusingly (sadly?), it seems that Brenner predicted many of the topics in science publishing that we&#8217;re still discussing in [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3284&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Floating by in the Twitter stream, <a href="https://twitter.com/leonidkruglyak/status/289478980153798656" target="_blank">this  from  @leonidkruglyak</a>. It leads to a light-hearted <a href="http://www.sciencedirect.com/science/article/pii/S0960982202005493" target="_blank">opinion(ated) piece</a> by Sydney Brenner in Current Biology, 1996.</p>
<p>In 1996, you may recall, the Web was just a few years old. Amusingly (sadly?), it seems that Brenner predicted many of the topics in science publishing that we&#8217;re still discussing in 2013. It&#8217;s just that he thought they would be implemented in no time at all.</p>
<p>For example, open refereeing: </p>
<blockquote><p>
It is incidents such as this that have led me to question whether the anonymity of referees needs to be guarded so closely
</p></blockquote>
<p>Self-publishing/archiving and post-publication peer review:</p>
<blockquote><p>
The electronic pre-print with open discussion (not refereeing) will soon become commonplace; in fact, labs could go into the publication business by themselves
</p></blockquote>
<p>Demise of the journal impact factor, publishing economics and altmetrics:</p>
<blockquote><p>
We will need something to substitute for the present ratings given to papers appearing in ‘superior, peer-reviewed publications’ (and commercial publishers will find ways of making people pay for this)
</p></blockquote>
<blockquote><p>
Perhaps we should have a readership index; it should not be beyond the wit of man to devise a way of recording whenever a paper is read, hard-copied or cited
</p></blockquote>
<p>As Ethan said:</p>
<blockquote class='twitter-tweet' lang='en'><p>@<a href="https://twitter.com/neilfws">neilfws</a> what&#039;s taking so long?!</p>&mdash; <br />Ethan Perlstein (@eperlste) <a href='http://twitter.com/#!/eperlste/status/289496813747183616' data-datetime='2013-01-10T22:19:53+00:00'>January 10, 2013</a></blockquote>
<br />Filed under: <a href='http://nsaunders.wordpress.com/category/publications/'>publications</a> Tagged: <a href='http://nsaunders.wordpress.com/tag/altmetrics/'>altmetrics</a>, <a href='http://nsaunders.wordpress.com/tag/history/'>history</a>, <a href='http://nsaunders.wordpress.com/tag/publishing/'>publishing</a>, <a href='http://nsaunders.wordpress.com/tag/sydney-brenner/'>sydney brenner</a>, <a href='http://nsaunders.wordpress.com/tag/www/'>www</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nsaunders.wordpress.com/3284/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nsaunders.wordpress.com/3284/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nsaunders.wordpress.com&#038;blog=334198&#038;post=3284&#038;subd=nsaunders&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nsaunders.wordpress.com/2013/01/11/the-future-of-science-publishing-from-1996/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e41743fa8aee7f5c7d1cd7ebfa77da85?s=96&#38;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">nsaunders</media:title>
		</media:content>
	</item>
	</channel>
</rss>
