Fixing aberrant files using R and the shell: a case study

Once in a while, you embark on what looks like a simple computational procedure only to encounter frustration very early on. “I can’t even read my file into R!” you cry.

Step back, take a deep breath and take note of what the software is trying to tell you. Most times, you’ve just missed something very straightforward. Here’s an example.

Update: this post is not about how best to perform the task; it’s about how to cope with frustration. Please stop sending me your solutions :-)
Continue reading

Dropbox tip continued: convert a file tree to HTML

A couple of posts ago, I outlined a small bash script to generate an index.html file, containing links to other files in a directory. This was for generating links to files in a Dropbox public directory.

I had completely forgotten about the very useful UNIX/Linux command named tree. If not installed, it should be in your distribution repository (e.g. sudo apt-get install tree for Ubuntu/Debian). Then simply:

cd Dropbox/Public/mydirectory
tree -H . > index.html
Next, navigate to index.html at the Dropbox website and you should see something like the tree on the right. It’s a little ugly and obviously, not as convenient as something like Github, but can be a good quick and dirty fix if you need to share a hierarchy of directories and files.

A quick Bash tip: add an index.html file to a Dropbox public folder

You know that Dropbox is terrific, of course. No? Go and check it out now.

One issue: files in your Public folder have a public URL, that you can send to other people. Unfortunately, directories do not. So how do you share a public directory full of files?

Answer: create an index.html file and share that. Let’s say that your files end in “.txt” and reside in ~/Dropbox/Public/entrez. Do this:

cd ~/Dropbox/Public/entrez
echo "<ol>" > index.html
for i in `ls *.txt`; do echo "<li><a href='$i'>$i</a></li>" >> index.html; done
echo "</ol>" >> index.html

Now you can share the link to the index.html, which when clicked will display a list of links to all the other files in the directory.

Text to fasta and other delights of the shell

One thing I’ve learned in my current job is that some familiarity with Linux tools for processing text files: awk, sed, grep, head/tail, cut/paste and so on, often provides a speedier solution than writing a script in (insert scripting language of choice here). I know this stuff is trivial to shell gurus, but I still get a little buzz out of it. A couple of real-life examples.
Read the rest…

Linux tip: sort a tab-delimited file

The Linux command “sort” is both powerful and confusing. The manpage tells us that the “-t” switch can be used to set the field delimiter.

If you’ve tried various combinations of “-t” and “\t” to tell sort that your file is tab-delimited without success, try this (bash shell):

TAB=`echo -e "\t"`
sort -t"$TAB" myfile

with “-kN” as appropriate, where N is the column on which to sort.

Long-winded discussion with much incorrect syntax in this forum; or get straight to the point in this mail archive.