TMTOWTDI
Hong Zheng / 2017-08-30
There’s more than one way to do it. (TMTOWTDI)
Although this is a perl motto, it applies to R, Python, bash, whatever language you name it.
The commands I put together here are very basic data manipulating techniques. I use Linux/Bash and R a lot, but often I forget the specific syntax and have to look it up somewhere. Here is the collection of them:
Calculate the sum of one column 计算一列之和
- R
In R it is pretty simple. Thesum
function does it. - Bash
Suppose the column to be calculated in the 1st column
`awk ‘BEGIN{sum=0}{sum+=$1}END{print sum}’
Calculate the sum of every column 计算每列之和
- R
In R it is still pretty simple. TherowSums
function does it. - Bash
awk ‘{for(n=1;n<=NF;n++)t[n]+=$n}END{for(n=1;n<=NF;n++)printf t[n]” “}’
Transpose (switch rows and columns) 行列转置
- R
Simple. Uset
- Bash
awk -F "\t" '{for(i=1;i<=NF;i++) a[i,NR]=$i}END{for(i=1;i<=NF;i++) {for(j=1;j<=NR;j++) printf a[i,j] "\t";print ""}}'
Subtraction of adjecent rows 邻行相减
- Bash
awk 'NR==1{for(i=1;i<=NF;i++)hash[i]=$i;}NR>1{for(i=1;i<=NF;i++){printf("%d\t",$i-hash[i]);hash[i]=$i;}printf("\n");}'
Order according to row 行排序
- Bash
awk ' {split( $0, a, " " ); asort( a ); for( i = 1; i <= length(a); i++ ) printf( "%s ", a[i] ); printf( "\n" ); }' input >output
Output according to column names 根据列名输出
- R
Very, very simple in R. - Bash
Suppose we want to output columns with names “#CHROM”, “POS”, “REF”, and “ALT”.
awk 'BEGIN{FS="\t";OFS="\t"}NR==1 {for(i=1;i<=NF;i++){c[$i]=i}} NR>1{print$c["#CHROM"],$c["POS"],$c["REF"],$c["ALT"]}' file
Concatenate every two lines 每两行合并
- Bash
sed '$!N;s/\n/\t/g' input
or
sed 'N;s/\n//g' input
Concatenate every three lines 每三行合并
- Bash
sed 'N;N;s/\n/\t/g'
Upper and lower case transform 大小写转换
- Bash
sed 's/[a-z]/\u&/g'
# all lower to upper
sed 's/[A-Z]/\l&/g'
# all upper to lower
sed 's/\b[a-z]/\u&/g'
# First letter of word, lower to upper
Concatenate every line in a file
- Perl
perl -p -e 's/\n/\t/' input
Other BASH tips
Use user-specified seperators in sort
sort -t $'\t' -k1