plink自分用メモ

ちょっと使って見ているけど、機能多すぎ。なので自分用メモ。本家のドキュメントを引用しつつ自分に優しくいじっているので要注意。
とりあえず adaptive permutation(ダメっぽいものはさっさと見切りをつけてしまう)が速くて良いなぁ。
ツールとしての概要が知りたい方は、id:ryamada22氏のコンテンツがあるのでそちらを参照して下さい。


http://pngu.mgh.harvard.edu/~purcell/plink/

http://pngu.mgh.harvard.edu/~purcell/plink/reference.shtml#options

  • PEDファイルフォーマト

http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped

Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)
Phenotype (-9 missing; 0 missing; 1 unaffected; 2 affected)

#で始めるとコメント扱いできるらしい。

  • MAPファイルフォーマット

http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#map

chromosome (1-22, X, Y or 0 if unplaced)
rs# or snp identifier
Genetic distance (morgans)
Base-pair position (bp units)

コマンドライン引数に--map3をつけるとMAPファイルはカラム3つとして認識されることになる。

plink --file mydata --map3

この場合は、以下のカラムと受け取られるので注意。

In this case, the three columns are expected to be
chromosome (1-22, X, Y or 0 if unplaced)
rs# or snp identifier
Base-pair position (bp units)

2008/11/01の時点で X/Y/MTはそれぞれ23/24/25と結果には出力されるようになったようだ (GWS6に対応のためかな)。
PED ファイルから適当に生成するスクリプトでとりあえずラン出来るようにしちゃった。

  • (使ってないから分からない)covariate ファイル

http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#covar
logistic とかも出来るようだけど、covariate はどう与えるのか?試したくなったら調べる。

  • (使ってないけど便利そう)Remove a subset of individuals

http://pngu.mgh.harvard.edu/~purcell/plink/dataman.shtml#remove

plink --file data --remove mylist.txt
  • (便利)HWE のまとめ算出

http://pngu.mgh.harvard.edu/~purcell/plink/summary.shtml#hardy

plink --file data --hardy
  • (便利)MAF とかのまとめ算出

http://pngu.mgh.harvard.edu/~purcell/plink/summary.shtml#freq

plink --file data --freq
  • (便利)サンプル当たりの call rate でフィルター

http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml#miss2

plink --file mydata --mind 0.1

ちなみにデフォルトは「Less than 0.10 Missing rate per individual --mind」だそうな。

ちなみにデフォルトは「Less than 0.10 Missing rate per SNP --geno」だそうな。

  • (便利)MAF でフィルター

http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml#maf

plink --file mydata --maf 0.05

ちなみにデフォルトは「Greater than 0.01 Minor allele frequency --maf」だそうな。

  • (便利)SNP 当たりの call rate でフィルター

http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml#miss1

plink --file mydata --geno 1
  • (便利)HWE でフィルター

http://pngu.mgh.harvard.edu/~purcell/plink/thresh.shtml#hwd

plink --file mydata --hwe 0.001
  • (便利)特定の SNP をフィルターする

plink --file plink_test_input03 --assoc --noweb --map3 --exclude chr.X.list.txt > removed.hoge

chr.X.list.txt には除きたい SNP の identifier がリストされている。

http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#cc

plink --file mydata --assoc

which generates a file

plink.assoc

which contains the fields:

CHR Chromosome
SNP SNP ID
BP Physical position (base-pair)
A1 Minor allele name (based on whole sample)
F_A Frequency of this allele in cases
F_U Frequency of this allele in controls
A2 Major allele name
CHISQ Basic allelic test chi-square (1df)
P Asymptotic p-value for this test
OR Estimated odds ratio (for A1)

  • Alternate / full model association tests

Cochran-Armitage trend test
Genotypic (2 df) test
Dominant gene action (1df) test
Recessive gene action (1df) test

http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#model

plink --file mydata --model
  • Linear and logistic models

http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#glm

plink --bfile mydaya --logistic

depending on the phenotype/command used. The basic format is:

CHR Chromosome
SNP SNP identifier
TEST Code for the test (see below)
NMISS Number of non-missing individuals included in analysis
BETA Regression coefficient (--linear) or odds ratio (--logistic)
STAT Coefficient t-statistic
P Asymptotic p-value for t-statistic

The TEST column is by default ADD meaning the additive effects of allele dosage. Adding the option

--genotypic

will generate file which will have two extra tests per SNP, corresponding to two extra rows: DOMDEV and GENO_2DF which represent a separate test of the dominance component or a 2 df joint test of both additive and dominance (i.e. corresponding the the general, genotypic model in the --model command). Unlike the dominance model is the --model, DOMDEV refers to a variable coded 0,1,0 for the three genotypes AA,Aa,aa, i.e. representing the dominance deviation from additivity, rather specifying that a particular allele is dominant or recessive. That is, the DOMDEV term is fitted jointly with the ADD term in a single model.

NOTE! The coding PLINK uses with the 2 df --genotypic model involves two variables representing an additive effect and a dominance deviation;

A D
AA 0 0
AB 1 1
BB 2 0

Although the 2df test will be identical, you would not expect to see similar p-values, etc for the two individual terms if instead you used a different version of "genotypic" coding, e.g. in another analysis package, such as using dummy variables to represent genotypes:

G1 G2
AA 0 0
AB 1 0
BB 0 1

That is, although fundamentally the same, in terms of the 2df test, the interpretation of the two individual terms is different in these two cases. To achieve this coding in PLINK (v1.02 onwards), add the --hethom flag as well as --genotypic.

In a related note, you would not always expect the ADD p-value to be the same when entering in the dominance term as it is without it; if in doubt, you are advised to stick to just interpreting the 2 df test if using the --genotypic option.

To specify a model assuming full dominance (or recessive) for the minor allele (i.e. rather than the 2 df model mentioned above), you can specify with either

--dominant

or

--recessive

後ろの方に --covar の例がある

  • Adjustment for multiple testing: Bonferroni, Sidak, FDR, etc

http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#adjust

plink --file mydata --assoc --adjust
  • Permutation いろいろ

http://pngu.mgh.harvard.edu/~purcell/plink/perm.shtml

  • SNP annotation を取ってくる

http://pngu.mgh.harvard.edu/~purcell/plink/psnp.shtml

plink --lookup rs6622
    • SNP simulation routine

http://pngu.mgh.harvard.edu/~purcell/plink/simulate.shtml

http://pngu.mgh.harvard.edu/~purcell/plink/ibdibs.shtml

  • テストデータの作成
plink --simulate wgas.sim --recode --out sim1 --simulate-ncases 200 --simulate-ncontrols 250 --noweb
plink --file sim1.recode --assoc --noweb --adjust

http://pngu.mgh.harvard.edu/~purcell/plink/simulate.shtml