GSTAT(v1.0) is a collection of R functions for statistical analysis of genome-wide data, by Qunyuan Zhang (qunyuan@wustl.edu), DSG. 05-02-2009 updated HOW TO USE (currently only available for GSC & DSG users) TO USE THESE FUNCTIONS, INCLUDE THE LINE BELOW IN YOUR R SCRIPTS ----------------------------------------- source("/gscmnt/sata180/info/medseq/biodb/shared/CancerGenomicTool/GDAC_statistical_packages/GSTAT/gstat.lib") ----------------------------------------- ###################### MUTATION ANALYSIS ###################### [FUNCTION] mutation_freq_test() To test if mutation frequencies of genes or protein domains are significant [USAGE#1] ------------------------------------------------------------------------------------ mutation_freq_test(inf="input_data_file",outf="output_file",backgrd=0.000001) ------------------------------------------------------------------------------------ [DATA FORMAT] Three columns, tab delimited (gene name, total mutation number, total bp length) Example input_data_file "/gscuser/qzhang/gstat/example_data/mutation_data.csv" [EXAMPLE RUN] mutation_freq_test(inf="/gscuser/qzhang/gstat/example_data/mutation_data.csv", outf="my_output_file.csv",backgrd=0.000001) [NOTE] You may specify you own backgrd value (i.e., backgroud mutaion frequency) [USAGE#2] ------------------------------------------------------------------------------------ mutation_freq_test(inf="input_data_file",outf="output_file",option="protein") ------------------------------------------------------------------------------------ [DATA FORMAT] Five columns, tab delimited (gene name, total mutation number of a protein domain, total mutation number of genes related to a protein domain,total bp length of protein domain, total bp length of genes related to a protein domain) Example input_data_file "/gscuser/qzhang/gstat/example_data/fmb.wbroad.csv" [EXAMPLE RUN] mutation_freq_test(inf="/gscuser/qzhang/gstat/example_data/fmb.wbroad.csv", outf="my_output_file.csv",option="protein") [NOTE] None ********************************************************************************* [FUNCTION] mutation_drive_test() To test if a gene's mutaion affects the overall mutation rate of all other genes [USAGE] ------------------------------------------------------------------------------- mutation_drive_test(inf="input_data_file",outf="output_file_name") ------------------------------------------------------------------------------- [DATA FORMAT] A data matrix (or file), multiple columns, tab delimited (sample_id, mutation numbers of gene 1, gene 2, ...., gene N) Example input_data_file "/gscuser/qzhang/gstat/example_data/WU_broad_4mutations.matrix.csv" [EXAMPLE RUN] mutation_drive_test(inf="/gscuser/qzhang/gstat/example_data/WU_broad_4mutations.matrix.csv", outf="my_output_file_name.csv") [NOTE] You can use option alt="less", defult is alt="greater" ##################### PATHWAY ANALYSIS ##################### [FUNCTION] mutation_pathway_test() To test if a pathway (or a group of genes) has significant higher mutation rate than others [USAGE] ---------------------------------------------------------------------------------------- mutation_pathway_test(inf_mut="mutation_file",inf_path="pathway_file",outf="output_file") ---------------------------------------------------------------------------------------- [DATA FORMAT] Example mutation_file "/gscuser/qzhang/gstat/example_data/mutation_data.csv" Format: three columns, tab delimited (gene name, total mutation number, total bp length) Example pathway_file "/gscuser/qzhang/gstat/example_data/gene_path_table_071227.txt" Format: three (or more) columns, tab delimited (gene ID, gene name, pathway or gene group name 1,pathway or gene group name 2,pathway or gene group name 3 ...) [EXAMPLE RUN] mutation_pathway_test(inf_mut="/gscuser/qzhang/gstat/example_data/mutation_data.csv", inf_path="/gscuser/qzhang/gstat/example_data/gene_path_table_071227.txt", outf="example_output.csv") ******************************************************************************** [FUNCTION] pathway_test() To test if a pathway enrichment test [USAGE] ---------------------------------------------------------------------------------------------------------------------- pathway_test (x,gcolname="gene",lev=c("lev1","lev2","lev3"),yn=c("yn1","yn2","yn3"),cutoff=0.5,outf=NULL,sep="") ---------------------------------------------------------------------------------------------------------------------- #################################### DNA/GENE COPY NUMBER ANALYSIS #################################### cn.log2ratio(sampleinfo,id=NULL,datadir=NULL,outdir="_log2ratio",sep="",des="tumor/normal pair") cn.swt(x,wsize=30,wby=1,sd.range=5,chr="all", chr.col="CHR",pos.col="POS",cn.col="cn", out.file=NULL,out.rdata=NULL,sep="",draw=F) cn.gene.test.all(x,out.file=NULL,out.rdata=NULL,cn.lim=4,sep="") cn.smooth.indv(datadir=NULL,id=NULL,outdir="_smooth.indv") #################################### GENERALIZED CORRELATION TEST #################################### cor2test(y,x=NULL,method="cor",cutoff=1,sep="",outf=NULL) ######################### GENOME VISUALIZATION ######################### plotgenome(tt, y="p",cutoff=NULL,cutline=2,img=NULL,yscale=NULL,draw=TRUE, chrom=NULL,mbp=NULL,chr.col="chromosome",pos.col="position",tombp=T) ########################### TWO SAMPLE TEST ########################### twogrouptest(x,id1,id2,permuN=0) x: a matrix, rows are samples, columns are varaibles id1 & id2: row IDs for samples from group 1 & 2 permuN: permutation number, 0 means t test