Cohort Integration


The Cohort Integration method uses as input somatic variants annotated with EA. For each gene, the method determines if the distribution of EA scores for the single nucleotide variants (SNVs) of that gene differs from the distribution of EA scores for all somatic SNVs of that gene. The EA distribution of the somatic SNVs may indicate loss of function (sLOF) when skewed to large EA values or gain of function (sGOF) when skewed to intermediate EA values. The method also accounts for the frequency of other types of functionally impactful somatic variants within each gene (i.e. stop loss, in-frame indels, frameshift indels).

Installation:

To install the Cohort Integration software:

  1. Install the required python version (python 2.7) and packages found in cohort_integration.yml.

  2. Download the CohortInteg_SupplementalMaterial.py script.

  3. Download the following supporting files in the directory that contains the CohortInteg_SupplementalMaterial.py script.

Example:

  An example input file can be downloaded here: example_input.ANNOVAR_EA

  This example input file is a subset of variants from TCGA BLCA samples that have been ANNOVAR annotated with EA scores, RefSeq NMIDs, variant classifications, and the corresponding amino acid substitutions.

Run:

Create:

-input_directory: contains the input file (example_input.ANNOVAR_EA)

-output_directory: user defined location of where output files will be saved

Terminal command:

  python CohortInteg_SupplementalMaterial.py input_directory output_directory