Documentation


Abbreviations

Populations

AFRAfrican/African American
AMRLatino
EASEast Asian
EUREuropean
FINFinnish
NFENon-Finnish European
SASSouth Asian
OTHOther

N.B. if a population classification is not used by a database the value None is given.


Diseases

CMcardiomyopathy
DCMdilated cardiomyopathy
HCMhypertrophic cardiomyopathy
MDmuscle disease
ARVCarrythmogenic right ventricular cardiomyopathy
PPCMperipartum cardiomyopathy
MmD-HDmulti-minicore disease with heart disease
CNMcentronuclear myopathy
LGMD2Jlimb-girdle muscular dystrophy type 2J
RCMrestrictive cardiomyopathy
TMDtibial muscular dystrophy
MFMmyofibrillar myopathy
HMERFhereditary myopathy with early respiratory failure
MCAmyopathy with cytoplasmic aggregates
IHiguinal hernia

Data

1000 genomes variant data [1] was obtained from dbSNP [2]. EXaC variant data was obtained from the ExAC webserver [3]. Titin related disease nsSNVs were obtained from 'A rising titan: TTN review and mutation update' [4]. Disease associated nsSNVs reported in the literature discovered since the publication of this paper were queried for on PubMed using the terms "("titin"[All Fields]) AND ("snp"[All fields])","("titin"[All Fields]) AND ("mutation"[All fields])" and "("titin"[All Fields]) AND ("variant"[All fields])".

Definition of titin domain boundaries

HMMER [5] was used to scan the protein sequence of titin IC variant (NP_001254479.2, obtained from the RefSeq database [6] against Pfam seed libraries [7]. Where hits overlapped the hit with the lowest E-value was accepted. When the lowest E-value hit for a region was greater than 0.0001 additional evidence was required to accept a hit. This was the case for domain Ig-94, which was identified with an E-value higher than the threshold 0.0001, but is verified by PDB structures 1WAA, 1TIU and 1TIT. Ig-88, Ig-89, Ig-90 and Ig-98 were also identified with E-values higher than the threshold; however when the titin sequence was scanned using an HMM created from an alignment of all (165) other titin Ig domains, these were identified with high significance (5.9E-11, 1.2E-12, 2.9E-10,7.4E-12).

Sequences logos created using Weblogo [8] (see Fig 1) showing aligned titin Fn3 sequences, differ substantially from such logos depicting Pfam seed alignments; in particular towards the end of the sequence where the conservation drops of gradually. Therefore the boundary does not appear to be clearly defined from sequence alone. When mapped onto structure it becomes clear that the Pfam defined boundaries do not cover the whole domain. Due to this information it was decided the Pfam Fn3 domain boundaries were not appropriate. Therefore Fn3 domains were initially identified using Pfam/HMMER and the sequences of these domains, including an extra 5 amino acids upstream and 16 amino acids downstream of the Pfam defined boundary, were aligned using T-coffee. This alignment was cut using structural information from available titin Fn3 crystal structures, in particular 3LPW. An HMM was created from this alignment and titin scanned again using this HMM to redefine Fn3 boundaries.

Fig 1. Sequence logos showing aligned titin domains and Pfam seed alignments for Ig and Fn3 domains. The Pfam Fn3 domain definition can be seen mapped onto structure in purple with structure absent from the Pfam definition in turquoise and blue.

Mapping of titin isoforms

Stretcher [9] was used to align all titin isoforms to the IC variant. Isoform sequences were obtained from RefSeq. Positions were mapped according to these alignments.

Modelling of titin domains

An automated homology modelling pipeline was set up. The pipeline takes a fasta file of domain sequences as input and uses only publicly available PDB structures as templates. The overall modelling process can be seen in Fig 2A and a flow diagram detailing the template selection process is depicted in Fig 2B. The template search, modelling and model assessment were performed using Modeller [10]; the alignment of query and templates performed using 3DCoffee [11]; and the overall pipeline produced using Python 2.7.

Fig 2. Flow-diagrams showing A) an outline of the modelling pipeline B) template selection criteria.

The I-TASSER server [12] was used to model Ig-112 as a satisfactory (negative) z-dope score was not obtained using Modeller

In silico assessment of the impact of nsSNVs

The in silico assessment of known nsSNVs occurring within Ig and Fn3 domains was performed using DUET [13]. The prediction of impact for all possible SAVs which localise to domain structures was carried out using mCSM [14]. Where experimental structures were available these were used for the assessment. Where no experimental structures were available the model with the lowest zDOPE score was used. See the table below for experimental structures used in the in silico assessment of nsSNVs.

domain PDB structure method resolution
Ig-12a38xray2.00
Ig-22a38xray2.00
Ig-101g1cxray2.10
Ig-185jddxray1.53
Ig-195jddxray1.53
Ig-205jddxray1.53
Ig-845joexray2.00
Ig-941waaxray2.00
Ig-1563lcyxray2.50
Ig-1573lcyxray1.99
Ig-1582j8hxray1.99
Ig-1592j8hxray1.99
Ig-1602bk8xray1.69
Ig-1633qp3xray2.00
Ig-1641tnnNMRNA
Ig-1663pucxray0.96
Ig-1693knbxray1.40
Fn3-34o00xray1.85
Fn3-621bpvNMRNA
Fn3-663lpwxray1.65
Fn3-673lpwxray1.65
Fn3-1322nzixray2.9

The assessment of the impact of SNVs on protein-protein interactions was performed using mCSM [14], where experimental binary complexes were available. See the table below for structures used for these calculations.

domain PDB structure method resolution interacting protein
Ig-11ya5xray2.44TCAP
Ig-21ya5xray2.44TCAP
Ig-1693knbxray1.40OBSL1
Ig-1694c4kxray1.95OBSCN

Assessment of all nsSNVs was performed using the sequence-based method Condel [15].

Definition of structural elements

Interface and core regions were defined using POPS [16]. Residues with a Q(SASA) (quotient solvent accessible surface area) > 0.3 were defined as being surface residues and those with a Q(SASA) ≤ 0.3 defined as core residues. Here Q(SASA) is defined as the quotient of the SASA (solvent accessible surface area) and Surf (surface area of the isolated residue).

Putative PPI interface regions were predicted using SPPIDER II [17] with a balanced trade-off between sensitivity and specificity (SPPIDER estimates this based on a control data set of 149 protein chains with no sequence homology).

Site annotations, including ligand binding sites and modified residues, were obtained from UniProt [18].

Citing TITINdb

When using this tool in publication, please cite Laddach, A., M. Gautel and F. Fraternali (2017). "TITINdb-a computational tool to assess titin's role as a disease gene." Bioinformatics 33(21): 3482-3485.

References

  1. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature, 526(7571):68–74, 2015.
  2. Exome aggregation consortium (exac). [Online; accessed 9-November-2015].
  3. S. T. Sherry, M. H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. dbSNP: the NCBI database of genetic variation. Nucleic acids research, 29(1):308–11, 2001.
  4. C. Chauveau, J. Rowell, and A. Ferreiro. A rising titan: TTN review and mutation update. Human Mutation, 35(9):1046–1059, 2014.
  5. R. D. Finn, J. Clements, and S. R. Eddy. HMMER web server: Interactive sequence similarity searching. Nucleic Acids Research, 39(SUPPL. 2):29–37, 2011.
  6. K. D. Pruitt, T. Tatusova, G. R. Brown, and D. R. Maglott. NCBI Reference Sequences (RefSeq): Current status, new features and genome annotation policy. Nucleic Acids Research, 40(D1):130– 135, 2012.
  7. R. D. Finn, A. Bateman, J. Clements, P. Coggill, R. Y. Eberhardt, S. R. Eddy, A. Heger, K. Hetherington, L. Holm, J. Mistry, E. L. L. Sonnhammer, J. Tate, and M. Punta. Pfam: the protein families database. Nucleic acids research, 42(Database issue):D222–30, jan 2014.
  8. G. Crooks, G. Hon, J. Chandonia, and S. Brenner. WebLogo: a sequence logo generator. Genome Res, 14:1188–1190, 2004.
  9. E. W. Myers and W. Miller. Optimal alignments in linear space. Computer applications in the biosciences : CABIOS, 4(1):11–17, 1988.
  10. N. Eswar, B. Webb, M. A. Marti-Renom, M.S. Madhusudhan, D. Eramian, M. Shen, U. Pieper, and A. Sali. Comparative Protein Structure Modeling Using Modeller. John Wiley Sons, Inc., 2002.
  11. C. Notredame, D. G. Higgins, and J. Heringa. T-coffee: a novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302(1):205–217, 2000.
  12. Y. Zhang. I-TASSER server for protein 3D structure prediction. BMC bioinformatics, 9(1):40, 2008.
  13. D. E. V. Pires, D. B. Ascher, and T. L. Blundell. DUET: A server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic Acids Research, 42(W1):1–6, 2014.
  14. D. E. V. Pires, D. B. Ascher, and T. L. Blundell. mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics, 30(3):335–42, 2014a.
  15. A. González-Pérez and N. López-Bigas. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. American Journal of Human Genetics, 88(4):440–449, 2011.
  16. L. Cavallo, J. Kleinjung, and F. Fraternali. Pops: a fast algorithm for solvent accessible surface areas at atomic and residue level. Nucleic Acids Research, 31(13):3364–3366, 2003.
  17. A. Porollo and J. Meller. Prediction-based fingerprints of protein–protein interactions. Proteins: Structure, Function, and Bioinformatics, 66(3):630–645, 2007.
  18. The UniProt Consortium (2017). UniProt: the universal protein knowledgebase. Nucleic Acids Res., 45, D1:D158-D169.