N.B. if a population classification is not used by a database the value None is given.
|ARVC||arrythmogenic right ventricular cardiomyopathy|
|MmD-HD||multi-minicore disease with heart disease|
|LGMD2J||limb-girdle muscular dystrophy type 2J|
|TMD||tibial muscular dystrophy|
|HMERF||hereditary myopathy with early respiratory failure|
|MCA||myopathy with cytoplasmic aggregates|
1000 genomes variant data  was obtained from dbSNP . EXaC variant data was obtained from the ExAC webserver . Titin related disease nsSNVs were obtained from 'A rising titan: TTN review and mutation update' . Disease associated nsSNVs reported in the literature discovered since the publication of this paper were queried for on PubMed using the terms "("titin"[All Fields]) AND ("snp"[All fields])","("titin"[All Fields]) AND ("mutation"[All fields])" and "("titin"[All Fields]) AND ("variant"[All fields])".
HMMER  was used to scan the protein sequence of titin IC variant (NP_001254479.2, obtained from the RefSeq database  against Pfam seed libraries . Where hits overlapped the hit with the lowest E-value was accepted. When the lowest E-value hit for a region was greater than 0.0001 additional evidence was required to accept a hit. This was the case for domain Ig-94, which was identified with an E-value higher than the threshold 0.0001, but is verified by PDB structures 1WAA, 1TIU and 1TIT. Ig-88, Ig-89, Ig-90 and Ig-98 were also identified with E-values higher than the threshold; however when the titin sequence was scanned using an HMM created from an alignment of all (165) other titin Ig domains, these were identified with high significance (5.9E-11, 1.2E-12, 2.9E-10,7.4E-12).
Sequences logos created using Weblogo  (see Fig 1) showing aligned titin Fn3 sequences, differ substantially from such logos depicting Pfam seed alignments; in particular towards the end of the sequence where the conservation drops of gradually. Therefore the boundary does not appear to be clearly defined from sequence alone. When mapped onto structure it becomes clear that the Pfam defined boundaries do not cover the whole domain. Due to this information it was decided the Pfam Fn3 domain boundaries were not appropriate. Therefore Fn3 domains were initially identified using Pfam/HMMER and the sequences of these domains, including an extra 5 amino acids upstream and 16 amino acids downstream of the Pfam defined boundary, were aligned using T-coffee. This alignment was cut using structural information from available titin Fn3 crystal structures, in particular 3LPW. An HMM was created from this alignment and titin scanned again using this HMM to redefine Fn3 boundaries.
Stretcher  was used to align all titin isoforms to the IC variant. Isoform sequences were obtained from RefSeq. Positions were mapped according to these alignments.
An automated homology modelling pipeline was set up. The pipeline takes a fasta file of domain sequences as input and uses only publicly available PDB structures as templates. The overall modelling process can be seen in Fig 2A and a flow diagram detailing the template selection process is depicted in Fig 2B. The template search, modelling and model assessment were performed using Modeller ; the alignment of query and templates performed using 3DCoffee ; and the overall pipeline produced using Python 2.7.
The I-TASSER server  was used to model Ig-112 as a satisfactory (negative) z-dope score was not obtained using Modeller
The in silico assessment of known nsSNVs occurring within Ig and Fn3 domains was performed using DUET . The prediction of impact for all possible SAVs which localise to domain structures was carried out using mCSM . Where experimental structures were available these were used for the assessment. Where no experimental structures were available the model with the lowest zDOPE score was used. See the table below for experimental structures used in the in silico assessment of nsSNVs.
The assessment of the impact of SNVs on protein-protein interactions was performed using mCSM , where experimental binary complexes were available. See the table below for structures used for these calculations.
|domain||PDB structure||method||resolution||interacting protein|
Assessment of all nsSNVs was performed using the sequence-based method Condel .
Interface and core regions were defined using POPS . Residues with a Q(SASA) (quotient solvent accessible surface area) > 0.3 were defined as being surface residues and those with a Q(SASA) ≤ 0.3 defined as core residues. Here Q(SASA) is defined as the quotient of the SASA (solvent accessible surface area) and Surf (surface area of the isolated residue).
Putative PPI interface regions were predicted using SPPIDER II  with a balanced trade-off between sensitivity and specificity (SPPIDER estimates this based on a control data set of 149 protein chains with no sequence homology).
Site annotations, including ligand binding sites and modified residues, were obtained from UniProt .
When using this tool in publication, please cite Laddach, A., M. Gautel and F. Fraternali (2017). "TITINdb-a computational tool to assess titin's role as a disease gene." Bioinformatics 33(21): 3482-3485.