We consider repeats with motif lengths between 1-6bp. E.g. AAAAA, ACACACACAC, CAGCAGCAGCAG, TATCTATCTATCTATCTATC, TTTTGTTTTGTTTTGTTTTG, AAAAACAAAAACAAAAAC would all fall under this definition. The set of STRs used by WebSTR was obtained from the HipSTR hg19 reference STR file.
We have helped develop lobSTR and HipSTR and GangSTR for genome-wide genotyping of STRs from next-generation sequencing data. Earlier work (mutation rates and constraint) are based on genotypes obtained using lobSTR. More recent work (GTEx eSTRs, imputation results) are based on genotypes obtained using HipSTR and GangSTR.
Yes, if allele frequency statistics are available they are displayed on the locus level page for several cohorts consisting of European, African, and East Asian ancestry.
We would love to include your dataset! Contact mgymrek AT ucsd DOT edu to discuss adding summary level STR statistics or allele frequencies for a different cohort to the site.
Yes, we provide all the neccessary data files on request if you would like to do it. Corresponding instructions can be found on Github for frontend and backend tier of WebSTR.