What is our project? The R-L21 South Irish Project was created to better understand the yDNA results that had previously been identified by Dr. Kenneth Nordvedt as the South Irish base haplotype1. When the CTS4466 SNP was identified, I isolated the STR signature of those who tested 4466+. Remarkably, Dr. Nordvedt's original STR signature for both his primary and secondary markers was well within the range. Each new 4466+ member is reviewed to locate changes in the primary and secondary marker STR signatures which are then updated.
Kathleen Kerwin, FTDNA volunteer admin for the South Irish, Eoganacht septs, Sullivan and TMRCA Case Study projects.
Current 4466 STR Signature:
I currently identify those who are 4466+ by finding the range of those who test 4466+ for the set of primary and secondary markers. I took Dr. Nordvedt's original list of primary and secondary markers and combined them into my current primary set: 390 >=22<=24; 391=10; 385=11-11 to 12-15; 439 >=10<=12; 458 >=16<=18; 447 >=22<=25; 449 =>27<=30; GATA H4 >=10<=12; 456 =>15<=16; 442 >=12<=14; 565>=9<=12.
The secondary set: 393=13; 19=>14<=15; 426=12; 388=>11<=12; 389i=>12<=14; 392=<12<=13; 389ii=>28<=31; 459=9-9,9-10,10-11; 455=>10<=12; 454=>11<=12; 437=>14<=16; 448=>18<=19; 464 = 14-15-16-17 to 15-17-17-17 not 13; 460=10-12; YCAII=19:23 to 21:23; 607=>13<=16; 576=>16<=20; 570=>16<=20; CDY=34-37 to 37-39; 438=12; 531=11; 578=9; DYF395S1=13-16 to 16-16; 590=8; 537=10; 641=>10<=11; 472=8; DYF406S1=>10<=11; 511=>8<=12; 425=12; 413=21:23 to 23:23; 557=>15<=17; 594=10; 436=12; 490=12; 534=>15<=16; 450=8; 444=>11<=13; 481=>18<=23; 520=>19<=21; 446=>12<=15; 617=12; 568=11; 487=>11<=13; 572=11; 640=11; 492=>12<=14.
Short synopsis of research developments: Using Dr. Nordvedt's South Irish base haplotype markers, I over laid them on to the Western Atlantic Modal Haplotype (WAMH). I compared our member's yDNA STR markers (STR signature) against this standard. At this point I didn't know how many mutations from this standard would identify those who belonged in the haplotype until the CTS4466 SNP was discovered.
Many FTDNA members took the CTS4466 SNP test providing the needed information to identify the 4466 STR signature range. Each new 4466+ potentially changes the range. With the work of Dave Reynolds and Chris Morley on the Geno 2.0 SNP phylogenetic tree, additional 4466+ SNPs were discovered: CTS5714, CTS3974, CTS8358, F2517, Z454, PF112, L247, and L270.
Using Dr. Anatole Klyosov's methods to determine TMRCA calculations2, I built phylogenetic trees with sub branches of those whose yDNA STRs were within this 4466+ STR signature. I noticed that the 4466 critical sub SNPs were scattered throughout the tree. I would have thought that they would automatically cluster based on their phylogenetic order3. Researcher Susan Hedeen provided the key to better organizing the tree. She said those testing for certain sub SNPs should be first separated in their own phylogenetic sub tree. My first attempt in separating out yDNA by 4466+ critical SNPs, showed that the STR signatures of these sub branches were much better ordered by marker patterns. In other words, they were most likely more closely related and will most likely a TRMCA calculation in the reasonable range (within 300 years or better). Previous sub branches, without being separated by sub SNPs, showed a much greater combination of marker patterns. I had gained much knowledge by fully building out at least 10 phylogenetic trees with sub branches with TMRCA's calculated for each.
At this point I realized that I needed to clearly identify all 4466+ critical sub SNPs in each member that had taken the Geno 2.0 test before knowing for certain what the 4466+ sub SNP phylogenetic hierarchy really looked like.
I scrutinized the FTDNA Geno 2.0 SNPs and found the data had too many 'blanks' for test results that left it entirely unknown whether the member had tested + or - for the specific critical SNP. Reviewing the Geno 2.0 raw data showed that many of the critical SNPs actually had not tested and had a '-' or no call, equivalent to not successfully being tested. Each of these '-' for the 4466 critical SNPs for all 4466+ Geno members needs retesting, by an individual SNP test process, before the 4466 sub SNP hierarchy can be clearly identified and relied upon by our research and by the member himself.
FTDNA is the lab that processes the Geno 2.0 lab kits. There are approximately 12,000 yDNA SNPs that are tested. FTDNA accepts a 3% failure rate which means that up to 360 SNP tests can fail without being considered an issue. If any of these 360 SNP tests fail for critical SNPs identified for a base haplotype, this is a 100% failure for those researching the base haplotype, as well as the member who is relying on the test outcome. This leaves each member needing to request individual retesting for all 4466+ critical SNPs that show a '-' or 'no call' in their Geno 2.0 raw data results. This is the current bottleneck in our research development.
1Dr. Ken Nordvedt: see http://mysite.verizon.net/timdesmond/files/dna_southirish.htm for the list of initial primary and secondary markers
2Dr. Anatole Klyosov: I found Dr. Klyosov's TMRCA calculations extremely useful because they have checks and balances that provide feedback whether the yDNA results included are within a close enough range to give a viable TRMCA calculation. He also provides details on building phylogenetic trees to order yDNA prior to inputting into the TMRCA calculation. The result of using Anatole's work is that I know when I need to increase the quality of the yDNA results to gain TMRCA values within the reasonable range. When my research list is outside this reasonable range, I go back to the start and re-evaluate. Each time I have re-evaluated and reselected my research list or built out more knowledge of the list, I have get closer to a viable TMRCA calculation or understand where the currents limits can be made to be more successful. The main inhibitor is not having enough yDNA results with the current 4466+ STR signature.
3Phylogentic tree order: I have learned many lessons about using STRs and SNPs by reviewing the research list in phylogenetic order. Every male in the world fits into a single yDNA phylogenetic tree. However, each male's SNPs provide the migration path in which his ancestors belong. A similar STR signature can take diversely different migration paths and not be closely related.
Counting mutations between these similar yDNA STR signatures may provide inaccurate information and FTDNA provides statistical analysis to account for this. FTDNA refers to mutation counting as 'close matches'. I have found that these close matches are really in a 'close pool of possible matches'. FTDNA uses statistical analysis means of predicting whether the mutation counting may be relevant or not. I have found that mutation counting does not take into consideration the complex combinations of marker patterns or SNPs so the statistical analysis for relevance has a very high hurtle to cover both. Upon viewing at least 10 phylogenetic trees of 400-600 yDNA results, I have found even a single mutation can be in a neighboring sub branch or even several sub branches away.
The reason I mention counting 'close matches' is because it is relevant to understanding the relationship between yDNA results. Before building phylogenetic trees of the research results, one of the first and best tools should be a complete SNP analysis. SNPs occur randomly in a single male. All descendants of this male will have this SNP. If brothers took a different migration path, their STR signatures may look similar. If one or both developed a SNP, their yDNA signature (STR signature and associated distinguishing SNPs) will be different. When STR signatures in 2 or more yDNA results are very similar but their distinguishing SNPs are dissimilar, this is known as convergence. For this very reason, it is critical to identify all 4466+ sub SNPs so that each member testing for the Geno 2.0 SNPs is properly placed in their correct phylogenetic sub tree according to their SNPs hierarchy first. The priniciple reason I believe the SNP hierarchy is critical is because of my experience in building the 4466+ phylogenetic tree and seeing that the 4466+ critical SNPs were scattered throughout the tree. They didn't naturally fall into sub branches based on the sub SNPs and STR patterns. Each member who tests for the full list of 4466+ critical SNPs will be pulled into into their own sub branches based on the positive SNP hierarchy prior to building the phylogenetic tree.
After a complete SNP analysis, the research list can then be placed properly into a phylogenetic tree and sub branches.