Other chapters in Help Me Understand Genetics. Genetics Home Reference has merged with MedlinePlus. Learn more. The information on this site should not be used as a substitute for professional medical care or advice. Contact a health care provider if you have questions about your health.
What is noncoding DNA? From Genetics Home Reference. Noncoding DNA contains many types of regulatory elements: Promoters provide binding sites for the protein machinery that carries out transcription. What is DNA? What is a gene? What is a chromosome? The information presented in this activity is not meant to serve as a guideline for patient management.
Tap the button to learn more about ObGFirst. Therefore, we are not responsible for the content or availability of this site. Toggle navigation. After completing this activity, the participant should be better able to: 1. The DNA molecule consists of two strands that wind around one another to form a shape known as a double helix. All The Genome Posts. The Genome and Whole Genome Sequencing. ObG Library Hysteroscopy Fertility. Already an ObGFirst Member?
Learn more. The information on this site should not be used as a substitute for professional medical care or advice. Contact a health care provider if you have questions about your health. Can changes in noncoding DNA affect health and development?
From Genetics Home Reference. Topics in the Variants and Health chapter What is a gene variant and how do variants occur? How can gene variants affect health and development? Do all gene variants affect health and development?
What kinds of gene variants are possible? Can a change in the number of genes affect health and development? Can changes in the number of chromosomes affect health and development? This map also enabled the development of tools with unprecedented capacity to survey genomes at population scale. Microarray technology interrogation of common single-nucleotide polymorphisms SNPs revolutionized our understanding of genetic inheritance patterns through the hapmap project 1 Fig.
DNA sequencing costs blue over time compared to the number of publications containing specific phrases in PubMed. Some key events in genomics are shown in green. Furthermore, epistatic interactions, where coinheritance of two or more variants could more adequately explain heritability, continues to be difficult to estimate due to computational limitations 3 , 4. At the same time, next-generation sequencing was poised to further revolutionize approaches to not only survey the genome, but also redefine the understanding of how the genome behaved and the diverse mechanisms through which genetic disease could manifest.
Exome sequencing, where the coding regions of genes are enriched from DNA and sequenced, has allowed the direct measurement of variance at the genetic level. It has provided clinicians and researchers base-scale resolution of the coding genome, giving rise to a quantum leap in the resolving power of associating genetic variation directly to altered protein. The scalability and relatively low cost has resulted in large databases characterizing and annotating the variation in the coding genome such as ExAC 5.
But like many technological advances, exome sequencing has exposed its own limitations, namely technical artefacts of the DNA capture used and overdependence on extant genome annotations. This latter limitation proved significant as the emergence of exome sequencing coincided with the observation of the pervasively transcribed genome 6 and the rise of lncRNAs as important functional transcripts, which for the most part were overlooked by the approach.
Importantly, the understanding that the majority of informative variants identified by GWAS occurred within the noncoding genome, and a shift in how the genome was known to encode function through transcribed noncoding regulatory RNAs gave rise to many theories regarding the genetic basis of disease, particularly in providing an explanation for the source of missing heritability.
Broadly, theories explaining missing heritability fell into two areas— 1 that variants in regulatory DNA sequences such as promoters, enhancers, and structural elements and regions encoding regulatory RNAs were responsible or 2 that large numbers of individual genetic features, potentially with complex interactions, contributed collectively to inherited traits. These technical advantages in analyzing coding regions alone enabled improved diagnostic yield of genome sequencing and has led to growing numbers of whole-genome clinical sequencing services worldwide.
Whole-genome sequencing consortia producing large databases of genomic variation, such as GnomAD 5 , , genomes 7 , and the Million Veterans Project 8 Fig.
Therefore, observed noncoding variants as well as protein-coding variants of unknown significance are increasing and there is potential now to advance the use of the noncoding genome to improve clinical diagnostic rates of genetic disease 9.
When the human genome project was completed, the implications of the complexity evident in the noncoding genome were staggering After more than a decade of research, considerable advances have been made in understanding how the genome instructs the development and function of organisms and it is increasingly pertinent that this knowledge is harnessed to maximize diagnoses in clinical genomics practice.
This poses the challenge of effectively developing variant filtering algorithms that narrow the search space for variants to regions where pathogenicity can be most clearly determined, i.
Typical approaches for interpreting clinical genomes involve reducing a genome down to rare coding variants with the appropriate inheritance patterns in a gene list of interest.
This approach typically yields a handful of variants for consideration. The proliferation of these tools have led to aggregator services such as VarCards 17 that allow multiple scores for a given variant to be interrogated in one place. A clinical molecular geneticist, molecular genetic pathologist, or other certified professional can interpret these data to assign a likely causal variant Fig.
If a candidate is not apparent from these approaches even in cases where there is a strong genetic component, a diagnosis becomes difficult since biochemical testing of variants of unknown significance is not feasible in a typical pathology laboratory setting and may not be considered to be cost-effective. Furthermore, although expanding the search space to include more variants increases the number of candidates, there is typically insufficient evidence to associate any particular variant with the phenotype.
Efforts worldwide are attempting to expand the annotation of the genome beyond the pure coding and to better understand how variations in these regions can have biological impact to expand the understanding of genetic basis of disease 18 and to thus fully realize the clinical utility of the whole genome. The interpretation of disease-associated variation at the level of the gene is undergoing a shift in understanding. However, these kinds of mutations have been shown to be relatively common, even in healthy genomes Furthermore, these variants can be difficult to interpret in a clinical setting if the mutation occurs in a region not previously reported, or in a gene whose function within the context of the disease in question has not been investigated It is also becoming apparent that mutations that do not affect the encoded amino acid synonymous can affect gene products in the context of codon frequency and RNA structure 21 , Furthermore, the concept of multiplicity, where gene expression can be impacted by combinations of genetic alterations 23 is only starting to be addressed.
This implies that the even annotation of coding variants is far from complete. It is also important to note that the coding proportion of a gene comprises a small percentage of the genetic information encoded by the locus and that alterations in the noncoding sequence can have impact on gene function Fig.
Variation at imprinted loci can drive the deposition of epigenetic marks responsible for imprinting 26 , which can lead to aberrant expression. Introns can similarly contain important genetic information that can be influenced by mutation 29 , e. Together, these investigations show that a significant proportion of the clinically relevant genetic information elucidated by whole-genome sequencing is not typically interpreted in diagnostic laboratories.
Ever since the first observation of the pervasively transcribed genome more than a decade ago, there has been an explosion in the identification and functional characterization of long noncoding RNA lncRNA 31 and other noncoding transcript types As the vast majority of transcribed species of the genome are noncoding, of which little is still known 31 , efforts are ongoing to describe the detail and regulation of noncoding RNA.
LncRNAs are of particular interest to the field of clinical genomics as their exquisite tissue-specific expression and regulatory behavior 34 indicate that a role in disease will become apparent as more is understood about lncRNA biology. The large-scale GTEx project 38 has set out to further understand the genetic drivers of tissue-specific gene expression via expression quantitative trait loci eQTL analysis.
Large-scale screens for noncoding RNA function have elucidated functional annotations for thousands of lncRNAs 39 and molecular tools tailored to the unique biology of lncRNA behavior are ongoing These efforts have enhanced the understanding of gene transcription and hint at a complexity that requires expanded resolution of functional annotation at the genetic level to inform interpretation in a clinical diagnostic setting.
Traditional indicators of functionality and thus of potential clinical utility , such as conservation, have thus been challenged by this expanding annotation of the genome. The volume of available data has fueled recent computational efforts to annotate functional parts of the genome without necessarily depending exclusively on the coding genome Table 1. Newer approaches have used genome-wide data itself to assign functional importance, either through association with DNA binding proteins Eigen 43 , or direct measures of resistance to variation Orion 44 , to provide comprehensive maps of coding and noncoding regions likely to be impacted by variation.
These maps are expanding the pool of potentially clinically relevant variants and continue to evolve with growing interest and innovation. The physical arrangement of the genome is also critical to homeostasis.
Copy-number alterations are associated with many diseases, but can also have no pathogenic effect 45 , The study of disease-associated genomic translocations has typically focused on the generation of gene fusions, which are particularly clinically relevant in cancer However, studies of intergenic translocations can also perturb local gene expression, possibly by interrupting chromatin looping and by rearranging regulatory sequence 48 , 49 , Moreover, chromatin looping 51 and nucleosome occupancy 52 are also susceptible to alteration by DNA mutation and structural rearrangement.
Indeed the interplay between the physical state of the DNA appears to be intimately associated with the process of gene expression 59 and transcription factor binding Importantly, it was recently shown that disease-associated variations that disrupt G-quadruplex formation in RNA can affect post-transcriptional regulation of genes 27 , suggesting that variants in structural features can directly impact cellular function.
The prevalence of intergenic, disease-associated SNPs from GWAS studies provoked diverse studies into how these variants were contributing to disease, revealing impacts on DNA conformation 51 , DNA-protein interactions 61 , and epigenetic marks Recent application of RNA-capture sequencing 63 to haplotype blocks associated with GWAS disease-associated SNPs revealed a multitude of transcripts of which less than half were in extant transcript databases Combined with fine mapping of SNPs associated with breast cancer, this approach revealed enhancer alterations affecting novel transcript expression These studies raise the possibility of direct and indirect impacts of disease-associated SNPs on tissue-specific transcription patterns and illustrate that both the resolution of disease-associated variants and genome annotation remain incomplete.
The ongoing accumulation of whole-genome data worldwide will eventually resolve the exact disease associations and a greater understanding of the noncoding transcriptome will continue to provide context for elucidating the impact of these variants In a similar vein, pseudogenes have classically been regarded as nonfunctional byproducts of retrotransposition With the observation of transcription and evidence of disease linkage 67 , pseudogene biology is being revisited, however, consensus as to a generic biological role has not yet been reached 68 , Indeed, the process of retrotransposition itself in shaping the genome is undergoing a renaissance through evidence of gene regulatory roles In , the American College for Medical Genetics ACMG described a set of evidence lines that could be used to ascribe degrees of pathogenicity to a particular variant Importantly, these recommendations sought to distinguish deleterious impacts on a gene from contribution to disease.
Predicting the impact of coding variation is a more mature process, especially in the case of missense and nonsense mutations. Tools like PolyPhen and VEP are commonly used to estimate genic pathogenicity, although the likely impact of the variant can be open to interpretation.
Evidence for disease contribution is usually achieved by cross-referencing rare variants with lists of genes with known roles in the disease of interest, reports in the literature, and clinical databases such as COSMIC and ClinVar. The point at which there is sufficient evidence of a variant causing a disease is becoming refined However, due to the complexities in the WGS data, interpretation, and phenotyping, associations can be subject to how the data are evaluated by genetic professionals and can still require in vitro testing.
Including non-protein-coding into this framework would require extra complexity predominantly due to the lack of functional data to support impact of a particular variant with precision, given the ongoing genome annotations outlined above Fig. However, noncoding variants can clearly be clinically relevant and their inclusion into clinical genomics frameworks is necessary for realizing the full clinical utility of genomic information.
Assigning variants red arrows at a hypothetical locus where protein-coding transcripts blue , lncRNA green , and regulatory regions magenta are incorporated. The clinical interpretation of variants typically begins strictly as an informatics exercise where variants are filtered and ranked according to likelihood of clinical trait association. One of the earliest steps is to omit variants that are noncoding, which in the light of the evidence outlined above may miss vital insights into the molecular basis of a disease.
While less data is available for accurately calculating variant frequency in noncoding regions, growing whole-genome reference databases are now available for this purpose. These annotations can then be interpreted alongside existing lines of evidence within the context of disease. The primary paradigm shift required by these additions to clinical genome interpretation workflows will be the expansion of the concept of what part of the genome constitutes a gene.
Impacts on a specific gene function can theoretically occur anywhere within the genome. This represents a currently insurmountable computational obstacle for the same reason that epistasis remains an intractable issue in genomics.
However, splicing and promoter variations are directly linked to genes and are currently well annotated. For this reason, we propose that variants occurring at splice sites and branch points as well as promoters annotated by ENCODE should be included in clinical genomics where they occur in disease relevant genes. We expect that a more inclusive approach to impacts on gene function will facilitate an improved picture of the clinical landscape, particularly in the case of disease with strong evidence of inheritance where no coding candidate can be found.
For example, a promoter variant may be the second-hit in a recessive heterozygous locus leading to total loss of a gene product.
0コメント