Brief project description:
As part of this project, human cybrid cell lines carrying a mixture of neutral mtDNA haplotypes differing in multiple nucleotide positions but not harboring pathogenic variants were generated. The aim of this study was to investigate the presence of heteroplasmy selection and intermolecular mtDNA recombination in humans in vitro.
To generate cybrids containing a mixture of two mtDNA haplotypes, human 143B.TK- cells devoid of mtDNA (ρ0) were fused with enucleated cells serving as mitochondrial donors (cytoplasts). Six combinations of four mtDNA haplotypes (A, B, C, and D) were used, varying in genetic distance measured by the number of nucleotide differences. The nucleotide sequences of the studied mtDNA haplotypes were deposited in the GenBank database under accession numbers PQ468425 (A), PQ468426 (B), PQ468427 (C), and PQ468428 (D).
Individual cell clones were screened for the presence and level of mtDNA heteroplasmy. mtDNA heteroplasmy was monitored over successive passages during cell culture to assess the occurrence of selection, its direction, and rate. Preliminary assessment was performed using Sanger sequencing of a selected mtDNA region (a fragment of the non-coding control region characterized by the highest variability). Detailed analysis of heteroplasmic cybrid lines was subsequently performed using high-throughput whole mtDNA sequencing using a method developed and implemented in our group's research on the MiSeq platform (Illumina) (https://doi.org/10.1016/j.exer.2018.10.004). The occurrence of intermolecular recombination in mtDNA in human cell culture in vitro was assessed by direct sequencing of native mtDNA molecules using Nanopore sequencing technology.
The dataset consists of the following zip archives:
Sanger_data.zip – DNA traces/chromatograms from Sanger sequencing (*.ab1) of selected mtDNA region for individual cybrid cell clones at different passages. The files can be opened using Sequencher software or any other similar bioinformatics software for Sanger data analysis.
Illumina_FASTQ_files.zip – raw sequencing data as FASTQ files (*.fastq.gz) from next-generation sequencing (NGS) of mtDNA from heteroplasmic cybrid cell lines at different cell passage points. The files can be opened using CLC Genomics Workbench software, any other similar bioinformatics software, or open-source bioinformatics tools for NGS data analysis.
Illumina_BAM_files.zip – mapped reads as BAM files and their corresponding index files (*.bam and *.bam.bai) generated from the bioinformatics analysis and mapping of mtDNA NGS data to the human mtDNA reference genome, rCRS (NC_012920). The files can be opened using CLC Genomics Workbench software, any other similar bioinformatics software, or open-source bioinformatics tools for NGS data analysis.
Illumina_variant_lists.zip – Lists of mtDNA variants (SNVs and indels) identified from the bioinformatics analysis of mtDNA NGS data (*.xlsx). The files can be opened with MS Excel or any other spreadsheet software.
ONT_data.zip – raw sequencing data as FASTQ files (*.fastq.gz) of long reads from direct sequencing of native mtDNA molecules using Oxford Nanopore Technology (ONT). The files can be opened using CLC Genomics Workbench software, any other similar bioinformatics software, or open-source bioinformatics tools for NGS/ONT data analysis.
File naming instructions:
Sanger files: e.g. F1_A_B_6_P0_M13_uni_21.ab1
- F1_ – Fusion experiment number (1–3)
- A_B_ – mtDNA haplotypes (A–D) = cybrid cell line
- 6_ – Clone number
- A3_ – Daughter line number (A3, E4, or F6)
- P0_ – Passage number
- M13_uni_21 – Universal sequencing primer for Sanger sequencing
Illumina files: e.g. 16_A_C_5_A3_P3_S17_L001_R1_001.fastq.gz, 16_A_C_5_A3_P3_S17_L001.bam, 16_A_C_5_A3_P3_S17_L001.bam.bai, 16_A_C_5_A3_P3_S17_L001.xlsx
- 16_ – LR-PCR amplicon 16 kb or AB (for details, please see https://doi.org/10.1016/j.exer.2018.10.004)
- A_C_ – mtDNA haplotypes (A–D) = cybrid cell line
- 5_ – Clone number
- A3_ – Daughter line number (A3, E4, or F6)
- P3_ – Passage number
- S17_ – Sample number based on the order of samples in the sample sheet
- L001_ – Flow cell lane number
- R1_ – Read number in paired-end sequencing run (1 or 2)
- 001 – Constant segment in FASTQ files.
Nanopore files: e.g. mtDNA_A_C_5_A3_ont_2kb_q10.fastq.gz
- mtDNA_ – Mitochondrial DNA
- A_C_ – mtDNA haplotypes (A–D) = cybrid cell line
- 5_ – Clone number
- A3_ – Daughter line number (A3, E4, or F6)
- ont_ – Oxford Nanopore Technology
- 2kb_ – Data were filtered to include reads longer than 2 kb
- q10_ – Data were filtered to include reads with a quality score greater than 10.