Mapping multimodal phenotypes to perturbations in cells and tissue with CRISPRmap

Cell lines and cell culture

HEK293FT cells (Thermo Fisher Scientific, R70007) were cultured in DMEM (Gibco, 11965092) supplemented with heat-inactivated 10% FBS (American Type Culture Collection (ATCC), 30–2020) and 100 U ml⁻¹ penicillin–streptomycin (Thermo Fisher Scientific, 15140163). MCF7-BE3 cells were cultured in the same medium supplemented with 2 μg ml⁻¹ blasticidin (Thermo Fisher Scientific, A1113903). HT1080–Cas9 AAVS1 (GeneCopoeia, SL512) cells were cultured in the same medium supplemented with 200 μg ml⁻¹ Hygromycin B Gold (Invivogen, ant-hg-2). OE19–BFP cells were cultured in RPMI 1640 medium (ATCC, 30–2001) supplemented with 10% heat–inactivated FBS, 100 U ml⁻¹ penicillin–streptomycin, 1× GlutaMAX supplement (Thermo Fisher Scientific, 35050079) and 2 μg ml⁻¹ blasticidin. IMR-90 (ATCC, CCL-186) cells were cultured in EMEM (ATCC, 30–2003) supplemented with heat-inactivated 10% FBS and 100 U ml⁻¹ penicillin–streptomycin. Rockefeller University Embryonic Stem Cell Line 2 (RUES2, passages 24–32) were maintained on mouse embryonic fibroblasts (MEFs) (Thermo Fisher Scientific, A34180) and plated at 22,500 cells per cm². Cells were cultured in hESC maintenance media (DMEM/F12 (Thermo Fisher Scientific, 11320033), 20% knockout serum (STEMCELL Technologies), 0.2% Primocin (InvivoGen, ant-pm-05), 0.1 mM β-mercaptoethanol (Sigma-Aldrich, M6250), 20 ng ml⁻¹ FGF2 (R&D Systems, 233-FB) and 1% GlutaMAX). The medium was changed daily. hESCs were passaged every 3–4 d with Accutase (Innovative Cell Technologies, AT-104), washed and replated at a dilution of 1:24. Cultures were maintained in a humidified 5% CO₂ atmosphere at 37 °C. Lines are karyotyped and verified for mycoplasma contamination using PCR every 6 months. hESCs were infected with the virus for 24 h in hESC medium supplemented with polybrene and puromycin selected. Before analysis, MEFs were depleted by passaging 5–7 × 10⁵ hESCs onto Matrigel (Corning, 354277, dilution 1:15)-coated flat glass-bottom 96-well plates. Cells were maintained in hESC medium in a humidified 5% CO₂ atmosphere at 37 °C. For human iPSC and iMN experiments, we used reference Wt line KOLF2.1J (ref. ³¹) (a gift from Christopher Ricupero). iPCSs were maintained on Matrigel (Corning, 354277, dilution 1:100)-coated plates in mTeSR Plus media (STEMCELL Technologies, 100–0276), supplemented with Y-27632 (ROCKi, 10 µM, Selleckchem, S1049) during thawing, passaging and viral transduction. Passaging was performed using Accutase (Thermo Fisher Scientific, A1110501). iPSCs were transduced with the CRISPRmap library during passaging by adding viral supernatant to polybrene (8 μg ml⁻¹, Sigma-Aldrich, TR-1003-G)-supplemented media at various dilutions. Forty-eight hours later, transduced cells were selected with puromycin (1 µg ml⁻¹, Thermo Fisher Scientific, A1113802) for 3 d. For optical barcode detection in iPSCs, iPSCs were dissociated into single cells and plated on polyethylenimine (PEI, Sigma-Aldrich, 408719)-coated 96-well plates for imaging in mTeSR Plus with ROCKi at 10,000 cells per well. ROCKi was maintained before fixing, preventing tight colony formation to simplify cell segmentation during image analysis. For coating, 96-well plates were incubated at 37 °C overnight with PEI (250 µg ml⁻¹, 50 µl per well) and then washed at least three times with 200 µl per well PBS.

iPSC to iMN differentiation was carried out as previously described^32,33. In brief, on day 0, iPSCs were dissociated to single cells with Accutase and resuspended in N2B27 differentiation media (1:1 Advanced DMEM/F12 and Neurobasal media (Life Technologies, 12634010 and 21103049)), GlutaMAX (1%, 35050061), β-mercaptoethanol (0.1%, Sigma-Aldrich), N-2 (1%, Thermo Fisher Scientific, 17502048), B-27 (2%, Thermo Fisher Scientific, 17504044) and ascorbic acid (10 μM, Sigma-Aldrich, A4403), supplemented with ROCKi (10 µM), FGF2 (10 ng ml⁻¹, PeproTech, PHG0263), CHIR99021 (CHIR, 3 µM, Tocris, 4423), SB 431542 hydrate (SB, 20 μM, Sigma-Aldrich, S4317) and LDN193189 (LDN, 100 nM, Stemgent, 04-0074) at a density of 50,000 cells per milliliter on ultra-low adhesion dishes to promote embryoid body (EB) formation. On day 2, media were replaced, supplemented with CHIR (3 µM), SB (20 µM), LDN (100 nm), all-trans retinoic acid (RA, 100 nM, Sigma-Aldrich, R2625) and smoothened agonist (SAG, 500 nM, Millipore, 566660). On day 4, media were replaced and supplemented as on day 2. On day 7, media were replaced and supplemented with RA (100 nM), SAG (500 nM) and BDNF (10 ng ml⁻¹, PeproTech, 450-02). On day 9, media were replaced and supplemented with RA (100 nM), SAG (500 nM), BDNF (10 ng ml⁻¹) and DAPT (10 µM, Selleckchem, S2215). On day 11, media were replaced and supplemented as on day 9. On day 14, media were replaced and supplemented with RA (100 nM), SAG (500 nM), BDNF (10 ng ml⁻¹), DAPT (10 µM) and GDNF (10 ng ml⁻¹, R&D Systems, 212-GD-050). On day 16, EBs were dissociated to single cells by trituration with 0.05% trypsin (Life Technologies, 25300054). Dissociated MNs were resuspended in hMN maintenance media (Neurobasal media, GlutaMAX (1%), NEAA (1%, Life Technologies, 11140050), β-mercaptoethanol (0.1%), N-2 (1%), B-27 (2%), ascorbic acid (10 μM), BDNF (10 ng ml⁻¹), GDNF (10 ng ml⁻¹), CNTF (10 ng ml⁻¹, PeproTech, 257-NT-050), IGF-1 (10 ng ml⁻¹, PeproTech, 291-G1), RA (1 µM) and adarotene (1 µM, MedChemExpress, HY-14808)) and plated on PEI-coated 96-well plates for imaging at 10,000 cells per well.

After CRISPRmap amplicon generation, cell type validation was performed on iPSCs and iMNs by immunostaining using anti-SOX2 (iPSC marker, 1:200, Thermo Fisher Scientific, 14-9811-82), anti-OCT4 (iPSC marker, 1:200, Cell Signaling Technology, 2840) and anti-NeuN (iMN marker, 1:200, Millipore Sigma, MAB377) antibodies. Likewise, cell type validation was performed on hESCs using anti-SOX2 (hESC marker, 1:200), anti-OCT4 (hESC marker, 1:200) and anti-Nanog (hESC marker, 1:200, Cell Signaling Technology, 4893) antibodies.

Library design and cloning

GFP-targeting guides in the GFP-targeting CRISPRmap knockout screen library (referred to as GFP-pilot) were designed with CRISPick³⁴ by selecting the top five recommended candidates in the CRISPRko mode that target the copGFP sequence. The library also contains five NTC guides that lack targets in the human genome. Each guide was combined with a universal scaffold sequence and a pair of guide-specific CRISPRmap barcode sequences. Universal 5′ and 3′ homology sequences were then added to facilitate NEB HiFi assembly into the expression vector. Full-length GFP-pilot library sequences are shown in Supplementary Table 5. Base editing guides in the DDR screen library (referred to as DDR364) were selected from the base editing screens as previously described⁷. The library contains 162 missense guides and 50 nonsense guides with a single C base in the editing window (4th to 8th base in the guide targeting sequence), 80 splice-donor or splice-acceptor (referred to as splice) guides, 35 guides targeting the AAVS1 safe-harbor site and 37 NTC guides that have minimal targets in the human genome. All the selected missense, nonsense and splice guides have false discovery rate (FDR) < 0.05 in at least one treatment in the previous screen⁷. Similarly, each guide was combined with the scaffold, CRISPRmap barcode and homology sequences. Sequences are shown in Supplementary Table 6. Both libraries were ordered as synthesized oligo pools (Integrated DNA Technologies) and PCR amplified with Q5 DNA polymerase (New England Biolabs, M0492) using an optimized two-round amplification strategy to minimize barcode–sgRNA recombination³⁵. In brief, oligo pools were diluted in ultrapure water (Thermo Fisher Scientific, 10-977-023); 1 pg of total DNA was added to each 50-μl Q5 reaction mix to perform the first-round amplification of 15 PCR cycles; and 0.5 μl of PCR product from each 50-μl first-round reaction was then added to each 50-μl Q5 reaction mix for the second-round amplification of 10 cycles. Final PCR product was purified with DNA Clean & Concentrator (Zymo Research, D4013). The primer pairs CRISPRmap-F and CRISPRmap-R in Supplementary Table 7 were used in both rounds. Amplified oligo pools were cloned into a modified CROPseq-puro-v2 (Addgene, 127458) vector that removed the original scaffold sequence (referred to as CRISPRmap-CROPseq) using NEBuilder HiFi DNA Assembly (New England Biolabs, E2621). Next, we electroporated into MegaX DH10B electrocompetent cells (Thermo Fisher Scientific, C640003). An average number of 300 colonies per guide was maintained to preserve the relative abundance of guides in the library. Bacterial colonies were scraped and pooled for plasmid extraction (Zymo Research, D4212).

Lentivirus production and titer determination

293FT cells were seeded into six-well tissue culture–treated plates at a density of 100,000 cells per cm². After 24 hours, cells were transfected with pMD2.G (Addgene, 12259), psPAX2 (Addgene, 12260) and CRISPRmap library plasmid (2:3:4 ratio by mass) using Lipofectamine 3000 (Thermo Fisher Scientific, L3000001) in Opti-MEM (Thermo Fisher Scientific, 31-985-070), supplemented with 5% FBS. Media were exchanged after 6 h and supplemented with 1.5 mM caffeine (Sigma-Aldrich, C0750) to increase viral titer. Viral supernatant was harvested at 24 h and 48 h after transfection, filtered through 0.45-μm cellulose acetate filters (Corning, 431220) and stored in a −80 °C freezer in aliquots.

Lentiviral titer was determined by the colony formation assay to control the MOI in downstream studies. In brief, 10-fold serial dilutions of the lentivirus stock were prepared in complete DMEM containing 8 μg ml⁻¹ polybrene. In total, 10,000 cells were seeded into each well of a six-well plate. A total volume of 1 ml of diluted lentivirus was added to each well for 48 h. Cells were then cultured in complete DMEM supplemented with appropriate antibiotics for 14 d, and media were changed every 3 d. Cells were fixed and stained with 0.1% crystal violet (Sigma-Aldrich, V5265) for 10 min at room temperature and washed three times with PBS. Colonies on each well were counted, and the transduction units per milliliter (TU/ml) was calculated as follows: TU/ml = number of colonies / total volume in the well (ml) × dilution factor.

Fluorescence microscopy

All imaging datasets were acquired using a confocal spinning disk microscope (Andor Dragonfly) coupled to a Nikon Ti-2 inverted epifluorescence microscope with automated stage control, Nikon Perfect Focus System and a Zyla PLUS 4.2-megapixel USB3 camera. Illumination was done with 100 mW 405 nm, 50 mW 488 nm, 50 mW 561 nm, 140 mW 640 nm and 100 mW 785 nm solid-state lasers. All hardware was controlled using Andor Fusion software. Lasers, laser powers, exposure times, objectives and experiment-specific acquisition parameters are summarized in Supplementary Tables 5 and 6. Images were acquired with four z-slices at 1.5-μm intervals for the cultured cells and with six z-slices at 1.5-μm intervals for the tissue sections unless otherwise specified.

Oligonucleotide fluorophore conjugation

In each 10-μl reaction, 2 μl of 0.5 mM 5′ amine-modified DNA probes (Integrated DNA Technologies) was mixed with 1 μl of 10 mM ATTO488-NHS ester (ATTO-TEC AD, 488-31), ATTO 643-NHS ester (ATTO-TEC AD, 643-31) or CF568 succinimidyl ester (Sigma-Aldrich, SCJ4600027) in 1× BBS (Thermo Fisher Scientific, 28384), pH 8.5, and incubated at room temperature for 4 h. Fluorophore-conjugated DNA probes were purified with Oligo Clean & Concentrator (Zymo Research, D4060) and diluted to 1 μM in ultrapure water, aliquoted and stored at −20 °C. Oligonucleotide sequences and fluorophores used in the GFP-targeting screen are listed in Supplementary Table 5, and the base editing screens and in vivo CRISPRmap barcode readout are listed in Supplementary Table 6.

Antibody fluorophore conjugation

In each conjugation reaction, 5 μg of antibody in PBS (BSA-free) is mixed with 1 μl of 0.33 mM CF750 Dye SE/TFP esters (Biotium, 92142), Alexa Fluor 647 NHS Ester (Thermo Fisher Scientific, A20006), Alexa Fluor 555 NHS Ester (Thermo Fisher Scientific, A20009) or Alexa Fluor 488 NHS Ester (Thermo Fisher Scientific, A20000) in DMSO and incubated at room temperature for 16 h. Fluorophore-conjugated antibodies were then purified with a 30-kDa Amicon Ultra-0.5 Centrifugal Filter Unit (Millipore Sigma, UFC5030BK) Antibodies used in the base editing screen and in vivo barcode readout are listed in Supplementary Table 6.

CRISPRmap optical pooled CRISPR knockout screen

HT1080–Cas9 AAVS1 cells were seeded into six-well tissue culture–treated plates at a density of 50,000 cells per cm². After 24 h, cells were transduced with the GFP-pilot lentiviral supernatant supplemented with 8 μg ml⁻¹ polybrene at MOI ~ 0.1. At 48 h after infection, viral supernatant was removed, and cells were treated with media containing 2 μg ml⁻¹ puromycin for 48 h and seeded onto 96-well glass-bottom plates (Cellvis, P96-1.5H-N) at 10,000 cells per well as the original seeding density. Cells were seeded at 4,000 cells per well as the sparse density to avoid extensive overlapping among cells. A total reaction volume of 50 μl was used in the following steps unless otherwise specified. After 24 h, cells were fixed in 4% paraformaldehyde (PFA) (Electron Microscopy Sciences, 15710-S) in PBS (Gibco, 10010049) for 10 min at room temperature, followed by two rinses in PBS. Cells were then incubated in 0.1 mg ml⁻¹ wheat germ agglutinin (WGA) CF770 conjugate (Biotium, 29059) and 0.5 μg ml⁻¹ DAPI (Abcam, ab285390) in PBS for 30 min at room temperature and imaged in PBS for membrane, GFP and nuclei signal using the microscope configuration described above. After phenotype imaging, cells were permeabilized with 0.2% Triton X-100 (Sigma-Aldrich, T8787) in PBS for 10 min at room temperature, followed by two rinses in PBS. The permeabilization conditions are to be determined for each new cell type, as it is one of the parameters that determines barcode detection efficiency. For primer and padlock oligo hybridization, cells in each well were incubated in the hybridization mix (GFP-pilot CRISPRmap padlock and primer mix (see Supplementary Table 5 for sequences; each oligo in the mix has a final concentration of 10 nM), 1 mg ml⁻¹ yeast tRNA (Invitrogen, 15401011) and 2× SSC, 20% formamide (v/v) in ultrapure water) for 16 h at 40 °C in a HybEZ oven (ACD, PN 321720). After hybridization, cells were first rinsed three times with the hybridization wash buffer (2× SSC, 20% formamide (v/v) in ultrapure water) and then washed three times for 5 min at 40 °C. Cells were then incubated in splint mix (10 nM CRISPRmap GFP-pilot splint mix (see Supplementary Table 5 for sequences; each splint oligo in the mix has a final concentration of 10 nM), 0.1% yeast tRNA, 2× SSC and 15% formamide in ultrapure water) for 30 min at 37 °C in a HybEZ oven, rinsed twice with the formamide wash buffer (2× SSC, 15% (v/v) formamide in ultrapure water) and incubated in 2× SSC in ultrapure water for 15 min at room temperature. For T4 DNA ligation, cells were incubated in ligation mix (1× T4 ligase buffer, 1% (v/v) T4 DNA ligase (Enzymatics, L6030-HC-L) in ultrapure water) for 2 h at 16 °C and then for 1 h at 25 °C in a HybEZ oven, followed by two rinses in PBS. For RCA, cells were incubated in RCA mix (1× QualiPhi buffer, 2% (v/v) QualiPhi DNA Polymerase (4basebio, 510100), dNTP mix (0.25 mM each; Thermo Fisher Scientific, R1122), and 0.02 mM 5-(3-aminoallyl)-dUTP (Thermo Fisher Scientific, AM8439) in ultrapure water) for 6 h at 30 °C and then the RCA mix was removed and the cells were immediately fixed with 4% PFA in PBS for 10 min at room temperature, followed by three PBS washes. For readout probe hybridization, cells in each well were incubated in readout probe mix (10 nM of each readout probe (see readout probe sequences for each hybridization rounds in Supplementary Table 5), 2× SSC, 15% formamide in ultrapure water) for 30 min at 37 °C in a HybEZ oven. Cells were then imaged in the imaging buffer (0.5 μg ml⁻¹ DAPI, 10 μg ml⁻¹ Fungin (InvivoGen, ant-fn-1) in PBS) using the microscope configuration described above. After imaging, the cells were incubated in the stripping buffer (2× SSC, 50% formamide (v/v) in ultrapure water) for 20 min at 40 °C in a HybEZ oven and then rinsed once with formamide wash buffer. A total of four readout hybridization rounds were performed to decode all the CRISPRmap barcodes in the GFP-pilot library. The same CRISPRmap assay and barcode detection protocol was applied to IMR-90 cells, iPSCs, iMNs and hESCs.

To quantify the sensitivity, specificity and precision of the assay, average fluorescence intensity of GFP under nuclei masks was quantified, and a threshold was determined based on the GFP intensity distribution of the cell population to classify cells into GFP⁺ and GFP⁻ categories. Standard CRISPRmap QC was performed for each cell to determine the guide identity. Specifically, we quantified the spot count for the most representing guide-reporting barcode (max_spot) and the second most representing guide-reporting barcode (second_max_spot) in each cell, and a purity score was calculated by: Purity = max_spot / (max_spot + second_max_spot). Guide identity is only assigned to a cell (that is, a cell that passed QC) when the max_spot ≥ 3 and Purity ≥ 0.66. A relaxed QC metric (max_spot ≥ 2 and Purity ≥ 0) was applied to enable a side-by-side comparison between CRISPRmap and conventional OPS. The guide identity reported by the most representing barcode is assigned to a cell that passed QC. Ratio of cells that passed QC was calculated as the ratio between the number of cells that passed QC and the total number of cells profiled. To calculate sensitivity, specificity and precision, we define a true positive (TP) as a GFP⁻ cell assigned with one of the GFP⁻ targeting guides in the GFP-pilot library; a true negative (TN) is defined as a GFP⁺ cell assigned with one of the non-targeting guides; a false positive (FP) is defined as a GFP⁺ cell assigned with one of the GFP-targeting guides; and a false negative (FN) is defined as a GFP⁻ cell assigned with one of the non-targeting guides. Specificity = TN / (TN + FP); Sensitivity = TP / (TP + FN); Precision = TP / (TP + FP). QC metrics from loose to tight were applied in Supplementary Fig. 1c–e.

CRISPRmap barcode readout optimization and quantification

To optimize and quantify the CRISPRmap barcode readout, we created two HT1080 cell lines with fluorescent protein (FP) expression tethered to the CRISPRmap barcode reporting on the FP identity: one cell line expresses GFP, an NTC guide (NT_GFP) and the CRISPRmap padlock and primer hybridization sequence for Padlock003 and Primer003 (GFP_Barcode); the other cell line expresses mTurquoise2, an NTC guide (NT_mTurquoise2) and the CRISPRmap padlock and primer hybridization sequence for Padlock004 and Primer004 (mTurquoise2_Barcode). The sequences are listed in Supplementary Table 5. FPs were introduced to the CRISPRmap-CROPseq vector by replacing the puromycin resistance gene. The sgRNA–barcode (NT_GFP + GFP_Barcode, NT_mTurquiose2 + mTurquoise2_Barcode) sequences were ordered as synthesized double-strand DNA fragments (Integrated DNA Technologies) and cloned onto CRISPRmap-CROPseq vector replaced with GFP and mTurquoise2, respectively. Plasmid-EZ sequencing (Azenta Life Sciences) was performed to confirm the sgRNA–barcode combination matches with the FP expressed, before lentiviral packaging and infection on the HT1080 cells. The two cell lines were sorted by flow cytometry, mixed at 1:1 ratio and seeded onto 96-well glass-bottom plates for genotype–phenotype mapping. Cells were fixed in 4% PFA in PBS for 10 min at room temperature and then incubated with DRAQ5 fluorescent probes (0.05 mM, Thermo Fisher Scientific, 62251) and WGA-CF770 (10 µg ml⁻¹) for 20 min at room temperature for nucleus and membrane segmentation, respectively. WGA-CF770, DRAQ5, GFP and mTurqoise2 fluorescence signals were imaged with 730-nm, 640-nm, 488-nm and 405-nm lasers, respectively. Cell permeabilization, amplicon generation, barcode readout and image analysis were performed, as described in the CRISPRmap pooled CRISPR knockout screen for the GFP-pilot library. Average fluorescence of GFP and mTurquiose2 signals under nuclei masks was quantified to classify cells into GFP⁺ and mTurquoise2⁺ categories, and the FP identity in each cell was matched to the detected barcodes to evaluate the sensitivity, specificity and precision of the assay (Supplementary Fig. 1c–e and Supplementary Table 2).

To quantify the CRSPRmap barcode readout for potential double-transduced cells, we performed lentiviral infection of the GFP-pilot library at different MOI of 0.9, 0.3, 0.1 and 0.03 on HT1080 cells. Cells were puromycin selected, seeded and profiled for barcode expression, as described above. To identify potential double-transduced cells, we first performed standard QC to identify cells with unique barcodes and then classified the remaining cells to be ‘double’ if the spot count for the most representing guide (max_spot) was ≥3 and the second-most representing spot (second_max_spot) was ≥2. The expected ratio of double-transduced cells after antibiotic selection at a given MOI was calculated by Poisson distribution after removing the proportion of cells with zero infection event. The ratio of double-transduced cells detected optically and the expected ratio at each MOI are shown in Supplementary Fig. 1l and Supplementary Table 2.

CRISPRmap multimodal optical pooled base editing screen

MCF7-BE3 cells were transduced with the DDR364 library in the same manner as in the GFP-targeting knockout screen with several modifications to accommodate the multiplexed immunofluorescence and RNAmap. Specifically, after puromycin selection, cells were seeded onto six-well glass-bottom plates (Cellvis, P6-1.5H-N) at a density of 50,000 cells per cm². For the DDR364-irradiation screening, after 48 h, cells were exposed to 10 Gy of ionizing radiation using the Gammacell 40 cesium source irradiator and fixed at 6 h after irradiation. For the DNA-damaging agents (DDR364-chemo) screening, cells were treated with 100 nM CPT (Sigma-Aldrich, C9911), 1 μM OLAP (Selleck Chemicals, S1060), 1 μM CISP (Sigma-Aldrich, P4394), 1 μM ETOP (Sigma-Aldrich, E1383) or untreated and fixed at 24 h after treatment. After fixation, cells were permeabilized with 0.1% Triton X-100 in PBS for 10 min on ice. Cells in each well were incubated in 1 ml of reaction mix or buffers in all steps unless otherwise specified. After permeabilization, cells were incubated in the antibody mix (2 μg ml⁻¹ rat anti-CD326 (BioLegend, 312502), 1 μg ml⁻¹ rabbit anti-RAD51-AF647 (BioAcademia, 70-012), 2 μg ml⁻¹ mouse anti-BRCA1-AF555 (Santa Cruz Biotechnology, sc-6954), 0.5 μg ml⁻¹ rabbit anti-RPA2-AF488 (Bethyl Laboratories, A300-244A) in PBS) for 1 h at room temperature and then rinsed twice with PBS. Cells were incubated in 10 μg ml⁻¹ goat anti-rat-IgG secondary antibody (Thermo Fisher Scientific, SA5-10023) for 30 min at room temperature and rinsed twice with PBS. Cells were fixed in 4% PFA in PBS for 10 min at room temperature to crosslink the antibodies to the cells, followed by two PBS rinses. Cells were then processed with padlock and primer probe hybridization, splint hybridization, ligation and RCA as described above, with the minor difference that 3 nM of each CRISPRmap padlock and primer probes and 3 nM of each RNAmap padlock and primer probes were used in the hybridization mix. Probe sequences are listed in Supplementary Table 6. After RCA and fixation, cells were first imaged in the imaging buffer for membrane, nuclei and nuclear foci signal using the microscope configuration described in Supplementary Table 6. After imaging, the antibody signal was bleached with 1 mg ml⁻¹ lithium borohydride (Sigma-Aldrich, 222356) and rinsed twice with PBS, before the incubation of the next round of antibodies. For both the DDR364-irradiation and the DDR364-chemo screening, a total of four antibody incubation–bleaching rounds were performed. After the last round of bleaching, eight rounds of RNAmap readout probe hybridization-stripping rounds were performed, followed by eight rounds of CRISPRmap readout probe hybridization-stripping rounds. Each round was imaged using the microscope configuration described above. Readout probe sequences and conjugated fluorophores are listed in Supplementary Table 6. For the DDR364-irradiation screening, cells were incubated in Vector TrueVIEW Autofluorescence Quenching reagent (Vector Laboratories, SP-8400-15) for 5 min at room temperature to reduce autofluorescence, followed by three rinses in PBS before imaging each CRISPRmap readout round in high DAPI imaging buffer (2.5 μg ml⁻¹ DAPI, 10 μg ml⁻¹ Fungin in PBS).

In vivo CRISPRmap barcode readout and multimodal phenotyping

After lentiviral transduction with pLV-EF1a-TagBFP2 on OE19 cells, fluorescence-activated cell sorting (Sony, MA900) was performed to obtain a BFP-expressing OE19 population. BFP-expressing OE19 (referred to as OE19–BFP) cells were lentiviral transduced with the DDR364 library and puromycin selected as described above and then expanded for 4 d in puromycin-free media. We suspended 5 × 10⁶ cells in a 1:1 mixture of Matrigel and PBS and inoculated the mixture into the flanks of nude mice (JAX, strain no. 002019). Mice were housed with a constant temperature of 21–24 °C, 45–65% humidity and a 12-h light/dark cycle. After 17 d, tumors were harvested and fresh frozen in OCT on dry ice and stored at −80 °C. Frozen tumor samples were sectioned using a cryostat microtome (Leica, CM1510S) at −20 °C into 10-μm-thick sections and deposited onto 12-well glass-bottom plates (Cellvis, P12-1.5H-N) coated with 0.1 mg ml⁻¹ poly-d-lysine (Sigma-Aldrich, A-003-E). CRISPRmap barcode readout and antibody staining were performed as described above with minor modifications. Specifically, 400 μl of reaction mix and buffers was added to each well to fully cover the tissue section. Tissue sections were fixed with 4% PFA in PBS for 15 min at room temperature and permeabilized with 0.5% Triton X-100 in PBS for 15 min at room temperature. Then, 30 nM of each CRISPRmap padlock and primer oligos was used in the hybridization mix. The same set of CRISPRmap padlock and primer probes, splints and readout probes was used as in the DDR364-irradiation screening. Eight CRISPRmap readout cycles were performed before antibody staining and bleaching cycles. The same readout probes were used as in the base editing screens.

Conventional OPS

Conventional OPS on cultured cells (HT1080 cells, IMR-90 cells, iPSCs, hESCs and MNs) was performed in accordance with the published protocol^3,4. In brief, cells were fixed and permeabilized in the same conditions with cells undergoing the CRISPRmap protocol for side-by-side comparisons. Specifically, cells were fixed in 4% PFA in PBS for 10 min at room temperature and permeabilized with 0.2% Triton X-100 in PBS for 10 min at room temperature. Reverse transcription mix (1× RevertAid RT buffer, 250 μM dNTPs, 0.2 mg ml⁻¹ BSA (New England Biolabs, B9000S), 1 μM RT primer (/5AmMC12/A + CT + CG + GT + GC + CA + CT + TTTTCAA, Integrated DNA Technologies), 0.8 U μl⁻¹ Ribolock RNase inhibitor (Thermo Fisher Scientific, EO0384) and 4.8 U μl⁻¹ RevertAid H minus reverse transcriptase (Thermo Fisher Scientific, EP0452)) was added to the cells and incubated for 16 h at 37 °C. Cells were washed five times with PBS-T and fixed with 3% PFA and 0.1% glutaraldehyde in PBS for 30 min at room temperature and then washed with PBS-T five times. Cells were incubated with the gap-fill reaction mix (1× Ampligase buffer, 0.4 U μl⁻¹ RNase H (Enzymatics, Y9220L), 0.2 mg ml⁻¹ BSA, 100 nM padlock probe (/5Phos/GTTTCAGAGCTATGCTCTCCTGTTCGCCAAATTCTACCCACCACCCACTCTCCAaaggacgaaaCACC, Integrated DNA Technologies), 0.02 U μl⁻¹ TaqIT polymerase (Enzymatics, P7620L), 0.5 U μl⁻¹ Ampligase (Lucigen, A3210K) and 50 nM dNTPs) for 5 min at 37 °C and 90 min at 45 °C, washed twice with PBS-T and then incubated with the RCA mix (1× Phi29 buffer, 250 μM dNTPs, 0.2 mg ml⁻¹ BSA, 5% glycerol, 1 U μl⁻¹ Phi29 DNA polymerase (Thermo Fisher Scientific, EP0091)) at 30 °C for 16 h. For in situ sequencing, 1 μM sequencing by synthesis primer (GCCAAATTCTACCCACCACCCACTCTCCAaaggacgaaaCACC, Integrated DNA Technologies) in 2× SSC was added to the cells for 30 min at room temperature. Incorporation mix (Illumina, MS-103-1003, MiSeq reagent 1) was added to the cells for 5 min at 60 °C, and the cells were rinsed five times in PR2 and washed by five cycles of 5 min, 60 °C washes. Cells were imaged using iIllumination of 100 mW 405 nm (DAPI), 50 mW 488 nm (G base), 50 mW 561 nm (C base) and 140 mW 640 nm (A base) lasers. Cells were then incubated in the cleavage mix (Illumina, MS-103-1003, MiSeq reagent 4) at 60 °C for 6 min, followed by three rinses with PR2, one wash with PR2 at 60 °C for 1 min and three rinses with PR2 again, before entering the next incorporation step. Four bases were sequenced to distinguish the guide sequences in the GFP-pilot library. Sensitivity, specificity and precision were calculated based on the barcode identity and GFP expression level in each cell, as described in the CRISPRmap pooled CRISPR knockout screen for the GFP-pilot library.

Variant annotation

The sgRNA category of each guide was annotated as previously described⁷. We grouped the splice-donor and splice-acceptor categories into a ‘splice’ category. All AAVS1-targeting and non-targeting guides are annotated as a ‘control’ category. The ClinVar category was determined by querying each guide in the ClinVar database (version 2023-12-15; https://www.ncbi.nlm.nih.gov/clinvar/; RRID: SCR_006169). Nonsense and missense variants were queried based on the specific amino acid change, whereas splice variants were queried based on the nucleotide change outcomes in the editing window (base C in the 4th to 8th bases in the sgRNA targeting sequences). Note that if multiple C bases exist in the editing window, a splice guide can render other mutational outcomes, such as missense or intron variants. These mutational outcomes were not counted in the annotation of splice variants but listed as ‘Less deleterious variants’ in Supplementary Table 6. The determining criteria of the ClinVar category were established as previously described⁷. In brief, three categories were assigned to non-control guides: (1) benign/likely-benign (B/LB); (2) VUSs; and (3) P/LP. The VUS category also includes variants with conflicting interpretations. If a variant was not documented in the ClinVar database, it was listed as ‘unknown’.

Library QC and quantification by NGS

For oligo pool quantification, the first-round amplification product in the library cloning step was collected, and 0.5 μl of 50-μl PCR product was added to each 50-μl Q5 reaction mix for the second-round amplification of 10 cycles using the primer pairs CRISPRmap-F-ad and CRISPRmap-R-ad in Supplementary Table 7. We amplified 10 pg of plasmid extraction product from the library cloning step with the same two-round strategy as the oligo pool quantification for plasmid-level quantification. Genomic DNA of the cells transduced with the sgRNA library was extracted with Genomic DNA Clean & Concentrator (Zymo Research, D4010). We amplified 100 ng of genomic DNA with the same two-round strategy. We had 5 ng of the final PCR product sequenced with NGS (Azenta Life Sciences, Amplicon-EZ). sgRNA sequences in the library were aligned to the NGS reads to quantify the relative abundance of each guide in the library, and the padlock and primer hybridization sequences (barcodes) were aligned to each NGS read containing a valid sgRNA sequence to evaluate the barcode–sgRNA recombination rate. Each read with a valid sgRNA sequence was classified into ‘matched’ (sgRNA–barcode combination matched the codebook), ‘switched’ (sgRNA–barcode combination does not match the codebook), ‘loss of BC’ (no valid padlock or primer sequences detected) or ‘unallowed BC’ (unallowed padlock and primer combination detected) category. The results are shown in Supplementary Fig. 1j and Supplementary Table 2.

Base editing screen hits validation

Individual sgRNAs with the same guide and scaffold sequences as used in the base editing screen were ordered as synthesized double-strand DNA fragments (Integrated DNA Technologies) and cloned onto the CRISPRmap-CROPseq vector. As described in the base editing screening, cells transduced with individual sgRNAs were selected for 2 d in puromycin and cultured for 2 d before ionizing radiation. Six hours after irradiation, cells on the glass-bottom plates were fixed for immunostaining of the same panel of nuclear foci imaged in the screen. Cells on tissue culture plates were harvested for immunoblotting. Genomic DNA was extracted from the untreated cells with QuickExtract DNA Extraction Solution (Lucigen, QE09050) at the same timepoint for evaluating base editing efficiency of the individually transduced guides. PCR amplification was performed on the genomic locus of the intended base edit (primer sequences listed in Supplementary Table 7) using Q5 DNA polymerase (New England Biolabs, M0492), followed by Sanger sequencing (Azenta Life Sciences). ICE analysis (Synthego Performance Analysis, ICE Analysis, 2019) was performed on the Sanger sequencing results to quantify the in-window and out-of-window editing efficiency (Supplementary Fig. 12a).

Immunoblotting

Cells transduced with individual sgRNAs were selected for 2 d in puromycin and cultured for 2 d before collection as described in the base editing screening. Cells treated with siRNAs were subjected to reverse siRNA transfection using firefly (FF) siRNA, BRCA1 siRNA or BRCA2 siRNA at 20 nM and Lipofectamine RNAiMAX (Thermo Fisher Scientific, 13778075) as per the manufacturer’s indications. Cells were trypsinized, washed and resuspended in sample buffer (0.1 M Tris, pH 6.8, 4% SDS, 12% β-mercaptoethanol) at a density of 20,000 cells per microliter. Subsequently, samples were sonicated for 10 s twice and boiled at 95 °C for 5 min before gel electrophoresis. After gel electrophoresis, proteins were transferred onto nitrocellulose membranes. Proteins were detected using the appropriate primary and HRP-conjugated secondary antibodies at a 1:10,000 dilution. Primary antibodies used in this study included mouse-anti-BRCA1 (Santa Cruz Biotechnology, sc-6954, 1:100), rabbit anti-phospho-KAP1 (Bethyl Laboratories, A700-013, 1:1,000), rat anti-tubulin (Novus Biologicals, NB 600-506, 1:50,000) and mouse anti-BRCA2 (Millipore, OP95, 1:1,000).

RNA sequencing

Gamma-irradiated and untreated MCF7-BE3 cells were prepared in parallel to the cells profiled in the DDR364-irradiation screen. Six hours after irradiation, total RNA was extracted with a Quick-RNA Microprep Kit (Zymo Research, R1051), and mRNA was isolated with an NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs, E7490L). RNA integrity number (RIN) was quantified with an RNA Pico 6000 assay (Aligent, 5067-1513) on a Bioanalyzer (Aligent 2100, G2939BA). DNA libraries for NGS were prepared with an NEBNext Ultra II RNA Library Prep Kit for Illumina (New England Biolabs, E7775) and NEBNext Multiplex Oligos for Illumina (Unique Dual Index UMI Adaptors RNA Set 1) (New England Biolabs, E7416). DNA libraries were quality checked with a DNA 1000 assay (Aligent, 5067-1504) on the Bioanalyzer and then sequenced on a MiSeq platform (Illumina) with a 5% spike-in of PhiX (Azenta Life Sciences, Sequencing-Only). Four replicates were sequenced, and the average transcripts per million (TPM) reads was calculated for the transcripts that we profiled optically with RNAmap.

CRISPRmap and RNAmap primer and padlock probe design

The gene-specific target probes for RNAmap are designed for specificity and minimized off-target binding, conforming to SeqFISH methodologies^28,36, using the FISHprobe R package (version 0.4.1; https://github.com/stevexniu/fishprobe). For gene selection and probe extraction, we selected highly expressed gene isoforms from the Human GTEX V8 (ref. ³⁷) and Mouse ENCODE³⁸ tissue expression datasets for probe design. Probe sequences, 20–30 nucleotides in length, were derived from the coding sequence (CDS) and, where necessary, from the untranslated regions (UTRs). We targeted a GC content range of 45–65% or 30–70% for the targeting probes, excluding those with unsuitable GC content or sequences prone to forming homopolymeric runs (such as G-quadruplexes) to maintain optimal hybridization characteristics.

Specificity and off-target minimization: local BLAST³⁹ searches against human and mouse mRNA sequence databases identified probes with off-targets, particularly those with alignments exceeding 10–15 nucleotides with unrelated genes in the transcriptomes and the repetitive DNA using repetitive masks. Tissue-specific expression data from human³⁷ or mouse³⁸ were pivotal in developing a gene copy number table for each tissue type, which informed the exclusion of probes with off-target copy numbers exceeding 15–20 logTPM. For thermal stability and structural integrity, to refine the probe pool by optimizing GC content for enhanced binding affinity, an iterative selection process was employed. Probes were initially ranked in ascending order of their deviation from the target GC content of 55%, starting with the probe exhibiting the greatest deviation. This arrangement continued until no overlapping probes remained. Subsequently, the selection process took into account the calculated melting temperatures (Tm)⁴⁰. For secondary structure predictions, including pseudoknots, the analysis was conducted under specific conditions: a sodium ion concentration of 0.33 M (equivalent to 2× SSC) and 50% formamide at 37 °C⁴⁰. Probes with an equilibrium stability lower than 20% were excluded to ensure the formation of stable and specific duplexes. For final probe set selection, the finalized probe set, consisting of 28–32 probes per gene, was optimized to minimize spatial overlap, allowing a maximum of five nucleotides of overlap between adjacent probes. Probes were subjected to stringent filters for equilibrium, and free energy, to refine the probe library. Local BLAST searches within the probe pool identified and mitigated potential cross-hybridization between the selected probes. Genes with insufficient probe numbers were curated manually using a genome browser to guarantee thorough coverage.

CRISPRmap and RNAmap readout probe, CRISPRmap padlock and primer oligo design

For probe generation and off-target screening, starting with a base of 240,000 25mer probes 8, we generated all possible 20mer sequences sequentially. Each of these subsequences was subjected to BLAST screening against human and mouse transcriptomes to exclude any probes with off-target complementarity, and the resulting pool was, thus, reduced to only those probes with zero off-target hits. For optimizing probe performance: to optimize the readout probes’ performance, we calculated their melting temperatures (Tm) and secondary structure predictions³⁹ to refine our selection further, similar to the aforementioned target probes. This ensures that each probe binds to its intended target with high affinity and that the thermal profiles are suitable for our experimental conditions. In scenarios with high mRNA expression, it is vital to prevent overcrowding within any single fluorescence channel. Additionally, based on expression levels in the targeting tissue^37,38, we curated the probe sets and their corresponding fluorophores and distributed the signal across multiple channels, promoting distinct visualization of each mRNA molecule. To further minimize the risk of cross-hybridization, we conducted an analysis of readout probe sequences for potential overlaps by performing a local BLAST search against the readout probe pool. This effort led to the identification of 226 20mer DNA sequences, as outlined in Supplementary Table 7, which provides details for each probe, including the 20-nucleotide probe sequence, unique identifier, off-target information, melting temperatures (Tm) and secondary structures. For codebook construction: by employing a Hamming distance approach, similar to the HDM4 code used in MERFISH¹⁹, a codebook was constructed with 36 of the aforementioned 226 20mer readout probes. This codebook consists of 319 36-bit codes allocated over 12 hybridization rounds across three channels (488 nm, 561 nm and 640 nm), ensuring that each readout probe would have a unique signature, reducing the possibility of channel crosstalk and fluorescence overlap. This approach aims to enable differentiation of probes even in densely labeled samples, where multiple mRNA molecules are in close proximity. The detailed codebook design is provided in Supplementary Table 7, which includes details such as binary code assignments for each hybridization round and optical channel, indices, a conversion table that relates binary codes to specific probes and sequences linked to each code across channels. For CRISPRmap readout probes, we selected 24 20mers from the aforementioned 226 20mer list. This selected set of 24 probes was split into two sets of 12 probes for the detection of padlocks and primers, respectively. Splint sequences consist of the reverse complement of the primer readout sequences, with an additional universal two bases added at the 5′ end (‘GT’) and the 3′ end (‘AC’) in an attempt to avoid ligation efficiency biases between different splint oligos. Two sets of 54 30mer encoding sequences were generated with similar criteria as the 20mer list and used as padlock or primer encoding sequences. The sequences of these oligos are listed in Supplementary Table 7.

Image processing and analysis

Image storage and stitching

All microscopy images were acquired using Fusion software and saved as IMS files. Each IMS file stores the image as a five-dimensional object in the following order: Resolution, Channel, Z, Y and X. All image montages were stitched using Fusion’s stitching software. For ×60 images, the high-speed setting was used to stitch the image montage and saved as IMS files. For the ×20 images, the high-quality setting using default parameters was used to stitch the image montage and saved as IMS files.

Image rescaling

All images were uploaded to a Google Cloud virtual machine for further image processing and analysis. To read the IMS file and the corresponding metadata, we used the ‘imaris_ims_file_reader’ package. All ×60 montages were analyzed at resolution 3 (1/8 scale of original image), and, for all ×20 images, we used resolution 1 images (1/2 scale of original image). All images were max-projected along the z axis. All max-projected images are three dimensional with the dimensions being Channel, Y and X. The images are in numpy array format and of uint16 data type. Our imaging protocol involves imaging cells at different magnifications based on the resolution of images required. If a particular imaging round was imaged at a different magnification, the images of this round will be of a different size and have a different pixel pitch (pixel-to-micron ratio). To accommodate for this, images were scaled to achieve a consistent pixel pitch using cv2 resize with a bicubic interpolation function. This also ensured that images from all imaging rounds were the same dimensions across X and Y.

Image registration

We register images to a reference image. Across the multiplexed imaging rounds, there are global translational shifts (that is, misalignment of the glass-bottom well plate) as well as local translational shifts (that is, cells slightly shifting between imaging rounds). To finely align the images across all imaging rounds, we calculated the transformation matrices for each round using the TV-L1 (ref. ²¹) implementation of optical flow on binary nuclei masks derived from DAPI stains. Optical flow calculates Y,X vector shifts across the images for every pixel and performs pixel-level registration. The transformation matrix was then applied to all image channels of that imaging cycle. During registration, all images are converted from uint16 to float64. The images are then converted back to uint16 to reduce the memory usage and speed up image processing. All registered images are three dimensional with the dimensions being Channel, Y and X. Registration quality was estimated using cross-correlation. It is expected that the cross-correlation would decrease with increasing montage size. For our 30 × 30 ×60 montages and 10 × 10 ×20 montages, a cross-correlation of greater than 0.75 was considered good.

Segmentation of cell and nuclear boundaries

To assign detected guides to each cell and quantify nuclear antibody stains, we segmented both the cytoplasmic area and the nuclear area of each cell using Cellpose⁴¹. This process was broken into three steps: pre-processing, segmentation and filtering. The EPCAM and DAPI stains were pre-processed by thresholding to maximize the dynamic range of the plasma membrane and nuclear stains in the base editing screening. For other cultured cells, WGA-CF770 (Biotium, 29059) was used for membrane segmentation. For tissue sections, the membrane segmentation was performed on the E-cadherin staining. Typically, this involved setting pixels below the 2nd percentile to 0 and pixels above the 98th percentile to 255.

After pre-processing, images were segmented twice with Cellpose, first to identify cytoplasmic areas and second to identify nuclear areas. The cytoplasmic segmentation run was performed on an image stack containing the EPCAM and DAPI stain and excluded any cytoplasm mask smaller than 5 pixels, whereas the nuclear segmentation was performed only on the DAPI stain with no minimum size requirement. The cell diameter parameter for Cellpose was determined by hand counting and averaging the width and height of 10 randomly sampled cells in pixels. This value was multiplied 1.5× for the cytoplasmic segmentation and 0.75× for the nuclear segmentation.

Once the nuclear and cytoplasmic masks were generated, we filtered out nuclear masks that did not overlap with a cytoplasmic mask and cytoplasmic masks that did not contain a nuclear mask. This ensured that each segmented cytoplasm had one associated segmented nucleus and vice versa. The coordinates for each nuclear and cytoplasmic pair were relabeled with the nuclear ID, which was used as the cell ID from this point onwards. Segmentation quality was validated by both quantifying the percent of proposed cytoplasmic masks retained after filtering for segmented nuclei overlap and visually inspecting the images.

CRISPRmap and RNAmap amplicon detection

To detect amplicons corresponding to CRISPRmap, all registered images corresponding to CRISPRmap readout rounds were processed as follows. Each two-dimensional image (Y and X) underwent contrast stretching to improve the signal-to-noise ratio using the skimage rescale_intensity function. Images were now stored in a list in the order of the readout round and channel (R1-ch1, R1-ch2, R1-ch3, R2-ch1…), with R1 being the first readout round and ch1 being the longest wavelength channel. For each image (readout round and channel combination), spots were identified by using the skimage implementation of the difference of Gaussians method using parameters that maximized the barcode recovery. This implementation outputs an array of coordinates of all spots identified. All the coordinates for spots identified were searched against the cell masks (from cell segmentation), and any spot outside cell masks was discarded from further analysis. Furthermore, if the number of spots within a cell mask was less than 3, then the spots within the cell mask were also discarded from further analysis. This was done to reduce the noise/error in spot detection. All the spots retained for a given round–channel image were stored in an array. This was repeated for each cycle–channel image, and the array of spots retained was stored in a list (with the order being spots for R1-ch1, R1-ch2, R1-ch3, R2-ch1…). Another array was created combining all the retained spots across all imaging round–channel combinations. To eliminate duplicates in the merged array, we used the np.unique function and discarded spots within a radius of 2 pixels. Then, for each spot coordinate in the merged array, we compare the distance of the spot with all the spots detected in a single round–channel combination. If the spot is within a 2-pixel radius, we mark the given round–channel combintion as positive. This was done for all rounds and channels, and, by doing so for each spot, a ‘spot code’ was generated. A spot code essentially maps for a given spot, and round and channel combinations also contain that spot. Once the spot code was generated for all the spots, the spot code for every spot was compared to the predefined barcode designed for every guide. If a spot code matched a barcode, the spot was assigned to the barcode. If a spot did not map to any barcode corresponding to a guide, then the spot was discarded from further analysis. Spot calling was optimized to maximize spots that are assigned to barcodes of the guide library.

Finally, each cell was assigned a barcode based on the spot identity underneath the cell mask, according to the standard CRISPRmap QC as described above. The barcode identity of the cell was stored in a dictionary as well as in the format of an image mask.

Foci and micronuclei detection, cell cycle determination and quantification of antibody stains

For each cell, the sum intensity underneath a given cell mask was calculated for a given antibody stain and stored in a dictionary. The sum intensity underneath the nuclear mask was also calculated and stored in the dictionary. The average intensity of each antibody stain was also calculated by dividing the sum intensity by the total number of pixels underneath the cell/nuclear mask. The raw images then underwent contrast stretching using the rescale_intensity function of skimage. After rescaling the images, foci detection for RAD51, BRCA1, RPA2, γH2AX, 53BP1 and RAD18 was done using the skimage difference of Gaussians method. The total number of foci within the cell mask/nuclear mask was also stored in the dictionary.

To detect and quantify the presence of micronuclei, for every cell mask in the image, the nuclear mask was retrieved. If there was a nuclear mask of less than 100 pixels in area, then the nuclear mask was separately annotated as a small nucleus.

For micronuclei detection, we first performed nuclei segmentation on the DAPI stain using Cellpose. Nuclei masks were generated to define the outline of each nucleus in the image. All the nuclei with a size of less than 100 pixels were marked as ‘micronuclei’. This threshold was determined by manually inspecting the micronuclei captured by the nuclei segmentation. We then subtracted the DAPI signal underneath all nuclei masks by changing the intensity value to 0 for all pixels outlined by the nuclei masks. We removed the background DAPI signal by changing the intensity value to 0 for any pixel with an intensity value lower than 110. To completely remove the residual DAPI signal coming from the DAPI staining of the cell nuclei, we dilated each nuclei mask by 2 pixels using cv2 and then changed the intensity value to 0 for all pixels outlined by the dilated nuclei masks. We finally performed spot calling to identify micronuclei. Based on the coordinates of the spots, the number of spots within a cell mask were identified and included in the dictionary as the number of micronuclei within a cell mask.

Multiplexed immunostaining on cell cycle phase marker proteins was performed to distinguish cells in different cell cycle phases (that is, G0, G1, S/G2 and M phases). Three antibodies were used for cell cycle phase classification—0.5 μg ml⁻¹ rabbit monoclonal anti-Ki-67 (Cell Signaling Technology, 34330), 1 μg ml⁻¹ rabbit monoclonal anti-cyclin A2 (Cell Signaling Technology, 29113SF) and 0.5 μg ml⁻¹ rabbit monoclonal anti-phospho-histone H3 (Cell Signaling Technology, 3475)—and 1 μg ml⁻¹ rabbit monoclonal anti-cyclin B1 (Cell Signaling Technology, 65173SF) was included to further validate the specificity of the cyclin A2 staining. In brief, cell cycle marker signals were quantified for each cell by calculating the average fluorescence intensity under the nuclear mask. The distribution of the average nuclear intensity of each marker was plotted, and a threshold was set to divide cells into positive or negative populations for that marker (shown in Supplementary Fig. 6g). First, cells in the Ki-67⁻ population were classified as G0-phase cells, and cells positive in phospho-histone H3 were classified as M-phase cells; and then cells in the cyclin A2⁺ population were classified as S/G2-phase cells, and the remaining cells were classified as G1-phase cells.

Statistical analysis on foci features in the base editing screen

For each foci feature that we acquired under each treatment condition in the base editing screen, we calculated a P value based on the Kolmogorov–Smirnov test between the foci count distribution in cells assigned with a given guide identity and all cells assigned with AAVS1-targeting or NTC guide identities (control cells), and then we calculated the L2FC between the average foci count in cells assigned with a given guide identity and the average foci count in all control cells. The P_adj was calculated by the Benjamini–Hochberg method. Statistical significance in foci number change is defined by P_adj < 0.05 and absolute L2FC > 0.5. Volcano plots were generated to visualize the L2FC and P_adj distribution for a given foci feature under a given treatment (Fig. 4g,h,j,k and Supplementary Fig. 8g,h,j,k). We evaluated the foci optical features that we acquired over the ionizing radiation and DNA-damaging agents treatments for each gene variant in the library and counted in how many features a variant resulted in statistically significant change compared to control cells. We tested the number of significant optical features scored by variants in different ClinVar categories with a two-sided Mann–Whitney test (Supplementary Fig. 8m). We also investigated the guides in our library that can lead to the same intended amino acid change. We compared the Pearson correlation of L2FC across all foci features in guides leading to same amino acid changes and guides leading to different amino acid changes with a two-sided independent t-test (Supplementary Fig. 8n).

Sample size analysis for base editing screen hits

In the DDR364 library, we included 72 control (non-targeting or AAVS1-targeting) guides. In the irradiation dataset, we profiled 20,029 control cells over these 72 control guides, whereas the median cell number of a perturbation guide is 281 cells. Therefore, the control population size is roughly 72-fold of a guide on average. In S/G2-phase cells, the median number of cells with perturbation guides is 99 cells. Here, we selected S/G2-phase cells from the top four screen hits of RAD51 foci and BRCA1 foci, respectively, to perform the random sampling analysis. Given the fact that if we vary the number of cells that we image, the size of the control population will still be roughly 72-fold of a given guide, we performed random sampling on the control cells accordingly to maintain this ratio. For each guide, we randomly sampled with replacement for n = 20, 40, …, 200 cells, whereas we sampled n × 72 cells from the control cells in the S/G2 phase, and the P value was calculated by the Kolmogorov–Smirnov test, with statistical significance determined by P < 0.05. The result is shown in Supplementary Fig. 9b,d.

Statistical analysis on co-localized foci features for base editing screen

All co-localization analysis was performed on the irradiated dataset using images at resolution 1 (1/2 scale of the original image), and foci for the six antibody stains (BRCA1, RAD51, γH2AX, 53BP1, RAD18 and RPA2) were detected using the skimage blob_dog method. To account for minor shifts that may not be corrected by registration, foci were considered to co-localize with other foci if the centroids were within a 4-pixel Euclidean distance. This was performed for all 15 (6 choose 2) pairwise combinations of the six antibody stains, and the number of co-localized foci for each pair was then calculated for each single cell that has been assigned with a guide identity (that is, passed QC) by mapping the co-localized foci to its cell nucleus mask.

To determine if the number of co-localized foci observed within the nucleus could be by random overlap of two foci markers, we fixed the coordinates of the first foci marker and permuted the foci for the second marker 10,000 times inside the nucleus. Each time, we calculated the number of co-localized foci using the method described above. The results are shown in Supplementary Fig. 16a. To determine if there was any relevance of cell cycle on the co-localization of foci, we calculated the differences in abundance between a co-localized foci at the S/G2 phase and the G1 phase. The results are shown in Supplementary Fig. 16b. Finally, to calculate the proportion of any given foci co-localizing with another foci, we calculated the proportion of foci co-localized using the formula mean(number of co-localized foci A − B / min(number of co-localized foci A, number of co-localized foci B) per cell). The results are shown in Supplementary Fig. 16c,d.

Void analysis of tissue images

Voids were identified by finding the outermost low-intensity contours where the E-cadherin stain intensity was equal to 113. To set a minimum size for voids, contours with more than 100 boundary pixels were retained as tissue voids. For each retained contour, we gathered the intensity values for the anti-mouse Cd31 stain and the anti-human cleaved PARP1 stain within 20 pixels of the edge of the contour (equidistant inside and outside), which we term ‘boundary stains’. We calculated the 90th percentile of the Cd31 and cPARP boundary stains, and we classified voids as mouse vasculature if the Cd31 value was higher than 113 or as cell death if the cPARP value was higher than 104. Voids negative in both were left unclassified, and one void was classified as double positive.

Clonal purity calculations on tissue images

Following recently reported clonality analysis of barcoded cells²⁷, the clonal score was determined in a cell-centric manner by calculating the local clustering coefficient. In brief, we constructed a 10-nearest neighbors graph for each cell and assigned a P value to each neighborhood by comparing the cell’s same guide clustering coefficient to a table of homotypic clustering coefficients from randomly arranged neighborhoods. This P value was corrected using Bonferroni correction. Cells with P_adj < 0.05 were then plotted along with the same guide cells in the 10-nearest neighbor graph to identify clonal regions with significantly higher clustering coefficient.

Bootstrapped Wasserstein distance

We computed a bootstrapped Wasserstein distance to measure the foci expression deviation from perturbation to control guides. We denote ${X}_{\left\{g\right\},{\;j}}\in {Z}^{{|g|}}$ and ${X}_{\left\{c\right\},\;j}\in {Z}^{\,{|c|}}$ as cells undergoing a specific perturbation or control, respectively, where ${|g|}$, ${|c|}$ refer to the corresponding number of cells in each condition. For each perturbation ${g}_{i}\in [\left[G\right]]$ and feature $j\in$ (RAD51, BRCA1, RPA2, γH2AX, X53BP1 and RAD18), we computed $W\left({g}_{i},j\right)=\frac{1}{N}\cdot {\Sigma }_{n=1}^{N}{W}_{1}({X}_{\left\{{g}_{i}^{\left(1\right)},\ldots ,{g}_{i}^{\left(S\right)}\right\},j},\,{X}_{\left\{c\right\},{j}})$ as the average 1-Wasserstein distance between guide i and control across $S=50$ samples with $N=200$ iterations, where, in each iteration, $S$, cells $\left\{{g}_{i}^{\left(1\right)},\ldots ,{g}_{i}^{\left(S\right)}\right\}\subseteq \{{g}_{i}\}$ under guide i are randomly sampled without replacement. As a baseline control, we also computed $W\left(c,j\right)=\frac{1}{N}\cdot {\Sigma }_{n=1}^{N}{W}_{1}({X}_{\{{c}^{\left(1\right)},\ldots {c}^{\left(S\right)}\},\;j},\,{X}_{\left\{c\right\},\;j})$ as the average 1-Wasserstein distance between randomly subsampled control cells and the full control cell set. The choice of bootstrapping cells is to mitigate the bias introduced by noticeable sample size differences across guides. We report the bootstrapped distances across perturbation guides and the control baseline with violin plots (Supplementary Fig. 3j,k). Highlighted guides were chosen from the aforementioned significant hits with absolute L2FC > 0.5 and Kolmogorov–Smirnov test P_adj < 0.05 under Benjamini–Hochberg correction.

Beta-binomial test

We define the data as X ∈ Z^{N ×2}, where X_i:= (n_i, k_i), where n_i denotes the number of cells affected by a given guide, and g_i, k_i denotes the number of cells exceeding a fixed threshold for a cell-specific continuous or discrete feature. We refer to these cells as ‘positive’ cells. Each guide, g_i ∈ [[G]], corresponds to either a control guide or a test guide defined by a mapping φ(g_i): [[G]] → {0, 1}, where φ(g_i) = 1 corresponds to a test guide. To construct a plausible null hypothesis, we fit a beta-binomial distribution to the control data, X_ctrl:= {X_i: φ(g_i) = 0}. We use a beta-binomial distribution because we assume the number of positive cells is independently and identically distributed according to a binomial distribution given the number of total cells for a guide. To account for overdispersion attributed to variability between control guides, we place a beta distribution on the success probability of the binomial distribution. For each condition the cells are placed in, we run a separate statistical test because we assume that the rate of positive cells is significantly impacted by the environment, and we would like to test for the significance of specific guides conditioned on the environment.

The model is as follows:

$${{\rm{p}}}_{{\rm{i}}}\sim {\rm{Beta}}({\rm{\alpha }},{{\beta }})$$

$${{\rm{k}}}_{{\rm{i}}}\sim {\rm{Binomial}}({{\rm{n}}}_{{\rm{i}}},\,{{\rm{p}}}_{{\rm{i}}}),$$

where the parameters α, β are inferred via a maximum likelihood estimation. Then, over the test data, X_test:= {X_i: φ(g_i) = 1}, we compute P values according to a two-tailed test. These P values are then adjusted using the Benjamini–Hochberg correction procedure to control for the FDR. Finally, we reject the null for any corrected P values falling below the significance level of 0.05. To ensure that the null hypothesis is plausible with respect to the control data, we check to see if the computed P values are uniformly distributed over the [0, 1] interval. The proportions, total number of cells and adjusted P values for each test guide are reported in Supplementary Table 7.

Statistics and reproducibility

Histograms were plotted with the histplot function, boxplots with the boxplot function, scatter plots and volcano plots with the scatterplot function, ECDF plots with the ecdfplot function and violin plots with the violinplot function in the seaborn package (0.11.1) in Python. Two-sided Mann–Whitney U-tests and Studentʼs t-tests were performed to test the difference between the distributions using the statannotations package in Python. *P < 0.05, **P < 0.01, ***P < 0.001, ***P < 0.0001. The difference in foci number distribution was tested by a two-sided Kolmogorov–Smirnov test using the ks.test function in R, and the P values were adjusted by the Benjamini–Hochberg method. *P_adj < 0.05, **P_adj < 0.01, ***P_adj < 0.001, ****P_adj < 0.0001. Guides with P_adj < 0.05 and an absolute L2FC > 0.5 were regarded as statistically significant. Fisherʼs exact test for gene enrichment was performed with the fisher.test function in R, and the P values were adjusted by the Benjamini–Hochberg method. Genes with P_adj < 0.05 were regarded as statistically significant. Heatmaps for optical feature correlations and hierarchical clustering of guides were generated with the clustermap function in the seaborn package (0.11.1) in Python, and the method ‘complete’ was used for clustering. Schematics were generated in BioRender. All representative images shown in this paper were repeated in at least three technical replicates with similar results, unless otherwise specified in the figure legends.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link