The pipeline has four parts, each of which provide different but related information on how in vitro systems match in vivo development.
- Transition mapping to a spatiotemporal atlas of human brain development: This method performs serialized differential expression between any two time points defined in the in vivo atlas (this is pre-calculated prior to your submission). Differential expression between the in vitro system you upload is also performed. Genes are ranked by their –log10(p-value) multiplied by the sign of effect to account for up-regulated or down-regulated genes. The similarity between in vitro and in vivo genes are compared using the rank rank hypergeometric overlap test (RRHO). This can be used to determine how well in vitro systems match in vivo, and the in vivo temporal transition which best reflects the in vitro transition. The actual time periods of development that correspond to the labeled periods are found in our paper and were originally defined in Table 1 of the in vivo atlas paper.
- Transition mapping to a laminar atlas of human brain transcriptomes: This method is similar to the above except that it uses a laminar atlas of human brain rather than a temporal atlas.
- Individual sample prediction of regional and temporal identity through the machine learning framework called CoNTExT: This method use a multi label multi class machine learning algorithm to predict at the individual sample level what the temporal and spatial identity of each sample is. All samples, even if they are not brain, will match to some brain region and temporal identity. You should use caution when interpreting any low matching samples. For a quantification of how much to trust CoNTExT based on transition mapping, see Figure S4 from our paper.
- Preservation of modules defined in a spatiotemporal atlas of human brain development: This method looks at how well preserved in vivo network architecture is within an in vitro system that you upload. This uses module preservation. Interpretation of the different modules is found in Table 1 and Figure 5 of the manuscript.
All four parts of the framework are complementary, but provide different information. Transition mapping provides a global look at transcriptomic matching between an in vitro system and in vivo development. It is mainly driven by differentiation, as evidenced by the GO ontologies of highest overlapping point. Although the input data to transition mapping is from cortex, the processes of synaptogenesis, neurogenesis and cell division are not specific to cortex so transition mapping should not be considered a regional identity tool. Using human in vivo cerebellar or striatal data as input to transition mapping, for example, still shows a high degree of in vitro matching with the cortical phNPCs. CoNTExT is a useful framework for individual sample prediction of temporal and regional identity. One strong caveat to its use is that the algorithm must classify a sample into one of the defined regions or time periods, even if it matches none of them (for example lung tissue). As such, its use should be based on a level of in vivo matching we define by simulation in Figure S4 of the paper. That said, CoNTExT is an incredibly powerful classification system that will allow regional and temporal identification at the individual sample level. Finally, though CoNTExT and transition mapping are good global evaluations they do not indicate the specific molecular pathways that are similar or different from in vivo development. Module preservation is a well-validated system to determine the specific functional processes that are preserved in vitro. For those processes not conserved, modifying the expression of key hub genes in the process through exogenous factors like small molecules or endogenous factors like gene expression may allow better in vivo modeling.