Genome-wide analyses of empathy and systemizing: correlations with psychiatric conditions, psychological traits, and education

212 words. Article: 3,537 words. Tables: 1. Figures: 2. Supplementary material: 3


Introduction
Empathy is the ability to identify other people's thoughts, intentions, desires, and feelings, and to respond to other's mental states with an appropriate emotion (1). It plays an important role in social interaction and is a key component of both prosocial behaviour and social cognition. Differences in empathy have been found in several psychiatric conditions, including autism (1), bipolar disorder (2), schizophrenia (3-5), and major depressive disorder (2, 6, 7). Empathy is modestly heritable (8-10), and a few candidate gene association studies have investigated the role of various genes in empathy (11)(12)(13).
Systemizing is the drive to identify patterns to understand and build rule-based systems (14). Elevated systemizing has been identified in autism (14,15). Individuals with autism, on average, tend to score higher than typical controls on tests of systemizing, including the Systemizing Quotient-Revised (SQ-R) (15). A few studies have investigated systemizing in other psychiatric traits and conditions, including schizotypy (16) and anorexia nervosa (17).
Because empathy and systemizing contribute to strengths and difficulties in several psychiatric conditions, they are important phenotypes for investigation. Understanding the biological networks that underlie these traits may help us understand how they contribute to psychiatric phenotypes, an approach that has been used for other traits such as neuroticism (18) creativity (19), and cognitive ability (20). Both empathy and systemizing show marked sex differences, in opposite directions: there is a male advantage in systemizing (15) and a female advantage in empathy (1). These sex differences are thought to contribute to the high proportion of males diagnosed with autism, although this could also in part reflect diagnostic practice (21). The sex difference in empathy and systemizing may also relate to the higher proportion of males in science, technology, engineering and mathematics (STEM) (which are inanimate object-centred or abstract sciences) and the higher proportion of females in medicine, veterinary sciences and psychology (which are animate-or people-centred sciences) (22). Both traits are correlated in opposite directions with levels of prenatal testosterone (23,24), which itself is produced in higher levels in males, and which changes brain development and gene expression.
In this study, we investigate the genetic correlates of empathy and systemizing and how they contribute to risk for various psychiatric conditions. We performed sex-stratified and non-stratified genome-wide association analyses of empathy and systemizing in 23andMe, a personalized genetics company. We calculated the narrow sense heritability explained by all the SNPs tested, and investigated sex differences. We conducted genetic correlation analyses with five psychiatric conditions, six psychological traits, and measures of educational attainment.

Genome-wide association analyses
To understand the genetic architecture of empathy and systemizing, we collaborated with 23andMe to conduct a Genome Wide Association Study (GWAS) of empathy (n = 46,861) and systemizing (n = 51,564). A flow chart of the study protocol is provided in Supplementary Figure 1. We employed two widely used self-report measures to quantify the traits: the Empathy Quotient (EQ) for empathy (1) and the revised Systemizing Quotient (SQ-R) for systemizing (15). The mean score for all participants was 46.4±13.7 on a total of 80 on the EQ, and 71±21 on a total of 150 on the SQ-R. Males scored higher than females on the SQ-R (76.5±20 in males; 65.4±20.6 in females), and vice-versa on the EQ (41.9±13.5 in males, 50.4±12.6) (Figure 1a and b). For each trait, we conducted three GWAS analyses: a male-only analysis, a female-only analysis, and a non-stratified analysis, using a linear regression model with age, and the first four ancestry principal components as covariates (Online Methods). We corrected for the three different tests for each trait and used a conservative threshold of P = 1.66x10 -8 . We did not identify any genome wide significant   (insert Figure 1 here)

Sex differences
Sex differences in empathy and systemizing (15,21) may reflect genetic as well as non-genetic factors (such as prenatal steroid hormones, and postnatal learning) (26). In our dataset, there was significant female advantage on the EQ (P < 2x10 -16 Cohen's d = 0.65), and a significant male advantage on the SQ-R (P < 2x10 -16 ; Cohen's d = 0.54) (Figure 1a and b). To investigate the biological basis for the sex-difference observed in the traits, we investigated the heritability of the sex-stratified GWAS analyses for both the traits. Our analyses revealed no significance difference between the heritability in the males-only and the females-only datasets for the two traits (P = 0.48 for male-female difference in EQ, and P = 0.34 for male-female difference in SQ-R) (Figure 1c and d; Supplementary Table 6).
Additionally, there was a high genetic-correlation between the males-only and females-only GWAS for both the traits (EQ correlation = 0.82±0.16; P = 2.34x10 -7 ; SQ-R correlation = 1±0.17; P = 3.91x10 -10 ), indicating a high degree of similarity in the genetic architecture of the traits in males and females. However, there was significant heterogeneity in the effect estimates of the top SNPs (P < 1x10 -6 ). (See supplementary Figure 18 and 19 and Supplementary Table 9).

Genetic correlations
To investigate how the two traits correlate with psychiatric conditions, psychological traits and educational attainment, we performed genetic correlation (Online Methods) with five psychiatric conditions (autism, ADHD, anorexia nervosa, bipolar disorder, depression (major depressive disorder and the larger depressive symptoms dataset) and schizophrenia), six psychological traits (NEO-extraversion, NEO-openness to experience, NEOconscientiousness, neuroticism, and subjective wellbeing) and educational attainment (a proxy measure of IQ, measured using years of schooling) (Supplementary Table 7). With psychiatric conditions, three correlations were significant after correcting for multiple comparisons: EQ-schizophrenia (r g = 0.19±0.04; P _FDRadjusted = 1.84x10 -4 ), SQ-Rschizophrenia (r g = 14±0.04; P_ FDRadjusted = 8.64x10 -3 ) and EQ-anorexia nervosa (r g = 0.32±0.09; P_ FDRadjusted = 4.05x10 -3 ). As anorexia nervosa is primarily diagnosed in women, we wanted to test if the correlation is driven by the presence of higher female cases in the anorexia nervosa GWAS (68% females and 32% males) that could simply reflect higher EQ in the anorexia dataset rather than pleiotropy. We conducted correlation analyses using the females-only EQ (EQ-F) dataset and the anorexia nervosa dataset. The EQ-F-anorexia correlation was highly significant (r g = 0.48±0.12; P = 8.46x10 -5 ), confirming the underlying pleiotropy between empathy and anorexia nervosa. The EQ-M anorexia correlation was considerably lower but significant (r g = 0.15±0.11; P = 0.17).

Discussion
This is the first GWAS to investigate the genetic architecture of empathy and systemizing. We identified several significant genetic correlations with EQ and SQ-R and psychiatric conditions, psychological traits, and education, providing insights into the shared genetic architecture. Although we did not identify any significant SNPs after correcting for multiple testing, we identified four SNPs with P < 5x10 -8 . Males and females perform differently on the tests, and increasing the sample size is likely to detect SNPs that contribute in a sex-specific manner to both the traits.
We identified significant positive genetic correlations for EQ with schizophrenia and anorexia nervosa, and for SQ-R with schizophrenia. The empirical literature about empathy and schizophrenia in general report deficits in cognitive empathy (3, 27), but preserved or stronger affective empathy (5, 27) and emotional contagion (5) in individuals with schizophrenia compared to controls. The EQ measures both affective and cognitive components of empathy, and our results suggest that a subset of genetic variants associated with empathy also increase the risk for schizophrenia and anorexia nervosa; the latter remained significant after using the females-only EQ dataset. It is also worth noting that schizophrenia and anorexia share significant positive genetic correlation (rg = 0.23±0.06), and it is possible that the pleiotropy between these two conditions may be mediated by genetic variants that contribute to empathy. A study from our group also identified a significant genetic correlation between cognitive empathy and anorexia nervosa, underscoring the importance of empathy as a genetic risk factor in anorexia nervosa (28). A few studies have identified a correlation between empathy and psychosis (29,30). To our knowledge, no study has identified a correlation between systemizing and schizophrenia.
LDHub reports a positive genetic overlap with educational attainment (years of schooling), but a recent study using measures of cognitive aptitude in the UKBiobank identifies a negative, significant correlation between schizophrenia and general cognitive aptitude (31).
Investigating using other measures of systemizing such as STEM aptitude will help validate the results.
Investigating genetic correlation results with psychological traits and measures of cognition further helped elucidate the genetic architecture of the two traits. The EQ was significantly correlated with subjective wellbeing, extraversion and conscientiousness. Both extraversion and conscientiousness contribute to empathy (32) which, in turn, contributes to subjective wellbeing (33). The SQ-R, on the other hand, had significant positive genetic correlations with different measures of cognition and NEO-openness to experience, which are genetically (r g = 0.40± 0.094; P = 1.79x10 -05 ) (31,34) and phenotypically correlated with each other (35,36). Our research suggests a common underlying genetic architecture that contributes to the correlation between the three traits. Interestingly, both NEO-Openness to experience and educational attainment (r g = 0.342±0.06 P = 2.49x10 -07 ) (34) are correlated with autism (rg = 0.42±0.13, P = 0.002) (34). The positive correlation between SQ-R and autism was not significant perhaps due to being underpowered..
Both the EQ and the SQ-R are self-report and it is unclear how much of the intrinsic biological variation in these traits is captured by the two tests. The EQ has three primary underlying factors (cognitive empathy, emotional reactivity and social skills) (37), whereas the SQ-R has one underlying factor (38). In this study, we were unable to stratify based on each factor and heritabilities and genetic correlations are for the broad trait rather than for specific factors. These subtleties must be kept in mind when interpreting the results. This is also the first study to provide estimates of additive heritability explained by all the SNPs tested for the two traits. Approximately 11% of variance in systemizing was explained by SNPs. To our knowledge, there is no study examining heritability of the SQ-R in twins. One study, investigating the heritability of the reduced EQ (18 items) in 250 twin pairs identified a heritability of 0.32(10). The literature on the heritability of empathy and prosociality is inconsistent, with heritability estimates ranging from 0.20 (8) to 0.69 (39), although a meta-analysis of different studies identified a heritability estimate of 0.35 (95% CI -0.21 -0.41) (40). Our analysis therefore suggests that a third of the heritability can be attributed to common genetic variants. Like IQ (41), the heritability of empathy and prosociality behaviour changes with age (8). Our study did not find any significant differences in heritability between males and females. Further, the male-female genetic correlations for both the traits were high. However, the most significant SNPs (P< 1x10 -6 ) might have sex-specific contributions to the traits, since SNPs that are significant in one sex are not or are much less significant in the other sex (P > 0.05).
In conclusion, the current study provides the first narrow sense heritability for empathy and systemizing. While there is a very significant difference in scores on the two questionnaires between males and females, heritability is similar, with a high genetic correlation between the sexes. We also identified significant genetic correlations between empathy, systemizing and several psychiatric conditions, psychological traits, and measures of cognition. This global view of the genomic architecture of empathy and systemizing will allow us to better understand psychiatric conditions, but also to improve our knowledge of the biological bases of neurodiversity and brain evolution in humans.

Methods
Participants: Participants were customers of 23andMe, Inc. a personal genetics company and are described in detail elsewhere (42,43). All participants provided informed consent and answered surveys online according to a human subjects research protocol, which was reviewed and approved by Ethical & Independent Review Services, an AAHRPP-accredited private institutional review board (http://www.eandireview.com). All participants completed the online version of the questionnaire on the 23andMe participant portal. The number of participants and participant overlap is provided in Table 1. Only participants who were primarily of European ancestry (97% European Ancestry) were selected for the analysis using existing methods (44). Unrelated individuals were selected using a segmental identity-bydescent algorithm(45). Measures: Two online questionnaires were used in this study. The first, the Empathy Quotient (EQ) (1), is a self-report measure of empathy, and includes items relevant to both cognitive and affective empathy. It comprises 64 questions and has a good test-retest reliability (37). In this study, participants scored a maximum of 80 and a minimum of 0. The second measure is the Systemizing Quotient-Revised (SQ-R), which is self-report measure of systemizing drive, or interest in rule-based patterns (15). SNPs (including Insertion/Deletion or InDels) were genotyped across all platforms.
Genotyped SNPs were filtered for quality control. Imputation was performed using the September 2013 release of the 1000 Genomes Project Phase 1reference haplotype phased using ShapeIt2. SNPs present only on platform V1, or in chromosome Y and mitochondrial chromosomes were excluded due to small sample sizes and unreliable genotype calling respectively. Next, using trio data, where available, SNPs that failed a parent offspring transmission test were excluded. SNPs were also excluded if they failed the Hardy-Weinberg Equilibrium Test at P < 10 -20 , or had a genotype rate of less than 90%. Phasing was performed by 23andMe using an internal tool called Finch which implements the Beagle haplotype graph-based phasing algorithm. SNPs were excluded if they were not in Hardy-Weinberg equilibrium (P < 10 -20 ), had a call rate less than 95%, or had discrepancies in allele frequency compared to the reference European 1000 Genomes data (chi-squared P < 10 -15 ).
Imputation was performed using Minimac25 using the September 2013 release of the 1000 Genomes Phase 1 reference haplotypes phased using Beagle4 (V3.3.1). For imputation, chromosomes were divided into segments of 10,000 genotyped SNPs, with overlaps of 200 1 0 SNPs, and imputed against all-ethnicity 1000 Genomes haplotypes (excluding monomorphic and singleton sites) using Minimac25, using 5 rounds and 200 states for parameter estimation. We restricted the analyses to only SNPs that had a minor allele frequency (MAF) of at least 1%. After quality control, 9,955,952 SNPs were analysed. Genotyping, imputation, and preliminary quality control were performed by 23andMe.
Genetic association: We performed a linear regression assuming an additive model of genetic effects. Age and sex along with the first five ancestry principal components were included as covariates. Additionally, for each trait, we performed a male-only and a female-only linear regression analysis to identify sex-specific loci. Since we were performing three tests for each trait (male-only, female-only, and males and females combined with sex as a covariate), we used a threshold of P <1.66x10 -8 to identify significant SNPs for each trait. Leading SNPs in each loci were identified after pruning for LD (r 2 > 0.8) using SNAP (46). We calculated the variance explained by the top SNPs using the following formula: R2g|c/(1 − R2c) = (t2/((n-k-1)+t2))x100 R2g|c/(1 − R2c) is the proportion of variance explained by the SNP after accounting for the effects of the covariates (four ancestry principal components, age, and, additionally, sex for the non-stratified analyses), t is the t-statistic of the regression co-efficient, k is the number of covariates, and n is the sample size.
Genomic inflation factor, heritability, and functional enrichment: We used Linkage Disequilibrium Score regression coefficient (LDSC) to calculate genomic inflation due to population stratification (25) (https://github.com/bulik/ldsc). The intercept for the SQ-R GWAS was 0.998 and the intercept for the EQ GWAS was 0.993 indicating that there was no unaccounted population stratification. Heritability and genetic correlation was performed using extended methods in LDSC (47). Difference in heritability between males and females was quantified using: where Z diff is the Z score for the difference in heritability for a trait, (h 2 males -h 2 females ) is the difference SNP heritability estimate in males and females, and SE is the standard errors for heritability. Two-tailed P-values were calculated, and reported as significant if P < 0.05. We identified enrichment in genomic functional elements for the traits using extended methods in LDSC (48). 1 1 Genetic correlations: LDSC was also used to calculate genetic correlations. We restricted our analyses to only the non-stratified GWAS dataset due to the unavailability of sex-stratified GWAS data in the phenotypes investigated. We calculated initial genetic correlations using LD Hub (34) for schizophrenia (49), autism, ADHD, bipolar disorder, major depressive disorder, depressive symptoms, educational attainment (years of schooling.), NEO-Openness to experience, NEO-Conscientiousness, subjective wellbeing, and neuroticism. For anorexia nervosa (50), we used the latest PGC data freeze (https://www.med.unc.edu/pgc/results-anddownloads) which has more cases than the dataset available on LD Hub. In addition, we conducted also genetic correlation for extraversion (51) separately as the data was unavailable on LD Hub. For the anorexia and the extraversion analyses, the North West European LD scores were used and the intercepts were not constrained as the extent of participant overlap was unknown. We report significant lists if they have a Benjamini-Hochberg based FDR q < 0.05. FDR based correction was conducted for all the genetic correlations analyses combined (i.e. the genetic correlations for EQ and SQ-R combined) except for the quasi-replication analyses described next. In addition, we conducted quasi replication analyses of the systemizing -educational attainment genetic correlation using two additional measures of cognition: college completion and childhood cognition. These were conducted using LDHub and we did not include them in the FDR correction. For anorexia nervosa, we also conducted genetic correlation analyses using the sex-stratified EQ dataset as the participants in the anorexia nervosa dataset were predominantly females (68% females).
A positive genetic correlation indicates that variants that contributes to either EQ or SQ-R increase the risk for conditions like schizophrenia and autism, whereas a negative genetic correlation indicates a decrease in risk.
Gene-based analysis: Gene based analyses for the non-stratified GWAS were performed using MetaXcan (52) using tissue weights derived from the GTEx project (53). MetaXcan uses summary GWAS data and tissue eQTL information derived from the GTEx project to impute gene expression levels for each gene. This allows for tissue specific prioritization of genes. For each trait, we ran gene-based analysis for ten neural tissues: anterior cingulate cortex (BA24), caudate basal ganglia, cerebellar hemisphere, cerebellum, cortex, frontal cortex (BA9), hippocampus, hypothalamus, nucleus accumbens basal ganglia, and putamen basal ganglia. We removed genes where there were 0 SNPs from our dataset, and genes that correlated poorly with predicted models of gene-expression (R 2 < 0.01) as implemented in the software. We report significant lists if they have an FDR < 0.05. We confirmed that LD,