Targeting Item-level Nuances Leads to Small but Robust Improvements in Personality Prediction from Digital Footprints
Corresponding Author
Andrew N. Hall
Correspondence to: Andrew N. Hall, Department of Psychology, Northwestern University, Swift Hall 102, 2029 Sheridan Road, Evanston, IL 60208, USA. E-mail: ahall4488@gmail.com; andrewhall@u.northwestern.edu
Search for more papers by this authorSandra C. Matz
Columbia Business School, Columbia University, New York City, NY, USA
Department of Psychology, Northwestern University, Evanston, IL, USA
Search for more papers by this authorCorresponding Author
Andrew N. Hall
Correspondence to: Andrew N. Hall, Department of Psychology, Northwestern University, Swift Hall 102, 2029 Sheridan Road, Evanston, IL 60208, USA. E-mail: ahall4488@gmail.com; andrewhall@u.northwestern.edu
Search for more papers by this authorSandra C. Matz
Columbia Business School, Columbia University, New York City, NY, USA
Department of Psychology, Northwestern University, Evanston, IL, USA
Search for more papers by this authorAbstract
In the past decade, researchers have demonstrated that personality can be accurately predicted from digital footprint data, including Facebook likes, tweets, blog posts, pictures, and transaction records. Such computer-based predictions from digital footprints can complement—and in some circumstances even replace—traditional self-report measures, which suffer from well-known response biases and are difficult to scale. However, these previous studies have focused on the prediction of aggregate trait scores (i.e. a person's extroversion score), which may obscure prediction-relevant information at theoretical levels of the personality hierarchy beneath the Big 5 traits. Specifically, new research has demonstrated that personality may be better represented by so-called personality nuances—item-level representations of personality—and that utilizing these nuances can improve predictive performance. The present work examines the hypothesis that personality predictions from digital footprint data can be improved by first predicting personality nuances and subsequently aggregating to scores, rather than predicting trait scores outright. To examine this hypothesis, we employed least absolute shrinkage and selection operator regression and random forest models to predict both items and traits using out-of-sample cross-validation. In nine out of 10 cases across the two modelling approaches, nuance-based models improved the prediction of personality over the trait-based approaches to a small, but meaningful degree (4.25% or 1.69% on average, depending on method). Implications for personality prediction and personality nuances are discussed. © 2020 European Association of Personality Psychology
Open Research
Open Research Badges
This article earned Open Materials badge through Open Practices Disclosure from the Center for Open Science: https://osf.io/tvyxz/wiki. The materials are permanently and openly accessible at https://osf.io/3rdju/. Author's disclosure form may also be found at the Supporting Information in the online version.
Supporting Information
Filename | Description |
---|---|
per2253-sup-0001-Supplementary Material.docxWord 2007 document , 996.5 KB |
Figure S1. Difference in magnitude of Spearman correlations between predicted personality traits (nuance-model and trait-model) from Random Forest models and 11 outcomes. Higher values (blue in the above plot) indicate stronger correlation between item-level personality traits and the outcome than between aggregate-level personality traits and the outcome. Figure S2. Spearman correlations of predicted outcome value scores between self-reported personality models and nuance vs. trait models. Predicted nuance and trait values come from Random Forest models. All predicted values are the result of 5-fold cross-validation using standard multiple regression to predict the outcome variable. Blue points indicate nuance-model predictions correlate more strongly with self-reported predictions, while red dots indicate trait-model correlate more strongly with self-reported predictions. A line with slope m = 1 is included for reference, as points on this line would indicate equal prediction, while points above indicate nuance-models outperform and points below indicate trait-models outperform. Table S1. Raw Spearman correlations between predicted personality traits and observed external outcome values for both LASSO and Random Forrest results. Self-report column denotes the correlation of observed self-reported personality traits with outcomes. Table S2. RMSE values calculated between the predicted traits scores and actual trait scores for the LASSO (left) and Random Forest (right) models. The final column displays the absolute change between the two model types. |
per2253-sup-0002-Open_Practices_Disclosure_Form.pdfPDF document, 1 MB |
Supporting info item |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- Aigner, D. J., & Goldfeld, S. M. (1974). Estimation and prediction from aggregate data when aggregates are measured more accurately than their components. Econometrica, 42, 113. https://doi.org/10.2307/1913689.
- Azucar, D., Marengo, D., & Settanni, M. (2018). Predicting the Big 5 personality traits form digital footprints on social media: A meta-analysis. Personality and Individual Differences, 124, 150–159.
- Bischl, B., Mersmann, O., Trautmann, H., & Weihs, C. (2012). Resampling methods for meta-model validation with recommendations for evolutionary computation. Evolutionary Computation, 20, 249–275. https://doi.org/10.1162/EVCO_a_00069.
- Bleidorn, W., & Hopwood, C. J. (2019). Using machine learning to advance personality assessment and theory. Personality and Social Psychology Review, 190–203. https://doi.org/10.1002/9781119173489.ch2.
- Bleidorn, W., Hopwood, C. J., & Wright, A. G. (2017). Using big data to advance personality theory. Current Opinion in Behavioral Sciences, 18, 79–82. https://doi.org/10.1016/j.cobeha.2017.08.004.
- Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
- Costa, P. T., & McCrae, R. R. (1985). The NEO Personality Inventory manual. Psychological Assessment Resources.
- Costa, P. T., & McCrae, R. R. (1995). Domains and facets: Hierarchical personality assessment using the revised NEO Personality Inventory. Journal of Personality Assessment, 64, 21–50. https://doi.org/10.1207/s15327752jpa6401_2.
- Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S. (1985). The Satisfaction With Life Scale. Journal of Personality Assessment, 49, 71–75. https://doi.org/10.1207/s15327752jpa4901_13.
- Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33. https://doi.org/10.18637/jss.v033.i01.
- Funder, D. C. (2016). The personality puzzle ( 7th ed.). Norton & Co: W. W.
- Gladstone, J. J., Matz, S. C., & Lemaire, A. (2019). Can psychological traits be inferred from spending? Evidence from transaction data. Psychological Science, 30, 1087–1096.
- Goldberg, L. R. (1990). An alternative “description of personality”: The Big-Five factor structure. Journal of Personality and Social Psychology, 59, 1216–1229.
- Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In I. Mervielde, I. J. Deary, F. Fruyt, & F. Ostendorf (Eds.), Personality psychology in Europe (pp. 7–28), 7. Tilburg University Press.
- Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. G. (2006). The international personality item pool and the future of public-domain personality measures. Journal of Research in Personality, 40, 84–96. https://doi.org/10.1016/j.jrp.2005.08.007.
- Hendry, D. F., & Hubrich, K. (2005). Forecasting aggregates by disaggregates. 35.
- Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. The Annals of Applied Statistics, 2, 841–860. https://doi.org/10.1214/08-AOAS169.
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. Springer.
10.1007/978-1-4614-7138-7 Google Scholar
- John, O. P., Naumann, L. P., & Soto, C. J. (2008). Paradigm shift to the integrative Big Five trait taxonomy: History, measurement, and conceptual issues. In O. P. John, & R. W. Robins (Eds.), Handbook of personality: Theory and research ( 2nd ed.). Guilford Press.
- John, O. P., & Srivastava, S. (1999). The big five trait taxonomy: History, measurement, and theoretical perspective. In L. Pervin, & O. P. John (Eds.), Handbook of personality: Theory and research ( 2nd ed.). Guilford Press.
- Kosinski, M., Matz, S. C., Gosling, S. D., Popov, V., & Stillwell, D. (2015). Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. American Psychologist, 70, 543–556. https://doi.org/10.1037/a0039210.
- Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110, 5802–5805. https://doi.org/10.1073/pnas.1218772110.
- Kosinski, M., Wang, Y., Lakkaraju, H., & Leskovec, J. (2016). Mining big data to extract patterns and predict real-life outcomes. Psychological Methods, 21, 493–506. https://doi.org/10.1037/met0000105.
- Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28, 1–26. https://doi.org/10.18637/jss.v028.i05.
- Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., Christakis, N., … Van Alstyne, M. (2009). Computational social science. Science, 323, 721–723. https://doi.org/10.1126/science.1167742.
- Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2, 5.
- Marcus, B., Machilek, F., & Schütz, A. (2006). Personality in cyberspace: Personal web sites as media for personality expressions and impressions. Journal of Personality and Social Psychology, 90, 1014–1031. https://doi.org/10.1037/0022-3514.90.6.1014.
- Matz, S. C., & Netzer, O. (2017). Using Big Data as a window into consumers' psychology. Current Opinion in Behavioral Sciences, 18, 7–12. https://doi.org/10.1016/j.cobeha.2017.05.009.
- McCrae, R. R. (2014). A more nuanced view of reliability: Specificity in the trait hierarchy. Personality and Social Psychology Review, 19, 17.
- Mõttus, R., Kandler, C., Bleidorn, W., Riemann, R., & McCrae, R. R. (2017). Personality traits below facets: The consensual validity, longitudinal stability, heritability, and utility of personality nuances. Journal of Personality and Social Psychology, 112, 474–490. https://doi.org/10.1037/pspp0000100.
- Mõttus, R., Sinick, J., Terracciano, A., Hřebíčková, M., Kandler, C., Ando, J., Mortensen, E. L., … Jang, K. L. (2018). Personality characteristics below facets: A replication and meta-analysis of cross-rater agreement, rank-order stability, heritability, and utility of personality nuances. Journal of Personality and Social Psychology. https://doi.org/10.1037/pspp0000202.
- Ozer, D. J., & Benet-Martínez, V. (2006). Personality and the prediction of consequential outcomes. Annual Review of Psychology, 57, 401–421. https://doi.org/10.1146/annurev.psych.57.102904.190127.
- Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H., et al. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108, 934–952. https://doi.org/10.1037/pspp0000020.
- Revelle, W. (2019). psych: Procedures for personality and psychological research (Version 1.9.12) [R]. Northwestern University. https://CRAN.R-project.org/package=psych
- Schwartz, S. H. (1992). Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In Advances in experimental social psychology (Vol. 25, pp. 1–65). Elsevier. https://doi.org/10.1016/S0065-2601(08)60281-6
- Seeboth, A., & Mõttus, R. (2018). Successful explanations start with accurate descriptions: Questionnaire items as personality markers for more accurate predictions. European Journal of Personality, 32, 186–201. https://doi.org/10.1002/per.2147.
- Segalin, C., Celli, F., Polonio, L., Kosinski, M., Stillwell, D., Sebe, N., Cristani, M., et al. (2017). What your Facebook profile picture reveals about your personality. Proceedings of the 2017 ACM on Multimedia Conference – MM'17, 460–468. https://doi.org/10.1145/3123266.3123331.
10.1145/3123266.3123331 Google Scholar
- Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2015). Specification curve: Descriptive and inferential statistics on all reasonable specifications. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2694998.
10.2139/ssrn.2694998 Google Scholar
- Soto, C. J. (2019). How replicable are links between personality traits and consequential life outcomes? The Life Outcomes of Personality Replication Project. Psychological Science, 30, 711–727. https://doi.org/10.1177/0956797619831612.
- Stachl, C., Au, Q., Schoedel, R., Buschek, D., Völkel, S., Schuwerk, T., Oldemeier, M., … Bühner, M. (2019). Behavioral patterns in smartphone usage predict Big Five personality traits. PsyArXiv. https://doi.org/10.31234/osf.io/ks4vd.
- Stachl, C., Pargent, F., Hilbert, S., Harari, G. M., Schoedel, R., Vaid, S., Gosling, S. D., et al. (2019). Personality research and assessment in the era of machine learning. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/efnj8.
- Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 7, 91 https://doi.org/10.1186/1471-2105-7-91.
- De Winter, J. C. F., Gosling, S. D., & Potter, J. (2016). Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological Methods, 21, 273–290. https://doi.org/10.1037/met0000079.
- Wright, M., & Ziegler, A. (2017). Ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77, 1–17. https://doi.org/10.18637/jss.v077.i01.
- Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12, 1100–1122. https://doi.org/10.1177/1745691617693393.
- Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112, 1036–1040. https://doi.org/10.1073/pnas.1418680112.
Citing Literature
Special Issue:Behavioral personality science in the age of big data
September/October 2020
Pages 873-884