Psychometric and Validity Issues in Machine Learning Approaches to Personality Assessment: A Focus on Social Media Text Mining
Corresponding Author
Louis Tay
Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA
Correspondence to: Louis Tay, Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907, USA.
E-mail: stay@purdue.edu
Search for more papers by this authorSang Eun Woo
Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA
Search for more papers by this authorLouis Hickman
Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA
Search for more papers by this authorCorresponding Author
Louis Tay
Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA
Correspondence to: Louis Tay, Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907, USA.
E-mail: stay@purdue.edu
Search for more papers by this authorSang Eun Woo
Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA
Search for more papers by this authorLouis Hickman
Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA
Search for more papers by this authorLouis Tay and Sang Eun Woo contributed equally to the paper.
Abstract
In the age of big data, substantial research is now moving toward using digital footprints like social media text data to assess personality. Nevertheless, there are concerns and questions regarding the psychometric and validity evidence of such approaches. We seek to address this issue by focusing on social media text data and (i) conducting a review of psychometric validation efforts in social media text mining (SMTM) for personality assessment and discussing additional work that needs to be done; (ii) considering additional validity issues from the standpoint of reference (i.e. ‘ground truth’) and causality (i.e. how personality determines variations in scores derived from SMTM); and (iii) discussing the unique issues of generalizability when validating SMTM for personality assessment across different social media platforms and populations. In doing so, we explicate the key validity and validation issues that need to be considered as a field to advance SMTM for personality assessment, and, more generally, machine learning personality assessment methods. © 2020 European Association of Personality Psychology
Open Research
Open Research Badges
This article earned Open Data and Open Materials badges through Open Practices Disclosure from the Center for Open Science: https://osf.io/tvyxz/wiki. The data and materials are permanently and openly accessible at https://osf.io/cgpmz/?view_only=4a56e3fb9aa6476bb2b9b27273b4124d. Author's disclosure form may also be found at the Supporting Information in the online version.
Supporting Information
Filename | Description |
---|---|
per2290-sup-0001-Open_Practices_Disclosure_Form.pdfPDF document, 670 KB |
Supporting info item |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
- APA Publications and Communications Board Working Group on Journal Article Reporting Standards (2008). Reporting standards for research in psychology: Why do we need them? What might they be? The American Psychologist, 63, 839–851. https://doi.org/10.1037/0003-066X.63.9.839.
- Azucar, D., Marengo, D., & Settanni, M. (2018). Predicting the big 5 personality traits from digital footprints on social media: A meta-analysis. Personality and Individual Differences, 124, 150–159. https://doi.org/10.1016/j.paid.2017.12.018.
- Back, M. D., Stopfer, J. M., Vazire, S., Gaddis, S., Schmukle, S. C., Egloff, B., & Gosling, S. D. (2010). Facebook profiles reflect actual personality, not self-idealization. Psychological Science, 21(3), 372–374. https://doi.org/10.1177/0956797609360756.
- Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44, 1–26.
- Blackwell, D., Leaman, C., Tramposch, R., Osborne, C., & Liss, M. (2017). Extraversion, neuroticism, attachment style and fear of missing out as predictors of social media use and addiction. Personality and Individual Differences, 116, 69–72. https://doi.org/10.1016/j.paid.2017.04.039.
- Bleidorn, W., & Hopwood, C. J. (2019). Using machine learning to advance personality assessment and theory. Personality and Social Psychology Review, 23(), 1088868318772990, 190–203. https://doi.org/10.1177/1088868318772990.
- Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061.
- Brandtzæg, P. B., & Heim, J. (2009). Why people use social networking sites. In A. A. Ozok, & P. Zaphiris (Eds.), Online communities and social computing (pp. 143–152). Berlin Heidelberg: Springer.
10.1007/978-3-642-02774-1_16 Google Scholar
- Brunswik, E. (1956). Perception and the representative design of psychological experiments. Berkeley: University of California Press.
10.1525/9780520350519 Google Scholar
- Caspi, A., Begg, D., Dickson, N., Harrington, H., Langley, J., Moffitt, T. E., & Silva, P. A. (1997). Personality differences predict health-risk behaviors in young adulthood: Evidence from a longitudinal study. Journal of Personality and Social Psychology, 73, 1052–1063. https://doi.org/10.1037/0022-3514.73.5.1052.
- Cattell, R. B. (1957). Personality and motivation structure and measurement (1958-03918-000). Yonkers-on-Hudson, N.Y.: World Book Co.
- Celli, F., Pianesi, F., Stillwell, D., & Kosinski, M. (2013, June 28). Workshop on computational personality recognition: Shared task. Seventh International AAAI Conference on Weblogs and Social Media. Seventh International AAAI Conference on Weblogs and Social Media. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6190.
- Cloninger, C. R., Sigvardsson, S., & Bohman, M. (1988). Childhood personality predicts alcohol abuse in young adults. Alcoholism, Clinical and Experimental Research, 12, 494–505. https://doi.org/10.1111/j.1530-0277.1988.tb00232.x.
- Connolly, J. J., Kavanagh, E. J., & Viswesvaran, C. (2007). The convergent validity between self and observer ratings of personality: A meta-analytic review. International Journal of Selection and Assessment, 15, 110–117. https://doi.org/10.1111/j.1468-2389.2007.00371.x.
- Dahlke, J. A., & Wiernik, B. M. (2019). Psychmeta: An R package for psychometric meta-analysis. Applied Psychological Measurement, 43, 415–416. https://doi.org/10.1177/0146621618795933.
- Davenport, S. W., Bergman, S. M., Bergman, J. Z., & Fearrington, M. E. (2014). Twitter versus Facebook: Exploring the role of narcissism in the motives and usage of different social media platforms. Computers in Human Behavior, 32, 212–220. https://doi.org/10.1016/j.chb.2013.12.011.
- DeYoung, C. G. (2015). Cybernetic big five theory. Journal of Research in Personality, 56, 33–58. https://doi.org/10.1016/j.jrp.2014.07.004.
- Fleeson, W. (2007). Situation-based contingencies underlying trait-content manifestion in behavior. Journal of Personality, 75, 823–861.
- Fleeson, W., & Jayawickreme, E. (2015). Whole trait theory. Journal of Research in Personality, 523-76, 82–92. https://doi.org/10.1016/j.jrp.2014.10.009.
- Ford, M. T., Jebb, A. T., Tay, L., & Diener, E. (2018). Internet searches for affect-related terms: An Indicator of subjective well-being and predictor of health outcomes across US states and metro areas. Appled Psychology: Health and Well-Being, 10, 3–29. https://doi.org/10.1111/aphw.12123.
- Funder, D. C. (1995). On the accuracy of personality judgment: A realistic approach. Psychological Review, 102, 652–670. https://doi.org/10.1037/0033-295X.102.4.652.
- Funder, D. C. (2012). Accurate personality judgment. Current Directions in Psychological Science, 21, 177–182. https://doi.org/10.1177/0963721412445309.
- Garcia, D., & Sikström, S. (2014). The dark side of Facebook: Semantic representations of status updates predict the Dark Triad of personality. Personality and Individual Differences, 67, 69–74. https://doi.org/10.1016/j.paid.2013.10.001.
- Garcia, D., & Sikström, S. (2014). The dark side of Facebook: Semantic representations of status updates predict the Dark Triad of personality. Personality and Individual Differences, 67, 92–96.
- Gill, A. J., Oberlander, J., & Austin, E. (2006). Rating e-mail personality at zero acquaintance. Personality and Individual Differences, 40, 497–507. https://doi.org/10.1016/j.paid.2005.06.027.
- Golbeck, J. (2016). Predicting personality from social media text. AIS Transactions on Replication Research, 2, 1–10. https://doi.org/10.17705/1atrr.00009.
10.17705/1atrr.00009 Google Scholar
- Gosling, S. D., Augustine, A. A., Vazire, S., Holtzman, N., & Gaddis, S. (2011). Manifestations of personality in online social networks: Self-reported Facebook-related behaviors and observable profile information. CyberPsychology, Behavior & Social Networking, 14, 483–488. https://doi.org/10.1089/cyber.2010.0087.
- Haig, B. D. (2020). Big data science: A philosophy of science perspective. In S. E. Woo, L. Tay, & R. W. Proctor (Eds.), Big data in psychological research (pp. 15–33). Washington, DC: American Psychological Association. https://doi.org/10.1037/0000193-002.
10.1037/0000193-002 Google Scholar
- Hall, J. A., Pennington, N., & Lueders, A. (2014). Impression management and formation on Facebook: A lens model approach. New Media & Society, 16, 958–982. https://doi.org/10.1177/1461444813495166.
- Hampson, S. E., Goldberg, L. R., Vogt, T. M., & Dubanoski, J. P. (2007). Mechanisms by which childhood personality traits influence adult health status: Educational attainment and healthy behaviors. Health Psychology, 26, 121–125. https://doi.org/10.1037/0278-6133.26.1.121.
- Haridakis, P., & Hanson, G. (2009). Social interaction and co-viewing with YouTube: Blending mass communication reception and social connection. Journal of Broadcasting & Electronic Media, 53, 317–335. https://doi.org/10.1080/08838150902908270.
- Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 61–83. https://doi.org/10.1017/S0140525X0999152X.
- Hinds, J., & Joinson, A. (2019). Human and computer personality prediction from digital footprints. Current Directions in Psychological Science, 28, 204–211. https://doi.org/10.1177/0963721419827849.
- Hogan, R. (1983). A socioanalytic theory of personality. In M. M. Page (Ed.), Nebraska symposium on motivation 1982. Personality: Current theory and research (pp. 55–89). Lincoln: University of Nebraska press.
- Hopwood, C. J., & Donnellan, M. B. (2010). How should the internal structure of personality inventories be evaluated? Personality and Social Psychology Review, 14, 332–346. https://doi.org/10.1177/1088868310361240.
- Hughes, D. J., Rowe, M., Batey, M., & Lee, A. (2012). A tale of two sites: Twitter vs. Facebook and the personality predictors of social media usage. Computers in Human Behavior, 28, 561–569. https://doi.org/10.1016/j.chb.2011.11.001.
- Ivcevic, Z., & Ambady, N. (2012). Personality impressions from identity claims on Facebook. Psychology of Popular Media Culture, 1(1), 38–45. https://doi.org/10.1037/a0027329.
10.1037/a0027329 Google Scholar
- Jin, Y. (2006). Multi-objective machine learning 16. Berlin, Heidelberg: Springer Science & Business Media.
10.1007/3-540-33019-4 Google Scholar
- Kern, M. L., Park, G., Eichstaedt, J. C., Schwartz, H. A., Sap, M., Smith, L. K., & Ungar, L. H. (2016). Gaining insights from social media language: Methodologies and challenges. Psychological Methods, 21, 507–525. https://doi.org/10.1037/met0000091.
- Kluemper, D. H., & Rosen, P. A. (2009). Future employment selection methods: Evaluating social networking web sites. Journal of Managerial Psychology, 24, 567–580. https://doi.org/10.1108/02683940910974134.
- Kluemper, D. H., Rosen, P. A., & Mossholder, K. W. (2012). Social networking websites, personality ratings, and the organizational context: More than meets the eye? Journal of Applied Social Psychology, 42, 1143–1172. https://doi.org/10.1111/j.1559-1816.2011.00881.x.
- Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110, 5802–5805. https://doi.org/10.1073/pnas.1218772110.
- Lin, K.-Y., & Lu, H.-P. (2011). Why people use social networking sites: An empirical study integrating network externalities and motivation theory. Computers in Human Behavior, 27, 1152–1161. https://doi.org/10.1016/j.chb.2010.12.009.
- Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Mass: Addison-Wesley.
- Luchman, J. N., Bergstrom, J., & Krulikowski, C. (2014). A motives framework of social media website use: A survey of young Americans. Computers in Human Behavior, 38, 136–141. https://doi.org/10.1016/j.chb.2014.05.016.
- Marcus, B., Machilek, F., & Schütz, A. (2006). Personality in cyberspace: Personal web sites as media for personality expressions and impressions. Journal of Personality and Social Psychology, 90, 1014–1031. https://doi.org/10.1037/0022-3514.90.6.1014.
- McAbee, S. T., & Connelly, B. S. (2016). A multi-rater framework for studying personality: The trait-reputation-identity model. Psychological Review, 123(5), 569–591. https://doi.org/10.1037/rev0000035.
- McAdams, D. P. (1996). Personality, modernity, and the storied self: A contemporary framework for studying persons. Psychological Inquiry, 7, 295–321. https://doi.org/10.1207/s15327965pli0704_1.
- McCrae, R. R., Kurtz, J. E., Yamagata, S., & Terracciano, A. (2011). Internal consistency, retest reliability, and their implications for personality scale validity. Personality and Social Psychology Review, 15, 28–50. https://doi.org/10.1177/1088868310366253.
- McFarland, L. A., & Ployhart, R. E. (2015). Social media: A contextual framework to guide research and practice. Journal of Applied Psychology, 100, 1653–1677. https://doi.org/10.1037/a0039244.
- Meyer Foundation (2014). Social media platform comparison: Key channels, trends & features to inform your story sharing process. https://www.meyerfoundation.org/sites/default/files/files/SWT-Platform-Comparison-090414.pdf.
- Mischel, W., & Shoda, Y. (1995). A cognitive–affective system theory of personality: Reconceptualizing situations, dispositions, dynamics, and invariance in personality structure. Psychological Review, 102, 246–268. https://doi.org/10.1037/0033-295X.102.2.246.
- Moore, K., & McElroy, J. C. (2012). The influence of personality on Facebook usage, wall postings, and regret. Computers in Human Behavior, 28, 267–274. https://doi.org/10.1016/j.chb.2011.09.009.
- Nye, C. D., Roberts, B. W., Saucier, G., & Zhou, X. (2008). Testing the measurement equivalence of personality adjective items across cultures. Journal of Research in Personality, 42, 1524–1536. https://doi.org/10.1016/j.jrp.2008.07.004.
- Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H., et al. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108, 934–952. https://doi.org/10.1037/pspp0000020.
- Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Austin, TX: University of Austin Texas.
- Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology, 54, 547–577. https://doi.org/10.1146/annurev.psych.54.101601.145041.
- Pew Research Center. (2018). Social Media Use in 2018. Retrieved from https://www.pewresearch.org/internet/wp-content/uploads/sites/9/2018/02/PI_2018.03.01_Social-Media_FINAL.pdf.
- Qiu, L., Lin, H., Ramsay, J., & Yang, F. (2012). You are what you tweet: Personality expression and perception on Twitter. Journal of Research in Personality, 46, 710–718. https://doi.org/10.1016/j.jrp.2012.08.008.
- Roberts, B. W., Walton, K. E., & Viechtbauer, W. (2006). Patterns of mean-level change in personality traits across the life course: A meta-analysis of longitudinal studies. Psychological Bulletin, 132, 1–25. https://doi.org/10.1037/0033-2909.132.1.1.
- Rogers, K. H., & Biesanz, J. C. (2019). Reassessing the good judge of personality. Journal of Personality and Social Psychology, 117, 186–200. https://doi.org/10.1037/pspp0000197.
- Roulin, N., & Levashina, J. (2019). LinkedIn as a new selection method: Psychometric properties and assessment approach. Personnel Psychology, 72, 187–211. https://doi.org/10.1111/peps.12296.
- Saef, R., Woo, S. E., Carpenter, J., & Tay, L. (2018). Fostering socio-informational behaviors online: The interactive effect of openness to experience and extraversion. Personality and Individual Differences, 122, 93–98. https://doi.org/10.1016/j.paid.2017.10.009.
- Sajjadiani, S., Sojourner, A. J., Kammeyer-Mueller, J. D., & Mykerezi, E. (2019). Using machine learning to translate applicant work history into predictors of performance and turnover. The Journal of Applied Psychology, 104, 1207–1225. https://doi.org/10.1037/apl0000405.
- Sanford, F. H. (1942). Speech and personality. Psychological Bulletin, 39, 811–845. https://doi.org/10.1037/h0060838.
10.1037/h0060838 Google Scholar
- Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., … Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE, 8, e73791. https://doi.org/10.1371/journal.pone.0073791.
- Seidman, G. (2013). Self-presentation and belonging on Facebook: How personality influences social media use and motivations. Personality and Individual Differences, 54, 402–407. https://doi.org/10.1016/j.paid.2012.10.009.
- Sheldon, P., & Bryant, K. (2016). Instagram: Motives for its use and relationship to narcissism and contextual age. Computers in Human Behavior, 58, 89–97. https://doi.org/10.1016/j.chb.2015.12.059.
- Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., Bahník, Š., … Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1, 337–356. https://doi.org/10.1177/2515245917747646.
10.1177/2515245917747646 Google Scholar
- Solomon, B. C., & Vazire, S. (2016). Knowledge of identity and reputation: Do people have knowledge of others' perceptions? Journal of Personality and Social Psychology, 111, 341–366. https://doi.org/10.1037/pspi0000061.
- Srinivasan, P. (2020). Text mining: A field of opportunities. In S. E. Woo, L. Tay, & R. W. Proctor (Eds.), Big data in psychological research (pp. 179–199). Washington, DC: American Psychological Association. https://doi.org/10.1037/0000193-009.
10.1037/0000193-009 Google Scholar
- Stopfer, J. M., Egloff, B., Nestler, S., & Back, M. D. (2014). Personality expression and impression formation in online social networks: An integrative approach to understanding the processes of accuracy, impression management and meta-accuracy. European Journal of Personality, 28(1), 73–94. https://doi.org/10.1002/per.1935.
- Svendsen, G. B., Johnsen, J.-A. K., Almås-Sørensen, L., & Vittersø, J. (2013). Personality and technology acceptance: The influence of personality factors on the core constructs of the technology acceptance model. Behaviour & Information Technology, 32, 323–334. https://doi.org/10.1080/0144929X.2011.553740.
- Tellegen, A. (1991). Personality traits: Issues of definition, evidence, and assessment. In Thinking clearly about psychology: Essays in honor of Paul E. Meehl, Vol. 1: Matters of public interest; Vol. 2: Personality and psychopathology (pp. 10–35). Minneapolis: University of Minnesota Press.
- Tett, R. P., & Guterman, H. A. (2000). Situation trait relevance, trait expression, and cross-situational consistency: Testing a principle of trait activation. Journal of Research in Personality, 34, 397–423. https://doi.org/10.1006/jrpe.2000.2292.
- Tskhay, K. O., & Rule, N. O. (2014). Perceptions of personality in text-based media and OSN: A meta-analysis. Journal of Research in Personality, 49, 25–30. https://doi.org/10.1016/j.jrp.2013.12.004.
- van de Ven, N., Bogaert, A., Serlie, A., Brandt, M. J., & Denissen, J. J. A. (2017). Personality perception based on LinkedIn profiles. Journal of Managerial Psychology, 32(6), 418–429. https://doi.org/10.1108/JMP-07-2016-0220.
- Van Iddekinge, C. H., Lanivich, S. E., Roth, P. L., & Junco, E. (2016). Social media for selection? Validity and adverse impact potential of a Facebook-based assessment. Journal of Management, 42, 1811–1835. https://doi.org/10.1177/0149206313515524.
- Vazire, S., & Gosling, S. D. (2004). e-Perceptions: Personality impressions based on personal websites. Journal of Personality and Social Psychology, 87(1), 123–132. https://doi.org/10.1037/0022-3514.87.1.123.
- Villanti, A. C., Johnson, A. L., Ilakkuvan, V., Jacobs, M. A., Graham, A. L., & Rath, J. M. (2017). Social media use and access to digital technology in US young adults in 2016. Journal of Medical Internet Research, 19, e196. https://doi.org/10.2196/jmir.7303.
- Wilson, R. E., Gosling, S. D., & Graham, L. T. (2012). A review of Facebook research in the social sciences. Perspectives on Psychological Science, 7, 203–220. https://doi.org/10.1177/1745691612442904.
- Woo, S. E., Tay, L., Jebb, A., Ford, M. T., & Kern, M. (2020). Big data for enhancing measurement quality. Big Data in Psychological Research, Washington, DC: American Psychological Association. http://dx.doi.org/10.1037/0000193-004.
10.1037/0000193-004 Google Scholar
- Woo, S. E., Tay, L., & Proctor, R. (2020). Big Data in Psychological Research, Washington, DC: American Psychological Association.
10.1037/0000193-000 Google Scholar
Citing Literature
Special Issue:Behavioral personality science in the age of big data
September/October 2020
Pages 826-844