I S K O |
Encyclopedia of Knowledge Organization |
home about ISKO join ISKO Knowledge Organization journal ISKO events ISKO chapters ISKO people SciTech Adv. Council ISKO publications Encyclopedia KO literature KO institutions ⇗ KOS registry 🔒 members contact us |
edited by Birger Hjørland and Claudio Gnoli
Content analysis
1. IntroductionContent analysis (CA) is a research methodology, or a family of methods, used for exploring patterns of words or phrases (or more generally of signs, e.g. in pictures) in a sample of → documents. Roberts (2015, 769) described the field as being about methods that have been “developed to draw inferences about large corpuses, or populations, of texts”. Content analysis can, for example, reveal cultural bias or gender bias in texts, map themes in scholarly domains, or identify demanded qualifications in job advertisements and thus be used to explore the world outside the documents, i.e., trends in the job market. The present article is not intended for people interested in applying CA as a research method, but is about distinguishing different ways of analyzing the contents of documents and about clarifying some conceptual confusion in the use of the term content analysis. Those readers who are interested in applying CA as a research method may consult standard textbooks such as Krippendorff (2018). CA is one among many methods used for analyzing messages or documents. Roberts (2015, 769) pointed out that this term does not involve “analysis”, but “measurement”: Content analysis is a class of techniques for mapping symbolic data into a data matrix suitable for statistical analysis. These techniques may be applied to any representative sample of cultural artifacts (e.g., books, paintings, technological innovations, etc.), whereby various nonnumeric attributes of these artifacts are mapped into a matrix of statistically manipulable symbols. Thus, content analysis involves measurement, not ‘analysis’ in the usual sense of the word. Neuendorf (2017, 11-18) considered it a myth that the term content analysis applies to all ways of examining documents. He rather stated that the truth is: “The term does not apply to every analysis of messages—only those investigations that meet a particular definition“. Further (2017, 11): There are many forms of analysis—from frivolous to seminal—that may be applied to the human production of messages. Content analysis is only one type, a technique presented by this book as systematic and quantitative. Even in the scholarly literature, some contestation exists as to what may be called a content analysis. On a number of occasions, the term has been applied erroneously. Neuendorf found it important to distinguish between CA and other ways of analyzing documents, including qualitative methods such as rhetorical analysis, narrative analysis, discourse analysis, structuralist or semiotic analysis, interpretative analysis, conversation analysis, critical analysis, and normative analysis. It seems that CA started out as a purely quantitative approach, but has developed to include more qualitative and “humanistic” approaches; however, in this process, CA has become more difficult to define in relation to other qualitative approaches, such as those listed above [1]. Scheufele (2015a, 111) wrote: In a wider sense, the term ‘qualitative content analysis’ subsumes quite different and various methods and techniques of analyzing text material qualitatively or hermeneutically. Examples are → grounded theory or → discourse analysis. In a more narrow sense, qualitative content analysis is a label for a specific type of qualitative text analysis that was developed by Philipp Mayring (2002). It is important with Neuendorf to realize that CA “does not apply to every analysis of messages”. This article argues that, for example, CA, subject analysis, concept analysis, literary criticism and other terms should not be confused. Why is this important? Are these terms not all about analysis of some kind of meaningful material? The justifications for not confusing these terms are:
The point is that processes like CA, concept analysis and subject analysis are different processes, requiring different methods, although they are sometimes confused, and although they may sometimes learn from each other, and the literatures on approaches such as subject analysis, concept analysis, and subject analysis are under discussion and development. This does not imply that these kinds of analysis should be considered closed. The debate about them should be considered open, but this does not justify conceptual confusion. Even if one may found the literature on a topic unsatisfactory, it is important to improve its literature rather than to confuse the terminology (see also Hjørland 2024, Section 7, about developing disciplines rather than confusing them). CA is a research methodology used to make empirical studies of the contents of documents (whereas subject analysis is not a research method, but a method used to characterize documents in order to facilitate their findability). CA is used in many disciplines, including communication studies, public health, educational research, library and information science (LIS), psychology and business studies (whereas subject analysis is primarily connected to LIS). CA is mostly applied to written materials, but it is considered to be about text in the broad meaning of this term [2]. Krippendorff (2018, 19) thus exemplify types of documents, which may be analyzed: “works of art, images, maps, sounds, signs, symbols, and even numerical records may be included as data—that is, they may be considered as text—provided that they speak to someone about phenomena outside of what can be sensed or observed”. A deeper analysis of the term content analysis must include an analysis of the term content, but this is only briefly discussed here and in Section 3, awaiting a special article devoted to this concept. See also discussion on → Table of Contents (Hjørland 2022). Content is usually understood as something contained in a container, and is often associated with the form-content dualism, but this is a philosophical issue with a rather comprehensive literature. Of relevance for this article is the question whether content can be explored apart from its container or, as in McLuhan’s (1964) slogan, “the medium is the message” (see, e.g., Robertson 1967a; 1967b; Harder 1967). In the next section, we shall discuss some uses of the term CA which we find misleading. Although there are different definitions and methods of this term, there is a distinct interdisciplinary literature about it, dominated by communication studies. Among the examples of content analytic studies from LIS, the following can be mentioned: Chu 2015; Tuomaala, Järvelin and Vakkari 2014; White 1999; White and Marsh 2006; Yoon and Schultz 2017). 2. Some confusions of content analysis with other termsIn her book Essential Library of Congress Subject Headings, Broughton (2012) titled Chapter 6 “Content analysis”. In it, she wrote (2012, 65): Before you can do that [index or classify a document] it is necessary to decide what the item being catalogued is about. Whatever system of subject headings (or classification scheme or thesaurus) is being used to describe a document, you should try initially to make an independent assessment of what the subject of that document is. In practice, you will almost certainly be unable to represent this exactly using the artificial language of your system, but you should at least begin by deciding objectively what it is you want to express. This process may be called 'subject analysis', or 'document analysis', 'content analysis' or 'concept analysis'. The subject content of items is sometimes also referred to more grandly as 'intellectual content' or 'semantic content', but these are simply other ways of defining what a document is about. This quote thus considered the following terms as synonyms: aboutness analysis, subject analysis, document analysis, content analysis, concept analysis, “analysis of intellectual content” or “of semantic content”. As stated in Section 1, we do not consider all these terms synonymous. Below we will consider these terms, as well as additional terms sometimes confused with content analysis. As Broughton’s quote use of “content analysis” is here understood as a misnomer for “subject analysis”, let us consider this last term first, followed by the other terms used in the quote, and then by a few additional concepts:
More concepts could be considered, including discourse analysis, genre analysis, media analysis, picture analysis, and text analysis, but these terms will not be discussed here. We conclude this section by stating that although content analysis is a fairly well understood concept in the literature in which textbooks such as Krippendorff (2018) providing a fine overview and a good understanding, it is often confused with other terms. 3. Epistemological issuesEpistemological issues already are apparent in definitions of the term content analysis. Krippendorff (2018, 24) suggested the following definition: Content analysis is a research technique for making reliable and valid inferences from texts (or other meaningful matter) to the context of its use. Krippendorff provided this definition after having considered other conceptions of this term. Berelson (1952, 18) defined content analysis as “a research technique for the objective, systematic and quantitative description of the manifest content of communication”. Krippendorff (2018, 25-26) discussed this definition and argued:
Berelson’s definition may be understood as part of a behaviorist tradition, which by Macnamara (2018) based on Shoemaker and Reese (1996, 31f) contrasts with a humanistic tradition in CA [11]. Krippendorff (2018, 25) found that three basic kinds of definitions of content analysis have been provided in the literature:
Krippendorff (2018, 27-31) presented the following six statements:
These points are not without implications for the methodologies used to perform CA. As Roberts (2015, 769) argues: After having acknowledged that language is not a neutral medium through which ‘content’ is unambiguously transmitted, researchers must make four key decisions: Are words only to be counted or are word relations to be encoded? Are word relations to be depicted as network characteristics or as variants of a semantic grammar? Do researchers presume they know more or less than the texts’ sources regarding the latter’s words? Are the texts under analysis being viewed as windows into sources’ perspectives or, alternatively, into events experienced by sources? Only once these decisions are made, can one identify the types of research questions afforded by content analysis. 4. ConclusionCA is first and foremost a research methodology, whereas subject analysis, for example, is a practical activity in library and information contexts, as well as a broader, daily activity implying statements of what meaningful things are about and labeling things in ways that provide clues to their uses. The main message of this article has been to argue against terminological pollution and provide arguments for which terms should be considered synonymous, and which should not. This does not indicate, that relevant lessons cannot be learned from different concepts. It is striking, for example, that CA and subject analysis have both developed from a phase in which respectively “content” and “subject” were something that documents have to a phase in which content and subject is understood as something attributed to documents from certain perspectives and interests. Endnotes1. Scheufele (2015b, 112) wrote: “While qualitative content analysis works rather inductively by summarizing and classifying elements and by assigning labels or categories to them, quantitative content analysis works deductively and measures quantitatively by assigning numeric codes to parts of the material to be coded”. 2. Jensen (2015 , 619): “Texts are vehicles of communication. While traditionally reserved for written and other verbal messages, the term refers to any meaningful entity, including images, everyday interaction, and cultural artifacts. Deriving from classical Latin 'texo' (to weave, to construct), texts emphasize the complex process in which ideas are articulated and communicated”. 3. Hutchins (1978, 180), contrasting presupposed knowledge with new knowledge in a text, wrote: “My general conclusion is that in most contexts indexers might do better to work with a concept of 'aboutness' which associates the subject of a document not with some 'summary' of its total content but with the 'presupposed knowledge' of its text”. Hutchins does not provide an example of how traditional subject analysis differs from his proposed “aboutness analysis”. His only example (1978, 181) concludes: “Thus both approaches to indexing would result in the same index entry Industrial archaeology”. What Hutchins associates with the traditional subject analysis is indexing as a kind of document summary, which has been termed document-oriented indexing. Hutchins is right in problematizing this approach. Request oriented indexing, on the other hand, does not intend to summarize the overall contents of documents, but to make documents findable in relation to the questions they may answer for the users. The important point here is that the concepts “subject” and “aboutness” are both associated with both document oriented indexing and request oriented indexing, and both views contain the same problems about the objectivity and subjectivity of indexing. Thus, there are no needs for the term aboutness analysis as distinguished from subject analysis. 4. Hanna (1998) wrote “by the end of the 1970s the movement [i.e., conceptual analysis] was widely regarded as defunct”. 5. Short (2016, 3) wrote: “Adding the word ‘critical’ in front of content analysis signals a political stance by the researcher, particularly in searching for and using research tools to examine inequities from multiple perspectives. Researchers who adopt a critical stance focus on locating power in social practices by understanding, uncovering, and transforming conditions of inequity embedded in society”. 6. Concerning deductive CA, see Kyngäs and Kaakinen (2020). 7. Concerning inductive CA, see Kyngäs (2020). 8. It is outside the scope of the present paper to discuss whether the concepts “replicability” and “validity” should form part of the definition of CA. These concepts are discussed in the philosophy of science (see, e.g., Guttinger 2020; Matarese and McCoy 2024; Pownall 2024). 9. Krippendorff (2018, 25) wrote: “His [Berelson’s] requirement that content analysis be 'objective' and 'systematic' is subsumed under the dual requirement of replicability and validity in our definition. For a process to be replicable, it must be governed by rules that are explicitly stated and applied equally to all units of analysis. Berelson argued for 'systematicity' in order to combat the human tendency to read textual material selectively, in support of expectations rather than against them. Our requirement of validity goes further, demanding that the researcher’s process of sampling, reading, and analyzing messages ultimately satisfy external criteria. Replicability is measurable, and validity is testable, but objectivity is neither”. 10. Krippendorff (2018, 26): “Berelson felt no need to elaborate on the crucial concept of 'content' in his definition, because for him and his contemporaries, at the time of his writing, there seemed to be no doubt about the nature of content—it was believe to reside inside a text”. 11. Macnamara (2018, 3): “Shoemaker and Reese argue that there are two traditions of content analysis—the behaviourist tradition and the humanist tradition. The behaviourist approach to content analysis, pursued by social scientists, is primarily concerned with the effects that content produces. Whereas the behaviourist approach looks forwards from media content to try to identify or predict future effects, the humanist approach looks backwards from media content to try to identify what it says about society and the culture producing it. Humanist media scholars draw on psychoanalysis and cultural anthropology to analyze how media content such as film and television dramas reveal 'truths' about a society—what Shoemaker and Reese term 'the media’s symbolic environment' (1996: 31–32)”. ReferencesAlbrechtsen, Hanne. 1993. “Subject Analysis and Indexing: From Automated Indexing to Domain Analysis”. The Indexer 18, no. 4: 219-224. https://doi.org/10.3828/indexer.1993.18.4.3. Baxendale, Phyllis B. 1966. “Content Analysis, Specification, and Control”. Annual Review of Information Science and Technology, 1: 71–106. Berelson, Bernard. 1952. Content Analysis in Communication Research. New York: Hafner. Bowen, Glenn A. 2009. ”Document Analysis as a Qualitative Research Method”. Qualitative Research Journal 9, no. 2: 27-40. DOI: 10.3316/QRJ0902027. Broughton, Vanda. 2012. Essential Library of Congress Subject Headings. London: Facet Publishing. Cheti, Alberto. 1996. Manuale ipertestuale di analisi concettuale. [Hypertextual Manual of Conceptual Analysis]. Università di Bologna. Centro interfacoltà per le biblioteche, http://www2.sba.unibo.it/miac/. Chu, Heting. 2015. “Research Methods in Library and Information Science: A Content Analysis”. Library & Information Science Research 37, no. 1: 36-41. DOI 10.1016/j.lisr.2014.09.003. Dousa, Thomas M. 2009. “Facts and Frameworks in Paul Otlet´s and Julius Otto Kaiser´s Theories of Knowledge Organization”. Bulletin of the American Society for Information Science and Technology 36, no 2: 19-25. https://doi.org/10.1002/bult.2010.1720360208. Gardin, Jean-Claude. 1973. “Document Analysis and Linguistic Theory”. Journal of Documentation 29, no. 2: 137-68. Guttinger, Stephan. 2020. “The Limits of Replicability”. European Journal for Philosophy of Science 10, no. 2: 1-17. https://doi.org/10.1007/s13194-019-0269-1. Hanna, Robert. 1998. “Conceptual Analysis”. In Routledge Encyclopedia of Philosophy, Version 1.0, London, UK: Routledge. DOI: 10.4324/9780415249126-U033-1. Harder, Worth Travis. 1967. “Comment on ‘The Dichotomy of Form and Content’”. College English 28, no. 8: 611-612. Hjørland, Birger. 2017. “Subject (of Documents)”. Knowledge Organization 44, no. 1: 55-64. Also available in ISKO Encyclopedia of Knowledge Organization, eds. Birger Hjørland and Claudio Gnoli, https://www.isko.org/cyclo/subject. Hjørland, Birger. 2022. “Table of contents (ToC)”. Knowledge Organization 49, no. 2: 98-120. Also available in ISKO Encyclopedia of Knowledge Organization, eds. Birger Hjørland and Claudio Gnoli, https://www.isko.org/cyclo/toc. Hjørland, Birger. 2023a. “Description: Its Meaning, Epistemology, and Use with emphasis on Information Science”. Journal of the American Society for Information Science and Technology 74, No. 13: 1532-1549. https://doi.org/10.1002/asi.24834. Hjørland, Birger. 2023b. “Information”. Knowledge Organization 50, no. 1: 47-78. Also available in ISKO Encyclopedia of Knowledge Organization, eds. Birger Hjørland and Claudio Gnoli, https://www.isko.org/cyclo/information. Hjørland, Birger. 2024. “Bibliography (Field of Study)”. In Press Knowledge Organization, xxx-yyy. Also available at ISKO Encyclopedia of Knowledge Organization, eds. Birger Hjørland and Claudio Gnoli, https://www.isko.org/cyclo/bibliography. Holley, Ralph M. and Daniel N. Joudrey. 2021. “Aboutness and Conceptual Analysis: A Review”. Cataloging & Classification Quarterly 59, nos. 2-3: 159-185. https://doi.org/10.1080/01639374.2020.1856992. Hutchins, W. John. 1978. “The Concept of ‘Aboutness’ in Subject Indexing”. Aslib Proceedings 30: 172-181. Jensen, Klaus Bruhn. 2015. “Text and Intertextuality”. In The Concise Encyclopedia of Communication, ed. Wolfgang Donsbach. Malden, MA: John Wiley & Sons, 619-620. Kent, Allen. 1971. Information Analysis and Retrieval. (Information sciences series). New York, NY: Becker and Hayes. Krippendorff, Klaus. 2018. Content Analysis: An Introduction to Its Methodology 4th ed. Los Angeles, CA: SAGE. Kyngäs, Helvi and Pirjo Kaakinen. 2020. ”Deductive Content Analysis”. In The Application of Content Analysis in Nursing Science Research, eds. Helvi Kyngäs, Kristina Mikkonen and Maria Kääriäinen. Cham, Switzerland: Springer, 23-30. DOI: 10.1007/978-3-030-30199-6_3. Kyngäs, Helvi. 2020. “Inductive Content Analysis”. In The Application of Content Analysis in Nursing Science Research, eds. Helvi Kyngäs, Kristina Mikkonen and Maria Kääriäinen. Cham, Switzerland: Springer, 13-21. DOI: 10.1007/978-3-030-30199-6_2. Lancaster, Frederick Wilfrid. 2003. Indexing and Abstracting in Theory and Practice 3rd ed. London, UK: Facet Publishing. Macnamara, Jim. 2018. “Content Analysis”. In Mediated Communication, ed. Philip M. Napoli. Berlin, Germany: De Gruyter Mouton, 191-212. https://doi.org/10.1515/9783110481129-012. Matarese, Vera and C.D McCoy. 2024. “When ‘Replicability’ is More than Just ‘Reliability’: The Hubble Constant Controversy”. Studies in History and Philosophy of Science 107: 1-10. https://doi.org/10.1016/j.shpsa.2024.07.005. Mayring, Philipp. 2002. Qualitative Inhaltsanalyse: Grundlagen und Techniken [Qualitative content analysis: Foundations and techniques], 8th edn. Weinheim, Germany: Beltz. McLuhan, Marshall. 1964. Understanding Media: The Extensions of Man. New York: McGraw-Hill. Neuendorf, Kimberley. 2017. The Content Analysis Guidebook. Thousand Oaks, CA: Sage. (The quotes in the text is taken from the 2022 online version of this book, page numbers may vary). https://www.degruyter.com/document/doi/10.1515/9783110481129-012/html. O´Connor, J. 1980. “Answer-Passage Retrieval by Text Searching”. Journal of the American Society for Information Science 31, no. 4: 227-39. https://doi.org/10.1002/asi.4630310402. Portella, Tauany Lorena Alves Silva and Gercina Ângela de Lima. 2024. “Subject Analysis, Content Analysis, and Domain Analysis: Concepts, Methods, and Applications”. Canadian Journal of Information and Library Science - La Revue canadienne des sciences de l’information et de bibliothéconomie (CJILS-RCSIB) 47, no. 2: 158-165. DOI: 10.5206/cjils-rcsib.v47i2.17633. Pownall, M. (2024). “Is Replication Possible in Qualitative Research? A Response to Makel et al. (2022)”. Educational Research and Evaluation 29, nos. 1–2: 104–110. https://doi.org/10.1080/13803611.2024.2314526. Priss, Uta. 2006. “Formal Concept Analysis in Information Science”. Annual Review of Information Science and Technology 40, no. 1: 521-543. https://doi.org/10.1002/aris.1440400120. Ranganathan, Shiyali Ramamrita. 1963. Documentation and Its Facets. New York: Asia Publishing House. http://arizona.openrepository.com/ arizona/bitstream/10150/105426/3/documen.partb.pdf. Roberts, Carl W. 1989. “Other than Counting Words: A Linguistic Approach to Content Analysis”. Social Forces 68, no. 1: 147-177. https://doi.org/10.1093/sf/68.1.147. Roberts, Carl W. 2015. “Content Analysis”. In International Encyclopedia of the Social & Behavioral Sciences 2nd ed. Ed. James D. Wright. Amsterdam, Netherlands: Elsevier, Vol. 4: 769-773. Robertson, Duncan. 1967a. “The Dichotomy of Form and Content“. College English 28, no. 4: 273-279. Robertson, Duncan. 1967b. “Rebuttal to Harder (1967)”. College English 28, no. 8: 612. Salminen, Airi, Katri Kauppinen and Merja Lehtovaara. 1997. “Towards a Methodology for Document Analysis”. Journal of the American Society for Information Science 48, no. 7: 644-55. Scheufele, Bertram. 2015a. “Content Analysis, Qualitative”. In The Concise Encyclopedia of Communication, ed. Wolfgang Donsbach. Malden, MA: John Wiley & Sons, 111-112. Scheufele, Bertram. 2015b. “Content Analysis, Quantitative”. In The Concise Encyclopedia of Communication, ed. Wolfgang Donsbach. Malden, MA: John Wiley & Sons, 112-113. Shoemaker, Pamela and Stephen Reese. 1996. Mediating the Message: Theories of Influences on Mass Media Content. White Plains, NY: Longman. Short, Kathy G. 2016. “Critical Content Analysis as a Research Methodology”. In Critical Content Analysis of Children’s and Young Adult Literature, eds. Holly Johnson, Janelle Mathis, and Kathy G. Short. New York, NY: Routledge, 1-15. Short, Kathy G. 2019. “Critical Content Analysis of Visual Images”. In Critical Content Analysis of Visual Images in Books for Young People Reading Images, eds. Holy Johnson, Janelle Mathis and Kathy G. Short. New York: Routledge, 3-22. Tuomaala, Otto, Kalervo Järvelin, and Pertti Vakkari. 2014. “Evolution of Library and Information Science, 1965–2005: Content Analysis of Journal Articles”. Journal of the Association for Information Science and Technology 65, no. 7: 1446–1462. https://doi.org/10.1002/asi.23034. Vickery, Brian C. and Alina Vickery. 2004. Information Theory in Theory and Practice. 3rd. ed. München, Germany: K. G. Saur. White, Gary W. 1999. “Academic Subject Specialist Positions in the United States: A Content Analysis of [Job] Announcements from 1990 through 1998”. Journal of Academic Librarianship 25, no. 5: 372-382. DOI 10.1016/S0099-1333(99)80056-1. White, Marilyn Domas and Emily E. Marsh. 2006. “Content Analysis: A Flexible Methodology”. Library Trends 55, no. 1: 22–45. Yablo, Stephen. 2014. Aboutness. Princeton: Princeton University Press. Yoon, Ayoung and Teresa Schultz. 2017. “Research Data Management Services in Academic Libraries in the US: A Content Analysis of Libraries' Websites”. College & Research Libraries 78, no. 7: 920-933. DOI: 10.5860/crl.78.7.920.
Version 1.0 published 2025-03-19 Article category: This editorial article is not peer-reviewed and is not being published in the journal Knowledge Organization. ©2025 ISKO. All rights reserved. |