« January 2007 | Main | January 2008 »

April 23, 2007

Carley, Kathleen M. "Network Text Analysis: The Network Position of Concepts"

Carley, Kathleen M. "Network Text Analysis: The Network Position of Concepts." Text analysis for the social sciences: methods for drawing statistical inferences from texts and transcripts. Ed. Carl W. Roberts. Mahwah, NJ: Lawrence Erlbaum Associates, 1997. 79-100. [link to CASOS]


(This is the fifth in a series of publications I've downloaded from CASOS.)

"Mental models can be abstracted from texts as a network of relations between concepts. Within these mental models different concepts play different roles depending on their position in the network. This chapter describes how to locate and empirically describe each concept's position in a mental model and how to construct composite empirical descriptions of the entire mental model" (79).

(This essay adds "lossy-integrated" to the description of the network/map, which is meant to mitigate diachronic changes in it.)

Two relationships important to this study are strength and directionality (81).

Strength: relationships directly stated = 3, relationships inferred from syntax = 2, relationships inferred from tacit social knowledge = 1 (82).

Additional ideas needed:
Vocabulary: set of concepts in the network
Focal concept: concept whose network position is being measured
Direct link: when two concepts occur together in a single statement
Indirect link: when two concepts are linked through a directed chain of statements
Local network: set of concepts to which focal concept is directly linked
Extended network: network generated for each concept in the local network.

Five connective dimensions: imageability, evokability, density, conductivity, intensity. "The dimensions can be thought of as measuring the connective properties of the concept, that is, as measuring the nature of each concept's connection to other concepts" (85).

Communicative power of a concept: high-density are likely to be used and thought about; high-conductivity connect otherwise disconnected concepts; high-intensity derive power from consensus over their relations (87).

Turns on page 87 to the taxonomy first developed here: ordinary concepts, prototypes, buzzwords, factoids, place-holders, stereotypes, emblems, and symbols. (Prototypes replaces allusions, and "pregnant" is dropped from place-holders from R&P version.)

"The movement from ordinary concepts to symbols is a movement from concepts with very general purpose and highly personal meaning and that are very astructural to concepts that are highly relevant to the task at hand, have strong social meaning, and are highly structured. The movement from ordinary concepts to symbols is a movement from a single conceptual entity to a sociocultural construct whose conceptual handle is relevant and highly embedded" (89-90).

Example/application: choosing a tutor (90-98)

"Texts can be coded as conceptual networks. These networks can, depending on the coding scheme, represent mental models. Coding texts as mental models focuses the researcher on the the analysis of meaning. Coding texts as networks allows the researcher to evaluate the texts in terms of the positional properties of the concepts. Examining concept positions focuses the researcher on the communicative power of the concepts" (99).

"Coding texts as conceptual networks, and then analyzing these networks using the dimensions proposed herein, helps the researcher to examine the constructed nature of meaning, to determine the basis for individual and social differences in meaning, and to examine the relationship between concept usage and action" (100).


Not a lot to add here, except to note that this chapter puts together the taxonomy from earlier articles with a bit more math than the others do.

Network position is defined more as variation from a baseline than it is specific location in a visualization, although perhaps the two could be connected.

April 22, 2007

Carley, Kathleen. "Extracting culture through textual analysis."

Carley, Kathleen. "Extracting culture through textual analysis." Poetics 22 (1994): 291-312. [link to CASOS]


(This is the fourth in a series of publications I've downloaded from CASOS.)

"This paper explores the relative benefits for using content analysis and map analysis for extracting and analyzing culture given a series of texts" (291).

"In this paper, it is demonstrated that textual analysis techniques that consider the distribution across and within texts of concepts and the inter-relationships among concepts can be fruitfully used to examine the role that culture plays in human memory and cognition" (292).

relative benefits of 2 techniques: content analysis (concepts/frequency) and map analysis (concepts, relationships, & frequency) (See Carley 1993 for distinction)

"Note, map analysis subsumes content analysis when the researcher focuses simply on the question 'is the concept or statement present in the text?' Thus the value added of map analysis over content analysis is the extra value of knowing when the 'story told' by statements elaborates on or contradicts that told by only the concepts" (294).

"In this paper, social knowledge is that set of concepts (or statements) such that each concept (or statement) occurs in 50% or more of the texts being analyzed. Second, cultural diversity is measured as the number of concepts (or statements) used in the texts. Third, when map analysis is used cultural density can also be measured. Cultural density, or the degree to which the social knowledge that forms the basis of the culture is interconnected, can be measured as the ratio of statements to concepts. In this case, the higher the number the more interconnected the concepts" (295).

Four datasets: thesaurus entries for comedy and drama; portrayal of robots in science fiction; role of culture in children's recall of stories; role of cultural knowledge in the decision making process.


Over time, portrayal of robots has become more complex. Social knowledge appears to have decreased then returned to original level. Diversity of culture has increased. Shared vision remained comparable.

"In other words, over time the number of concepts used by the majority to discuss concepts has decreased while the degree of interconnection among those concepts has increased. These authors may be using fewer similar words to describe robots but they are using those words in a more similar fashion" (299).

"Clearly, there is less social knowledge about robots across the decades than there is unique knowledge. Nevertheless, sufficient changes pervade the culture that the overall view of robots as portrayed by these authors changed from bad to good. Clearly part of what is unique to each period are technological changes reflecting current science fiction thinking. Another major part of what is unique to each period is the portrayal of how people in the books respond to the robots, the portrayal of the expected culture" (303).

"These examples serve to illustrate the added benefits of map analysis. they demonstrate that by taking a more cognitive approach to coding texts we can gain additional information on culture. In general, it is often most informative to compare the results from a content analysis with those of a map analysis; e.g., to first examine concepts and then to move on to the examination of statements" (310).


Two of the four examples I've seen before, so this essay seemed a little more basic than it would have before I'd begun making my way through the CASOS papers.

Even though the terminology is a little different than previous papers, it's similar enough to provide a nice example of empirically verifying what might otherwise be interpretive hunches. I focus in my notes on the robot example, of course, but this is true of each of these examples.

One point that doesn't emerge in Carley's analysis that seemed particularly appropriate for the robot example is the diachronics of that data set. Particularly in science fiction (or any specialist genre), it's likely that writers in the later periods would have read writers in the earlier ones. What kind of effect that would have on questions of shared knowledge, diversity, and density is something I'd have to think through--there is something, however, to the idea that someone in the 80s writing about robots might consciously wish to break from (or reinscribe), say, Asimov's treatment. Culture changes, yes, but I'm not sure she accounts here for the cumulative and recursive possibilities of that change.

It's less an issue in her other examples, though, and insofar as this article is meant to demonstrate the analyses first and foremost, it's more something for me to ponder than anything else.

Kaufer, David S., and Kathleen M. Carley. "Condensation Symbols: Their Variety and Rhetorical Function in Political Discourse."

Kaufer, David S., and Kathleen M. Carley. "Condensation Symbols: Their Variety and Rhetorical Function in Political Discourse." Philosophy and Rhetoric 26.3 (1993): 201-226. [link to CASOS]


(This is the third in a series of publications I've downloaded from CASOS. There are strong similarities between this essay and "Semantic Connectivity," which was published, I think, while this essay was in press.)

"Rhetorical theorists, however, focus on the stratified impact of words in context, a stratification brought about by the fact that every word has a unique history of usage across populations of audiences who can continue to affect a word's impact in contemporaneous messages.
For the rhetorical theorist, high impact words evince a high degree of connectivity in context" (201).

"what is special about such symbols is not simply that they are networked with other concepts, but that they are (somehow) well-connected in a network of meaning primed by the context....The sense of connectedness at issue involves ties to situational and strategic notions as well, connections between words and specific rhetorical settings" (202).

"Let us say that symbols are well-connected just in case they are at least high in situational conductivity, or situational density, or situational consensus" (202).

"Situational conductivity refers to the capacity of a linguistic concept both to elaborate and to be elaborated by other concepts in a particular context of use" (202).

"Situational density refers to the frequency with which a linguistic item is used in relation to others, within a delineated context and social group" (204).

"Situational consensus refers to the extent to which a concept is elaborated in similar ways across a given population in a given context" (204).

"A condensation symbol differs from ordinary symbols (i.e., ordinary words) by being well-connected in its context of meaning" (205).

Ordinary language and factoids are going to be ignored--uninteresting rhetorically.

Buzzwords (206), Pregnant Placeholders (207)

Better explanation, though: Pregnant place-holders are words that, like buzzwords, are both high in situational conductivity and low in consensus. Unlike buzzwords, they are also high in density, that is, they are highly connected to other concepts, meaning that as rhetorical devices, they have more staying power than buzzwords. Buzzwords begin to lose their magic with too much elaboration. They are names for words that are, symbolically at least, too hot to handle. The significant property of pregnant place-holders, on the other hand, is that they can sustain a great deal of elaboration in the absence of consensus" (207).

Emblems (208)

Also explained in more detail: "Because emblems are not dense (have few connections with other points of reference in their contexts of use). they function as conductive points of consensus at a distance from other focal points, islands of focal agreement. The clearest example of an emblem, perhaps, is the academic citation. This is especially true of citation in the natural sciences where there is likely to be relative consensus about the content of a contribution (e.g., Einstein 1905)" (208).

Standard Symbols (209), Allusions (209)

Nice analysis of allusions here: "The allusion presents challenges to communication. It lies at the periphery of a focal network and so is not highly conductive. And yet because it is dense, there is the danger that it will draw attention away from the focal concepts it is supposed to be elaborating....When the insider nature of the allusion becomes more important than the message, the symbol becomes a kind of insider jargon or argot, a membership card--in short, some variety of dense expression (a pregnant place-holder, allusion, stereotype, or standard symbol) that significantly thins out in meaning as the audiences for it widen.
On the other hand, allusions are the source of many strategies used to hold together, by a thread, fragile coalitions, even to cultivate a false sense of consensus across diverse constituencies" (210).

Stereotypes (211)

Condensation Symbols and Rhetorical Function: "While quantitative measures can be used to extract distributions of language use associated with each of these types, our aims in this paper are qualitative and designed to show how these categories provide descriptive tools for analyzing categories of political argument" (212).

Analysis of exchange between Miro Todorovich and Howard Glickstein on affirmative action (begins 213).

"Because Todorovich and Glickstein speak with different insider reference groups in mind, we should find that different concepts in their argument have different structural/rhetorical characteristics. Some concepts will be standard symbols, others stereotypes, others buzzwords. and so on. We should also find that a single concept has different structural/rhetorical characteristics depending upon the insider reference group to which it is addressed. A concept functioning, say, as a stereotype when addressed (by Todorovich) within one insider reference group may function as an altogether different stereotype (or some other category) when addressed (by Glickstein) within another, Finally, it is possible that different categories play different rhetorical functions in argument. Some categories may be especially useful for building symbolic bridges across belief systems. Others may be especially useful for building solidarity within one" (213).

"As it happens, there is a thin line between speaking of condensation symbols as "devices" and as "beliefs," Do Glickstein and Todorovich use rhetoric differently in the service of their different beliefs? Or do they simply believe different things and their language simply (and accurately) translates these differences? The first question implies that we are dealing with (artistic) devices driven by beliefs. The second question implies that we are dealing with (inartistic) beliefs channeled into routine linguistic expression. Like active viruses on the border of organic and inorganic life, condensation symbols lie on the border of artistic and inartistic life" (223).


There's not much to add here. The difference in audience for this essay and for "Semantic Connectivity" results in some important distinctions. In this essay, C&K are more interested in the ways that their typology functions rhetorically, and their application is an excellent example of those types in action. The interpretive leverage which is mostly implicit in SC is far more explicit here, and (I think) easily extended to other texts.

I'm definitely interested in pursuing the question of jargon further, and this provides some language for doing so, which is nice.

April 21, 2007

Carley, Kathleen M., and David S. Kaufer. "Semantic Connectivity: An Approach for Analyzing Symbols in Semantic Networks."

Carley, Kathleen M., and David S. Kaufer. "Semantic Connectivity: An Approach for Analyzing Symbols in Semantic Networks." Communication Theory 3.3 (1993): 183-213. [link to CASOS]


(This is the second in a series of publications I've downloaded from CASOS.)

"Density measures have also dominated the analysis of semantic networks. There is now a sizable amount of work on the generation of such networks from linguistic data. The majority of that work locates the network and displays it. When attempts are made to analyze the network, the focus is typically on the density (i.e., the number of links) of particular concepts (which serve as the nodes) within the network and on the inferences that can be made about the communicative prominence of such concepts in light of their density. While density is a useful way of analyzing the communicative “connectivity� of a symbol in a message, it provides only one dimension for analyzing connectivity within a semantic network. In this article we offer two further dimensions - conductivity and consensus - with which to analyze semantic networks for connectivity" (183).

"These techniques generate a semantic network in which concepts are the nodes and the relationships between concepts, the links. The same level of attention, however, has yet to be given to analyzing the resulting network. In general, researchers are content to display the resulting network" (184).

"The literature on the symbol has mainly taken its direction from the analysis of the literary symbol, often the literary metaphor or allegory, which are well known to elicit multiple levels of rich, often imagistic, inference. This linkage has produced a heavy bias in favor of explaining the symbol in terms of interpretive density, the sheer number of continuous connections that the symbol makes available to the understanding (as, for example, in the world is a stage)" (185).

"Density, however, is not the only primitive constitutive of symbolic connectivity. A second and independent primitive is consensus. Some symbols function as such only because they are connected to historical inferences that are widely shared. A symbol like 1492 has relatively low density for a grammar school student but nonetheless performs a symbolic function because it draws on beliefs that are almost universally shared across that population" (185).

"Elliptical expressions combining density with consensus were known in ancient rhetoric as enthymemes" (186). (Also, e.g., slogans, stereotypes, cliches)

"There is a third primitive constitutive of symbolic expression, one that can combine with the primitives of density and consensus but can also stand on its own. We call this primitive conductivity. Conductivity is the capacity of an expression in context to carry (or trigger) information in a two-directional flow. Information flows in two directions when it both triggers and is triggered by other available information in the context....The importance of a word known only for its conductivity (and so lacking in density or consensus) is not the expression itself but rather the flow of ideas it keeps stimulating" (186).

"The primary example of such a purely conductive symbol is the buzzword....The “meaning� of a buzzword lies not in its direct or immediate denotation but in the elaborations that everyday users have come to give it" (186).

Measuring "Three Dimensions of Connectivity in a Semantic Network"

Density: "The density of a focal concept is the number of concepts to which the focal concept is directly linked, regardless of the direction of the link" (187).

Conductivity: "The total conductivity of a focal concept is measured by multiplying the number of concepts directly linked into it by the number of concepts directly linked out from it....Note also that a focal concept acquires density and conductivity in a network at a very different rate. Density grows one concept at a time, additively. Conductivity grows faster than that, multiplicatively" (189).

Consensus: "We can compute consensus by surveying or sampling the agreements of language users about which concepts are connected to which on a pair-by-pair basis. The more highly consented to links to or from the focal concept, the higher its consensus" (189).

consensus measured by threshhold: "The use of a threshold does not imply the absence of agreement for links that fall below it; it only implies the absence of “social knowledge.� Social knowledge consists of that information that is more or less known by most individuals in the society....Threshold-setting for social knowledge is important because the level of agreement required to achieve social knowledge may vary by context" (190).

A Typology of Semantic Categories Formed by the Intersection of These Dimensions

Assuming that focal concepts can be either "high" or "low" on the 3 dimensions:

ConnectivityConsensusLanguage Category
LowLowLowOrdinary Words

Qualifications -- will vary from community to community, and also over time. None of these dimensions is necessarily stable over the long term or across groups (196).

Applied to 3 semantic environments: residence hall, writing classroom, thesaurus entry (the first two studies are published elsewhere, the 3rd original to the article).

"On occasion, we want to make comparative inferences about connectivity across topical contexts. For example, in comparing scientific disciplines one might wish to test the proposition that disciplines associated with physical as opposed to social reality produce shorter articles in part because the language of such disciplines is less amorphous and there is greater consensus as to what words mean" (199).

Application domains: analyzing argument discourse (207), decision making and voting (207), classroom learning (208), lexical choice (209).


Of all the CASOS essays, this one may provide the best starting point for getting at a full picture of what the network analysis of texts can accomplish. First, the typology outlined above provides a very persuasive account of the spectrum from ordinary language to symbols, one that includes measurable features yet still generates some interpretive leverage.

One thing that occurs to me as I look back through it is the relative thinness of representations like tagclouds, measuring as they do the density of terms in a dataset. This has got me thinking about how the other dimensions might likewise be represented.

Oh, and second, the math here is not overwhelming, although it's still a bit of a challenge for me. I get the impression that I could manage it if I were interested in doing this kind of project. Having three different kinds of environments represented also allows C&K to offer some different examples of hypotheses that they can test given the data here.

Carley, Kathleen, and Michael Palmquist. "Extracting, Representing, and Analyzing Mental Models."

Carley, Kathleen, and Michael Palmquist. "Extracting, Representing, and Analyzing Mental Models." Social Forces 70.3 (Mar 1992): 601-636. [link to CASOS]


(This is the first in a series of publications I've downloaded from the Center for Computational Analysis of Social and Organizational Systems (CASOS) at Carnegie Mellon.)

"This article describes a methodology for representing mental models as maps, extracting those maps from texts, and analyzing and comparing those maps. The methodology employs a set of computer-based tools to analyze written and spoken texts" (601).

"This stance underlies the methodology described in this article and is epitomized by the following claims: (1) mental models are internal representations, (2) language is the key to understanding mental models, that is, mental models can be represented linguistically and those representations can be based on linguistic accounts, (3) mental models can be represented as networks of concepts, (4) the meaning of a concept for an individual is embedded in its relations to other concepts in the individual's mental model, and (5) the social meaning of a concept is not defined in a universal sense but rather through the intersection of individuals' mental models" (602).

3 assumptions: (1) both the cognitive structure and text can be modeled using symbols, (2) the text is a sample of what is known by the individual and hence of the contents of the individual's cognitive structure, (3) the symbolic or verbal structure extracted from the text is a sample of the full symbolic representation of the individual's cognitive structure. Completeness of sample depends on variety of factors (603).

4 steps: the researcher (1) identifies concepts, (2) defines the types of relationships that can exist between concepts, (3) codes texts according to concepts/relations, and (4) displays models graphically or analyzes them statistically (604).

vs. content analysis -- inability to accommodate context, tells us less about structure than content (605).

vs. procedural mapping -- focus on context to the exclusion of content/meaning (605).

cognitive mapping -- broad range of approaches, many limitations. concerns here: lack of automation, lack of procedure for cross-individual comparison, questions of representation (606).

    4 basic objects: (607-608)
  • concept ("ideational kernel")
  • relationship ("tie that links two concepts together" -- directionality, strength, sign, and meaning)
  • statement ("two concepts and the relationship between them")
  • map ("a network formed from statements")
    4-step process: (608)
  • identify the set of concepts that will be used in coding the texts
  • define the types of relationships that can exist among these concepts
  • code information in a text as a set of statements
  • display/analyze maps

Identification: confirmatory or exploratory? (609)
Extraction: representative sample, automated (610)

Defining relationships: specify how strength, sign, directionality, and meaning will be used (611).

Example of map analysis (621-624)


This series of essays from CASOS, that stretch across the 90s, are almost certainly going to be invaluable for me. First, in various ways, they model the kinds of analysis that network studies makes possible, particularly with respect to texts. And second, their bibliographic coverage of various precursors is going to make things much easier for me.

Subsequent essays in the series will get a little more at the rhetorical end of things--the focus of this piece is almost exclusively on extraction. But it does also get at one of the potential weaknesses of network mapping, which is the possible loss of qualitative data. The emphasis on the different qualities of ties/relationships is the trick.

Watts, Duncan. "Is Justin Timberlake a Product of Cumulative Advantage?"

Watts, Duncan. "Is Justin Timberlake a Product of Cumulative Advantage?" New York Times Sunday Magazine. April 15, 2007. [link]

I've been meaning to blog this for almost a week now...


This essay begins with the common-sense position that the culture industry should be better than they actually are at figuring out which books, music, movies, etc., will succeed. But this position assumes that "when people make decisions about what they like, they do so independently of one another. But people almost never make decisions independently."

Instead, they rely on the recommendations of others, in the form of bestseller lists, word of mouth, etc. The social process by which we consume culture leads to "cumulative advantage":

This means that if one object happens to be slightly more popular than another at just the right point, it will tend to become more popular still. As a result, even tiny, random fluctuations can blow up, generating potentially enormous long-run differences among even indistinguishable competitors — a phenomenon that is similar in some ways to the famous “butterfly effect� from chaos theory. Thus, if history were to be somehow rerun many times, seemingly identical universes with the same set of competitors and the same overall market tastes would quickly generate different winners: Madonna would have been popular in this world, but in some other version of history, she would be a nobody, and someone we have never heard of would be in her place.

To test this idea, Watts and collaborators set up a website called "MusicLab" where they made mp3s available for listening and download. Visitors "were asked to listen to, rate and, if they chose, download songs by bands they had never heard of. Some of the participants saw only the names of the songs and bands, while others also saw how many times the songs had been downloaded by previous participants. This second group — in what we called the “social influence� condition — was further split into eight parallel “worlds� such that participants could see the prior downloads of people only in their own world."

And the result was that the social influence worlds varied widely, not only from the control group, but from each other as well.

"In our artificial market, therefore, social influence played as large a role in determining the market share of successful songs as differences in quality. It’s a simple result to state, but it has a surprisingly deep consequence."

Watts closes with caution about any type of prediction or analysis under such circumstances: "Our desire to believe in an orderly universe leads us to interpret the uncertainty we feel about the future as nothing but a consequence of our current state of ignorance, to be dispelled by greater knowledge or better analysis."


I should probably summarize the piece better, but I may do so once I've downloaded and digested the version of this article that appeared in Science.

The significance of this piece, as far as I can tell, is that it provides a model for empirically testing of the cumulative advantage hypothesis, and as Watts notes at the end of the NYT piece, it's a hypothesis which runs against the grain of a great deal of our commonsense thinking when it comes to culture. The idea that publishers, music execs, Hollywood vets, etc., have an "eye for talent" and are capable of predicting success may be even thinner than we suspect.

And of course this has serious implications for any discussion of canonicity, which is rapidly becoming one of the chapter themes as I think about what this book will look like. It's not just that this supports the notion of canonicity, although the CA hypothesis certainly does that. What's really interesting is the degree to which the process of canonicity is an arbitrary one--once the ball is rolling, the rich get richer, but the MusicLab work suggests that there's no necessary "rich," that the initial conditions can vary from "world" to "world."

I'm not sure I'm explaining this particularly well, but it raises questions about the motivations behind canonicity, or the presumed prevalence of such motives.