Skip to content

This is the research blog of Kim Witten: linguist, internet researcher and graphic designer. My PhD research focuses on the way people in online communities negotiate the pronunciations and social meanings of new words that are derived or primarily used in a text-based medium. Please see the 'About' section for more.

January 10, 2013

Free access to Names – Extended to February 15th

by Kim Witten

Maney publishing has announced that Names is the January journal of the month. Until February 15th, you can enjoy free access to Names: A Journal of Onomastics. Check out the December issue, all about names and the internet, where my article about the negotiation of “MeFi” appears. Enjoy!

October 28, 2012

“nsfw” vs. “trigger”

by Kim Witten

A fellow MeFite tweeted the following earlier today:

“When did trigger warning/alert become nearly as common as NSFW?”

This got me thinking, so I dug into the MetaFilter Corpus and pulled out the raw counts and parts per million (PPM) for both of these words, as found on “the Blue”, and sorted by year. Here are the results:

I didn’t do a chart by word rank, but I should’ve (it’s a bit labor intensive with my current setup).

The word frequency gives you a sense of how much these words were actually used (number of instances), whereas the PPM tells you more about how those words measure up to the rest of the corpus for that year.

“nsfw” (meaning “not suitable for work”, as a warning to other readers when clicking on links) was coined sometime during 2001, but didn’t hit MeFi consciousness until sometime during 2003 (where we see a big jump, from less than 100 instances to just over 300). It spiked again in 2006-2007 and then has seemed to found it’s place with a PPM somewhere between 20-35 million.

“trigger”, already having another sense from the one we are interested in (specifically, “trigger” as a type of warning to other MeFites that the content may be disturbing for PTSD or other trauma survivors) was found in the corpus from the very start*. “trigger” sees a spike in 2005, and then again in 2009. However, it’s remained relatively stable as compared the rest of the corpus (maintaining a PPM value between 13-17). This suggests to me that “trigger” is noticeable to others perhaps not because of its frequency, but rather it’s saliency as a new form with a somewhat contentious use and meaning on MetaFilter.

I do find it interesting that in 2010, both wordforms are relatively equally frequent in count and PPM. I’d want to look into this further and see just how similar they are to each other in these respects. The word rank would be very useful here as well. As would 2011 data. Perhaps I’ll ask around and see if that’s been generated yet.

All of this is not very rigorous…I’m just throwing some quick charts up here, describing what I see, and giving some sparse thought on it. If I had more time, I’d love to delve into the qualitative data and see how these words are actually used in context. And run some stats. But for now, it’s getting late and I had very little sleep last night (went on a roller derby zombie recruitment raid in York, as you do).

Any thoughts y’all have on this would be cool. I’d love to hear ‘em.

*The 1999 results for “trigger” were set to zero for that year, as the corpus was very small and therefore the PPM results unreliable. I really should have just left that year out, but I wanted to be thorough.

October 14, 2012

Go forth and people the planet Suck!

by Kim Witten

Speculative Grammarian is the premier scholarly journal featuring research in the neglected field of satirical linguistics.

SpecGram has had a long, rich, and varied history, including notable classics such as Lingua Pranca and Syntactic Structures.

SpecGram has also brought us the tree diagram of love and other poetic observations involving linguistic pain and discovery. In addition, SpecGram has been home to many scientific breakthroughs.

But if this is all a bit much, they tone it done a notch with collected wisdom, Indo-European crosswords, classifieds, and book reviews.

Be sure to check out their current issue, featuring Obama’s denial, re: Romney’s accusation of his plan to eliminate verbs from the English language.

October 1, 2012

Sociophonetic Variation in an Internet Place Name

by Kim Witten

My first journal article is now published in the Names: a Journal of Onomastics (Maney Publishing link) special issue on Names, Naming and the Internet.


This study provides one of the first published accounts of sociophonetic variation in which the speech community under investigation exists online and text-based communication is the dominant mode of interaction. The abbreviated name of the Internet community weblog — MeFi, from — has at least eight recognized pronunciation variants. Quantitative analysis of surveys from over 2000 MetaFilter members reveals statistically significant variation in the distribution of members’ preferred pronunciations for MeFi across four English-speaking countries. These results reflect dialectal and socio-cultural differences in naming preferences in spite of the fact that the speech channel is limited or non-primary.

Here is the link to the content of Volume 60, Number 4, December 2012; my article is here. The post-print of the article can be viewed by clicking here (automatic PDF download). Please note that the post-print is the version that has been accepted by the journal, prior to Maney’s copyediting, typesetting and proofing process; there will be differences between this PDF and the final published version. If you have Institutional access to Names, please view that version.

I hope you enjoy reading this. Please don’t hesitate to send me some feedback. Thanks!

Preferred citation: Witten, K., 2012. Sociophonetic Variation in an Internet Place Name. Names: A Journal of Onomastics, 60(4), pp.220–230.
Some terms and conditions for online journals from Maney Publishing.
September 25, 2012

Chugging and crunching the chars, chais & chi-squares

by Kim Witten

With AVML being a success and accomplishment, and Portugal vacation sadly in the recent past, I’ve finally forced myself to turn to data crunching my way through this endless flood (York is turning into a giant waterhole). It was slow going at first, but in the last few hours I’ve really made some progress.

After weeks of tinkering with Google Refine, I finally have the data in an analyzable state. In the database to boot! I started playing around with the variables and ran a few chi-squares. The results are encouraging! Not only are there significant changes between the 2010 and 2012 surveys, but there are some consistencies (that are also significant) as well. I even found a few things that I hadn’t checked on the 2010 data, which I will go back and do.

Gender is still insignificant; age most certainly is not. Geographies have shifted. Heat maps are needed (and thanks to AVML and some helpful MeFites, I know where to start). Also interesting is the relationship between strength of what I call “MeFi conviction” and pronunciation outcomes. Basically, MeFites who feel very strongly about their chosen pronunciation tend to fall more strongly into a few popular pronunciation groups; MeFites who don’t care or are unsure are pretty much all over the place. This may seem obvious or uninteresting, but it is actually quite relevant to the process of enregisterment in that emerging standards and awareness of those standards are aligned with people’s attitudes and investment in their own pronunciation choices.

That’s all I’ll say for now. There’s loads more, but I’m tired, it’s late and I need to have a real think about things. But later.

I’ll probably work on this again a little bit more tomorrow. Then I’m switching gears to work on updating and submitting my paper about dogwhistles to a journal. I want to make some changes, then include data from next week’s presidential debate. Keep your ears open for those pitches, people!

Friday kicks off a MetaFilter meetup weekend with trips to the Superhuman exhibit at the Wellcome Collection, a night of Turkish BBQ and jazz music, a Saturday at the Science Museum, a trip to what I hear is passable Mexican food at Taqueria afterwards, and finally a pub crawl in Clerkenwell. Then I shall spend Sunday at the International Tattoo Convention, adding to what has so far been a 50+-hour investment. Good times.

August 29, 2012

Some results from the MeFi Pronunciation Surveys

by Kim Witten

I summarized some of the early findings from the MeFi pronunciation surveys in this MetaTalk comment. Here they are, repasted below.

  • 2,521 MeFites took the March 2010 survey; 1,920+ took the August 2012 survey.
  • At least 47 different countries were represented in both surveys.
  • All MeFites of the 2010 survey were active users and that survey data represented 16% of the active MetaFilter userbase. It’s estimated that this 2012 survey represents at least 10% of the active userbase.
  • From the 2010 survey data, it seems that MetaFilter skews male (62%); 36% of the survey respondents were female. There were statistically significant differences in the male/female ratio between the US and the UK — the UK tended to skew more towards males (72%); 27% of the UK survey respondents were female.
  • Again from the 2010 survey data, the average age of MeFites was 33, with no statistically significant differences between Australia, Canada, UK and US.
  • The most preferred pronunciation was “me-fie”, followed by “meh-fee” and “meh-fie”, where “meh” is pronounced with a vowel similar to the one in “met” (not “may”). The remaining pronunciations, in order of their preference, were “may-fie”, “me-fee”, “may-fee”, “my-fie”.
  • The order of preferred pronunciations stay more or less the same (the last 3 switch around a little bit where there are low counts) but the amounts by which any one variant “wins” changes based on several factors, many of which are significant.
  • The “meh-fih” and “my-fie” pronunciations weren’t options on the 2010 survey; the “my-fie” pronunciation was calculated after the fact, based on discussions with individual MeFites and survey comments.
  • The 2012 survey should provide some reliable data on those options, since they were included on this survey. It would also appear (from a very cursory look) that those two options are fairly represented here, meaning that many MeFites chose either of those as their preferred pronunciation.
  • Looking at the 2012 data (briefly and before it’s been thoroughly cleaned up and normalized), it seems that there are some interesting shifts in distribution of preferences between US, UK, Canada, Australia and the other countries. While some geographic regions seem to have gotten more skewed towards a pronunciation, others have gotten more varied. It will be interesting to match this up with how strongly people feel about their pronunciation choice.

I’m trying to find a good way to share this (and other related) info to interested MeFites. I don’t know that another MeTa is appropriate and this seems a bit *too* meta (and unfinished) for Projects. I’m not sure if this is something that people other than me and a few other wordnerds have overwhelming interest in. Anybody have any thoughts?

August 27, 2012

Sociophonetic Variation in an Online Community of Practice

by Kim Witten

The last five days of my final MetaFilter data collection were extremely successful and I can breathe a huge sigh of relief. I have a beautiful, ginormous dataset to work with and a solid plan of what I’d like to do with it. But oh boy, I’ve got my work cut out for me.

Until then, I’d like to enjoy the good feeling of this accomplishment, and bask in the overwhelming support of MeFites and MetaFilter moderators. None of this can be done without them and I truly feel blessed to have such help and encouragement every step of the way. Mirroring many of the sentiments in the comments people left in their surveys, MetaFilter is a phenomenal community and truly the best of the web.

After 2+ years of needing to withhold MeFi pronunciation data, I’m finally ready to spill the beans! I’ve attached the full presentation I gave last year at Variation and Language Processing (VaLP 2011). This one covers a little bit of everything I’m working on…sociophonetic variation, enregisterment, corpus comparisons, and a mini-map of my database and the type of data that’s in it.

My first journal article comes out around November and I’m hoping to be able to share a pre-print or some data from it. I’m still working on how I can do that. The article delves into differences in the pronunciation of the first syllable of MeFi across native English speakers in Australia, Canada, the UK and the US. Stay tuned for that.

I’ve only briefly looked at the 2012 survey data so far. I can see some definite patterns going on and there are some rather drastic differences between the data here and the last survey. It’s super exciting and I’ll post a little bit more about that as I can.

For now, here’s a PDF of “Sociophonetic Variation in an Online Community of Practice”, presented at VaLP 2011 and containing some of the pronunciation and other results from the 2010 survey: Witten_MeFi_VaLP_2011-04-12_smll

Also, here’s a quick pronunciation guide for disambiguating IPA and the codes used in my research. NOTE: the numbers are NOT a preference ranking.

August 20, 2012

Advances in Visual Methods for Linguistics

by Kim Witten

I don’t know why I haven’t mentioned it here before, but my supervisors, other colleagues and I are putting together a conference taking place next month. Advances in Visual Methods for Linguistics (AVML) is a 3-day event with workshops, posters and exciting presentations grouped into six sessions ranging from mapping techniques to knowledge, corpora and interaction visualisations. Our conference website has all the info about our keynote speakers, conference programme, travel, accommodation and more.

We’d love to draw as much interest as possible, so if you know a fellow wordnerdy data-cruncher who would like to get in on the latest dataviz techniques, please pass along the info.

Click on the image to download a PDF of our conference poster:

Registration closes in ten days…sign up now or miss out on this datatastic event!

July 25, 2012

Quantify Me and the Frequency of Beans

by Kim Witten

Inspired by this recent MeTa post about quantifying user comment data into frequency tables, I’m uploading some slides from a presentation I gave last year at Variation and Language Processing (VaLP, 2011). This isn’t the full presentation, but it contains some fun data about beans and where I’ve been going with corpus research these days.

I’ve actually collected some more frequency tables (over 150 now), which is great…thanks again to all the MeFites who contributed thus far! Also in the works is one final big data collection. If all goes as planned, MeFites will be able to see their own frequency files as well as participate in some more pronunciation Q&A fun. With the approval of the ethics board and the help of the MeFi mods, I hope to go live with that for 5 days sometime before October. It’s a lot of work, especially for the mods and so as always, I thank them profusely.

As always, any questions or concerns, feel free to email or MeMail me!

Modified VaLP Presentation – for MeFites

Oh, one more thing! My very first published article should be coming out soon (December, 2012) in a special issue of the Journal of Onomastics titled Names, Naming and the Internet. The article title will be “Sociophonetic Variation in an Internet Place Name”. It contains quantitative analysis of surveys from over 2,000 MeFites, revealing statistically significant variation in the distribution of members’ preferred pronunciations for MeFi across four English-speaking countries. In other words, how are people actually saying MeFi out there in the world? Stay tuned and find out.


March 17, 2012

The MeFi Register, Poster and Paper

by Kim Witten

It’s been a very busy time for me, research-wise. Now that I’ve finished with this term’s (also very hectic) teaching schedule, I can catch up on all things here with complete attention. News to report comes in 3 parts:


I got ethics approval to proceed with yet another form of data collection. This time I gathered MeFites’ examples of items that might be considered part of the community register. This goes towards a glossary in my dissertation, which will be crucial for the chapters on enregisterment.

The MetaTalk thread was an overwhelming success (300+ comments, woo!) and I think people there had fun with reliving some of the site culture. I learned quite a lot about memes and things that I had no idea even existed before.


I entered this year’s HRC poster competition — an invitation to “create a poster which offers a clear ‘taster encounter’ with your research project for a non-specialist audience and has quickly appreciable dramatic visual impact.”

I was thrilled and honored to find out on Thursday morning that I placed first in the competition! Here’s my winning entry:

(click on the image for a downloadable pdf)


Lastly, I’ve been working on finishing up my very first academic article for publication. This paper explores pronunciation variation of ‘MeFi’ by the geographic location of MeFites who participated in the March 2010 data survey.

I’ve finished the first final draft and have so far received positive feedback from the various people I’ve shared it with for pre-submission review. I’m pleased with what I’ve written and am encouraged by the response so far.

The article is due to the publisher in early April, and should be publicly available in print form around December this year. More details to come soon.

Now that I have more time to devote to all things MeFi, I should be updating this site more frequently. Feel free to leave me any comments or questions or suggestions. I love ‘em.