Elena M. Friot

Map-tism by Fire – Baby Steps in GIS

Disclaimer: This is in no way a tutorial, because when you have three projects to write, you don’t have time for that.  Perhaps later, when I need a break from the ungainly pile of reading that is growing with each book I add to my comps list…

I’m writing a paper for a research seminar, and in the process of getting a grasp of what New Mexico looked like in the build up to World War II, and indeed during the war years, I took a stab at using QGIS.  Our seminar required us to play around with U.S. Census Data, the Social Explorer database, and Sanborn Fire Insurance Maps, which gave me pause as I reconsidered the bounds of my project and the constraints imposed by semester deadlines. (Another disclaimer: I am not yet revealing much about my current research, because it is likely a dissertation in the making, and I’m reserving the right to be greedy).  Using those tools gave me a way to think about New Mexico as a place – economically, socially, culturally – and how I might describe those landscapes in writing.

With those tools in hand, I revisited QGIS in the hopes that it wouldn’t be as difficult as I found it the first time around.  And, much like my other forays into the digital world, using it effectively requires much trial and error.

So, I had a pile of data that I wanted to see in one place to paint a picture of New Mexico that I could use to turn into a narrative for my paper – somehow an image of a place lends itself to storytelling more than columns of numbers and names.  The data I have to hand includes population info – total numbers, density – along with employment info, war casualties, presidential election voting records, locations of Civilian Conservation Corps Camps, National Guard Armories, and World War II air fields.  Every time I learn to do something new in QGIS, I think of another data set that I can add to make a richer picture.

A couple things I’ve learned to do in a few hours of playing with the program: Add and create from scratch vector layers and raster layers; georeference images; create .dbfs  to join layers with data (and, what’s great about dbfs is that when you update or add to the dbf, the new data is synced with the dbf in layers – timesaver!); create my own .svgs to use as symbols.  And, as luck would have it, I was able to download shapefiles from the New Mexico Resource Geographic Information System Program at UNM for county boundaries in 1940.  Though not a time-waster because I learned something, I was distressed when I realized that my 1940 data had nothing for Cibola County, because it didn’t exist in 1940! I figured out how to merge parts of a map (great, useful skill) and then, shortly thereafter, happened upon a collection of shapefiles from 1850-2000.

Here are a couple screenshots of the map I created in QGIS:

NMCasualties

This map shows the percentage of WWII casualties by county. The darker the color, the higher the percentage.

NMDemocraticReturns

This map shows the percentage of voters who voted for Roosevelt in 1940. The lowest percentage on the map is about 43%, while the highest are in the 60-75% range.

NMPopulationAirfields

This map shows the total population by county – Bernalillo County had the largest population at about 69,000, while the smallest county had about 3,700 residents. Also on the map are the locations of World War II air fields (which are today either military bases or municipal airports) and National Guard Armories.

A challenge I’ve come up against and haven’t quite been able to resolve is how to make two data sets show up on the same layer.  For example, I’d like to be able to show, on one layer, those states that voted Democrat (50.1%+) and those states that voted Republican (50.1%+).

On the whole, I’m pretty happy with my first real foray into using QGIS, and have decided that it really makes for an excellent procrastination tool – and I can justify the procrastination because I am technically using the information to help construct my paper.  A final (hopefully temporary) dilemma – how does one cite a map created in QGIS with pilfered coordinates and patchworked information about military installations?

 

You Know You’ve Been a Bad Blogger When…

You can’t remember your user name OR your password.

I have been guilt-tripped into (finally!) writing a blog post after reading the posts of a colleague.  I am shamed by his re-embracing of the blog after a winter-holiday hiatus, and am now assuaging my personal guilt by attempting to re-discover my inner blogger.  Indeed, it is well after midnight, but what is one to do when sleep is elusive? You wouldn’t think that it would be for a near-end-of-coursework grad student, but writing might be the only way to get the half-finished research paper sentences out of my brain and onto a legible surface.

I ended last semester thinking I was going to be a good blogger.  I constructed a list of potential blog topics.  I promised myself I was going to write at least once a week.

Epic fail.

The holiday coma of sleep, meals prepared using a heat source other than the microwave, and the blessed non-existence of deadlines and demands got the better of me.  I was not a good blogger, and, more than letting anyone else down, I feel that I let ME down.  For a brief few months, my blog provided me with an way into the big, bold world of DH, and connected me (if somewhat superficially) to people in the field who I might otherwise have no communication with whatsoever.  Now that I have left the DH-love-fest that was HIST 666 (yes, really, that’s the number) I have to be self-motivated to continue working in the field and using digital tools in my work.  Maybe this blog post is my way back in.

The nature of coursework is that you, for a few short months, rip through fifteen or so texts to show off your analytical brilliance in seminar, all the while digging through some pile of hopefully archival material to cobble together that gem known as the “seminar paper.”  Sometimes you are lucky enough to find something to write about that relates, in any sort of tangential way, to your own particular interests.  You are even luckier if you can find archival documents that don’t require travel, desperate begging, or massive copying costs.  All that being said, it’s not often that seminar projects carry over into the next semester, and, as is the case for my work on Appleton, get abandoned in favor of a new project that promises satisfaction and – above all – completion.

I really want to use digital tools to work through my current seminar project.  I am not done with Appleton – but, let’s face it – Minnesota is far away, it’s really cold, and requires more time and financial resources than I have at my disposal as a not-yet-candidatized PhD student.  Though still working in a similar vein (localized commemorative practice) the sources I currently have at my disposal do not yet lend themselves to use with any of the digital tools that I engaged with last semester.  I believe that they will – if this project expands and turns into potential dissertation material – but not without some manipulation and well-funded travel first.

Outside of my DH cocoon I have little confidence in my DH abilities beyond Gephi and MALLET.  I don’t know enough about coding to do something with my data on my own, and I imagine my work might get boring if I use the same tools over and over again.  Reading my articles would be like the Groundhog Day of DH…little tweaks here and there, but general the same motions over and over again.  Can I be a sometimes-DHer, or is it an all or nothing proposition?

I can certainly think of a way to visualize my current project for readers, but without OCR-ing my PDF/JPEG newspaper articles, and creating metadata for images (which I have absolutely no idea how to do) is DH applicability source-dependent? We talked about DH as both a way of thinking and a way of doing…Some GIS would be great for a mapping project in the case of my current research, but some of the data that I need doesn’t exist in an accessible way (by accessible I mean timely, convenient, and cheap).  

How do I bridge the thinking-doing gap? I am compiling personnel data on a particular military unit from a variety of sources, and I believe that this will produce some useful data ripe for manipulation and visualization, but I don’t want to fall into the trap of dumping information into an Excel spreadsheet, using it to make a pretty picture, and calling it digital history.

Whatever the opposite of a brain fart is, I just had one.  I have an idea.  I can’t share it.  Yet.  If I do, the IRB (creepily similar-sounding to the IRS, don’t you think) will come calling.  Speaking of which, in a public forum (i.e. a blog) what constraints can the IRB impose on what the creator of the blog does with content, like comments and feedback? Hmmmm.

I think I found my way back into DH.

A Word Cloud Is (Can Be) More Than Just a Pretty Picture

In 2011, Jacob Harris disparaged the now-ubiquitous word cloud and reminded readers that there was a good reason why it is considered the mullet of the Internet.  A word cloud is the most basic form of text-mining/topic modeling there is.  Anyone can create them, with any collection of textual data.  Choose your cloud-maker – Wordle, Tagxedo, Tagul – all are available for free, and quickly generate a pretty picture with words sized according to their frequency with which they appear in your text.

Harris’ gripe with word clouds is that they are generally used as “filler visualizations.”  To make his point, he gives readers two visualization options.  A word cloud, or a map.  Both claim to represent a “deadly day in Baghdad.”  I went to the source of the word cloud, and the article by Kit Keaton, of Fast Company, indicates that it was produced using only 1% of the Iraq War Diaries from WikiLeaks.  The word cloud itself is fairly meaningless because 1) it was made with a minuscule portion of the available text, and 2) required no human analysis aside from copying and pasting.  The word cloud could certainly have been made more effective by translating some of the acronyms into full words, but I spent about the same time self-narrating the word cloud as I did the interactive map.  In both cases, the visualization could use some humanizing.  I don’t think “unknown bodies” necessarily imparts more meaning, or conveys more sensitivity, than the inflated words “KIA” and “IED” in a word cloud.  “KIA” and “IED” are loaded terms when a war is as publicized as the Iraq War – who hasn’t seen images of the aftermath on television?

Though I disagree with Harris on the usefulness of word clouds, I do agree that if you can’t help yourself, you should at least deploy the word cloud carefully and with analytical, narrative intent.  In The Historian’s Macroscope, Shawn Graham, Ian Milligan, and Scott Weingart contend that word clouds do indeed hold some sort of scholarly potential.  They give an example by using party platforms from Canada to show how the CCF evolved from the 1930s to the present.

After reading their chapter on word clouds, I tried to think of periods in United States history for which word clouds might serve a useful pedagogical tool.  As Fred Gibbs pointed out in seminar, thinking digitally is just as important (if not more so) as doing digitally.  Could we use word clouds to show changing attitudes toward slavery and abolition in the 1800s? Or, the development of revolutionary forces in the 18th century? The transformation of isolationism to interventionism? The waxing and waning of communist and atomic fears post-1945?

I’m using MALLET to do some topic modeling with my World War II Letters, so I feel absolutely no shame in using those same text files to create word clouds.  If I were to create a website for my project, the word clouds could serve as clickable links to the deeper analysis that they reflect.  In a conference presentation they might serve as a backdrop to a portion of the talk.  They might be useful on a poster (which are turning up at all sorts of humanities conferences).  I am not suggesting that a word cloud replaces the more complicated analysis that any historical project requires, but that they do serve a pedagogical purpose and act as a gateway to more complex engagement.

Using Tagxedo, I created image files for 1941, 1942, and 1943.  Then, I uploaded my set of text files for each year to produce a word cloud:

1941Appleton

1942Appleton

1943Appleton

Quite a few of the words turn up frequently in all three years, but towards 1943, for example, “Dad” becomes more prominent.  When I read the letters, I could see that “Dad” was frequently ill during 1943, and so his name came up regularly in the letters – usually reports on how he was feeling, or his visits to the hospital.  “Write,” “home,” “soon,” pop up in all the letters because the family members were constantly reminding each other to stay in touch, or talking about when they would next be home.  Using MALLET, I could take out all the stop words, or words that appear so regularly they add little to the analysis, and generate new word clouds.  I ran ALL the text files through Tagxedo, and this is what turned up:

LettersAll

Not much spectacularly new or different here, but the word cloud suggests that not much about the war itself made its way into the letters.  I then tried my hand at Voyant, a “web-based reading and analysis environment for digital texts.” I found it annoying that I had to upload each individual text file into the program, so I stopped.  Maybe later.  Instead, I kidnapped the words that MALLET produced in the appleton-state.gz file.  The downside of using those words was that they were no longer in any sort of contextual relationship, and the corpus reader at Voyant showed a too-long list of words.  I didn’t get a word trend graph, and the word cloud didn’t show me anything I didn’t already know from the much faster Tagxedo.  Surely there is a better way!

I did some exploring because none of the help links at Voyant worked, and found out that I could use a zip file to upload multiple documents at once.  I zipped up my 356 text files (more props to WinZip for proving to be a necessity that I will soon have to pay for) and uploaded them into Voyant.

I had all the letters in the reader, but no word cloud (though tastefully named “Cirrus” in Voyant)! Still no word trends! I clicked on the tool option in the cloud generator, and saw that I could remove stopwords.  I did so, and immediately the program started working.  Is this why they’re called stopwords? They happen so frequently (and apparently uselessly) that the jam up the whole operation and prevent progress?

I got some trends, but they didn’t really make sense to me.  And, try as I might, I couldn’t get any hint of a word cloud.  I may have reached the limits of Voyant – does it not lend itself to a 31,000+ word corpus? More learnin’ needs to be done on this one.

Voyant

I did find the “Words in the Entire Corpus” helpful.  I checked out “Dad” to see what it said, and sure enough – there is the spike in him as an important topic of the letters back and forth.  These lovely images are NOT replacements for actual reading of the letters, but do help pick out patterns and convey those patterns quickly to readers.

I’m convinced that word clouds are an easy way to visualize large amounts of text, but equally convinced that they do little on their own and do require some sort of narrative – from choice of texts, to arrangement and display of the word cloud, to accompanying analysis, to consideration of audience and intended purpose.  Used incorrectly, they are just as harmful (see Harris’ article, above) as any other technology, media, or form of writing that lacks integrity and proper transparency.  Historians pride themselves on documentation, strength of argument, historiographical grounding, originality, and faithfulness to the sources.  The tools offered by digital humanities give historians a way to become even more transparent in their work.   All of the tools I have used this semester are freely available downloads, or Internet-based, and if I provide my data in a public document, anyone can replicate my work, or, at the very least, use the tools to arrive at the same data configurations that I have.  They won’t necessarily approach the materials in the same way, come to the same conclusions, or bring the same analytical/conceptual frameworks and historiographical knowledge to the table, but they can see the process by which I got there.

If you want to start your DH-adventure with word clouds, I suggest you go for it.  But I also suggest that you don’t limit yourself to word clouds.  I started out in August as an entirely novice DHer (and I have not progressed all that far) but I am using technology in ways I never thought I would.  I purposely avoided word clouds until I felt confident using more robust forms of textual analysis.  I can enter commands into MALLET like it’s a second language, and I don’t feel that goofing up my Gephi project is the end of the world (as I might have three months ago).  I started with a small question: How and why did the residents of Appleton decide to name their town streets after their war dead of World War II? That question happily coincided with the start of my DH seminar, and provided fertile ground for using the abundance of tools lurking out there on the Internet.  I now have skills and modes of thinking that I can apply to future projects, but am also looking forward to developing those skills and moving beyond the basics.  As a result of my progress with Appleton, I have devised additional research questions for seminar papers/potential dissertation topics (not to be revealed until much later dates) that I most likely would never have come to without the benefit of thinking with my developing DH brain.

DH encouraged me to take intellectual and practical risks (neither I nor my computer has blown up yet) and I am happy I did.  I’m into blogging, whereas I used to cringe at the idea of social networking, and even more surprising is my still-healthy addiction to Twitter.  As I found about a week ago, it’s a great way to get people reading your work, and it’s an even better way to keep an eye on new research, new ideas, and all the goodies that circulate in the digital community.  Next up: copying and pasting MORE letters to enlarge my corpus, re-running MALLET and GEPHI with the bigger data set, writing an essay to present my findings, and creating a robust web experience about Appleton and it’s commemorative legacy.

 

 

The MALLET behind the Madness: Topic Modeling World War II Letters, Part I

There was a method (and a purpose) behind the madness of spending hours copying and pasting letters into text files.  I wanted to use the letters to do some topic modeling with MALLET, a tool put out by the University of Massachusetts at Amherst.  Shawn Graham and Ian Milligan offer a robust review of MALLET and provide an array of examples of projects done with the tool.  Rob Nelson’s Mining the Dispatch is rich source if you want to see the products MALLET can produce – timelines, topics, and nifty graphs.  As the authors point out, MALLET gives historians a way to read a large corpus quickly, and highlight themes that might otherwise go undiscovered through traditional close reading.  MALLET, however, does not replace such a close reading – but it does give you a way to read closely for a purpose. Graham and Milligan critique MALLET for its “alienation of novice users,” but I found MALLET easier to maneuver than Gephi.  It helped that I had been forced to do a few lessons at Learn Python the Hard Way, and was comfortable with the command language.

The best resource I found for using MALLET is in fact written by Graham, Milligan, and Scott Weingart at The Programming Historian.  Using their tutorial, I was able to do some topic modeling with a year of letters.  I was thrilled with the outcome, but realized that one year of letters told me little, and over the next few weeks compiled two more years’ worth of letters. Much like Graham and Milligan suggest, and as Nelson points out in his project, topic modeling is not an end in itself.  That probably applies to all the digital tools we have learned about this semester.  If we treat the results of running data through a program as a research product, we eliminate the important interpretive role of the historian from the equation.  MALLET, Gephi, and even the mullets of the internet help us think about our data (sources) in different ways, identify trends and patterns we might not be able to see when treating documents as single entities, and point us to related collections that we might otherwise ignore.  In their introduction to a special edition of the Journal of Digital Humanities, Elijah Meeks and Weingart state that the journal is explicitly not about the tools themselves and point out that contributing authors “critically wrestle with the process” of using the tools and working with the results.  MALLET, and programs like it, become tools for historians that enhance our abilities to glean as much as possible from the sources, rather than standalone machines that do the work for us.

After discussing MALLET in class, and perusing Nelson’s project and Cameron Blevin’s text mining of Martha Ballard’s Diary, I followed the tutorial at The Programming Historian and installed MALLET.  When I first used it, I only had one year of letters ready and so the result were fairly unimpressive.  I now have three years, and am anxious to see the results.

Collecting and Formatting Data

I explained the process of creating text files in my previous post, Go Go Gadget, Gephi! I did that many, many times to compile my current corpus.  I have each year in a separate file, titled AppletonLetters1941, AppletonLetters1942, and so on, but created a separate file with all the letters to use in MALLET, to simplify the process.  You can view the text files here.

Running MALLET

I followed The Programming Historian tutorial, so I’m not going to rehash each step here, but am including screenshots and brief descriptions of my particular process.  I had some moments where I wasn’t sure how to replace the tutorial’s commands with mine, so that took some trial and error.  If you compare what I have below with the tutorial, it might be easier to follow in the footsteps.  If you have trouble installing and getting MALLET to run successfully, try troubleshooting it on your own first, make sure you’re typing everything exactly as the tutorial indicates, and then if you are still unsuccessful, ask for help.***

Getting Data into MALLET

To get my data into MALLET, I followed the tutorial’s instructions to input my text file folder (called “appleton”) and to output it as appleton.mallet.    Make sure to scroll to the right for the full command line – I didn’t do this the first few times and couldn’t understand why nothing worked!

AppletonMallet

I didn’t tell it to keep the sequence, as the tutorial does, because I noticed that when I combined the text files into a single folder the dates moved out of order because of how they were named. ***MALLET didn’t like this.  I had to re-run this command and keep the sequence, because that’s what MALLET needed to work.*** I did include the command to take out stopwords.  You can see the list of stopwords by opening up the MALLET folder “Stoplists.”  The “En” folder has the list in English, and takes all words out of your corpus that probably have no impact on the topics.  If I wanted to, I could add words to this list, such as “Dear,” “Love,” and so on.  I could also add in the names of the senders and recipients and see what removing those does to the model.

Running MALLET and Outputting the Model

Since I’ve already worked with MALLET before, I know that following the tutorial’s first command will run the program and produce a model, but it won’t output any of the data to a usable form.  I want to see the data in an Excel spreadsheet, which will allow me to save it, access it, and use it in other tools and visualizations.  So, I run the second command they provide but change the number of topics to 10.  I know from running the program before that the smaller the corpus, the better it is to keep the set of topics small as well.  Another benefit of the “iterative interaction” of DH is that you can (and should) run the program repeatedly to see how the results vary, so as my corpus enlarges I will increase the number of topics and see what happens.  Here’s what the program spits out with my commands:

CMDPRMT

Reading the Output Files

Because I told it to output the results in .gz and .txt files, I can unzip and open those files.  They show up in the MALLET folder:

Results

To view the composition, I open up Excel, then go to File, Open, and select All Files.  I select appleton_composition.txt from the MALLET folder, and after clicking Next a few times, the data imports into the Excel spreadsheet.

AppletonComp

Columns C, E, G, and so on indicate the topics that dominated each letter.  The decimal figure to the right of those columns indicates the weight of that topic within the letter.  The topic IDs and frequent words are in the appleton_keys.txt file, which I open in Word:

appkeys

Then I had to deal with the .gz file.  I downloaded WinZip (only a trial, but probably a necessary program if I’m going to use tools like MALLET on a regular basis), and once I did that, the appleton-state file showed up as a WinZip file.  Once I unzipped it, I opened it up in Word and saved it.  The format looked to lend itself to a spreadsheet, so I opened up Excel, and went through a process similar to that for the first spreadsheet.  This time, though, I had to trial and error a few file formats before I got it right.  As it turns out, the file is space-delimited, so I had to select “delimited” and check “space” in the next dialog box.  This gave me an Excel spreadsheet containing 31,000+ words and their assigned topics.

GZ

Where to Go From Here

You’ll notice that I haven’t included any fancy graphs or actual results, but plan to work more with the corpus to do so for my research and in a future blog post.  Now that I have MALLET at my disposal, I need to do some serious editing of the letters to make them more topic-modeling friendly, and, have some other things I’d like to try:

1. Correcting grammatical and spelling errors to unify the text.

2. Adding names and salutations to a stoplist, so that names are not included as potential topics.

3. Fine-tuning the file names so that, when combined, the letters stay in chronological order.

4. Creating a timeline to see what happens to the topics throughout the course of World War II.

5. Running the program to produce fewer, or more, possible topics.

6. Use the Excel data to produce graphic visualizations of topics

7. Creating sender files (letters from the same person over five years) and running the program.

8. Spend time reading the results and categorizing the topics – family, military life, illness, battle, and so on.

As I devote more time to perfecting my use of MALLET, and reading up on what other historians and digital humanists have done with the tool, I imagine I’ll come up with more ideas as to how I might use this effectively as I conduct my research.  For now, this is a pretty good starting point.

What Does This Do For Me as a Historian?

When I used MALLET for the first time, I only had one year of letters, so my results were consequently limited.  This run used three years, or approximately 350, letters.  The final tally, up to 1945, will include over 500 letters.  While this is part of a larger project on commemoration and the construction of living memorials after World War II, I find that my use of digital tools is helping me construct a larger story of a rural community at war.  I need to visit Appleton, MN and get into the town records.  I also need to access the military records of Appleton’s soldiers.  Up until now, my focus has been on those who are memorialized in Appleton’s street names, but there were other Appleton soldiers who returned home, who are also part of the story.  I need to find out more about them and their role in the commemorative activity and post-war life of the town.  These letters, and the topics identified through MALLET, can tell me quite a bit about how Appleton residents viewed and learned about the war, and the role family communications played in that.  Using MALLET does not replace the close reading I still need to do with most of the letters, but it does point me to look for particular trends within them.  MALLET, in combination with GEPHI, will hopefully highlight common themes between particular senders and recipients.  The products of these tools, along with the newspaper articles from the Appleton Press,  allows me to produce a richer telling of the story of a community at war and forms a basis for similar analysis of other towns, cities, and communities.  In sum, the process of using MALLET helped me refine my research, think more carefully about the process, and more effectively analyze the letters on micro and macro levels.

***A second installment will include visualization and analysis of the complete set of letters.

 

***Note to world: I had installed and run MALLET successfully on my netbook.  I followed the same exact installation on my home laptop, and chaos ensued.  Though everything looked identical, right down to the environment variable, I kept getting “path not specified” and some version of “we can’t find this program on your computer.”  What?!? I spent a good hour (or more) uninstalling, reinstalling, and re-commanding MALLET, with no success.  I then noticed that within C:\, I had a MALLET folder, and within that folder, another MALLET folder.  I have no idea how that happened, so I tried copying and pasting all the MALLET sub-folders into the C:\mallet folder, then deleting the additional mallet folder.  Not sure what the techno-reasoning is behind it, but I was finally able to run the program, and now feel slightly triumphant that I troubleshot the problem without begging for help on Twitter.

Go Go Gadget, Gephi! The (Mis)Adventures of a Newbie DHer

The week we discussed data visualizations in seminar, some of our classmates took a look at data visualization programs and reported back to us.  What could they be used for? How did they work? What kind of visualization did they produce?

One of my colleagues perused the Gephi program and worked through the tutorial with a sample data set.  Gephi is a “an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.”  In the most basic sense, Gephi has the capability to take the data you give it and produce visual representations of the networks, relationships, and hierarchies between “things.”

So, we were shown a visualization of characters from the novel Les Miserables.  She had followed the tutorial included in the program, which produced a snazzy-looking, incredibly complex network of relationships between the characters.  This is a screenshot of the visualization, from Gephi’s collection:

Pretty nifty, right? Take a look at the spreadsheet to the right of the visualization.  THAT is the focus of this blog entry.

Pretty nifty, right? Take a look at the spreadsheet to the right of the visualization. THAT is the focus of this blog entry.

We had plenty of questions about the final product, and the process of going from spreadsheet to visualization.  How does the program know who is communicating with who? If characters show up on the same page, how does it distinguish between communication and coincidence? What do all of those categories mean?

The short and long answer is that none of that fabulousness happens without a human being doing a lot of hard work first.  The steps below roughly approximate my first experiment with Gephi, and, though they do not produce anything nearly as complex as the Les Miserables visualization, are enough to get a novice started and thinking more easily in the language and process that is Gephi.  First step – download the latest version of Gephi.  It’s an open-source program, so check for updates every now and then to make sure you’ve got the latest and greatest.  I found my first experiment with Gephi challenging, mostly because I could not figure out how to get my data into Gephi and produce a visualization.  None of the tutorials told me how to create the spreadsheet, so I had to play.  A lot.  For a long time.  It took me about four hours to figure it out, so hopefully the process below will get others started with less frustration:

Data Collection

I thought I might use the Letters from World War II to visualize the frequency of letters between members of the Nelson family.  These letters are available on the website, but each on a separate page.  Each blog entry is accessible as part of a group of five, and I couldn’t simply move from letter to letter.  In a burst of ambition, I had copied and pasted all of the letters from 1941 into a Word document.  Then, I discovered that DH runs in .txt, not in Word.  I had to overcome my addiction to neat margins, nicely formatted paragraphs, and the academic insistence on size 12 Times New Roman font.  It is easy enough to change files from .docx to .txt – all I had to do was select “Save As” and change the file type.

A pretty straightforward operation.  I always wondered what those file type options were useful for!

A pretty straightforward operation. I always wondered what those file type options were useful for!

I had a whiny moment in class and bemoaned the fact that I had to spend what felt like an eternity copying and pasting the data from the website into files, but Fred put this process in perspective and contrasted it with an adventure in the archives.  What if I had to photograph, scan, or otherwise obtain this same data from hard copies? The amount of time I spent working with the letters online was significantly less than that I would have had to spend in and archives, and let’s face it – I accessed it for free.  Now, I have a huge corpus that I can use in Gephi AND in Mallet (more on that later).  After my first run of letters, I streamlined the process and could copy, paste, and save as a text file fairly quickly – about two hours per year of letters.

The quick run down:

1. Copy and paste the letter from the website into a Word Document.

Some of the letters are long and some are only a few lines. This one is in the middle, and luckily all the letters include the date (see the tab at the top) so the files are named with just the date.

Some of the letters are long and some are only a few lines. This one is in the middle, and luckily all the letters include the date (see the tab at the top) so the files are named with just the date.

2. Give the document a name, and save it as Plain Text.  If possible, give it a name that carries data.  In my case, I named each document with the date so that 04101944 = April 10, 1944.  Other programs, like Mallet, will use this information to create a timeline.  This name is more descriptive and useful than Letter1, Letter2.  You would think this is obvious, but it was not obvious to me when I first started. Now I name every document with a date.  If two or more pieces of data had the same date, I tack on a letter at the end – 04101944, 04101944b, 04101944c, and so on.  Once you’ve save a single file as .txt, you can copy and paste other data into the same document, change the name, and not have to bother with the file type.

Notice that I didn't change the formatting of the letter.  I spent too long doing this for the first year of letters, and it was a time-suck.  I then realized that plain text doesn't allow  for formatting, so all the irritating italics disappeared.  The quotes at the beginning and end still irritate me, but they don't affect the outcome in the programs I've used so far.

Notice that I didn’t change the formatting of the letter. I spent too long doing this for the first year of letters, and it was a time-suck. I then realized that plain text doesn’t allow for formatting, so all the irritating italics disappeared. The quotes at the beginning and end still irritate me, but they don’t affect the outcome in the programs I’ve used so far.

After you select a file format and give it a name, you have the option to choose the encoding.  I haven't had reason to change it and keep it at the default.  Character substitution replaces characters not identified by the encoding you choose with the closest character to the original.  This is why sometimes in documents you see punctuation replaced with question marks - because the original document didn't translate perfectly to a different encoding language.  I don't check the box, and haven't had a problem.

After you select a file format and give it a name, you have the option to choose the encoding. I haven’t had reason to change it and keep it at the default. Character substitution replaces characters not identified by the encoding you choose with the closest character to the original. This is why sometimes in documents you see punctuation replaced with question marks – because the original document didn’t translate perfectly to a different encoding language. I don’t check the box, and haven’t had a problem.

Getting the Data into a Gephi-Friendly Format

Text files are not enough to get into Gephi.  Once the data is collected (I started with a single year just to get my feet wet) I had to transform my letters into a spreadsheet.  I didn’t need to put the letters themselves into the spreadsheet – just the senders and recipients.  So, I did this, and dutifully made a column for “senders” and for “recipients.”

Looks pretty simple, right? Little did I know that this was NOT what Gephi needed.

Looks pretty simple, right? Little did I know that this was NOT what Gephi needed.

I wrongly assumed that Gephi would take my data, understand the links between senders and recipients, and make a visualization.  No.  I still had a lot more data manipulation to do to make it work.  I looked for a tutorial to tell me how to get my data in a format that Gephi could use, but didn’t really have any luck.  Looking at the samples in Gephi wasn’t helpful, either, because I only saw the Gephi Project File, which shows the data already manipulated.  I finally found a sample Excel spreadsheet about Twitter and universities, and looked at the columns, the data, and how each was identified in relationship to one another.  Aha! First, I needed to really understand what was meant by

Nodes and Edges

Nodes, in my case, are the senders and recipients.  The people.  A node is essentially a connection point.  Edges are the lines between those connection points – imagine the letters floating through space and time from the sender to the recipient, and that is the edge.  The more letters between two “nodes,” the bigger the edge.  Gephi requires you to use two separate spreadsheets – one that contains the nodes, and another that contains the edges.

Creating the Right Spreadsheet

Creating the spreadsheets for Gephi required some work.  I ascertained, based on looking at other spreadsheets available online, that each “node” needed an ID.  I went through my first spreadsheet (see above) and cleaned up the data.  Nicknames for senders and recipients were replaced with their legal names.  Then, each sender was assigned an ID number.  In my case, 1-8.

Nodes

The “label” is just the names – that’s how the nodes will be labeled in the visualization.  The “type” of node is a “person.”  I don’t know if I really needed this in my spreadsheet because I couldn’t see that it had any impact on the outcome, but for more complicated visualizations it might be helpful.  I had no idea what the x- and y-coordinates were for, so I left the values at zero.  To use this in Gephi, I needed to save it as a .csv, or “Comma Separated Values” file.  Instead of saving the file as a regular Excel workbook file, I selected “Save as Type: CSV.”  I didn’t need to do any special formatting.

Onto the edges: This is where the ID numbers come in handy.  I went through my original Excel spreadsheet, the one that contained all the names, and I replaced each name with its corresponding number.

Edges

The “source” is the sender.  The “target” is the recipient.  I didn’t know what “type” meant, so I left it as “directed,” which is what the sample spreadsheet I was looking at used.  I left ID and Label blank – I assumed that the nodes would be labeled with the names included in the nodes spreadsheet I created earlier.  I also left “weight” at “1” because I didn’t know what the impact would be on the final visualization.  As it turned out, “directed” put an arrow on the end of the edge in the visualization to show the direction of the communication between the nodes.  I wasn’t a fan of that, so I changed them all to “undirected” and this solved the issue.  Trial and error!  I had to save this file as .csv as well.  Now, I was ready to send my files through the Gephi machine!

Getting into Gephi and Creating a Project

This is the screen you see when you open up Gephi:

GephiScreen1

First, click on “New Project.”  The next screen provides several options – Overview, Data Laboratory, and Preview.  The only one that is immediately useful is the Data Laboratory.  So, select Data Laboratory.

DataLab

 

I needed to import the data, so I selected “Import Spreadsheet.”  I located my .csv file, named “AppGeph” and then made sure that I selected “comma” as the separator, and “nodes” as the table type.  The program will tell you if you have the wrong type selected by highlighting your file in red, and telling you what your table needs that it doesn’t have.

GephNodes

 

When I click “Next,” I get this screen:

GephNodes2

 

Onto the edges, which follows the same process except that I selected “edges” table in the “Import Spreadsheet” dialog box.  To see what your edges table looks like in Gephi, click on the “Edges” link in the left corner, just under the “Data Table” tab:

GephEdges

 

You can see that Gephi took my spreadsheet and calculated the number of letters between the nodes.  Take a look at the “Weight” column, and you can see, for example, that there were 5 contacts between Source 1 and Target 3, or, “Family” and “Beany.” Now that my data is uploaded into Gephi, I’m ready to see the initial visualization.

The Visualization

I select “Overview” to see what my data has produced, and, given the small sample set (only one year of letters) I know that the visualization will be small.

Vis1

 

Really tiny.  If you run your mouse over the icons to the left of the graph screen, you see that you have options for modifying the graph and making it look more impressive.  Here’s what I did to make my graph presentable and understandable:

1. Center Graph: This centered the graph in the window so it was a bit larger and easier to work with.

2. Show Node Labels: This put the names on all the nodes.

3. Sizer: I did this for the three individuals who sent/received the most letters, so they stand out from the other family members.

4. Dragging: I dragged the nodes and rearranged them so that all the edges could be seen clearly.

5. Painter: I selected a different color for each node, so each family member now shows as a different color.

6. Font Size: I used the font size slider to enlarge the node labels until they were easily readable.

7. Screenshot: I used the nifty screenshot tool in Gephi to capture my visualization.

GephiFinal

 

As of yet, there is no “undo” button in Gephi, at least not one that I have found.  My suggestion is to save your visualization every time you change it, so that you don’t have to worry about not being able to undo some sort of hideousness.  I wanted to see what “generate random graph” did; need I say more?  ***When I went back to the Data Laboratory, I saw that my tables had changed – all sorts of funky numbers and rows were added to my nodes and edges.  I deleted them, left the original data, and my graph returned to normal.  Phew! Sometimes, you learn more without the undo option!

ScaryGephi

 

When you save your work, it will save as a Gephi Project File, and when you open that file, it will open automatically in Gephi.

SavedFiles

 

Conclusions

I was excited to use Gephi because it seemed to offer a way to think about my data, as opposed to just representing it (aka my previously posted Timeline adventure).  Though I am by no means an expert at using Gephi, I can at least make my spreadsheets work  in the program.  I imagine that as I add the data from the next four years of letters, the visualization will become more complex.  Combined with the topic modeling of Mallet, the visualizations from Gephi lend themselves to answering a few particular research questions, and generating others in the future.  Who wrote the most letters? What topics were family members writing about? What kind of information did they share with each other? Did some family members share some things with one member that they didn’t with others? How did communications change based on personal circumstances? Did they sign their names differently or use affectionate nicknames based on the family member to whom they were writing? How did events affect the tone, topic, and frequency of letters?

The more data you have, the richer and more complex your visualization will be.  I started to use Gephi without any understanding of the program, except that it took data and made a network.  I had to spend a lot of time working with my data to get it into the correct form (after spending even more time figuring out what that form was) and had to try out all of the icons to see what they did to the network in front of me.  I won’t reveal how many times I closed everything out entirely and reloaded my data because the lack of an undo button frustrated me.  I once lost my visualization, and found it again (much later) when I (by chance) clicked on “Window” and selected “Graph.”  If you happen to read this, and have some data that’s crying out for a network visualization, give Gephi a shot.  When I originally looked at the sample in class, I thought “I’ll never be able to do this, but I really want to.”  I had ZERO experience with Gephi and aside from downloading it, didn’t know where to start.  Hopefully, this brief and simplified intro helps other novice (non)DHers increase their willingness to take risks and experiment with their data and the tools without being petrified of (inevitable but temporary) failure.

 

Using the Data: Baby Steps into the World of Visualization

When I entered the seminar room for the first time, I was itching to make stuff, about history, on a computer.  That’s what DH is, right? Making something visually pleasing from data that is historically useful.

I could not have been more wrong.  While I was more eager than a couple of my classmates to start tooling around (pun intended), doing DH required a shift in thinking, as well as a shift in doing.  The collaborative text, Digital_Humanities, suggests that we need to embrace “productive failure.” What the authors term “generative humanities” favors process over product and experimentation over finality.  My venture into visualization produced a snazzy product, but the process was ultimately more beneficial to my work as a historian than the interactive timeline I produced.

If you look at my data from my previous post, Thickening the Data, you’ll see that I have tons of information in my spreadsheet.  When we first started prowling the Internet for free visualization tools (let’s face it, we were all intrigued by the possibility of turning our data into pretty pictures) I came upon SHIVA, part of the SHANTI program at the University of Virginia.

Though currently only open to the UVA community, SHIVA may at some point be available for use by the larger public.

Though currently only open to the UVA community, SHIVA may at some point be available for use by the larger public.

I was able to secure a log-in to use the visualization tools, and based on the interface thought it would be a snap.  This is the first place I tested out my visualization chops.  I thought I would start out with a simple timeline.  I assumed that I could upload my Excel Spreadsheet, and that the program would suck out all the relevant information and make something –  a timeline, a chart, a map.

I should have read the directions first.

I tried uploading my Excel spreadsheet, but bad things happened.  Nothing appeared on the page in front of me.  I probably got some sort of error message, but have since blocked the traumatic memory.  I did this several times.  I blamed the program for most of those times, and then thought that perhaps there was a set of directions for a reason.

Here are the directions I should have read first.  My repeated ill-fated attempts to use SHIVA are what those in DH like to call productive failure.  Sometimes, it's only productive in hindsight.

Here are the directions I should have read first. My repeated ill-fated attempts to use SHIVA are what those in DH like to call productive failure. Sometimes, it’s only productive in hindsight.

I realized I had to use Google Docs to make my spreadsheet, so I had to create a Google account.You can view the spreadsheet here.  I made several spreadsheets before I got it right, and I won’t tell you how many times I started my visualization over from scratch, or re-saved a better version of the spreadsheet until the result was somewhat respectable.  If you want to venture down the DH rabbit hole, or beg for a log-in and give SHIVA (or other viz tools) a shot, here’s what I learned that might be helpful.

1.  It helps if you make your Google Doc public as soon as you start.  Once I created the visualization, I had no idea why nobody could access my timeline.  I made the visualization public, but the trick was to make the DATA public.  Note to world: I didn’t know people couldn’t get access until Fred tweeted that the link didn’t work.  I didn’t realize that because I was signed into Google (I didn’t know that I was – silly me) I could of course open the visualization on my computer.  A shameless plug for Twitter in DH – instant mortification at publicly visible errors can be resolved by immediate notification and quickly remedied.

My advice is to make your document "Public on the Web," unless it contains super-sensitive information that you don't want to share.  Part of the deal with using SHIVA is that your work is accessible by other users, so don't be a Scrooge.

My advice is to make your document “Public on the Web,” unless it contains super-sensitive information that you don’t want to share. Part of the deal with using SHIVA is that your work is accessible by other users, so don’t be a Scrooge.

2. Follow the directions.  Your laptop won’t explode if you don’t, but you will find it much easier and rewarding if you at least give a nod to the tutorial offered on the SHIVA page.  I created my spreadsheet following these directions (after wondering why SHIVA didn’t seem to like Excel) and included those columns that I thought were most relevant.

I kept it simple and included start and end dates, titles, descriptions, images, and importance.  Getting the images to display correctly drove me to the edge of sanity, but the outcome was worth the agony.

I kept it simple and included start and end dates, titles, descriptions, images, and importance. Getting the images to display correctly drove me to the edge of sanity, but the outcome was worth the agony.

3. There is a difference between the url for the webpage on which the image appears, and the actual image url.  I had no idea that images even had their own urls, but if you right click on an image (go ahead – try it on the pic above) you will indeed see an option that says “Copy Image URL.”  I couldn’t figure out what I was doing wrong, because I HAD urls in the image column, but when the visualization ran, nothing appeared.  I randomly right-clicked on an image to save it, thinking if I saved the image in Google Docs I could make it public and then copy the Google link into the spreadsheet, and totally by chance saw the option to get the image url, and gave it a try.  Score! Do that if you want to insert an image.

Those are the helpful hints.  Here’s how I actually made the visualization:

Here is where I started creating my timeline:

This is what the page looks like when you click on "Create Visualization."  You can choose from an array of options, so have a look and think about which visualization best fits your data.  You might think differently about what your data means after you see how it can be represented.

This is what the page looks like when you click on “Create Visualization.” You can choose from an array of options, so have a look and think about which visualization best fits your data. You might think differently about what your data means after you see how it can be represented.

1. I copied and pasted the url for my Google Docs spreadsheet into the space next to “Source of Events.”  That told the program where to find all my information.  I gave it a title, and played around with the height and width to see what size best fit my data.  The center date of the timeline turned out to be important – if I kept it at 1787, viewers would have to work hard to get to my tiny set of years, so I set it at 1942.

I played with these options for quite awhile before I settled.  This experiment with visualization proves that you have to be willing to take chances and risk failure to have even a modicum of success.

I played with these options for quite awhile before I settled. This experiment with visualization proves that you have to be willing to take chances and risk failure to have even a modicum of success.

2. I also played around with the zoom levels.  I wanted to make sure users could read the information on the timeline without having to zoom in or out too much, so I set it at a level that showed all of the information at once.  This is the part of the visualization that the importance set in the spreadsheet comes in handy.  The higher the number, the larger the text for that event appears on the timeline.  I played with that a few times, too, and decided to standardize the events so that general World War II events were the same size, and events related to Appleton soldiers were the same size.  The more data you have, the more important this is.  This strategy provides visual unity and makes it easier for users to organization what they are seeing.

3. Making the visualization look like I wanted it to required sampling colors and fonts – there’s no right way or wrong way to go about it.  The visualization updates every time a value is changed, so keep at it until you’re happy with what you see.

4. When you’re ready to share your work, make sure that your document has been made public.  Then, set your visualization so that it can be shared with the public.  There is a share link on the creation page, so you have to decide which format is best for you.  I am not yet a member of the “I Have My Own Domain Name” club, don’t have WordPress downloaded on my computer, and don’t pay for web hosting,  so the best I could do was share the link.  Thus, in the spirit of sharing, here is the link to my first visualization.  (Keep in mind, I created this in September, a mere month after starting my first class in DH).  The visualization is interactive, so you need to slide, click, and scroll your way through it.

This timeline is only a small portion of what could be a large project.  Charting the participation of Appleton soldiers in the larger context of World War II is a huge, but entirely possible, undertaking.

This timeline is only a small portion of what could be a large project. Charting the participation of Appleton soldiers in the larger context of World War II is a huge, but entirely possible, undertaking.

To more experienced DHers, this visualization might seem fairly limited (it is) and its output category somewhat difficult to determine. But, I found it really helpful to get my hands dirty in an un-intimidating way.  I got used to frustration and temporarily existing in a sort of digital purgatory.

I like my timeline.  As I continue the larger research project, a better and more complete timeline will certainly enhance the visual impact of my work for readers.  Without a lot of fluff, readers can see a story and (the beginnings of) an argument.

The product is a small achievement, but it was the process of creating this timeline that was the most beneficial.  Creating this timeline forced me to  reconsider questions that I thought were tangential to my research.  How did the Appleton soldiers fit into the larger story of World War II? In what theaters of war did they fight? What were they like as soldiers, and as men, while they were engaged in combat? I first found out about Robert Rooney’s Distinguished Service Cross when I was in the midst of making this timeline.  I wanted to find out what his unit did, and I knew he was killed in Tunisia, so robust Internet searching located a partial history of his unit, the 34th Infantry Division in the 135th Infantry Regiment.  Other stories are out there for the Appleton soldiers, and a full timeline will certainly tell a compelling story about their participation in World War II.

Beyond the research, this process taught me how to use Google Docs effectively.  I learned the language of a visualization tool.  I shared my work without any final polish and felt no remorse or intellectual guilt.  My self-congratulatory fist-pump at the first pop-up sighting was a joyous moment indeed (though tamed by immediate and future sightings of typos and inexcusable grammatical errors).  When I started using more complex tools like Mallet and Gephi, I had only to remember the difficulties I had with a (by comparison) simple program and work through the problems in much the same way – trial and error, repeated as many times as necessary until I got it right.

The takeaway from my foray into creating visualizations is that you don’t need any fancy know-how to do it.  You will learn as you go, so don’t expect to know how to do everything at once.   If you already know how to do it all, you might be reading the wrong blog! Finally, start out with something small so you aren’t \intimidated by the seeming impossibility of the task before you.

Thickening the Data: How Excel Helped Me Become a Better Historian

The purpose of this post is to show how I started rethinking my approach to historical research – finding sources, reading them, interpreting them, and using them to ask and answer new questions.

For Week 5 of our Digital History seminar, we had to read a few essays about the nature of historical data.  A post by Trevor Owens suggested that as historians, we can think of data in three particular ways – as artifact, as text, and as information.  Regardless of the form, data allows us to glean that stuff we call “evidence” from the stacks (or bytes) parked in front of us.  Owens also teamed up with Fred Gibbs on a collaborative essay on data interpretation and historical writing and they wrote something that really resonated with me as I dove into my own research:

But data does not always have to be used as evidence. It can also help with discovering and framing research questions. Especially as increasing amounts of historical data is provided…playing with data—in all its formats and forms—is more important than ever. This view of iterative interaction with data as a part of the hermeneutic process—especially when explored in graphical form—resonates with some recent theoretical work that describes knowledge from visualizations as not simply “transferred, revealed, or perceived, but…created through a dynamic process.”*  [Paragraph 11, The Hermeneutics of Data and Historical Writing, *Martin Jessop, “Digital Visualization as a Scholarly Activity,” Literary and Linguistic Computing, 23.3 (2008): 281-293, 282.]

That part in bold – iterative interaction with data – is the phrase that stuck out to me.  I have done plenty of research in the past, but my standard approach was to read the source, squeeze out the information, and move onto the next piece.  Thinking about working with my research over and over again at first sounded tiresome, but like they say – don’t knock it until you try it!

I added that new gem to what Tricia Wang said about big data and thick data.  After untangling the remaining historiographical cobwebs of Clifford Geertz and thick description, I started thinking about the importance of the story.  Where could I find the story hidden in the data? My growing Excel spreadsheet looked impressive with all its carefully labeled columns, but where was the poignancy? What did service numbers, parents’ names, and regiments tell me about the soldiers and their community? What did all this data have to do with the way these soldiers were remembered by the town?

The steps below outline my foray into treating my research as data, and might serve as a model as you tackle your own research:

Step 1: If You Want to Think of Research as Data, Create an Excel Spreadsheet!

I had never used an Excel spreadsheet to organize my research prior to starting this project.  I stuck to index cards, legal pads, and a whole lot of Post-It Notes.  When it came time to write the paper, I was a mess.  I always pulled it off, but not without a gradual loss of sanity and random moments of anti-social behavior.  When I started thinking about my research as data, I didn’t head for my stash of paper products – I went right for my laptop and fired up Excel.  With Excel, I can add columns every time I come across a different type of data.  I can color-code cells to create categories and connections.  Best of all, my data is digitized and I can save and export the information in a variety of formats for use in different digital tools.

Excel Screenshot

Only a couple columns in this spreadsheet came from the newspaper clippings. The rest were added based on searches completed with the information from those clippings.

Step 2: Venture into the Unknown Wealth of Cyberspace – Add and Fill Columns!

I had A LOT of blank spaces on my spreadsheet when I first started.  My research was helped by the kind staff at the Swift County Historical Society, who sent me an envelope full of newspaper clippings from the 1940s, when the Appleton streets were renamed for their war dead.  Short biographies of the 29 soldiers from Appleton who died during World War II helped fill in some of the categories – name, street name, and date of death.  Information in these bios was inconsistent, so for some of the men I had their unit, date of birth, parents’ names, and location of death.  Empty cells irked me, so I had to start thinking about where I might find that information.  If I had all the time and grant money in the world, I could easily hop on a plane and plant myself in the archives, city records offices, and libraries that have this information.  Because I don’t, I turned to the Internet (and all the FREE stuff I could find) to fill in the blanks.  I found information about these men that I had not anticipated, and as a result of hours of trying various search terms and moaning at the services that promised to find what I needed (but for a price) I was able to fill out most of my table and even add columns that I had not previously thought of, or considered important.  Adding the units they served in gave me a way to find out about their lives as soldiers.  What campaigns were they involved in? Would families back home have read about these in the news? What were they like as soldiers? Did they earn any honors and awards? How did they die? Did their place of death have any bearing on where they were buried? I discovered that PFC Edwin L. Haven was on board the SS Leopoldville, a Belgian troop transport ship, when it was sunk by a German U-Boat on Christmas Eve, 1944.  I unearthed photographs of the ship and information about the efforts of shipwreck-hunters to find it in the English Channel.  This is the thick data that gives meaning to the names, numbers, and dates that pepper my spreadsheet, but I only found this thick data because the spreadsheet screamed at me to be filled.

This portion of the spreadsheet was added after perusing online military records, census data from 1940, and sites like www.findagrave.com.

This portion of the spreadsheet was added after perusing online military records, census data from 1940, and sites like http://www.findagrave.com.

Step 3 – Seek Ye the Treasures of “Iterative Interaction” – Search, and Ye Shall Find!

The stories I discovered are fabulous, but I also unearthed a variety of other sources that help me construct a social history of wartime Appleton and its residents.  Every time I opened up the spreadsheet I used the information to mangle together unusual search strings.  One of the gems I discovered was the website of composer Daniel Kallman.  He was asked to write a piece for Appleton and the 34th Infantry Division, the unit in which many of  Appleton’s soldiers served (and still serve today!).  Read what he writes about how the composition, Streets of Honor, was composed:

Prior to composing "Streets of Honor," Kallman visited Appleton, listened to interviews conducted by author Erling Kolke, and read the same newspaper articles I did from the Appleton Press.

Prior to composing “Streets of Honor,” Kallman visited Appleton, listened to interviews conducted by author Erling Kolke, and read the same newspaper articles I did from the Appleton Press.

I found this because I kept playing with my data.  Kallman’s website led me to the semi-fictitious novel Streets of Honor by Erling H. Kolke.  As historians we sometimes hesitate to include fiction among our sources, but I snapped up this self-published novel and read it immediately.  This “down-home novel about the activities and families of the 135th Infantry Regiment” tells the story of a small Minnesota town with deep communal ties.  Reading this sheds light on why the community chose to honor their war dead by renaming their streets in 1947.

One search term in particular led to a treasure-trove of wartime letters written by one family from roughly 1941-1945.  I typed in “Ole Veum, Appleton” and first found a newspaper article about his exploits in the skies above Tunisia, but when I put in “Ole Veum, Africa” I found the letters.  “Letters from World War II” is a blog maintained by descendants of the Nelson family from Appleton, Minnesota.  The blog contains transcriptions of hundreds of letters, as well as scanned photographs, menus, postcards, and telegrams the family sent back and forth for the duration of the war.  THIS IS A GOLDMINE!

This website is a historian's jackpot, especially since it contains the transcriptions of the letters instead of scanned images.  The text is easy to dump into text-mining and topic modeling tools.

This website is a historian’s jackpot, especially since it contains the transcriptions of the letters instead of scanned images. The text is easy to dump into text-mining and topic modeling tools.

The lesson here is that precious evidence is sometimes hidden beyond the first search pages, and that rethinking and playing with data helped me locate sources I might never have found if I gave up after the first unproductive search terms.  Though I feel like I spent too many hours sitting in front of a computer digging for this information, think about the time (and money) I might have had to spend to find similar information in a dusty archive.  Now I have great stories, fabulous data, and can narrow my archival needs more accurately.

Conclusion – So, Why Should You Do This, Too? 

The simple answer is because you can.   The better answer is because actively engaging with my data produced new questions and new sources.  I started with an Excel spreadsheet, and the quantity and quality of my data snowballed from there.  I did this all from the comfort of home (and my slightly overheated university office) without having to pay a penny for travel, photocopying, and morale-busting days in the archive.  I am a more productive and creative researcher as a result.  My data is organized, and because of cloud-sharing made possible by Google, Apple, and Dropbox, I can access my data anywhere and easily manipulate it to upload into various digital tools.  Because I found the letters, I now have materials to use in a text-mining and topic-modeling project.  Don’t worry that you will be any less of a historian because you call your research “data.”  Keep in mind Wang’s recommendation about “thick data” and ponder how your quantitative data can help you uncover the qualitative data that produces the richest historical analyses.

Why Every Grad Student Should Take a Seminar in Digital History

We have been charged with submitting, for all the world to see, a critique of our digital history seminar.  We have further been instructed to be honest in our critique, and not be “nice” simply for the sake of making people feel good.  So, here goes.

I started the semester fairly enthusiastically – I really wanted to take a course in digital history, and I was excited to see something besides standard history seminars in the course offerings.  Plus, a friend and I had  snooped around the Internet and perused the course syllabus before enrolling.  We knew what to expect (so we thought) and were intrigued.  I did balk a bit at being coerced into using Twitter and setting up a blog.  Prior to taking this class, I had been entirely anti-social media, and have a Facebook account mostly so I can check in on my family now and then.  I rarely, if ever, post anything of my own.  Twitter struck me as a trendy, superficial everyone-is-doing-it-so-you-should-too fad.  But, bravely onward!  Blogging seemed a waste of time – time I could spend instead reading, doing research, or writing, but I liked the idea of blogging comments rather than sending an email or writing a response paper.  Once I got past the course requirements, which, in all honesty, appeared fairly light in comparison to other seminars (beware the shiny exterior – I probably did more work for this class than any other) I felt ready to tackle the world of digital history.  After a semester of frustration, elation, and at times utter confusion, I’ve concluded that every grad student should take a course in digital humanities.  The top five reasons are…..

1.  This course will change the way you think.  The brain game of figuring out digital tools forced me to ask new questions of my topic, and treat my research as data.  I had to alter how I thought about the information I gathered, and manipulate (not distort) it in so many ways to use the tools appropriately that I sought out more and more information, looked in additional places for untapped sources, and pondered more effectively the meaning of the historical stuff I had in front of me.

2.  If you want to be a historian in the 21st century, you need to learn how to use Twitter.  Twitter certainly had some haters, and I more than once bemoaned the lack of conversation in #dh2068.  I got into Twitter quite a bit.  I still am, but have unintentionally tapered off my use over the last week. (Have no fear – I plan to resume Twittering after the chaos of finals week is over). Twitter is no fun if it is not used to develop a conversation, and some people assume that Twitter confuses peoples’ meanings and isn’t suitable for sharing big ideas.  My philosophy? Don’t knock it until you’ve tried it.  Multiply your 140 characters by an infinite number of Tweets, and you can have as long a conversation as you want.  I found that my best experiences with Twitter were the result of commenting, re-tweeting, and contributing to conversations (even with people I don’t know personally) on a regular basis.  Like the DH world in general, Twitter requires collaboration and community, and despite the millions of conversations, can indeed be a lonely place without them.

3.  Blogging is a good way to practice your writing.  I’m not going to lie – I got into blogging, and plan to continue throughout my academic career.  Though we were encouraged to blog for class, I found myself blogging about my research and other topics along the way.  Because blogging was a requirement, I think some classmates were annoyed by the additional alerts they received when I updated my blog and suggested that we could perhaps use a separate blog for class, and another for personal use.  My philosophy is this: if you don’t want to read my entry, delete the email or cancel the alerts.  We were encouraged to push our boundaries and take risks in this course, so I did.

4.  Taking risks pays off.  Because the possibilities of digital tools intrigued me so much, I dove into a research project and didn’t look back.  While I have not had the time to devote to a full-scale research paper, I have uncovered a multitude of sources and compiled a series of questions to guide further research next semester, and am on my way to a more well-defined dissertation topic.  Beyond that, I have figured out how to use digital tools and feel fairly confident in sharing my still-limited know-how with other DH newbies, and am working on a tutorial to submit to The Programming Historian.  I am well-versed enough in the challenges and possibilities of DH to sustain a reasonable conversation with others, and am looking to continue my DH skills well after the seminar is over.

5.  You get comfortable with failure.  We started out the semester with a reading about productive failure.  I sat in front of my computer staring and clicking with little to no progress.  I made version upon version of visualizations with no visible change.  I attempted Gephi and just about had a panic attack when my screen changed and I couldn’t figure out how to get my pretty network back. (Nevermind the repeated attempts it took to get the network in the first place).  Historians don’t like failure – we’re not used to it, because failure to us is a personal inadequacy.  We can dismiss failure in the sciences because experimentation is the name of the game, and usually the net gain of the successes outweighs the even more innumerable failures.  The more comfortable with “failing” I get, the more productive I am as a researcher and a writer because I take the risk and believe that the potential for success overrides my desire to avoid failure.

Those are the warm fuzzy things.  There are certainly more pros to our DH seminar, but I’ll save them for my final project.

Here are the not-so-warm-fuzzy-things:

1. Grading and evaluation were not quite clear.  I know I earn a certain percentage for my blog posts, my use of Twitter, my leadership of discussion, and my overall engagement in discussion.  I certainly could have asked for a grade estimate at any time, but I feel that the lack of regular reporting on grades contributes to some of the requirements not being taken seriously by all, which impacts the ability of the entire class to engage critically with each other.

2. While I love the freedom of doing a series of blog posts for my final project, I do think that the freedom we have been given means that the objectives are not as clear.  A standard expectation for final projects is a sort of equalizer.  But, we have raised this question more than once in our class.  How do we evaluate digital work? Does time spent count for anything, or is it the final product that matters more? We all take varying amounts of time to write a 25 page research paper, but in the end, we all produce a 25 page paper.  We can count the pages.  How do we do this with digital projects? How do you evaluate risk? Creativity? Effort? Usefulness?

3. We were a small class, and the use of social media means that we all see the work we are doing for the class.  This is more personal than it is a reflection of others, but I was frustrated when I saw that others had not done the required work, or engaged critically with the content of the course in the way I thought was expected.  You would think that with social media as our main form of “work” and communication, public visibility would inspire us to adhere clearly to the requirements (we are in grad school, after all) but does it have the opposite effect?  Does a digital forum somehow relieve us of strict deadlines and requirements?

4. Again, we were a small class.  We were each supposed to lead a discussion, and we were all supposed to be in charge of a reading every week, if there were enough to go around.  Some people did a lot.  Some people did not.  If we need to take responsibility for discussion and for readings, we need to get this sorted at the get-go and at latest the week prior.  Leaving it to Twitter didn’t work all the time.

Despite some of the frustrations with the mechanics of the class, I learned more than I anticipated and look forward to learning more in the future.  I blog and I tweet – no small achievement, I assure you.  I am more comfortable with making my work transparent, and significantly more prepared to share my work in its early stages rather than in its final form.  And, at every possibly opportunity, I will dissuade colleagues from slapping up a PowerPoint and calling it digital history.

Oh no you didn’t……..I am TOO a historian!!!

I was innocently selling books to word-hungry passersby today, when a man approached the table looking for a particular book. So that none of the story is lost in translation, I am providing a rough play-by-play of the conversation below. Text in parentheses indicates action. Text in brackets indicates inner monologue.

Man: (Nosing around books and flipping pages)
Me: (Watching man)
Man: Do you have any books about historical revisionism?
Me: Do you have anything specific in mind?
Man: Yes, historical revisionism. Do you know what that is?
Me: [is this guy kidding?] Well, it’s not my “area of expertise” but historical revisionism is what we do all the time. Historians revise history when they challenge a standard or orthodox interpretation of a particular event, so when we write history we are usually always revising it in some way. But, revisionism got a bad rap when some historians used so-called statistical evidence to claim that the Holocaust never happened. You know, the Holocaust deniers.
Man: That’s what I’m talking about. They came up with the number “six million” to justify the establishment of the State of Israel.
Me: [oh no!] Does the number matter? (Yes, if it’s been purposely distorted – see this page for discussion) What if it had been two million, or one hundred? Don’t the individual lives matter more than the statistics? Whether one person or six million were killed, isn’t the fact that they tried to destroy an entire group of people enough?
Man: This is what I mean. How can you say that the number doesn’t matter? Aren’t you supposed to advocate for the history department? How can you do that if you don’t know what history is?
Me: [This is NOT going to end well]
Man: They don’t have evidence to say that six million were killed, and that’s what they base everything on. I can’t believe historians would say that accuracy doesn’t matter.
Me: I didn’t say that accuracy doesn’t matter.
Man: You did. Aren’t historians supposed to be objective? And only write about the objective truth?
Me: [Thank you, historiography] Actually, historians have generally agreed that achieving objectivity is practically impossible, but objectivity and accuracy aren’t the same thing.
Man: My own family was part of this Holocaust, that’s why I’ve been studying it for so long.
Me: [And you want to tell ME about objectivity?]
Me: If you really want to take a good look at the statistics, you ought to have a look at Martin Gilbert’s work.
Man: Does he say the number is six million?
Me: [I’m fighting a losing battle]
Man: I’m a hard scientist, and I have been for twenty years. I was hoping to have a conversation with a real historian. I’m smarter than most of the professors in your department, and you’re not a historian.
Me: Okay. Thank you, sir.

I may not yet be a professional historian, but I at least believe that I know how to reason and argue like a historian. There are many statements here that rubbed me the wrong way, but my chief complaint was that he appeared to want to engage in some sort of conversation, but was more intent on proving me to be an inept (or not at all) historian. When he threw in the dig about the quality of the professors in our department, I thought about asking him politely to put his books down (which he didn’t end up buying) and leave. It’s because of those professors that I was able to carry on that conversation and feel fairly confident in my ability to engage in a somewhat historical debate. I say “somewhat” because I’m not really sure what he was after.

We spend so much time talking about our audience that I wonder if we really think about who is in that audience. When we say “the public” do we assume that the entire public is open-minded and willing to have their preconceptions challenged? This man was the real-life version of the Internet troll. As historians, I feel that our work should engage the public and challenge them to think differently about the past, but look what happened here! This man was so obsessed with a number, that I wonder if he took the time to think about the evil inflicted upon his family and the suffering they might have endured. (I’m not totally sure what his history is.) Does he think the number makes the difference between a genocide and something else, or that a lower (or higher) number makes the causes of the Holocaust any less (or more) pernicious? Does he think that such things can be measured?

And so, unbeknownst to him (and if he knew, probably to his chagrin) he demonstrated beyond a doubt what perhaps separates historians from scientists. We use statistics and data in our work, and in fact have to if we want to claim authority, legitimacy, and all those other things that make our work more accurate, but we also look at the people and processes behind those statistics. We question the statistics, and cannot ever rely on them alone because they only tell part of the story. We turn to words, to photographs, to behaviors and attitudes, and to the material products of society to make our narratives as rich as we can. Not complete, because we know that as much as objectivity is an elusive goal, so too is a finished past. Yes sir, we have revisionism because we constantly encounter new evidence or read it in new ways so that we might tell a slightly different story than another historian did before. And yes, part of the reason why we tell that different story is because we are affected by our political, social, culture, economic, and religious contexts. The stories we tell often reflect the times in which we live.

If I encounter this man again, I will have to thank him because his ignorance gave me pause, and motivated me to reflect more critically about the role of the historian in modern society. We go on and on about our readers and our audiences, and about letting down our hair from the ivory tower of academia, but who is in that audience? What do they bring to the table? How do we negotiate the impact of long-held beliefs or previously (and maybe erroneously) inculcated versions of history? Do we exclude hard-heads and ignoramuses from our audience and leave them to the wolves? Or, do we challenge ourselves to present our work to them in such a way that they can’t help but engage with it and question their own preconceived/implanted notions?

Maybe I can get this man to help me study for comps. I don’t need him to agree, just to argue.

Thank you, sir. I am a historian.

Scholarly Scholarship and the Perils of Peer Review in the Digital Age

Have you ever thought about, written, or seen a word so many times that it starts to look funny and you question whether or not it is even a word? That’s how I feel about “scholar” after this evening’s provocative discussion on the quality control process of peer review and its application to digital projects.

Our debate grew rather heated as we engaged with the following ideas – scholarly vs. scholarship, knowledge, use value, and sustainability, among others.  Perhaps the most controversial part of the discussion was our exchange on the differences between scholarly work and academic scholarship.  Is there a difference? Do we value one contribution more than the other? Do scholarly contributions count as much as scholarship?  Our point of departure was a series of short readings about guidelines for evaluating digital scholarship, particular as a contribution to consideration for hire, promotion, or tenure.  We (I think) decided that none of the guidelines are entirely satisfactory.

The MLA Guidelines propose a sort of “go me!” approach and offer no real methods for critiquing the products and processes of DH.  The work takes a backseat to the scholar’s ability to self-promote, and the message is fairly unambiguous: Advocate, and ye shall receive (a job, tenure, a snazzy CV, props for the digital humanities).  James Smithies takes us farther and classifies digital projects into six categories.  Slightly put off by Category 6 and worried that our own attempts at DH might be “rarely seen, and generally politely ignored,” we were on board with the fact that any digital work contributes to the field as a whole, regardless of its category. [Just a side note: Most of us in #dh2068 are fairly confident that we will at least be up for consideration for Category 5 by the end of the semester.]  The University of Nebraska provides a series of guidelines for the techies in all of us, but the suggestions might only make sense to Category 1 and 2 digital scholars, and are the DH equivalent of the Chicago Manual of Style.  Todd Presner gives us a set of evaluative criteria that we found much more helpful, most probably because they replicate our standards of review for written texts – intellectual rigor, authorship, multidimensional applicability, sustainability, and scholarly contribution.

We saw the peer review process in action when we read William G. Thomas III’s process piece on a digital article.  The article, “The Differences Slavery Made: A Close Analysis of Two American Communities”, was born digital and went through several revisions before finding a home on the University of Virginia server.  Two things struck me about the process he and Edward L. Ayers went through to construct the article.  First, the process of review was almost identical to the process currently used by print journals and publishing houses.  Second, though the pair sought to produce a unique reading experience in digital media, the peer review process resulted in the construction of a digital project that bears uncanny resemblance to traditional print articles.  This article was composed (I feel “written” doesn’t work to well when talking about a digital project – is it writing? construction? creation? production?) in the early 2000s and was therefore fairly exceptional for the time.  Even today I find it a great resource, especially when preparing for comps (just check out the historiography section and you’ll see what I mean!)  But, I can’t help but feel that reviewers tended to blame the form for the inadequacies they perceived in the content.  That is, they assumed it was the structure of the digital media that undermined the strength and clarity of the argument.  Reviewers balked at the presentation as too “gimmicky” and invoked some of the same arguments regarding authorial control that we see in Fitzpatrick’s Planned Obsolescence.   This perception is perhaps not a failure on the part of Thomas and Ayers, but a failure on the part of reviewers to do what Presner suggests – engage with the project on its own terms, respect the medium in which it is published, and privilege form and content equally.

Thomas and Ayers claim that the project was an “attempt to translate the fundamental components of professional scholarship – evidence, engagement with prior scholarship, and a scholarly argument – into forms that take advantage of the possibilities of electronic media.”  [insert lively class discussion here]

Can we, when employing and deploying digital tools in the service of humanities endeavors, simply translate traditional scholarly expectations from one form to the other? I think that’s perhaps the whole point of trying to develop a set of standards that evaluates digital projects on their own terms – that applying the traditional standards of review doesn’t work.  When applied to Thomas and Ayers project, it altered the final form such that it became a journal article on a computer screen, with links instead of appendices.

So we come to the contentious question of the night: What is scholarship? We got a bit prickly in class, but found that there might be a difference between “scholarly work” and “scholarship.”  We thought that an argument is an essential feature of scholarship, but what kind of argument? If I put together a digital archive or a collection of primary sources and offer them to the public in a way that these sources were not previously accessible, have I produced a work of scholarship? Or, just something scholarly? I could say that there is an argument embedded in my work – I chose certain sources, I arranged them to tell a story, and I included information garnered from sweaty archival efforts.  Am I less a scholar than someone who publishes a transcribed diary? – Future scholars will use my work to produce their own – who’s to say that these efforts shouldn’t be rewarded with a job, a promotion, or tenure?

The heart of this debate, snarking and verbal jabbing aside, seemed to be the intellectual weight of process versus that of product.  What do we, as future producers of knowledge, value most? The end result or how we got there? If the monograph is the accepted standard, or the journal article a signifier of progress and prestige in the field, then the answer is clearly product.  Digital humanities. scholars, though, put great stock in the process and methods – so much so that experimentation, risk, and potential failure (and the willingness to confess these!) seem to be the hallmarks of good digital scholarship.

Developing a rigorous review process that satisfies DHers and paper professors alike calls not just for separate standards for digital work, but a universal set of guidelines that account for the myriad ways in which scholars might present their work, and the changing definition of scholarship that the digital age requires.

December 2016
M T W T F S S
« May    
 1234
567891011
12131415161718
19202122232425
262728293031