Romping Through Wikipedia
I.
I’ve noticed a healthy amount of vandalism on the Hononegah Wikipedia page.
Last summer I halfheartedly queried “Hononegah” in the Twitter search engine, and found this gem:
Following the link, I found these edits; these beautiful edits.
And so my quest began.
II.
This seemed to be a good starting ground:
2008 and 2009 represent the vast majority of changes (vandal or otherwise), with 30 or less edits per year after 2009. The year 2016 already represents a marked increase in anonymous posts from previous years. There are approximately 300 edits total, and nearly 100 are classified as minor. The editor of the post makes this distinction.
From the present, working backwards:
2016.
From a mobile phone, late into the night of March 21, 2016:
At 6:22pm on March 1, 2016
“Ross Neir, Student Council Prez. OGG”. I chuckle.
While probably not intentionally vandalism, this occurred at 3:14pm on February 11, 2016.
Sometimes school jokes appear, like this instance at 3:37pm on February 5, 2016
2015.
I am sick of screenshotting. Linking is a much better approach. 5:55am on December 11, 2015 this happened.
From 12:23am to 12:27am on November 17, 2015:
I chuckle again. Also note that someone made a throwaway account just for this.
Note that this at 2:20am on August 13, 2015 is neither the first, nor the last recurrence of Boylan High School in vandalism to the page. Within 60 seconds a bot automatically reverted the text and temp-banned the IP address from future edits to the page.
We have returned to this post’s foremost example of vandalism from 4:13am, July 15 2015
On 4/20 last year, at nearly 4:20pm, the account SloanKetterling, who probably himself was not in a functional cognitive state, changed the school’s name to memtally challenged, then to mentally challenged, then back to Hononegah. I am genuinely surprised that they caught the spelling mistake. [NOTE: later on the timing will change from 4:20pm to 11:20am]
Many months of peace elapse on the page.
2014.
At 4:26pm on August 5, 2014, user changes the average ACT from 22.3 to 36 to 69 and then back to 22.3.
At 10:51pm on March 30, 2014 a current student was added to the notable alumni page; thankfully, bhalcom, who probably is Brad Halcom, an IT for the school, caught this a few weeks later because Evan “was not official.” Related: Evan “is the best student they have and is amazing.”
2013.
6:31am, October 2nd 2013: Does Rudy Reynolds have clout? He did, until “Rockton Legend” became “Trash” and HQ became BO just five days later, at 10:41pm on October 7th, 2013.
Then, from 4:55am to 5:02am, April 10, 2013 happened.
Brennan Steines is a male model in New York, then Johann Hayag becomes the first female (?) soccer player to score a goal, but actually that was Joan Rivers, not Johann, actually no, wait, I think it was Jonnah Hill? As a side note, Brennan Steines actually picks up trash on the side of the highway. Then our amorphous formshifting soccer player becomes Eli Trulley. Scott Hamilton transitions from a male figure skater to a Somalian Pirate. Now a bot swoops in to revert the vandalism, and leaves a reference tag; one of the original two IP address above breaks the tag, until someone else (probably Nick Welcher) undoes them to reinclude the fixed reftag. Now they reinstate their vandalism; Scott Hamilton is a Somalian Pirate again! Nick Welcher redeletes the vandalism. You are a good man, Nick Welcher.
At 4:51pm on March 13, 2013, Bruce Calson won the 1984 International Hula Invitational and Moses split Kelsey Field in half.
At 5:33am, Februrary 3, 2013, John Eckburg will eventually find fame in piano pedagogy.
2012.
2011.
At 12:58pm, September 19, 2011, Hononegah became Homonegah. For a brief eight minutes, the school had been known as Homonigguh, which had a beautiful fight song written by [a student?] whom I do not know. EDIT: I’M SO WRONG ABOUT THAT SONG here
3:39am on September 12, 2011, Hononegah briefly experienced an interesting change in gender identity.
At 4:27am on 17 July, 2011, well, you can caption this yourself. Also, another example of student names explicitly included in the page during an incident of vandalism.
At 2:32pm on January 28, 2011, Hononegah again finds itself second to Boylan on the page.
2010.
9:06pm on November 23 2010, MOSES RETURNS and also some kid won a Twinkie eating contest.
3:04am on November 8, 2010, Katie Henning being hot!!.
4:10pm on August 7, 2010 the twinkie eating kid also was a sports announcer, and at 5:25pm the same editor notes that Flava Flav found his nickname in the school’s cafeteria while eating a sloppy joe.
3:21am on February 25, 2010, a truly exceptional mind notes this.
2009.
10:18pm on November 5, 2009, in Soviet Hononegah, you don’t learn school, school learns you. Also at 10:14pm, twinkie kid was the announcer at football games until he lost his voice in a Sweedish yodeling contest.
Between 9:43 and 9:47pm on October 13, 2009, user Jmoney747 made some almost funny contributions (1, 2, 3). Number 1 mentions my brother.
At 8:15am on September 28, 2009, this was inserted into an otherwise fine section.
5:54am on August 19th, 2009, Homonegah makes another appearance, and then it gets worse, before, two minutes later, this same user corrects it back.
3:35am on Jun 4, 2009, DO YOU WANT TO HEAR A JOKE? Also, this actually made me laugh because of its clever wording and sentence integration.
9:06pm on April 24, 2009 Hononegah becomes known for its boogie dancing program. Also, twinkie kid and Moses both return.
Nic Haab, a graduate of the class of 2008, now teaches Social Studies at the school and was here at 3:59pm on March 17, 2009.
At 5:33pm on January 16, 2009, some affectionate names for the Dome entered the page. Later, at 3:12am on February 20, 2009, someone else entered that “No one calls it that what were they smoking when they wrote this.” Also, my brother’s name gets changed to Turd Fergueson. Additionally a fellow Forensics team member receives this sardonic shoutout.
I literally wrote this one myself. At 5:22pm on January 10, 2009. This was me in the 5th grade. Also, I made the introductory part up. George Kelsey probably didn’t donate 1.6 million dollars, nor would it cost that much to build a stadium. He was a coach in the 60s, not the 40s. The rest was true, to my knowledge, but certainly doesn’t merit a mention on this page, espeically one that has remained for seven years. I just wanted to give my brother a shoutout, I swear. Earlier today I tagged the paragraph as dubious and in need of citation- my contrition to the page. (Also interestingly, my IP address has changed, despite my family remaining with the same ISP).
At 6:35am on January 3, 2009, Jacob Clausen enters the page with his notable journey to Mars, paper football and sandcastle building, and never forget, being the coolest person to ever attend Hononegah.
2008.
5:31am on December 30, 2008. Caption this very obvious example of vandalism if you dare. Slight variations 1 and 2 before they realized that a bot was automatically deleting them.
3:00am on October 10, 2008, Jenbunny from The Hills makes a cameo.
3:30pm on September 10, 2008 a student gets listed under Notable Alumni as a “bowel movement chemist,” which at 3:48pm on September 12th, 2008, receives this addendum.
11:34pm on August 23, 2008, somebody doesn’t like sports.
6:22am on July 29, 2008 actually did not laugh at this.
3:38am on March 25, 2008 David Brown is the best basketball player in the state.
11:43pm on February 17, 2008 Daniel Rohrer, the future president, and at 5:41pm on March 2, 2008 Alexander Killion, his vice-president.
5:29pm on January 28, 2008, the dome falls victim to yet more fetishism. 10:49pm on January 23, 2008, the dome gains its “affectionate” title.
7:50pm on January 11, 2008, user Owenstiffler adds content about a student named Owen Stiffler, which shouldn’t raise any flags whatsoever. The comments about Lane Horcher at 11:27pm on January 10, 2008 here.
2007.
2006.
2005.
I have now reached the creation of the page.
III.
All of the times are incorrect because Wikipedia’s central time mechanism is not equal to Central Standard Time. I will have to adjust them all. To test the timing mechanism, I made this irrelevant edit at 4:51pm; Wikipedia registered the edit as 9:51pm. So, I will have to subtract five hours from all the above times.
Moving over to a spreadsheet for calculations, I enter the data like this:
Column A represents month, B day within month, C year, D hour, E minutes within hour, and F am/pm. Some changes need to happen. Primarily, the hour/minute system will not work (unless Google Sheets can compute the values modulo 12). I switch columns E and F and go to town.
First I add 12 to the hours marked with a PM, bringing them into military standard time. Then I subtract 5 from each to adjust for the Wikipedia vs. Central Time problem. When a value drops below 12am, I continue the subtraction and knock one day off of column B. I then change any 12ams to 0, and the hours are complete.
Next I standardize the months. Google Sheets may have been able to do modulo 12, but it can’t do modulo 31-then 28 but sometimes 29-then 31-then30, and on and on. I google each date to find its ordinal place within the Gregorian calendar, making sure to account for leap years in 2016 and 2008 (note that 2012 had no data).
IV.
I hypothesize that month of the year has no impact on frequency of vandalism.
I perform a chi-squared Goodness of Fit test on the data above with an expected value of 47/12 = 3.912 and degrees of freedom df=11. Note that a X2GOF test is only valid with expected value greater than 5; however, I am testing a population, not a sample, and feel confident that 47/12 = 3.912 is the true expected value. Chi-squared is 15.5532 and p=.15854. There is insufficient evidence to reject my hypothesis given this data set. {4.257, 3.9167, 0.001773, 1.1, 2.1719, 0.001773,
2.427, 0.2145, 0.001773, 0.2145, 0.2996, 0.9379}.
Bummer. Maybe by grouping the months into seasons I will see an association. After all, some instances of vandalism could have just fallen into the next month. So, grouping should reduce this effect by a factor of 3 (12 months into 4 seasons).
I use the standard distinctions of winter, spring, summer and fall. I hypothesize that season has no impact on frequency of page vandalism. I use another chi-squared GOF test with expected value 47/4 = 11.75 and degrees of freedom df=3. Chi-squared is 2.4468 and p=.485. There is insufficient evidence to reject my hypothesis given this data set.
I have an implicit stereotype of someone who would vandalize their high school’s Wikipedia page: attends that school, is a male, stays up late at night at a sleepover with friends who together think its funny to change some things around. Given Wikipedia’s largely anonymous editing, the only thing I can test is the late at night aspect.
For ease of calculation and for hope of not sheering off my data, I group instances of vandalism into 2 hour blocks beginning with 0,1 and ending with 22,23.
One last chi-squared Goodness of Fit test. I hypothesize that the hour of day has no impact on frequency of page vandalism. Expected value of 47/12 = 3.912 and degrees of freedom df = 11. The test results in Chi-square = 42.617 and p=.1266e-5. There is strong evidence to reject my hypothesis that hour has no impact on vandalism.
However, I should expect nobody to post on the page, vandalism or otherwise, between 5 and 8 am. Likewise the workday should include less posts. Early evening should have the most posts, since I think that most people have their leisure time then. Without concrete data (and believe me, I searched) I have nothing to compare it against. Could there be a stronger association between hour and average sleep/wake patterns than between hour and vandalism?
So I have to resort to non-test parameters. 16 of the 47 vandalisms occurred between 10:oopm and 1:59am, which is 34% of posts compared to 16% of the day. Very similarly, a separate 16 of the 47 vandalisms occurred between 10:00am and 1:59pm, which is 34% of posts compared to 16% of the day. So, I can conclude that page vandalism happens at an equal frequency late at night as during the lunch hours (which shouldn’t surprise me; when else during the school day do students have extended and unrestricted access to the internet?).
I would logically expect that the data correlate strongly with the number of total posts; since posts of vandalism are included in the number of total posts, I’d be shocked to find otherwise.
A linear regression of the form y = a +bx with y representing vandalisms and x representing total posts gives a=.024 and b=.138, with r-squared = .8301 and r=.9111. So, 83% of the variation in vandalism is actually just variation in the total number of posts on the page.
Now I can test the change in the percent of posts that are vandalism as time progresses.
I can see visually that there is no real change. The %vandalisms column has enough zeros to render the overall data incalculable, but even after removing them, the change is probably just random noise, especially given the substantial fluctuation in sample size (total posts) each year.
Finally, I contrust a histogram with bucket size b=40 to visualize the data set. It seems that April 30th to June 9th have very few instances. Ignore the last value since only 360-365 could possibly fit into the bucket, despite its width of 40.
The asterisk indicates that 2008 and 2016 were leap years, with days in the year being 366. Not many conclusions can be drawn from this histogram, other than that the late spring and the summer months may have nominally less instances of vandalism than the school year, which I would expect given that students’ minds are off school more when they are out of school.
IV.
It seems that the boredom of high school sleepovers do not cause most of the vandalism on the Hononegah Wikipedia page, as far as I can tell. Nor does date. Nor does month. Nor do most of the other quanitifiable measures I could establish given the wikidata available. It seems that the rate of page vandalism is either
random
caused by factors not recorded in Wikipedia’s edit meta-data.
And it’s probably the latter, which in retrospect makes this whole quest futile.
V.
I used the Revision History tab on the Hononegah Community High School page on Wikipedia (revision tab here, page here) to find examples of vandalism. I clicked through every revision from 10:00am today to the page’s creation in 2005. My test for vandalism was simple, and perhaps not rigorous enough: in the style of Jacobellis v. Ohio, I thought that I “would know it when I see it,” and although that isn’t a precise defintion, for the purposes of this project it worked well enough.
General edits like spelling, category shifting, introduction of sources, fleshing out of details, and on, were not counted as vandalism. Instances where editors altered common words to curse words, included off-topic references, references to Moses, and posts that generally were too silly to fit the page’s overall professional tone were considered vandalism. There wasn’t much in between.
When counting entries, I counted only one incident of vandalism per day. This eliminated the risk of high frequency double- or triple- posters creating a correlation where none actually exists by skewing or throwing out of proportion the data by time stamp.
It is slightly illegitimate to use chi-squared Goodness of Fit tests on population data. However, given that this was an entirely self-motivated project to kill time on my day off, I don’t care. Also, it should work more effectively on population data than sample data, just without the margin of error calculation mattering.
VI.
Bonus Content: on an almost related note, the page saw an immense spike in traffic in December of 2015, the same day as the Dome’s deflation:
although the traffic from my research today could likely match this spike.