An experience in reporting online plagiarism

Over the past couple of days, I've been catching up with a bit of research, concentrating predominately upon current support for WAI-ARIA amongst popular screen readers. After I'd exhausted what I felt to be the key resources (I may blog about these later) I decided to do something I think most of us end up doing, and that is to type in appropriate terms into Google in order to identify any articles or papers I hadn't originally picked up upon.

Scrolling through the list of search results, I spotted a link to an article on a web site of a web design agency I'd never heard of, but which dealt with a particular aspect of WAI-ARIA I was interested in. (Before I go on, to protect the innocent - and the guilty - I have decided not to name any of the parties involved, nor the title/details of the article in question). So, I clicked on the link, and began reading. Almost immediately, something didn't feel quite right.

The web site itself was reasonably "up to date" - different backgrounds for particular sections of the page, rolling images, megamenu, fat footer - if not spectacular. However, as I scanned through the article, I was experiencing a growing sense of déjà vu. I was convinced I knew the artwork, peppered through the article, from somewhere else. But it was the text itself that irritated me more. I found myself "completing" sentences in my mind, in the same way that you find yourself singing along to a song you haven't heard for years.

I'd read this article before, hadn't I? No, obviously not. It was the web design agency's blog, and the text at the top of the page read "Written by author". It was posted only a couple of weeks ago. So, it must be new, and I must have mistaken it for a similar article. Maybe they'd simply read a similar article online, and re-written it in their own words, as I used to do as a first year undergraduate (come on, we all did that? Didn't we...?). Not exactly a crime, then. So, giving them the benefit of the doubt, I read the article in full.

And then it appeared.

At the very bottom of the page, just above the footer, but below the references (ha!), was a link - "Article source: http://www...". Huh? "Article source"? I'm reading the article, surely there isn't a "source"? So, I clicked on the link. I recognised the resultant site straight away.

I had read this article before. It was published back in late 2010, which was around the time I had first read it. It was published on a very well respected web site for web designers. I didn't recognise the author's name, but I remember the article being well structured, very well referenced, and obviously written by someone who knew what he was talking about. It contained details of a study the author and his team had carried out (which meant that it was still shown as "we" in the duplicated article, meaning that someone scanning the article may have been under the impression that it was the web design agency who had carried out the study). The article itself was obviously the product of many months hard work. Yet, there was no mention of his name on the other site, nor the artist who had put together the rather beautiful illustrations.

So, I figured, there must be an innocent explanation. As an occasional developer, I come across vast amounts of plagiarism, in which someone blogs a particular JavaScript or jQuery technique for (say) a nifty lightbox effect, and suddenly the source code appears everywhere as wannabe coders try to show how clever they are, maybe changing one or two lines maximum. Incidentally, if you're a developer and you want to find out if code has been stolen from elsewhere, take a look at the comments, particularly if they are to do with how the code is structured, or what particular routines mean. Is the author answering them? Are comments disabled? If he/she is making an attempt at answering them, they should give an indication as to how much the author knows about the code he has just posted. If he is ignoring them, he might be "too busy to answer just now", but presumably he should know his own code enough so that he can spare the five minutes it requires to clarify or answer any queries, particularly if he was willing to advertise it to the world in the first place.

Anyway, back on topic. This was a web design agency, providing a commercial service, not an individual programmer in his bedroom writing functions. Maybe there is a link between the site and the original publisher, and there is some reciprocal agreement between both parties to publish each others work. That said, I had my suspicions.

So, I decided to pose the following question on Twitter:

I'm reading a blog entry on a web design company's website which is directly lifted from elsewhere - should I inform the original author?

I received three responses, each of whom suggested that, yes, I should. So, with a little concern for making a fool of myself, I got in touch with both the author and the publisher. I wasn't really quite sure whether I was doing the right thing - there had to have been a simple explanation, and I was going to make myself look an idiot. "Of course we know about it ", they were going to reply, "You are an idiot for bringing it to my attention. I run both sites, and I am very busy. Go away!". So, it was with a little trepidation that I hit the "Send" button to both parties.

A couple of hours later, I received the following (edited) response from the original publisher (at the time of writing, I haven't heard from the author):

Thank you for contacting us about this. I have written to the site owner and expect them to remove the article until permission can be gained from [XXX]. Indeed [YYY] is in violation of copyright. They did not request permission from us. They do not have appropriate attribution, and they're not allowed under any circumstances to reproduce [ZZZ]'s artwork.

Woohoo! I had done my good deed for the day. I felt good. I then revisited the web site to find that this was only one of several articles they had stolen (yes, I'm going to use that word) from the original publisher, even going as far as including the original source within a submenu of the megamenu. The article was exactly the same, in its entirety - all of the images had been reproduced, as had the list of references. Bizarrely, the links to the article translations linked to translations on the original website. Maybe they didn't want to deal with the different language codes.

After I'd congratulated myself several times over (yeah, OK, I'll admit to doing so), I started to feel angry. This was not some first year undergraduate hoping to gain a few extra marks for his essay. This wasn't even a very amateur programmer passing off someone else's code as his own. This was a firm carrying out commercial web design work. To the uninitiated, they seemed to know a lot about WAI-ARIA. Hey, if I worked in a small business looking for a new web site, I think I could trust these guys - look, Mr Boss, they know about accessibility, let's give them a significant amount of our budget to build an accessible site. On the other hand, maybe the "author" was pressed for time - he knew there was the potential for a few big jobs to come in, and wanted to impress his clients, so he thought he would write something on one of these new fangled technologies, but just couldn't find the words (or didn't know them in the first place).

This is no excuse though. While I'm still relatively new to web accessibility, I've worked late nights reading up on latest developments, playing around with NVDA and the Web Accessibility Toolbar into the wee hours, reading paper upon paper, article upon article and so on, so I can understand what it's like to navigate through a site under extraordinary circumstances. I've spent years catching up on latest developments, to achieve a moderate level of understanding of the issues. It's been hard work.

Yet, I very rarely blog my thoughts. There are many other more experienced bloggers working in both the commercial and academic realms. I certainly never copy anyone else's work and pass it off as my own. When I do have the time and inclination to write something interesting, I don't copy and paste - instead, I use that tried and tested routine of linking to the original. There's nothing wrong with links. Links don't show that you don't know stuff that someone else does, and therefore you cannot be trusted. Rather, by choosing your links carefully (and assuming you actually read the target resources in detail!), it shows that you are aware what others are doing in the field. It says, "hey, I know about this stuff, but this guy over here writes about it in more detail". It's certainly much better than lifting the content in its entirety and passing it off as your own, which just makes you and your firm look bad at best, and clueless incompetents who are not to be trusted at worst.

So, in future, I've decided that I'll always report suspicious content. I would encourage you to do so as well.

UPDATE 28 February 2011: After a quick check this morning, it appears the copied articles have since been removed from the web design company's website. Hopefully that's the end of the matter - and a reason why it's always good to speak up!

UPDATE 23 February 2011: I've since heard from the author of the original piece, who was also completely unaware that his article had been used elsewhere.

"Double Captioning" - my approach to using captions and subtitles as a means of learning a second language

Through my work with DMAG, I have become more and more attracted to the subject of captioning. However, this interest grew not as a result of my work-based interests within accessibility, but through my own personal experiences of using captions as a novel approach to learning a second (and now third) language. In this blog entry, I'll show you how I've used a few open source, or exceptionally cheap, tools to achieve this, and reflect upon how useful this approach has been for me personally. I should point out that I tend to use Mac OSX-based tools so, if you're a Windows or Linux user, you may have to hunt around for alternatives, but I'm sure they exist. Additionally, as I've been learning Danish (and, more recently, German), I'll be using that language as the basis for the discussion, although I'm pretty sure you can apply the techniques I'll demonstrate to any language you're motivated to spend a bit of time immersing yourself within.

Anyone who knows me will be aware that I've been learning Danish for 5-6 years now, albeit at a very slow pace - I guess I'm OK at reading it (with a little dictionary at hand just in case), but I'm much less confident comprehending spoken Danish, and even less confident at speaking it to others. There are several reasons behind this. Firstly, I've had to teach myself the language, as there are no evening classes in my area offering face-to-face tuition. Secondly, I find the whole textbook-based learning approach, with the audio supplements spoken by a jobbing actor in a monotonous drone, a bit dull (in fact, on one of the Danish CDs I bought, I seriously thought the actor in question was ready to top himself - the speaking clock showed more emotion than this guy). And, finally, I'll admit it - I'm a geek. If I can use technology to solve a problem, I will. Even if it means messing around with it myself and trying things out. The approach I have personally found useful is to learn the language through watching Danish DVDs on my iPod, taking advantage of the Danish captions (for the hard-of-hearing Danes) and the equivalent English subtitles (for non-Danes) and, crucially, through displaying both sets of captions and subtitles on the screen on the same time.

So, why take this approach? Firstly, it fosters a much more mobile and portable way of learning - on a long bus/train journey, I can select a film on my iPod and learn the language at the same time. I don't need to be carrying textbooks and CDs around with me in case I have a sudden urge to practice my listening comprehension. Secondly, it's a much more interesting way of learning - Danish DVDs are, of course, aimed at the Danish market, so you get to hear how real Danes speak, talking about real life subjects at a real pace, rather than Mr "I-went-to-RADA-you-know" explaining how one asks for the price of a banana. You never know, you might also enjoy the film - who couldn't fail to be entranced by a film with the English title, "Sunshine Barry and the Disco Worms"? Of course, if you're completely new to the language, this is perhaps not the best approach - while it isn't a particularly motivating process, running through a few lessons in a textbook will help you with the basics of grammar.

I should point out that I have carried out no research to test out whether or not this is more generally a useful approach for others, although it is something I am hoping to follow up academically at some stage (albeit not using this specific approach, but thinking about captions as tools for language learning more generally). Rather, I came across this approach by accident, and by trial and error. About three years ago, I started buying DVDs from online Danish retailers - which, unfortunately, are extremely expensive (GBP20 per DVD, not including postage). Early on, I ordered possibly the most successful (although not really known internationally) Danish film of recent years, "Den Eneste Ene" - as an aside, it was remade in the UK as the literally translated "The One and Only" starring Patsy Kensit, and set in Newcastle rather than Copenhagen but, predictably, flopped. When the DVD dropped through the post, and given that most of the films I ordered had English subtitles (as my Danish was even ropier back then), I took a look at the subtitles offered - Norwegian, Swedish, Finnish, as well as Danish for the hard of hearing. Guh. That'll be twenty quid down the drain then. That was until I found out that there are various websites such as Open Subtitles whereby enthusiastic hobbyists have created subtitles in their own language and uploaded for others to share. I quickly found English subtitles, hunted around for appropriate conversion and subtitle burning software, and stuck the film onto my iPod. I watched the film several times, finding that I could quickly match the spoken text with the English equivalent. As a result, and whenever I received a DVD through the post, I'd find out ways of working with captions and subtitles as a means of improving my abilities in comprehending the language, until I settled upon the "double subtitles" approach I'll demonstrate here. In this approach, I display the Danish for the hard of hearing captions on the screen at the same time as the English subtitles.

 

An example of double captioning

The above picture is a still image from the film Den Store Dag ("The Big Day"). What I'd like to focus upon is the double subtitle feature, although if you wish to focus upon the delectable Louise Mieritz instead, I won't hold it against you. The text at the bottom represent the original captions - these are provided on the DVD for Danes who are hard of hearing. The text at the top is taken from the English subtitles, which also appear on the DVD but (like the Danish captions) also appear at the bottom of the screen, so I had to "grab"; them from the film and move them to the top. The effect is that the viewer hears the Danish being spoken, but can also read how this Danish appears in text, as well as seeing an English translation, all at the same time. I am aware this might appear quite confusing at first, but with a bit of effort you can get used to watching films in this way.

This effect is a little tricky to achieve, so I will run through the process I take.

Stage 1

Convert the film to MP4 (or equivalent), "burning" the captions into the movie file

I use a free tool called Handbrake to convert the DVD to MP4 format. On slower machines, this can be a bit of a drag - on my old iBook G4 with 1Gb memory, a two hour film took two hours to convert. My new Macbook Pro (4 Gb) is much faster, although you still need to give yourself twenty minutes or so to allow for conversion. The key, however, is to make sure you "burn" the Danish captions into the converted file. Handbrake is one of the few tools that will do this for you. Click on the "Audio and Subtitles" tab, then select the captions in the original language - do not select to burn the English subtitles, for reasons I shall explain later. Remember that this might take a while, so go and do something else while you wait for the file to convert.

Burning captions in handbrake

Stage 2

Source English subtitles

There are several ways you can source the English subtitles. Firstly, if they do not come on the DVD, check out Opensubtitles to see if a hobbyist has created them for you. Normally, these will be provided in .srt or .sub format, in which each subtitle/caption is represented alongside its respective timestamp. The second approach is to "grab" the English subtitles from the DVD itself. For this, I use another free tool called D-Subtitler. D-Subtitler uses optimal character recognition (OCR) to identify subtitles on the screen and converts them to text.

Grabbing subtitles in D-Subtitler

To do this, double-click on the DVD icon, and drag the VIDEO_TS folder onto the "Objectif Mac" panel, then select "English" from the available languages. Again, this can take about ten minutes on a decently spec'd Macbook Pro, so give yourself plenty of time. Once the OCR conversion process is complete, you will normally be asked to clarify particular characters that the application has not been able to recognise - again, this can take time, and can be particularly frustrating, as you may have to confirm characters more than once. It will then save the document as a .srt file. Here is where it gets interesting. Despite the best efforts of you and the application, some characters will not have been converted properly - for example, you will find that the number 0 has been replaced by the letter O, the letters I (capital i) and l (small case L) are muddled, and so on. Thankfully, .srt files can be opened by any bog standard text editor, so you can go through the script and make changes where required. To go back to the previous step, this is why you burn the captions for the target language rather than the English subtitles - it is much easier for you to recognise mistakes and correct them in English than it is in the language you are intending to learn.

A third step, while extremely time consuming but also somewhat rewarding (and one that I have carried out on more than one occasion) is to grab the captions for the hard of hearing and translate each individual caption into English. This isn't particularly difficult, as the timestamps should be included within the original file, but obviously it does take up a bit of time and effort. I generally take this approach as a last resort, if I cannot find English subtitles either online or on the DVD.

Stage 3

Merge the English subtitles into the MP4 and export to iPod format

The final tool I use, which is unfortunately not free but extremely useful and which does the job perfectly, is Submerge. For just nine dollars, it's an absolute gem. The website itself explains all the features but, by way of a quick summary, Submerge allows you to burn the English subtitles onto the MP4 file you created earlier, and also to export it to the iPod-friendly .m4v format (as well as many other formats, if you prefer to watch the film on your laptop or other video device). The tool allows me to place the subtitles towards the top of the film, play around with different fonts and colours, and finally render the entire film to my iPod. Submerge effectively works the other way round to D-Subtitler by reconverting the (corrected) English subtitles to images, before burning them onto your MP4 file. Once again, depending upon the spec of your Mac, you should make yourself a tea or a coffee while you wait for Submerge to do its thing.

And that's basically that! You have created a doubly-subtitled movie file that you can stick on your iPod and watch over and over again in an attempt to improve your language skills. I use many more captioning and subtitling tools than I have mentioned here, but the three that I have mentioned are perhaps the most useful. As I stated earlier, I have found this approach a useful way of improving my Danish language skills, albeit after I have put the groundwork into learning about grammar, pronunciation and so on, and I am now at a stage where I'd like to be doing the same thing with German films.

And now, if you'll excuse me, I'm off to drool over Ms Mieritz...er...further improve my Danish.