A question for editors: Could MS Word use my track changes to teach editing to artificial intelligence?

EyeB

Here we are, in 2019. It’s the last year where we, the world, are ‘teenagers’ in this century. We’re growing up fast, and, if you believe futurist Ray Kurzweil, the singularity is predicted to happen just 26 years from now. At this point, machines will be able to self-improve at such a rate that it will signal the end of the human era as we know it. This idea hasn’t gone without critique, but instead of worrying about the end of humanity, let’s narrow the scope somewhat: how is machine learning affecting the work of language professionals such as translators and editors?

Translation software

Last year I attended the SENSE Conference, where the final keynote speaker was Sarah Bawa Mason, Chair of the Institute of Translation and Interpreting (ITI). She talked about the future for language professionals, particularly with how fast machine learning is happening. Non-translators will be familiar with Google Translate and maybe with DeepL, which is producing very good output because it uses high quality translated texts as its source data, rather than the vast quantity of variable-quality data that Google uses.

Professional translators, however, use software such as DejaVu, SDL Trados and MemoQ. With these packages, translators can manipulate the text in ways that seem very sophisticated compared to what you can do with track changes, the main tool that editors use. For example, translators can sit their source text next to the target text rather than toggling them as we do in editing to show various degrees of changes; they can make a dictionary of terms and the translations; they can use predictive typing (I see this has come into Gmail now); they can make project-specific lists of terms and proper nouns; and when translating something that looks like a familiar phrase, they can see all the ways they’ve translated it before.

Editing software

To an editor, this is astonishing. But editing software is also advancing. With PerfectIt, for example, you can already have your stylesheet right in the document instead of as a separate document, and you can use it to check for consistency. In MS Word, you can already modify the custom dictionaries, using a different dictionary for each document. The editing function in MS Word is getting better all the time; go to File>Options>Proofing> then choose the Settings in the ‘When correcting and spelling and grammar in Word’ section to see how many options are there. Grammarly can check spelling and grammar as MS Word does, but it will also check consistency, suitability for genre, the length of paragraphs, and active sentences. Hemingway has a focus on readability rather than catching errors. There are a bunch of similar programs, each highlighting problems so the user can decide what to do. An example of software that does actual editing is WordRake, which goes quickly through a document making tracked changes to simplify wordy text. Note that I ran it over this piece but accepted less than 10% of its suggested changes – it just hadn’t picked up on the nuances of the cohesive devices I used between sentences or other devices such as parallelism. WordRake is one of a bunch of software packages offering various degrees of textual intervention that include ‘rephrasing’ and ‘contextual spelling’. None of these tools, however, currently offer anything like the level of actual editing service that translation tools already give translators.

Where does all this data go?

All these software packages are processing vast quantities of data, and at the conference Sarah asked if we knew where these translations are going. Translators who work for a translation company are creating data that contributes to the company’s databases. The software that freelancers use might also do this. The International Federation of Translators mentions the need for equitable solutions regarding freelancers’ copyright position in relation to computer-aided translation tools. Translators are warned about the client confidentiality risks of using software such as Google Translate because Google then has a license to use that data to improve their services.

This already affects editors using Google Docs as their tool of choice. Microsoft has a similar license allowing it to use your content to improve products and services. The software that editors use isn’t anywhere as sophisticated at showing the user how they can best use their previous work to inform their current work, but is that technology coming? All your track changes could be used to teach the program which prepositions you choose for which nouns, how you reduce wordiness, how you move subjects closer to the top of sentences, where you decide to put commas in or take commas out. Look at this patent from 2014, which describes how editing rules can be developed that are based on an editor’s previous changes for the purpose of offering suggested rewrites of a text.

The developments in non-editing language work continue apace: Microsoft is fast developing artificial intelligence that can create translations with ‘human parity’ between Chinese, German and English, as well as a text-to-speech synthesis system where the voices are almost indistinguishable from recordings of people. These advances will come to editing work too.

What do we do now?

Where does this leave language professionals, particularly editors? Sarah described new markets in pre- and post-editing of machine-translated texts. Machines are still some way from being able to produce edited text and, as with translation software, when they can it will take some time before they’re any good at it. Even those hilarious ‘scripts’ written by artificial intelligence neural networks are still terrible and some are actually written by people (for the social media lols). The key thing is that good writing is subtle, and machines are still a blunt instrument for manipulating human language. Editors and writers work together to make sure that writers’ intended meanings are delivered, so far as is possible, cleanly into the minds of readers. But we will all be keeping our still-human eyes trained carefully on the future.

Advertisements

More commas before ‘and’

CommasB

It’s got to be the topic closest to editors’ hearts: commas. We love arguing about them, and I’m intrigued by how different editors apply comma rules. I’m fond of saying that there are two types of editor: comma putterinnerers and comma takerouterers, which is just another way of saying, ‘I see your comma rules and I raise you anarchy’.

Here I want to talk specifically about the comma before ‘and’. I’ve written about it before, but this post is about the specific examples I see in very polished writing.

Brief comma refresh

The function of commas is to separate elements in a sentence:

1. To separate main clauses linked by a coordinating conjunction such as ‘and’. These are sentences like:

Caffeine can keep coffee drinkers alert, and it may elevate their mood (TLBH p. 432)

Each element either side of the <, and> is a sentence that can stand on its own.

2. To set off introductory elements: Yesterday, she was very upset.

3. To set off non-essential elements: She was, understandably, very upset.

4. To separate three or more items of equal importance in a list: We all need air, water, food, and chocolate. (The last comma in this example is a serial comma; it’s not the topic of this post.)

5. To separate coordinate adjectives: The red, blue, yellow and green blanket was very warm. (This sentence has a list that doesn’t use the serial comma.)

In general, then, you don’t use a comma when you’re separating items in a list of two. TLBH says explicitly that you should delete a comma that separates a pair of words, phrases or subordinate clauses joined by a coordinating conjunction (p. 448).

When sentences are short, this is easy. We would all recognise that these commas are wrong:

A. I bought milk, and bread. (comma incorrectly separating two direct objects)

B. I went shopping, and bought groceries. (comma incorrectly separating two verbs dependent on the same subject: I went shopping; I bought groceries)

However, this is more difficult to see in longer sentences where the structural elements are the same, just longer.

Sentences using <, and>

From a number of large documents, I took a sample of sentences that have the structure
< , and> and then divided them up into different types. A lot of them fit the pattern described in 1 and 4 above, so those commas were fine.

But I could also see some that fit the A and B examples above. (Note that the sample sentences below come from real sentences I find in my work, but I’ve changed key phrases to protect privacy.)

Type A (with two items that come off the stem ‘required to research’ shown in the numbered square brackets)

(a) More work is desperately required to research [1] the best ways to manage these areas, and [2] whether we are able to improve the vegetation density in the short term.

Type B (with two verbs both dependent on the same subject)

(b) After six months I asked to work as a business analyst, and was promoted to Senior Analyst.

(c) The organisation works with the community, local government and industry groups, and fosters good management practices and ethics.

Some reasons authors might do this

I’ve had a look through this dataset and they do seem to fall into some rough categories.

1. Proliferation of ‘and’: the first list item has ‘and’ as part of it

(d) I have divided it into tools that we use specifically as part of our reading and writing processes, and tools that we use in a broader business context.

In this example, the author wants to separate the two occurrences of ‘and’ that occur as part of the phrase ‘our reading and writing processes and tools’, even though it’s clear from the parallel structure of ‘tools that we use for X … tools that we use for Y’ what the two items in the list are. Sentence (c) also fits this pattern.

Sometimes the first list item can contain so many instances of ‘and’ that the comma helps bring the reader back to what the stem is (although the best solution then is usually to break the sentence into two and, in the example below, fix the redundancy).

(e) You will need to revise the climate change section of the website regularly with updates and new information about adaptation planning as well as emergent technology and resilience processes that reduce impacts of climate change and warming, and the lessons being learned from on-ground projects and field trials.

In this very long list, the author is distinguishing between information available from external research and information available from this organisation’s internal research. However, the first list isn’t named as such and consists of a list of the possible things that come under the heading of external research.

The overall issue then seems to be that the sentence is just too long, and the author feels that adding the comma will help break up what appears to be a long list.

2. Using a comma to separate nouns where one is modified but the other not

(f) By the end of the period, to have all agencies, and properties identified in the research program showing improved results in pest management.

If this sentence were just ‘… to have all agencies and properties showing improved results …’, I think the author wouldn’t have used the unnecessary comma. Note this modification is restrictive, in that not all properties are included in the list.

3. Where a prepositional phrase gives more information about the first noun in a list of two nouns

(g) He has been a plant illustrator for books based on theses, and has advised art students on aspects of biological illustration.

(h) Her role includes writing fact sheets about social research, and working with a variety of community stakeholders.

The prepositional phrase here is another kind of modification, but it’s non-restrictive; it just provides additional information.

4. Using a comma when the first list item has a series of commas, and it seems to close that list off before giving the second list item

(i) I left Australia at the end of a long, hot, enervating summer, and began my research at Max Planck Institute feeling like a desiccated leaf.

5. Using a comma to signal a pause in real time

(j) It is the first thing I do when I come into the office each morning, and the last thing I do before I leave each night.

Here the author seems to have inserted the comma to provide the sense of real time that has passed between the morning and the evening. Sentence (b) could also fit this pattern, if the author intends to convey a passage of time between asking for the role and being promoted within the role.

6. Using a comma to provide emphasis by separating the two list items

(k) He said he WOULD make the call, and would do so immediately.

The capped ‘WOULD’ was in the original text, which led me to think that the author wanted each list item to stand a bit more independently than they would without the comma.

I’m very interested to know what other editors think about this. Can all these instances of comma use be gathered together under the umbrella of avoiding ambiguity (even that is usually an explanation for use of the serial comma, where the list has three or more items)? If you’re an editor and have a source that says this type of comma is fine, please do link it in the comments below!

 

Fowler HR and Aaron JE. 2007. The Little, Brown Handbook. Pearson Longman. New York.

 

All the married ladies: unravelling the puzzle of plural titles

MarriedLadies

I recently saw a photo of my cousin, her two sisters-in-law and her mother-in-law at what looked to be a lovely evening out with family. The photo was tagged ‘Four Mrs Jenkins’, and my cousin had asked where the apostrophe went. It took me a moment. I knew right away there was no apostrophe, because it was a straight plural, not a possessive. It wasn’t in the same category as ‘All the cats’ whiskers’, which is the full set of whiskers belonging to all the cats, or ‘All the archers’ bows’, the full set of bows belonging to all the archers.

But was it Jenkinses? That didn’t seem right either. For one thing, it would be ‘Four Mrs Jenkinses’ and that sounds like ‘Four Misses Jenkinses’, which was a clue to play around with the title instead of the name. ‘Misses’ is the plural of ‘miss’. It made me wonder what it would have been if we were talking about all the husbands instead of all the wives.

That was easier: they would be the four Misters Jenkins. It seemed right, but it made me think about which was the noun and which the adjective, as nouns get pluralised, but adjectives (at least, in English) don’t. You can use a person’s name as a noun, as in ‘That boy really is a Jenkins!’ which makes it seem as if ‘Jenkins’ is the noun.

But Mrs is a title, which is a noun, and other titles turn into count nouns when you pluralise them. For example, ‘Justice Davies, Justice Smith and Justice Andrews [The justices] have all declared their support.’ ‘Jenkins’ in ‘the four Misters Jenkins’ isn’t taking a plural, which means it’s functioning more as a modifier to the title. If you didn’t have a title getting in the way, you could easily pluralise ‘Jenkins’: ‘The Jenkinses will be coming over for dinner tonight.’

The reason it took a bit of working out is that we still use the French for the plural: Mesdames. So that’s the answer: ‘Four Mesdames Jenkins.’ I hope they all had a wonderful time at their party. Bon soir!

https://en.wikipedia.org/wiki/Mrs.
http://learnersdictionary.com/qa/plural-form-of-mr-and-mrs
http://www.quickanddirtytips.com/education/grammar/how-to-make-family-names-plural?page=2

Provide me food: dependent prepositions

Figs

Over the last year or so I’ve noticed some language change happening to the word ‘provide’ and how it’s used with prepositions – the words that tell you the relationship between the nouns (and pronouns).

A verb like ‘provide’ can be transitive, intransitive or ditransitive. That is, it can work:

  1. without an object (intransitive): We hope the state will provide.
  2. with an object (transitive): They provided food.
  3. with an object and an indirect object (ditransitive): They provided her with food (or They provided food to her).

The third example has the preposition ‘with’ or ‘to’, depending on what order you put the nouns in, but the rule is that the noun that comes second uses a dependent preposition, and if the recipient is the indirect object, then this element is optional.

‘Provide’ has a few dependent prepositions:

  • Provide (someone) with (something) (from #3 above)
  • Provide (something) to (someone) (also from #3 above)
  • Provide for (something or someone)
  • Provide against (something)

What I’ve been seeing recently is examples where the author clearly meant to use the ditransitive version, but didn’t add the preposition:

They provided her food.

The preposition is now missing. This changes the type of sentence it is, from being a ditransitive example to a transitive example, and changes the relationship between the nouns, therefore changing the meaning of the sentence.

The original sentence, ‘They provided her with food’ means that someone has given a woman some food.

If we take the preposition out, ‘her’ is acting possessively to modify ‘food’. It was her food, not his. Or: ‘The food that is her was provided by them.’ The focus can vary, but what’s important here is that the food is hers.

Here’s another example:

The restaurant also ensures that students provide customers service that reflects the excellent reputation the owners have built up over many years.

This sounds like it was supposed to say either ‘customer service’ or it’s a possessive that is missing its apostrophe. Was it meant to be ‘customers’ service’? If we put the preposition in, it’s clearer:

The restaurant also ensures that students provide customers with service that reflects the excellent reputation the owners have built up over many years.

The reason I’m uncomfortable with the form that omits the preposition is a question of geography. It’s not acceptable in British or Australian English, but it is acceptable, apparently, in American English. Forms such as ‘provide me food’ are found in the US, which to my Australian ears just sounds like someone being lazy in saying ‘my’.

Here are some ngrams for the phrase ‘provide me (with) food’ according to English, American English and British English. It’s clear that the phrase using the preposition is much more common than without. But interestingly, the two forms are closer together in British English than in American English. Ngrams are only looking at published works, most of which will have been through some kind of editorial process and therefore will conform more closely to standard.

English1950_2008BrEnglish1950_2008AmEnglish1950_2008

I had thought that a straight internet search would show up more occurrences of the version without the prepositions, but “provide me food” returns 11,400 results, and “provide me with food” returns 84,000 results, which is approximately 7 times the results returned without the preposition.

It will be interesting to see how this changes in the coming years, but for now I’m staying with the ‘with’ team.

Dashing through the alphabet

Christmas0117

Christmas 2017

Dashing to the end of the year, and what a year it’s been! You can see here on the work section of the site what’s been keeping me busy: among other things, I’ve edited reports about the Great Barrier Reef, tourism and bush foods in Australia, contract farming in Ethiopia and aquaculture in Kenya.

Apart from that, I convened a conference committee for Editors Queensland and IPEd for the 8th IPEd National Editors Conference. It was a wonderful experience, during which I worked with many talented people. If you’re an editor in Australia (or want to visit Australia!), make sure you come along to the next one: Melbourne, May 2019. If you’re a writer, know that editors love getting together and talking about the best ways of working with authors. We’re all in the business of making manuscripts sing.

Speaking of singing, to end the year I’ve dashed off this little Ode to the Alphabet (with apologies to ‘Jingle Bells’). Enjoy!

Checking carefully
In a research manuscript
For sense and clarity
Red pen loosely gripped
When suddenly I twitch
Something is not right
A quoted passage has a glitch
Whose is this oversight?

Oh, alphabet, alphabet
Order is the key
You let the reader find the source
By checking A, B, C
Alphabet, alphabet
You make it so easy
For anyone to find a name
In a bibliography.

I need to find out where
This quote has first appeared
Was this error there?
Or is it as I feared?
The author made a slip
Retyping this in haste
Instead of point and click to give
Verbatim copy and paste.

Oh, alphabet, alphabet
Thank goodness I can find
In 50 pages of references
That one particular line
That takes me to the source
Of the quote, and I’ll know quick
I’ll see if I should fix it up
Or merely add a [sic].

I’ve made it to the site
The original of the quote
The error’s there, all right
I have the antidote
I add a little [sic]
It’s just like poetry
How simply with the alphabet
Works your bibliography!

See you in 2018!

 

Oh, the dilemma!

Rock Hard PlaceI was recently asked by a friend about the difference between ‘choose’ and ‘decide’. She felt that ‘choose’ implied more personal agency and that ‘decide’ is what you do when you have less personal agency and you’re left with having to select between someone else’s choices. She asked this specifically with regards to how animals in captivity might be limited in their choice of opportunities, such as where they might like to rest. She told me that animals in the wild have a high degree of agency and choose from a wide range of opportunities such as exactly how much dappled light they might lie in. Confined animals have notably less options, often having to decide between sun or shade, inside or outside, the rock or the dirt. I love that these questions of word choice come up in the most interesting fields!

Anyway, my linguistics study, many moons ago, included the topic of semantics, where we had whole assignments on the difference between a cup and a mug, between happiness and joy. For those assignments we needed to look carefully at the evidence of how both these words are used, starting with definitions.

Macquarie has this for ‘decide’:

–verb (t) 1. to determine or settle (a question, controversy, struggle, etc.) by giving victory to one side. 2. to adjust or settle (anything in dispute or doubt). 3. to bring (a person) to a decision: *The appearance of the woman decided me at once –oliné keese, 1859. –verb (i) 4. to settle something in dispute or doubt. 5. to pronounce a judgement; come to a conclusion.

and this for ‘choose’:

–verb (t) 1. to select from a number, or in preference to another or other things or persons. 2. to prefer and decide (to do something): she chose to stand for election. 3. to want; desire.

Neither definition mentions agency, although ‘decide’ has more meanings that imply an external party.

After looking around a bit further, I think the word that conveys the lack of freedom of choice is ‘dilemma’. There are a number of different types of dilemma (including, but not limited to ‘ethical dilemma’), but the main definition is ‘a problem offering two possibilities, neither of which is unambiguously acceptable or preferable’.

The problem with ‘choose’ vs. ‘decide’ is that while ‘choose’ definitely conveys the possibility of two or more choices, it can equally be used to describe limited choices. We have a lot of expressions for this (from that wiki page linked to above): ‘Damned if you do, damned if you don’t’; ‘Lesser of two evils’; ‘Between a rock and a hard place’, since both objects or metaphorical choices are rough; ‘Between the devil and the deep blue sea’.

But none of these tell us if we would use the verb ‘choose’ or ‘decide’ for them. For that we can go to ngrams, which track word and phrase usage over time. Here are the links to where I ran a couple of these.

  • lesser of’ shows ‘choose between’ on the chart; ‘decide between’ doesn’t even appear.
  • between the devil’ is the same.

If you accept that ‘between’ means that the choices are only two (and therefore limited), then you can run an ngram on ‘choose between’ and ‘decide between’, which shows that ‘choose between’ is far more common than ‘decide between’. All this suggests that ‘choose’ can be used even when agency is limited.

The other problem with putting ‘decide’ as the option where an individual doesn’t have agency is that while some of its meanings are about external authority, the meaning also specifically includes instances of personal agency. We also have positive connotations in English for that word; for instance, you might prefer to read a job application where the applicant describes themselves as decisive rather than indecisive. It has strong connotations of personal agency.

Also, ‘choose’ and ‘decide’ are in many instances interchangeable, and only in some instances not:

  • ‘humans choosedecide what possibilities are available and the animal must decidechoose which to utilise’
  • ‘animal had agency then it may choose [maybe interchangeable here? but less so] something completely different.’
  • ‘tripe or brains for dinner? I wouldn’t choose [can’t change this one] to eat either … I must decidechoose between what’s available’.

All of this makes me want to spend days gathering a ton of data from various corpora about exactly where these phrases are used and exactly how much agency is implied in every usage and do a comparison over time … but instead I will choose to keep going on my more immediate editorial tasks. Happy new year!

Everyone’s a critic: beware Muphry’s Law

OneDoesNotSometimes you see people taking to the streets, vigilante style, to right wrongs and solve the world’s problems, especially those very important problems, like grammatically incorrect graffiti. Sometimes, stories like this even make it around the world, as when a couple of grammar pedants in Quito, Ecuador, recently made news for The Guardian paper in the UK. They take their correcting cannisters and add accents, cut commas and modify misspellings.

Quito

I loved this story, even though the vigilantes in question have to carry out their crusade under the protective cover of darkness and behind Twitter handles like Diéresis (Spanish for ‘diaeresis’, the name of the two dots that go over the second vowel in a pair to signal a syllable change, such as in ‘naïve’) rather than their own names. It’s subversive stuff, correcting grammar in Ecuador. Despite the perils, they’re doing a public service – supplying corrected copy for the benefit of all those passersby and even the original poster, as it were.

It’s one thing to be pointing out errors in graffiti; the public shaming that goes on whenever someone makes a grammatical error on the internet seems another kettle of fish entirely. Judging by the number of memes about this, it appears to have become the fallback position of losing arguers (meme-counting is valid quantitative data collection, isn’t it?).

Muphry'sMemes

It can also get revoltingly rabid. Errors often arise from a literacy problem (but does anyone have the stats on how public shaming of poor literacy improves literacy? I thought not). But it’s not always about literacy: it turns out there is a reason why people who know that they know the difference between ‘your’ and ‘you’re’ are still typing the wrong one. This research describes how our brains take shortcuts to get the job done, choosing high frequency routes sometimes over the correct route. For example, you might type ‘I’m going, to’ when you mean ‘I’m going, too,’ because you’re probably more used to typing ‘I’m going to [the shops/check/be there, etc.]’.

Both of these stories remind me of Muphry’s Law. You probably know of Murphy’s Law (anything that can go wrong, will go wrong); Muphry’s Law says that if you criticise the writing/editing/proofreading of a work, there will be writing/editing/proofreading mistakes in your criticism. Some time ago I saw an unfortunate instance of Muphry’s Law in action. Taped to the back of a toilet door on a university campus, I found an ad for an editor who was offering to help people with their assignments. Unfortunately, the ad itself had an error in it.

EditorError

This isn’t an exact example of Muphry’s Law, in that the editor wasn’t directly criticising any written work, but obviously there is an implication that student work will have errors and will therefore need editing.

You have only to go to any site about grammar or language use to find that Muphry’s Law is strong. Writing something about the state of grammar teaching in particular will bring out the critics in droves, each of them lamenting the old days when we could all parse a sentence and express our ideas with eloquence and grace, yet somehow failing to do it themselves. In the comments for this article from The Age about teaching grammar in schools I found someone who is not concise in urging writers to be concise:

I’m not a grammar nazi, like so many old fogies who have few other achievements in their lives to hang on to but understanding past perfect participles and wrestling subjunctives into submission, but as someone who wasted years trying to teach writing to uni students who didn’t know a noun from a verb, a subject from a verb, a comma from a hyphen or a sentence from a jumble of clauses or phrases, I am all for enough traditional grammar to enable people to say what they mean with clarity and conciseness.

And then there’s this, from a WBC picket notice:

WBC

Muphry’s law aside (which is always good for a laugh), shaming people on the internet for poor grammar or spelling is the least effective use of your time there. Instead, why not get some grammar giggles?