Converting Word to LaTeX with Writer2LaTeX

Gerhard Schaden

2022-11-23 09:11

LaTeX is great, but not everybody knows how to use it. So, what can you do if you get a .doc or a .docx-file, and need to transform it to LaTeX?

As far as I know, there is no direct way of conversion (though maybe pandoc could handle this). So, the best bet is to use LibreOffice Writer to open the file, and then use a converter.

There is a dedicated LibreOffice-Writer plugin to do Word->LaTeX conversion, which is called Writer2Latex (and which you can download at https://writer2latex.sourceforge.net/). It does a decent enough job for simple documents, but I discovered (too late) that there is one thing that can make your life much easier.

When using it, you really want to desactivate the Activate Multilingual Support option (or whatever it is called in English, in French it is Activer le support multilingue, and it is the first thing you can disactivate on the left, see screenshot below). It will pepper your document with useless \selectlanguage{english} commands in front of every paragraph. Unfortunately, it is activated by default, and you absolutely want to shut it down (especially if it is not you who has prepared the document).

nil

If I had known that before, it would have saved me a few hours of my life.

For anything fancy, you will have to work by hand anyways (glossed linguistic examples, trees, etc.), but the nice thing is that Writer2Latex will transform elements it has difficulties with to images (and autoinclude them).

LibreOffice Writer for Linguists and Linguistics

Gerhard Schaden

2021-02-16 17:22

There are many high quality sites telling you how to use Microsoft Word for your specific linguistic typesetting needs (for instance, Maite Taboada's webpage, or this site by Ng E-Ching). Unfortunately, I have not seen any page specifically dedicated to LibreOffice Writer¹ – which is a pity, since LibreOffice is free software (in both senses), and can be used by anyone free of charge.

Many of the tips for Word are valuable also for LibreOffice (and I would strongly encourage you to have a look), but making numbered examples (with glosses) works quite differently in LibreOffice, and I could not find an easy way of doing it in the (extremely scattered) documentation for LibreOffice. So, this is my attempt to providing some documentation for LibreOffice Writer.

CAVEAT EMPTOR: I have tested this under LibreOffice Writer 6.4.5.2 on OpenSuse Leap 15.2, and also with LibreOffice Writer 7.1 on Windows 10. I assume that these procedures are not very different in other operating systems and versions of LibreOffice, but this is only an assumption.

Creating Numbered Examples

Assume you have just written a paper with 100 examples, and you have written the numbers of the examples as plain text. Unfortunately, you notice that

you really should add another example between examples (9) and (10) in the introduction. This would make things much clearer.
Your section 2.2 (containing examples 44–52) should really become section 4.1.

You have lots of work to do! If only there were a way of adjusting the numbers of the examples automatically (and also the cross-references to them). Do not worry – this is possible, and the following will show you how to do it.

In order to create automatically numbered examples, we need a way of introducing a custom counter for this purpose. Here is how to do this:

Click on INSERT → FIELDS → MORE FIELDS (This may be bound to <Control F2> on your computer, but on my computer, the window manager overrides the keyboard shortcut).
Click on the VARIABLES tab.
Under TYPE, choose "Number Range", and make sure that on the right column, you choose "Arabic (1 2 3)".
Under SELECT, choose "Text" (not sure that this is necessary).
We need to give this new field a name. In the NAME field, write "NumEx" (for Numbered Example) or something like this.
Finally, click on Insert

With this procedure, you should have introduced the numbered field for your numbered example in the file. Henceforth, you should not have to reintroduce this counter manually each time you make a numbered reference; NumEx is a choice that should appear from now on in the middle SELECT column of the VARIABLES tab (as can be seen in the screenshot above).

Referring Back to A Numbered Example

Having a numbered example is excellent, but you also want to be able to refer back to it, and in a way where it does not matter if you swap around text. This is how you can achieve this:

Click on INSERT → FIELDS → MORE FIELDS (This may be bound to <Control F2> on your computer)
Click on the CROSS-REFERENCES tab. You can also go directly to INSERT → CROSS-REFERENCE.
In Type, select NumEx (or however you called your Field).
In Selection, you can see the instances of your NumEx fields. Choose the target of your cross-reference.
In Insert reference to, you can choose various options. You probably want to choose Category and Number (which does not include the parentheses) or Reference (which does include the parentheses).
Finally, click on Insert

Now, the cross-reference will always refer to your chosen target, no matter where in the text you choose to place it.

Making Glossed Examples

When working with examples, there will often be cases where your reader does not know the language you are talking about (in fact, this might be the case for any language other than English). In this case, you need to make a glossed example.

I assume that you know how to gloss, but have a look at the Leipzig Glossing Rules anyway.

Making Glosses with Tables

If you are sure that you have only very short examples, where you will never need to deal with line breaks, the easiest solution is to use a table. This solution is easy. but if one of your examples becomes too long, you will need to do some copy-pasting of the table which is too wide. The advantage is that you will be able to control without problem the layout of your gloss (and apply small caps, for instance).

Insert your number field as explained above (do not write those by hand!)
Write your example – separating words by tabulations in the first line; in the second line, write your gloss – separating words by tabulations, and starting with a tabulation. In the third line, write the translation after a tabulation – but do not separate words by tabulations in the sentence.
Select the three lines (including the number field in the beginning). This is what it should look like:
Click on TABLE → CONVERT → TEXT TO TABLE
On the popup-menu, deselect Equal width for all columns

At least at the time of writing of this post, one cannot deselect Equal width for all columns if you do not use tabulations but rather spaces as cell separations.
Click OK. The first two lines should now be properly aligned (more or less), but the translation is stuffed into a single cell, as is shown below.
In order to give more room for the translation, select the translation and the empty cells to its right. Then right-click on the selected area, and choose Merge Cells.
You may need to manually adjust the width of the colums. Either do that with the mouse, moving around the boundaries of the cells, or right click somewhere in the table, choose Table Properties and then modify the colum in Adjust Column Width. The end result will look as follows:

Making Glosses with Rubies

This is the solution that I would recommend,² even though you will not be able to apply small caps. If combined with a proper style, you will not have to worry about long examples, and everything will just work out fine – as long as you can live with just lower and upper case element in the gloss.

First, you will need to enable Asian Language Support (this is only required once).

Click on TOOLS → OPTIONS
Click on LANGUAGE SETTINGS → LANGUAGES
In Default Languages for Documents, enable Asian. I left the default (simplified Chinese) in the drop down menu; this does not seem to matter.

Once this is done, you will not have to modify this again.

Write out the example in the original language.
Select the example with your mouse – without the number.
Click FORMAT → ASIAN PHONETIC GUIDE.
In the pop-up window, add in the Ruby Text field the gloss for each word. In the Position field, select Bottom. Click Apply, then Close.
Now, your example should look as follows:
By default, your example will probably not look like this, and you will have a much smaller font in your glosses. In order to change this, type <F11>, which will show the Styles menu. Click on the A on top, then right-click on Rubies, and choose Modify.
In the pop-up window, choose the Font tab, and set the Asian Text Font to a style and size of your liking.
Finally, add a translation, and you are done. Do not worry about the grey underlining text. This will not show up in a pdf or a printed version of the paper.

Bibliography Management

If you are at university, you need Zotero. Trust me. There is no reason to make a bibliography in any other way. This page will assist you in installing and explaining how to integrate Zotero into LibreOffice.

Footnotes:

It is true that there is Linguistic Tools, but this LibreOffice addon seems to be require specific software and formats of XML-files from SIL, so it is not easy to use for people who do not know these, and is probably too much work for most of us.

This is the solution I found on the the LibreOffice help site.

Arbitrarily Typed Traces in the Lambda-Calculator – And How To Bind Them

Gerhard Schaden

2018-06-22 17:16

I find myself (relatively) often in a situation where I need to have arbitrarily typed traces in the Lambda-Calculator – which supports them without any problems. However, it is not easy to find the information on how to do this on the Homepage.

By default, a trace in the Lambda-Calculator is of type e, and it can be written as t_i, where i is some integer. In order to bind that trace, one simply uses i again. However, what can you do if, for some reason, you need a trace of type i or v? There is a solution.

One can simply indicate the type in the trace signature: instead of writing simply t_1, one can write t_<i>1 if one needs a trace of type i. The general recipe is therefore to write t_<TYPE>i, where TYPE is some type in the standard notation, and i again some integer.

This is how it can be done (just copy the content below into a file, and open it with the Lambda-Calculator):

ARBITRARY TYPES IN TRACES AND BINDERS

# Author: Gerhard Schaden (gerhard.schaden@univ-lille.fr),
# based on a file by Lucas Champollion (champollion@nyu.edu)
# This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).
# You are free to share — copy and redistribute the material in any medium or format;
# adapt — remix, transform, and build upon the material for any purpose, even commercially. 
# The copyright to the work of the authors cited here remains with them.


# The following directive instructs the program to
# interpret multiple-letter constants and variables
# properly.

multiple letter identifiers

# "use rule" directives are for tree evaluation exercises.
# They indicate which composition rules are available
# at nonterminals.

use rule function application
use rule non-branching nodes
use rule lambda abstraction

####################################################################################################
# Defining types 

#individual entities:
variables of type e: x 
# intervals, times:
variables of type i: i 
variables of type <e,t>: P 
constants of type <e,t>: man 
constants of type <<e,t>,<<e,t>,t>>: every
constants of type <i,<i,t>>: before


####################################################################################################
# Defining a rudimentary lexicon:

define before: Li.Li'.[before(i')(i)]
define every: LP'.LP.Ax.[P'(x)->P(x)]
define man: Lx.man(x)

###################################################################################################

exercise tree

title How to introduce and bind traces of type e

instructions A trace without any indication is taken to be a trace of type e.\\
instructions A trace is written as follows:\\
instructions t_i\\
instructions where i is some integer\\
instructions In order to bind the trace, use the integer.\\
instructions \\
instructions This is only to be taken as an illustration how to do binding and traces.\\
instructions The following is a very silly example, like all the rest;\\
instructions the only aim here is to bind as fast as possible.

[ 1 [ man t_1] ]


####################################################################################################
title How to introduce and bind traces of some other (simple) type

instructions In order to specify the type of a trace, write
instructions \\
instructions t_<TYPE>i\\
instructions where TYPE is the type you want to give the trace, and i is some integer\\
instructions Assuming you want a variable of type i, you need to write\\
instructions t_<i>2\\
instructions Binders will take care of themselves

[ 1 [ 2 [ t_<i>2 [ before t_<i>1 ] ] ] ] 


####################################################################################################
title How to introduce and bind traces of arbitrary complex types

instructions BECAUSE WE CAN!\\
instructions \\
instructions As before, in order to specify the type of a trace, write
instructions t_<TYPE>i\\
instructions where TYPE is the type you want to give the trace, and i is some integer\\
instructions We are being unreasonable, and for some reason, we want to make a trace of type <e,t>\\
instructions Therefore, we can write this as\\
instructions t_<e,t>7\\
instructions Binders will take care of themselves

[ 7 [every t_<e,t>7 ] ] 


####################################################################################################
# EOF

New Website

Gerhard Schaden

2017-07-12 14:57

More and more, when I looked at the design of my homepage, it felt ~~as if it had come out of the Soviet Republic~~ a little bit vintage. So, finally, I decided to make a clean slate, and start anew.

In case nostalgia should haunt me, here is what the old site looked like.

No regrets.

Doing a Linguistic Simulation (Using R)

Gerhard Schaden

2013-07-12 12:23

In this squib, I will explain how and why I did what I did in my paper in the International Review of Pragmatics, focalising not so much on the results (which you can read in the paper), but rather on the decisions I had to make. In brief, it is a sort of a blog entry. I basically aim to provide the kind of help I would have liked to have when I started working on the diachronic simulation of present perfects.

The article mainly presents the results, but because of want of space, does not really permit to redo the experiments, and does not specify the underlying technicalities. This is what I am trying to expose here. (I am presupposing that you know the article at least a bit). I cannot give here a full introduction to GNU R. I try to explain the code enough so that you understand what is going on even if you do not know the language. But if you want to do simulations yourself, get yourself a good book on R from your local library or check out the web for video-tutorials. There are many of them around.

If you have GNU R installed, you can just copy and paste the code snippets into the R command line; otherwise, check out the instructions here.

Disclaimer: I am no programmer, and have only very limited knowledge of statistics. If you find errors, please feel free to enlighten me by mail.

How I came to do a simulation

I had been thinking about quite some time about a method to simulate the aoristic drift of the present perfect. There is of course a lot of literature on such domains involving game-theory for linguistics, and there even is an implementation by Gerhard Jäger of a simulation software specifically targeted toward linguists: EvolOT.

If you can frame what you want to model in Bidirectional OT, I urge you to check out this link. There is no good reason not to use it, even though the default gnuplot-output can sometimes look ~~pretty shitty~~ not as good as one might hope (but nothing that one could not configure, I guess).

One of the difficulties was that my problem did not seem to make sense from a BiOT approach. Therefore, I had to go for an other approach (so at least I thought), and do the whole on my own.

The basic intuition that I tried to explore was that the frequency of use had an effect on the meaning of present perfect (viz. simple past); yet I did not know exactly how that could be expressed. At the beginning, I went for a rather straightforward markedness hypothesis (one form is marked, the other one unmarked), but that did not get me very far. I took me some time to notice that - if one is serious about frequency - I would have somehow incorporate a frequency distribution into my model. For the formal semantician I am, this looked like a very frightening step. But eventually, I took the blue pill.

So, I assumed that the distribution had something to do with current relevance, and I was also familiar with Merin's work, which allow to quantify relevance. And then, at some time, I actually drew a probability distribution. I did it on paper, so I cannot show you the result. Initially, I thought that I would play it safe and go for a normal distribution. However, after some research, it turned out that - since I thought of Current Relevance as a percentage (i.e., a value between 0 and 1) - that I would need a beta-distribution instead. A beta distribution has predefined upper and lower limits.

I decided to use R for doing the plots I needed, principally because I already knew a little bit Baayen's book Analyzing Linguistic Data (I know that there are other books covering the use of R for linguists, but I happen to know only this one), and because R

contains most of the functions I needed (and so, it minimises effort);
has excellent graphics capabilities;
has a huge community, with excellent online resources; and finally
did not look too complicated for a non-programmer like me.

R has also at least one big disadvantage: it is not particularly fast. But since my simulation is spectacularly simple, this is nothing I needed to worry about. If you want to make complicated multi-agent simulations, you might want to go for another, faster programming language.

For instance, plotting a beta-distribution is extremely simple:

curve(dbeta(x,5,5),   # the density function of the beta distribution
        xlim=c(0,1))  # sets the lower (viz. upper) limit to 0 (viz. 1)

R ignores everything on a line behind a #; that is why we can put comments behind it. It is always a good idea to comment what you have done. And this is what the above gives as an output:

nil

Technically, a beta-distribution has two parameters, named α and β, respectively.

dbeta(x, α, β)

You can toy around a bit with these two values to get a feeling for the differences it makes when you vary them; in brief, if α=β, then the distribution will have its highest point (its modus) at 0.5, otherwise, it will lean towards 0 (if α < β) or 1 (if α > β). You can also have a look at the wikipedia page of the beta distribution, which features the following plot:

Making the simulation work

Now I had determined a suitable probability distribution for Current Relevance. So far so good. But the issue of diachronic dynamics is that there must be some (possibly very small) difference between generations, which will have wide-ranging consequences as time goes by. How can we get dynamics into the picture? My idea was that speakers and hearers do not have identical probability distributions, but that speakers overestimate theirs. What this looks like is stated in the paper, so we will concentrate here on how we can make the simulation.

R comes equipped for every distribution with

the probability density function, that is, how much probability mass is to be found for any specific values (in our case, dbeta, which we have already seen above)
the cumulative density function, (CDR) that is, how much probability mass is to be found below a specific value (in our case, pbeta)
the quantile function, which is the inverse of the CDR (in our case, qbeta)

I needed functions for production and grammar inference, that is, functions that

given a value of current relevance, indicate how much probability mass will be below and above it (this is the production part: which is the percentage of simple pasts vs. present perfects).
given a percentage of values for simple pasts and present perfects, tells us where will be situated the threshold value n of current relevance (this is the grammar inference part).

Now, luckily, it happens that in very simple cases (the unimodal ones) the production-side corresponds to the cumulative density function pbeta, and the inference-side to the quantile function qbeta. So we could just take those. But in order to practice, we will see how one can create a custom function in R.

production <- function(x) {
  pbeta(x,5.5,4.5) # values of α and β slightly shifted wrt hearer distribution
}

inference <- function(x) {
  qbeta(x,5,5)	   # same parameters for α and β as we have seen above
}

This indicates that both production and inference are functions that take one argument. It is not strictly necessary to create these functions in this case, but we will need to do so later anyway.

Let us now look how we can use this in order to do our simulation. The basic principle of Iterated Learning Models is that one learns based on the input provided by the production, and one does this a certain number of times. Suppose that our initial value is 0.999; the speakers will produce given this value, so we will call production(0.999); then the hearers will make an inference based on the output of this function.

# creating intermediate assignments:
x <- production(0.999)
inference(x)

# doing the same in functional style:
inference(production(0.999))

Given our values, calling either version will have the return value of 0.9979835. This is lower than the initial value of 0.999, as we expected. Now, we need to check what happens on a longer run. Therefore, we will iterate this whole procedure 40 times (and check afterwards if this is enough).

# initialize values for loop
n <- .999	# the initial value for n
N <- c(n)	# this where we will store the values we will plot; we add n
k <- 40		# the number of iterations 

while(k>0){
   n <- inference(production(n)); # make a learning cycle, assign the result to n
   N <- c(N, n);                  # add the new n to the end of the list N
   k <- k-1;                      # decrease the counter by one
}

plot(N)

We start with an initial value of n at 0.999, and make then a first inference-production round. The result will be added to the list we plot, already containing the initial value 0.999. We then decrease the counter, and then, the whole processus starts over.

And we get the following result:

nil

So, the value for n drops until it hits 0. We could now add all sorts of bells and whistles, like better labels for the axes, and a legend. I will not bother with this here. So let us move on to more complicated stuff, namely the move to multimodal distributions.

How to do multimodal distributions

The modus is the value that appear most frequently in a data set. In our example above, the modus is at 0.5. Of course, it may happen that two, or more than two two values have the same, highest frequency. This is what one calls a multimodal distribution (as opposed to the unimodal distribution displayed above).

Preliminaries

The key to making multimodal distribution in my simulation is the idea that a multimodal distribution is made up of at least two unimodal distributions, each of which represents a genre. Since they all are distinguished wrt current relevance, each genre can be represented as a beta-distribution, ranging from 0 to 1.

There is only one tiny trick to it: the multimodal distribution needs to be a probability distribution, which means that the sum of all occurrences will have to sum up to 1 (or to 100%). However, each individual genre - as a beta-distribution - also comes as a probability distribution (and its occurrences will add up to 1). Therefore, you will have to divide by the number of individual genres composing the multimodal distribution.

Now, we can rewrite our speaker and hearer functions as follows:

# the speaker probability distribution
dmyspeaker <- function(x) {
  (dbeta(x,18.5,3) + dbeta(x,4.5,22.5))/2 # Division by 2 is required
                                          # to keep it a probability
                                          # distribution
}

# the hearer probability distribution
dmyhearer <- function(x) {
  (dbeta(x,20,3.5) + dbeta(x,3.5,20))/2   # Division by 2 is required
                                          # to keep it a probability
                                          # distribution
}

curve(dmyspeaker(x),xlim=c(0,1))
curve(dmyhearer(x),xlim=c(0,1), add=T, # we don't want to overwrite, but to add
     col="red"                         # red colour to distinguish the two plots
     )

nil

We can see that the speaker's probability distribution is shifted to the right. In our example, we have two individual beta distributions as ingredients; therefore we divide by 2. If we had three or more, that part would have to be adjusted.

So far, so good. But the density function is not what we need (other to visualise whether we have well chosen the parameters, such that the speaker overrates his contribution…); we also need a production and an inference function. It turns out that the production function is simple, since we can simply add together the two CDFs and divide the result by 2:

# the production function, based on the hearer probability distribution
production <- function(x) {
  (pbeta(x,18.5,3) + pbeta(x,4.5,22.5))/2
}

The quantile function is less obvious. Helpful persons on the R mailing list pointed me to the distr-package, which contains what I needed, namely the function "AbscontDistribution". The distr package is not included in the standard R install, therefore you will need to install it individually. This is actually very easy: just type the following at your R command line:

install.packages("distr")

Then, you will be prompted for a mirror; chose one close to your location, and everything will be taken care of automagically. Once this is done, you have to load the package with

library(distr)

Finally, we can define our inference function:

## use generating function "AbscontDistribution" from the "distr"
## package. You could put it directly into the inference function, but
## that makes it MUCH slower
D <- AbscontDistribution(d = dmyhearer, low = 0, up = 1, withStand = TRUE)

# the inference (learning) function, based on the hearer function
inference <- function(x){
  q(D)(x)
}

Now we are set to do the simulation.

The simulation

The actual simulation is pretty much the same that what we had in the unimodal case: we will have a series of inference-production cycles, which we will iterate until something interesting happens (or not).

Therefore, we can basically take the same code we had for the unimodal distribution (but this time, the definition of production and inference are different).

# initialize values for loop
n <- .999	# the initial value for n
N <- c(n)	# this where we will store the values we will plot; we add n
k <- 200		# the number of iterations 

while(k>0){
   n <- inference(production(n)); # make a learning cycle, assign the result to n
   N <- c(N, n);                  # add the new n to the end of the list N
   k <- k-1;                      # decrease the counter by one
}

plot(N)

nil

Now, we see that - if we start with 0.999 - the curve does not drop all the way down to 0, but seems to converge to something like 0.6. We also see that R by default only shows the area where there are some values. We will see below how we can correct this behaviour.

Another question we may ask is what happens with other values, and what is the global behaviour of the system. We can explore this if we do not only plot one starting value, but several of them at regular intervals - let's say one at 0.1, another at 0.2, another at 0.3, etc., up to 1. The easy - but tedious - way would be to iterate the code snippet above and to start the loop again with another starting value. This is some copy and paste work, but for ten values, this might be possible. However, there is a less verbose way of achieving it, and I will show how it words by making 100 different starting points (which would be quite tiresome to write out).

The key to doing that is the sapply-function. It takes as first argument a function, and applies it then to each element of a vector one by one:

sapply(vector, function)

Our function will be the evolutionary while-loop (which we will have to transform into a function), and the vector will be composed of the different starting points. Lets start with the function. We can rewrite the while-loop as follows:

# evol is the function that wraps the evolution of each separate
# starting value of n, and checks what happens through k generations
evol <- function(n) {
  k <- generations;      # the counter, or: the number of generations to be tracked
  N <- c(n);             # the list we will plot
  while(k>0){
      n <- inference(production(n)); # the inference-production cycle
      N <- c(N, n);                  # update N with result
      k <- k-1;                      # decrease counter
  }
  # inside a loop, you need to print() the points, otherwise, there
  # will be no output:
  print(points(N), add=T)  # we don't want to overwrite previous plots
  return(N)
}

The evol function contains the while-loop plus the setting of generations and the vector N we want to plot. It will print out the points of the plot, and return the vector of values N (if we want to inspect what happened to the values).

Notice that in the loop, I have not used plot, but points (I could have used curve). The difference is that plot creates a new plotting window, erasing anything that you had before, whereas points (and curve) can add to the existing window.

Let us now look how we can construct the vector of starting points. Obviously, we could just have written it out:

vector <- c(0, 0.01, 0.02, 0.03) # and so on

But this really is annoying. Imagine you needed to check 1000 starting points! Fortunately, there is an easier way to do that, with the seq function:

seq(start-value, end-value, step)

seq(0, 1, 0.01)

So, the last expression will give us a vector that starts with 0, and then increases by 0.01, adds that result, repeats, until we obtain a vector with 101 members (0, 0.01, 0.02, … 0.98, 0.99, 1). We can now put together the pieces, and write:

generations <- 100

# initialize values for loop
n <- .999		# the initial value for n
N <- c(n)		# this where we will store the values we will plot; we add n
k <- generations	# the number of iterations 

while(k>0){
   n <- inference(production(n)); # make a learning cycle, assign the result to n
   N <- c(N, n);                  # add the new n to the end of the list N
   k <- k-1;                      # decrease the counter by one
}

plot(N,ylim=c(0,1))	# ylim sets the upper and lower end of the y-axis

sapply(seq(0,1,.01), evol)

And this is what it looks like:

nil

Notice the ugly Moiré pattern, particularly in the upper half. In practice, it is better to make less plots (I did 10 in the publication), to get a cleaner general look. We will see below how one can improve the graphics (and the axis labelling) below. Let us first look at what the graphics itself shows us.

In previous plots, the values either dropped or remained stable. In the picture below, some values actually rise. How come? Is there something wrong with our coding? Normally, speaker-overestimation should make the value of n drop, and here, at some places, they rise, even though the speaker overestimates each curve individually, as we saw above. Now, this might be an instance of numerical error. But it might just as well be a genuine property of the system. Can we know which one it is?

Indeed we can. The secret is to calculate the intercept (as Grégoire Winterstein was kind enough to explain to me) between speaker and hearer CDRs. The idea is the following: if the speaker probability distribution has more probability mass above n than the hearer probability distribution, the value of n will go down; if they are equal, n will be stable; if the hearer distribution has more probability mass above n than the speaker distribution, n will rise.

This is once again something we can plot. The CDR (in R, for a beta-distribution, pbeta) gives us the probability mass up to some point; so we can use this as a basis.

# the speaker's CDR
pmyspeaker <- function(x) {
  (pbeta(x,18.5,3) + pbeta(x,4.5,22.5))/2
}

# the hearer's CDR
pmyhearer <- function(x) {
  (pbeta(x,20,3.5) + pbeta(x,3.5,20))/2
}

# pmyspeaker gives what is below x; I want to know what is the
# probability mass above it, which is 1-pmy{speaker,hearer}
pintercept <- function(x) {
  (1 - pmyspeaker(x)) - (1 - pmyhearer(x)) 
}

curve(pintercept, xlim=c(0,1))	# plot the function, with values for n from 0 to 1

nil

We see that it looks like that most of the time, the speaker's CDR is higher than the hearer's CDR. However, the plot is not very legible, mainly because we do not see very well the bottom line. Crucially, we would like to know what happens at the interval where the values rise. We can add a straight line with the following command:

# inserts a straight line at 0, which is horizontal, and dotted
abline(h=0, lty="dotted")

And we are not interested in the whole area from 0 to 1. The above picture makes it rather clear that the values should drop on wide ranges (as they do, in fact); we want to zoom in on the specific region where they rise, which is roughly between 0.44 and 0.60.

plot(pintercept, .44, .605,		 # gives us the boundaries
     main = "Detail of the plot above",	 # gives the plot a heading
     ylab = "difference",		 # label the y-axis
     xlab = "n")			 # label the x-axis

abline(h=0, lty="dotted")		 # add a dotted baseline for x=0

nil

So it turns out that in the region where n rose, actually the speaker's probability mass is less than the hearer's, and that therefore, n will rise. Here, it comes handy that R auto-adjusts for the values: actually, as the y-axis makes clear, the difference between the two CDRs is actually very small.

We know now what we wanted to know. Let us now see how we can show others in a decent way what we know, and how we can save the result of our work. You are not obliged to work on the R command line; you can also create a file with your favourite text editor, and tell R to execute your commands. For instance, copy and paste the code below into a file, that you call "intercept.r". Then, in your R command line, call

source("/path/to/intercept.r")

(where you replace /path/to with the place the file is located on your computer, e.g., source("C:\\\stuff\intercept.r"), or source("/home/gummybear/intercept.r"))

# we will create a svg file, called "intercept.svg"
svg("intercept.svg")

# we define the function we will plot (the same as above)
pintercept <- function(x) {
  (1 - pmyspeaker(x)) - (1 - pmyhearer(x)) 
}

# put two plots one below the other
par(mfrow=c(2,1)) 

# plot the function, add a heading and axis-labels
plot(pintercept,
     main = "Subtracting Hearer CDR from Speaker CDR",
     ylab = "difference",
     xlab = "n")

# Add some lines to make clearer how things work
abline(h=0, lty="dotted")
abline(v=0, lty="dotted")
abline(v=1, lty="dotted")
abline(v=0.455, lty="dotted")
abline(v=0.5986, lty="dotted")


# and add some arrows, which explicit the system behaviour:
arrows(x0=0.82, y0=0.04, x1=.78, y1=0.04, length = 0.05, angle = 25,
        code = 2, col = par("fg"), lty = par("lty"),
        lwd = par("lwd"))
arrows(x0=0.51, y0=0.04, x1=.55, length = 0.05, angle = 25,
       code = 2, col = par("fg"), lty = par("lty"),
       lwd = par("lwd"))
arrows(x0=0.32, y0=0.04, x1=.28, length = 0.05, angle = 25,
       code = 2, col = par("fg"), lty = par("lty"),
       lwd = par("lwd"))
# end of first plot

# start second plot
plot(pintercept, .44, .605,
     main = 'Detail of the first plot',
     ylab = 'difference',
     xlab = 'n')
# note that it does not make any difference if you use "", or '' for
# the text

abline(h=0, lty="dotted")
abline(v=0.455, lty="dotted")
abline(v=0.5986, lty="dotted")

# close the device, in order to make sure the svg gets properly printed:
dev.off()

After having run source on the file, you should obtain a svg-file called intercept.svg, which should look like the following:

nil

Applications to other areas

I assume that the same type of modelling might apply to instances of speaker-hearer conflict in natural languages. One instance of an area where this seems to make sense might be sound change.¹ Clear pronounciation would appear to be rather in the interest of the hearer than in the speaker's (for whom it constitutes a source of articulatory effort). Similarly, homonymy seems (at least to me) to be also something that is makes life easier for the speaker, at the cost of putting a strain on the hearer.

That's all, folks. I hope that what I have written contributes to understanding how exactly the underlying mechanism in the paper works. Maybe, it can even encourage you to do a simulation yourself. After all, it's not difficult, once you know what exactly you want to do (but after all, doing research is principally about finding out what exactly one wants to do).

Footnotes:

But then, I am no phoneticist, nor a phonetician.

Template pour publier un livre avec LaTeX chez L'Harmattan

Gerhard Schaden

2010-07-12 14:14

J'ai publié en 2009 un livre tiré de ma thèse chez L'Harmattan, et je l'ai fait sous LaTeX. Voici un fichier qui respecte les contraintes en vigueur à l'époque (et qui peuvent avoir changées).

\documentclass[10pt,a4paper,bibtotoc,indextotoc]{scrbook}
\usepackage[frenchb]{babel}

\usepackage{geometry} % pour avoir la bonne taille du papier:

\geometry{
  paperwidth=13.5cm,
  paperheight=21.5cm,
  textwidth=10.5cm,
  textheight=18cm,
  lines=38,                % on peut aller jusqu'à 40
  includeheadfoot,
  footskip=10mm,
}

\usepackage[color=white,a4,center]{crop} % pour voir les frontières de
                                % la page final, enlever "color=white"

\usepackage[T1]{fontenc}
\usepackage[utf8x]{inputenc}
\usepackage{mathptmx,hieroglf} % L'Harmattan veut du times
\usepackage{ucs}
\usepackage{url}

% \usepackage[pagewise]{lineno} % pratique pour les corrections
% \linenumbers

\usepackage[splitrule]{footmisc} %% Cela règle le trait de notes en bas de page
                   %% comme prévu par L'Harmattan



\title{Titre} \subtitle{Subtitre}
\author{Auteur}
\publishers{\vfill \textnormal{Éditions de
    l'Harmattan}}
\dedication{Bla bla bla\\bla\\
  \emph{\small Pas Moi}} \date{}

\begin{document}
% L'Harmattan demande une page avec seulement le titre
\extratitle{\vspace*{\stretch{2}}\huge \centering Titre
 \vspace*{\stretch{3}}}

\maketitle

% Et ici tout le contenu.


\end{document}

Template pour une thèse sous LaTeX

Gerhard Schaden

2010-07-12 13:58

Ceci est probablement d'intérêt historique uniquement, mais peut-être que ça peut encore servir à des gens, et je voulais donc le laisser sur le site.

Mais renseignez-vous, peut-être que ça peut encore passer… et donc voilà le fichier.

NB: Mes pratiques de LaTeX ont un peu changé depuis 2007. Je recommenderais d'utiliser plutôt biblatex aujourd'hui; c'est moin compliqué à gérer, a plus de fonctions, et c'est beaucoup plus simple à configurer. J'ai également arrêté d'utiliser pstricks, et j'utilise plutôt tikz aujourd'hui.

%%% Template commenté pour une thèse en Sciences du langage à l'Université
%%% Paris 8. J'ai réussi à la rendre comme ça, donc en principe, cela
%%% ne devrait pas poser problème. Vous pouvez adapter ce template
%%% comme vous le voulez
%%%
%%% Gerhard Schaden, 5 février 2007

\documentclass[12pt,BCOR8mm,a4paper,bibtotoc]{scrbook} %% Paris 8
                                %% requiert une police à 12pt minimum;
                                %% BCOR8mm est là pour que l'espace au
                                %% centre soit égal aux 2,5 cm
                                %% requises autour; bibtotoc inclut la
                                %% bibliographie dans la table des
                                %% matières

\usepackage[ngerman,english,frenchb]{babel} %% langue principale:
                                %% français; langues secondaires:
                                %% anglais et allemand

\AddThinSpaceBeforeFootnotes    %% recommandé en typographie française

\usepackage[T1]{fontenc}        %% pour avoir des "é" déjà fait (par
                                %% exemple, et ne
                                %% pas composés d'un "'" et d'un "e"

\usepackage[utf8]{inputenc}     %% surtout si vous avez besoin de
                                %% plusieurs langues européennes; pour
                                %% des langues plus exotiques, penser
                                %% pê à ucs.sty

\usepackage{url}                %% vous donne les \url{} en \texttt

%\usepackage[Lenny]{fncychap}   %% si vous voulez des entêtes de
                                %% chapitres "fancy" 
\usepackage{paralist}           %% permet des énumérations sans
                                %% revenir à la ligne

\usepackage{natbib}             %% pour une thèse en SDL...
\bibpunct{(}{)}{;}{a}{,}{,}

\usepackage{multicol,longtable} %% pour des tableaux plus compliquées           
\usepackage{array}              %% dito

\usepackage{amssymb,amsfonts}   %% pour les maths

\usepackage{qtree}              %% et les arbres

\usepackage{textcomp,multirow}  %% certains sigles spéciaux en ont
                                %% besoin

\usepackage[safe]{tipa}         %% le classique pour l'écriture
                                %% phonétique 

\usepackage{appendix}           %% permet des annexes aux chapitres
\appendixtitleon                %% vous dit "Annexe X" devant le numéro

\usepackage{wasysym}            %% quelques signes spéciaux 

\usepackage{pstricks,pst-text,pst-coil,pst-node,calc} %% PSTRICKS. Si
                                %% vous avez des diagrammes plus
                                %% compliqués. ATTENTION: ne marchera
                                %% pas si vous générez directement du
                                %% pdf

\usepackage{soul}               %% pour pouvoir barrer du texte   

\usepackage{changebar}          %% pour tracer les changements

\nochangebars                   %% pour que cela ne transparaisse pas
                                %% dans le document final; pendant que
                                %% vous travaillez dessus, commentez-le

\usepackage{setspace}           %% vous pouvez laisser le standard ---
                                %% P8 requiert un espacement
                                %% "standard". Si jamais ça ne vous
                                %% plaît pas, décommentez le suivant:
%\onehalfspacing

\catcode`? 13                   %% pour que ? fonctionne avec linguex
\usepackage{linguex,cgloss4e,xspace} 
\catcode`?\active               %% Ready to handle French

\sloppy                         %% indispensable


\deffootnote{2.1em}{1em}{\makebox[2.1em][l]{%
    \thefootnotemark{.}}}       %% redéfinition des notes. NB:
                                %% l'argument de \deffootnote et la
                                %% longueur de la \makebox doivent
                                %% être égaux. Si vous n'allez pas
                                %% au-delà de 100 notes par chapitre,
                                %% 1.5em suffiront. (volé du scrguide)
                                %% Si ça vous ne plaît pas, prenez:
%\FrenchFootnotes


\begin{document}
\bibliographystyle{coco}        %% mon style de bibliographie française

\newcommand{\traduc}[1]{\og \emph{#1}\fg} %% pour avoir solution unique
                                %% pour traductions de mot: par ex:
                                %% "gerade (lit. \traduc{tout droit})"

\let\eachwordone\slshape        %% première ligne slanted pour exemples glosés

\newcommand\longpage[1]{\enlargethispage{#1\baselineskip}}
\newcommand\shortpage[1]{\enlargethispage{-#1\baselineskip}} 
% rallonge ou raccourcit la page courante de #1 lignes. Copié du Latex
% Companion II, p. 234.
% À UTILISER ENTRE DEUX PARAGRAPHES!


\begin{titlepage}

  \begin{center}
    Université Paris 8 --- Vincennes-Saint-Denis

     École doctorale Cognition, Langage, Interaction

     U.F.R. Sciences du Langage
  \end{center}

  \vspace*{\stretch{1}}


  \begin{flushright}
    Numéro attribué par la bibliothèque\\
   \vspace{2mm}


   {\small
    \begin{tabular}{|l|l|l|l|l|l|l|l|l|l|}
      \textcolor{white}{a} & \textcolor{white}{a}
      &\textcolor{white}{a} &\textcolor{white}{a}
      &\textcolor{white}{a} &\textcolor{white}{a}
      &\textcolor{white}{a} &\textcolor{white}{a}
      &\textcolor{white}{a} &\textcolor{white}{a}\\
      \hline
    \end{tabular}}
  \end{flushright} % Bidule pour avoir une grille pour entrer numéro
                   % attribué par la bibliothèque

  \vspace*{\stretch{2}}

  \begin{center}
     {\large  {\bfseries\textsc{Thèse}}}

     \vspace{3mm}

   Nouveau régime\\

   \vspace{3mm}
   Pour obtenir le grade de\\
   Docteur en Sciences du Langage\\
   Discipline: Linguistique Générale

   \vspace{3mm}

  Présentée et soutenue publiquement\\
  par

  \vspace{3mm}

  {\large \textbf{\textsc{Bananuphe Tripode}}}

  \vspace{3mm}

  Le 13 Brumaire de l'an \textsc{CCXIV}
  \end{center}

\vspace*{\stretch{6}}

\begin{center}
  {\Huge \textbf{\textsc{Move $\mathbf{\alpha}$ and ur head will follow}}}

  \vspace{4mm}

  {\large \textbf{Étude du mouvement}}

  \vspace{2mm}
  {\large \textbf{dans l'ensemble des langues attestées dans ce bas monde}}
\end{center}

\vspace*{\stretch{6}}

\begin{center}
  Directeur de thèse:
  \vspace{2mm}

  {\large\textbf{\textsc{Docteur Faustroll ('Pataphysicien)}}}

\end{center}

\vspace*{\stretch{4}}

\begin{center}
  \textbf{Composition du jury}:

  \vspace{2mm}

  \begin{tabular}{lll}
   \textcolor{white}{Mme} & X  \textsc{Y} & Université de Z (pré-rapporteur)\\
   \textcolor{white}{Mme} & X  \textsc{Y} & Université de Z \\ 
   \textcolor{white}{Mme} & X  \textsc{Y} & Université de Z \\ 
   \textcolor{white}{Mme} & X  \textsc{Y} & Université de Z (pré-rapporteur)\\

  \end{tabular}


\end{center}

\end{titlepage}

\cleardoubleemptypage % supprime N° de page; et nous voulons recommencer
                      % sur la page de droite

\thispagestyle{empty} % sans n° de page
\vspace*{\stretch{1}}

\begin{flushright}
   pour ma maman\\

\vspace{2mm}
 pour la paix dans le monde\\

\vspace{2mm}

  pour démarrer ma carrière universitaire\\
\end{flushright}

\vspace*{\stretch{6}}

\cleardoubleemptypage % supprime N° de page

%% ne mettez pas tout le texte là-dedans; servez-vous de "\include"


% \frontmatter %  pour intro: pages en chiffres latins

% \include{merci/merci} % remerciements

% \tableofcontents     

% \include{struct/abbrev} % abbréviations

% \include{struct/struct} % Ce que je vais faire, et où

% \mainmatter % partie principale; avec les chapitres suivants:


%   \include{intro/intro}
%   \include{depuis/depuis}
%   \include{aspect/aspect}
%   \include{surcomp/surcomp}
%   \cleardoubleemptypage
%   \include{gerade/gerade}

%  \include{conclusio/conclusio}

\begin{singlespace}     % pour la bibliographie, l'espacement normal
                        % suffira 
\bibliography{mabiblio} % avec le chemin vers mon fichier bibliographique
\end{singlespace}

\end{document}           % félicitations au docteur!