National Centre for Language Technology

Dublin City University, Ireland

National Centre for Language Technology

 

Centre for Next Generation Localisation

School of Computing

School of Applied Languages and Intercultural Studies

School of Electronic Engineering

 
 
 

NCLT Seminar Series



NCLT Home

Members

History

Projects

Theses

Links

Publications

Research Groups
 

NCLT Seminar Series 2013/2014

The NCLT seminar series usually takes place every 2nd Wednesday from 4-5 pm in Room L2.21 (School of Computing).

The schedule of presenters will be added below as they are confirmed. Please contact John Judge if you have any queries about the NCLT 2013/2014 Seminar Series.

Time and venue Speaker(s) Title(s)
October 16th 2013; 14:00, L2.21
Teresa Lynn
Working with a small dataset - semi-supervised dependency parsing for Irish
October 25th 2013; 15:00, L2.21
Xia Lu
Exploring Word Order Universals: a Probabilistic Graphical Model Approach
November 20th 2013; 14:00,
CG86 (Henry Grattan Building)
Ahmed Ragheb, IBM
UIMA: the open architecture that helped Watson understand human language.
November 29th 2013; 15:00,
CG05 (Henry Grattan Building)
Felipe Sánchez Martínez, Universitat d'Alacant
Generalised alignment templates for the inference of shallow-transfer MT rules from small parallel corpora.
March 13th 2014; 14:00, L2.21
Tommi Pirinen
Weighted finite-state methods as a bridge between strictly rule-based and mostly statistical NLP systems

Working with a small dataset - semi-supervised dependency parsing for Irish (SPMRL Long Paper)

Teresa Lynn (joint work with Jennifer Foster, Josef van Genabith, and Mark Dras)

We present a number of semi-supervised parsing experiments on the Irish language carried out using a small seed set of manually parsed trees and a larger, yet still relatively small, set of unlabelled sentences. We take two popular dependency parsers -- one graph-based and one transition-based -- and compare results for both. Results show that using semi-supervised learning in the form of self-training and co-training yields only very modest improvements in parsing accuracy. We also try to use morphological information in a targeted way and fail to see any improvements.


Exploring Word Order Universals: a Probabilistic Graphical Model Approach

Xia Lu (University of Buffalo)

In this work we propose a probabilistic graphical model as an innovative framework for studying typological universals. We view language as a system and linguistic features as its components whose relationships are encoded in a Directed Acyclic Graph (DAG). Taking discovery of the word order universals as a knowledge discovery task we learn the graphical representation of a word order sub-system which reveals a finer structure such as direct and indirect dependencies among word order features. Then probabilistic inference enables us to see the strength of such relationships: given the observed value of one feature (or combination of features), the probabilities of values of other features can be calculated. Our model is not restricted to using only two values of a feature. Using imputation technique and EM algorithm it can handle missing values well. Model averaging technique solves the problem of limited data. In addition the incremental and divide-and-conquer method addresses the areal and genetic effects simultaneously instead of separately as in Daumé III and Campbell (2007).


UIMA: the open architecture that helped Watson understand human language.

Ahmed Ragheb, IBM

The open source Unstructured Information Management Architecture (Apache UIMA) that IBM Research donated to the Apache Foundation in 2006 is what makes Watson’s hundreds of independent algorithms work together. UIMA is an architectural and software framework that supports creation, discovery, composition, and deployment of a broad range of analysis capabilities. It provides a run-time environment in which developers can plug in and run their UIMA component implementations, along with other independently-developed components, and with which they can build and deploy UIM applications. In this talk, we'll get introduced to the framework and what it provides to the natural language processing community of researchers and engineers.


Generalised alignment templates for the inference of shallow-transfer MT rules from small parallel corpora.

Felipe Sánchez Martínez, Universitat d'Alacant

Rule-based machine translation (MT) is the paradigm of choice when the amount of bilingual resources available is not large enough to train a full-fledged statistical MT system. Building a rule-based MT system usually implies a considerable investment in the development of linguistics resources. However, even in those cases in which bilingual parallel corpora are scarce, automatic inference methods can be used to automatically infer structural transfer rules. In this talk I will present the current developments at Universitat d'Alacant aimed at learning shallow-transfer MT rules from small parallel corpora for their used by the shallow-transfer MT platform Apertium. Inspired by the work by Sánchez-Martínez & Forcada (2009) we use alignment templates (AT), like those used in statistical MT, and overcomes the main limitations of their approach: the inability of finding the appropriate level of generalisation for the ATs from which rules are generated; the inability to perform context-dependent lexicalisations to be able to give a different treatment to those words that are incorrectly translated by more general ATs; and the deficient selection of the sequences of lexical categories for which transfer rules are generated. Preliminary experiments show that translation quality is improved as compared to the method by Sánchez-Martínez & Forcada (2009), and the number of inferred rules is considerably smaller.


Weighted finite-state methods as a bridge between strictly rule-based and mostly statistical NLP systems

Tommi Pirinen

As some of you may know, University of Helsinki is mostly known from its strictly rule-based approach to computational linguistics, with main contributions like TWOL system by Prof. Koskenniemi in 1983 and CG system by Prof. Karlsson 1995. In my doctoral dissertation I experimented with some basic approaches of combining statistical information to weighted finite-state models (cf. Openfst and Mohri's academic papers) of language, esp. for morphologically complex languages with limited resources (e.g. Greenlandic). The presentation will consist of some slides from my FSMNLP 2012 tutorial and parts of my lectio praecursoria for my PhD .


Dublin City University   Last update: 10th March 2014