Prof. Hermann Ney visit
TMI / MT Summit
NCLT / CNGL Workshop
MaTrEx Redesign Meeting
|
NCLT / CNGL Workshop |
| Wednesday 23- Thursday 24 July 2008 1:30-5pm rooms N209 (23rd) and L221 (24th) |
| |
| Wednesday (N209) | Thursday (L221) |
| 1:30 |
Andy |
Josef |
| 1:45 |
Yanjun |
Sara |
| 2:00 |
Ventzi Transfer talk + DOT+ OpenMPI |
Sergio |
| 2:15 |
Ventzi |
Ankit |
| 2:30 |
John Transfer talk |
Tsuyoshi Previous exposure to MT |
| 2:45 |
John |
Stephen |
| Coffee break! |
| 3:30 |
Sylwia Syntax for SMT Phrase Extraction [slides] |
Yvette Probabilistic Transfer-based MT [slides] |
| 3:45 |
Sara |
Jennifer Automatic grammaticality judgements [slides] |
| 4:00 |
Patrik |
Deirdre Joint project with Jennifer |
| 4:15 |
Patrik |
Joachim |
| 4:30 |
Jinhua |
Joachim |
| 4:45 |
Jinhua |
Andy |
|
| |
| Wednesday 23 July, room N209 |
| First session (responsible: Jinhua) |
| Andy Way | Introduction Talk In this talk I'll give an overview of the MT research we aim to carry out in the NGL CSET, together with pointers to related work that has been done here in the NCLT. |
| Yanjun Ma | Word alignment Firstly, I will give a brief overview on the task of word alignment and relate it to statistical machine translation. Secondly, I will explore two of the factors that will influence the performance of word aligner---word segmentation (tokenisation) and syntax. I briefly introduce two novel approaches, word packing investigating the role of segmentation in word alignment and syntax-enhanced word alignment justifying the use of syntax. Then, I will say a few words about integrating these approaches into MaTrEx system and using them in MT evaluation. Finally, we point out some future work in this line.
|
| Ventsislav Zhechev | Tree-to-tree alignment and Data Oriented Translation The talk will be about the tree-to-tree alignment system I developed, but will also include a short overview of DOT. If time permits, I will give a presentation on using the OpenMP APIs in C++ to parallelise existing software. This will focus mainly on giving links to resources
with information on this topic, but I'll also give an example of how I
used this technology. |
| John Tinsley | Following Ventsi's discussion of the aligner, I will describe how I've used resulting treebanks as training data in phrase-based SMT and mention some outstanding issues with this approach. I will then discuss how to further exploit treebanks in syntax-aware MT.
|
| John Tinsley | The MaTrEx Translation System I will give an overview of our MT system, MaTrEx, describing its general functionality and capabilities. I will then discuss the MT wiki in terms of MaTrEx and talk about some work that could be done with the system. |
|
| Second session (responsible: Andy) |
| Sylwia Ozdowska | Constituency and Dependency Representations for SMT Phrase Extraction The talk will focus on the value of replacing and/or combining string-based methods with syntax-based methods for PB-SMT, and the relative merits of using constituency-annotated vs. dependency-annotated training data. |
| Sara Morrissey | Data-Driven Machine Translation for Sign Languages My thesis explores the application of data-driven machine translation (MT) to sign languages (SLs) to facilitate communication between Deaf and hearing people by translating information into the native and preferred language of the individual. In this talk, I will first overview sign languages and outline previous approaches and problems that arise. I will then describe the experiments performed translating both to and from SLs along with automatic evaluation scores. I will finish by describing the SL animation process and manual evaluations performed on this task. |
| Patrik Lambert | The N-gram based machine translation system I will describe the N-gram based machine translation framework, developed in the TALP group at UPC, Barcelona. In this approach, the joint translation probability is modelled via a log-linear combination of a bilingual N-gram model and additional feature functions. |
| Patrik Lambert | Exploiting Lexical Information and Discriminative Alignment Training in SMT The thesis work mainly focused on three aspects of statistical machine translation: the use of lexical information like basic lexical models and multi-word expressions, minimum error training strategies and word alignment models. We proposed a novel framework for discriminative training of alignment models with automated translation metrics as maximisation criterion. |
| Jinhua Du | Ph.D thesis This talk mainly focuses on one topic - multiple system combination. We proposed an improved combination framework which uses MBR decoding, GIZA-TER alignment metric and Confusion Network decoding to generate an optimal translation hypothesis. |
| Jinhua Du | The CASIA Translation System This talk will describe the MT system which participates in many international and domestic MT evaluations on behalf of Institute of Automation. This system is a complete MT platform, including automatic preprocessing module, word alignment processing and phrase generation module, decoding and MER training module and multiple system combination module. I'll focus on the key modules and system configuration. |
|
| |
| Thursday 24 July, room L221 (responsible: Patrik) |
| Josef van Genabith | Previous MT at DCU, LFG parsing and Generation technologies Two topics will be covered: (i) Previous MT at DCU (Transbooster, MT Evaluation),and (ii) GramLab LFG parsing & generation technologies for English, Spanish, Chinese, German, French , Japanese and Arabic |
| Sara Morrissey | |
| Sergio Penkale | Genetic Algorithms for Syntactic Parsing Traditional syntactic parsers define probabilistic models that allow them to exhaustively explore the search space in reasonable time. In this work we propose and evaluate a search method that performs a non-exhaustive search using heuristics. |
| Ankit Srivastava | Learning a Translation Lexicon from non-Parallel Corpora This project evaluates the performance of syntactic context windows against positional context windows in extracting word translations from non-parallel English and German newswire corpora. |
| Tsuyoshi Okita | Distance in SMT This study is about incorporating syntax information into ngram-based MT focusing on monolingual symmetries. How to integrate distances in these two distinct nature of spaces is the topic. |
| Stephen Doherty | Current Research This talk will give an overview of the research proposal for my PhD, which will compare readability and comprehensibility of RBMT and SMT output for controlled and uncontrolled input. The focus here will be on readability and comprehensibility, controlled language, and eye tracking. |
|
| Second session (responsible: Sylwia) |
| Yvette Graham | Probabilistic Transfer-based Machine Translation Probabilistic Transfer-based Machine Translation involves automatically
inducing transfer rules from parsed bilingual corpora. In my work, I use
Lexical Functional Grammar (LFG) F-structures as the intermediate
representation for transfer. In this talk I describe an algorithm for
inducing transfer rules automatically from the f-structures of an LFG
parsed corpus. The transfer rule induction algorithm uses an efficient
packed representation that stores multiple rules (up to O(2^n)) in a
single structure (O(n) size). I present recent experiment results tested
on German-English Europarl corpus showing a vast reduction in the amount
of resources that the rule induction algorithm requires. I also briefly
describe a chart-based decoder used for translating unseen sentences using
transfer rules induced by the rule induction alogrithm. |
| Jennifer Foster | Automatic grammaticality judgements Joint work with Joachim on our method for automatic grammaticality
judgements and how that might be useful for ranking MT output. |
| Deirdre Hogan | New project Talk about the new project jointly with Jennifer and how that might link in with MT work. |
| Joachim Wagner | Computing Resources MT research often demands resources not available on a single desktop PC. Training models can be memory-intensive both in RAM and on disk. Decoding requires lots of CPU time. In this talk I will give an overview of the existing MT group cluster, ICHEC resources, and plans for more new machines. |
| Joachim Wagner | PBS job and taskfarming example If many users share the same machines for their experiments without any job management, there will be resource conflicts, for example two processes competing for RAM, causing lots of "swapping" and slowing down both processes almost to a standstill. To address these needs and problems, 5 machines of the MT group have been organised in a cluster. A PBS jobs management system manages the resources centrally and allocates exclusive access to machines for experiments. I will show some examples how to use it.
|
| Andy Way | Conclusion talk Having heard about the MT research being/about to be carried out in the NCLT and NGL CSET, we'll attempt to identify trends, convergences, and any gaps that need filling. This will, hopefully, provide strong pointers to the future direction of our research, in the short- to medium-term, at least. |
|
| Last update: August 6 2008 |
| Related Sites: NCLT | School of Computing | School of Applied Languages and Intercultural Studies | Dublin City University |
|