Jan Buys Senior Lecturer Computer Science Building Room 304.2 Department of Computer Science University of Cape Town South Africa jbuys -at- cs.uct.ac.za Google Scholar GitHub |
About
I am a Senior Lecturer (≈ Assistant Professor) in the Department of Computer Science at the University of Cape Town. My research area is Natural Language Processing and Machine Learning. My current research focusses on text generation, linguistic structure prediction, and low-resource language processing.Previously, I was a postdoctoral researcher at the University of Washington, working with Yejin Choi. I completed my PhD at the University of Oxford, supervised by Phil Blunsom. Before that I obtained a Masters degree and undergraduate degrees in Computer Science at the University of Stellenbosch in South Africa.
Recent Reviewing (2024): TACL, ACL ARR, SACAIR, LREC-COLING, ICLR (AC), CoLM (AC).
UCT Teaching (2024): CSC2001F Data Structures; CSC4025Z Artificial Intelligence; CSC5035Z Natural Language Processing.
Research Group
Information for Prospective StudentsNews
- February 2024: I have been awarded a P rating by the National Research Foundation of South Africa.
- October 2023: I have been promoted to Senior Lecturer at UCT (effective January 2024).
- July 2023: I am spending my sabbatical from July to December 2023 at the HPI at the University of Potsdam, Germany, vising Gerard de Melo's research group.
- August 2022: I am attending the Deep Learning Indaba in Tunis.
- February 2022: I attended the Hundzula Natural Language Processing and Linguistics Retreat at the University of Pretoria.
- September 2021: I gave an invited talk at IndabaX Sudan.
- August 2021: Applications for PhD Scholarships offered by the the HPI Research School at the University of Cape Town close on 15 August.
- April 2021: Two papers accepted for presentation at the AfricaNLP workshop at EACL 2021.
- February 2021: I gave a Tutorial on Deep Learning of Natural Language Processing at SACAIR 2020.
- January 2021: I have a funded position available for a full-time Masters by Dissertation student, starting in March 2021. See this page for details on how to apply.
- January 2021: I have been awarded a Thuthuka grant from the South African National Research Foundation.
Publications
-
A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation.
Francois Meyer and Jan Buys.
NAACL Findings 2024.
-
NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages.
Francois Meyer, Haiyue Song, Abhisek Chakrabarty, Jan Buys, Raj Dabre and Hideki Tanaka.
LREC-COLING 2024. [Data]
-
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation.
Francois Meyer and Jan Buys.
LREC-COLING 2024. [Data]
-
Neural Machine Translation between Low-Resource Languages with Synthetic Pivoting.
Khalid N. Elmadani and Jan Buys.
LREC-COLING 2024.
-
Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence Generation.
Francois Meyer and Jan Buys.
ACL Findings 2023. [Code]
-
Data Augmentation for Low Resource Neural Machine Translation for Sotho-Tswana Languages.
Maxwell Mojapelo and Jan Buys.
SACAIR 2023.
- Policy-based Reinforcement Learning for Generalisation in Interactive Text-based Environments.
Edan Toledo, Jan Buys and Jonathan Shock.
EACL 2023. [Code]
-
Subword Segmental Language Modelling for Nguni Languages.
Francois Meyer and Jan Buys.
EMNLP Findings 2022. [Code]
-
University of Cape Town’s WMT22 System: Multilingual Machine Translation for Southern African Languages.
Khalid N. Elmadani, Francois Meyer and Jan Buys.
WMT 2022. [Model]
-
Self-Supervised Text Style Transfer with Rationale Prediction and Pretrained Transformers.
Neil Sinclair and Jan Buys.
SACAIR 2022 (CCIS). [Version of Record]
-
From GNNs to Sparse Transformers: Graph-based architectures for Multi-hop Question Answering.
Shane Acton and Jan Buys.
SACAIR 2022 (CCIS). [Version of Record]
-
Generic Overgeneralization in Pre-trained Language Models.
Sello Ralethe and Jan Buys.
COLING 2022. [Data]
-
A Sequence Modelling Approach to Question Answering in Text-Based Games.
Greg Furman, Edan Toledo, Jonathan Shock and Jan Buys.
Wordplay 2022. [Code]
-
Canonical and Surface Morphological Segmentation for Nguni Languages.
Tumi Moeng, Sheldon Reay, Aaron Daniels and Jan Buys.
SACAIR 2021 (CCIS). [Version of Record] [Code]
-
Low-Resource Language Modelling of South African Languages.
Stuart Mesham, Luc Hayward, Jared Shapiro and Jan Buys.
SACAIR 2021. [Code]
-
RepGraph: Visualising and Analysing Meaning Representation Graphs.
Jaron Cohen, Roy Cohen, Edan Toledo and Jan Buys.
EMNLP 2021 System Demonstrations. [Demo] [Code]
-
Discourse Understanding and Factual Consistency in Abstractive Summarization.
Saadia Gabriel, Antoine Bosselut, Jeff Da, Ari Holtzman, Jan Buys, Kyle Lo, Asli Celikyilmaz and Yejin Choi.
EACL 2021.
-
The Curious Case of Neural Text Degeneration.
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes and Yejin Choi.
ICLR 2020. [Code]
-
BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle.
Peter West, Ari Holtzman, Jan Buys and Yejin Choi.
EMNLP 2019. [Code]
-
Neural Text Generation from Rich Semantic Representations.
Valerie Hajdik, Jan Buys, Michael Wayne Goodman and Emily M. Bender.
NAACL 2019. [Code]
-
Benchmarking Hierarchical Script Knowledge.
Yonatan Bisk, Jan Buys, Karl Pichotta and Yejin Choi.
NAACL 2019. [Code]
-
Bridging HMMs and RNNs through Architectural Transformations.
Jan Buys, Yonatan Bisk and Yejin Choi.
IRASL NeurIPS Workshop 2018. [Code]
-
Learning to Write with Cooperative Discriminators.
Ari Holtzman, Jan Buys, Maxwell Forbes, Antoine Bosselut, David Golub and Yejin Choi.
ACL 2018. [Code]
-
Neural Syntactic Generative Models with Exact Marginalization.
Jan Buys and Phil Blunsom.
NAACL 2018. [Code]
-
Robust Incremental Neural Semantic Graph Parsing.
Jan Buys and Phil Blunsom.
ACL 2017. [Code]
-
Oxford at SemEval-2017 Task 9: Neural AMR Parsing with Pointer-Augmented Attention.
Jan Buys and Phil Blunsom.
SemEval 2017 Shared Task.
-
Online Segment to Segment Neural Transduction.
Lei Yu, Jan Buys and Phil Blunsom.
EMNLP 2016.
-
Cross-Lingual Morphological Tagging for Low-Resource Languages.
Jan Buys and Jan Botha.
ACL 2016.
-
Generative Incremental Dependency Parsing with Neural Networks.
Jan Buys and Phil Blunsom.
ACL 2015. [Code]
-
A Bayesian Model for Generative Transition-based Dependency Parsing.
Jan Buys and Phil Blunsom.
Depling 2015. [Code]
-
A Tree Transducer Model for Grammatical Error Correction.
Jan Buys and Brink van der Merwe.
CoNLL 2013 Shared Task.
-
Chorale Harmonization with Weighted Finite-state Transducers.
Jan Buys and Brink van der Merwe.
PRASA 2012.
-
Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution.
Ben Murrell, Thomas Weighill, Jan Buys, Robert Ketteringham, Sasha Moola, Gerdus Benade, Lise du Buisson, Daniel Kaliski, Tristan Hands and Konrad Scheffler.
PLoS ONE 2011.
Theses
-
Incremental Generative Models for Syntactic and Semantic Natural Language Processing.
DPhil thesis, University of Oxford, 2018.
-
Probabilistic Tree Transducers for Grammatical Error Correction.
MSc thesis, University of Stellenbosch, 2013.
-
Generative Models of Music for Style Imitation and Composer Recognition.
Honours project report, University of Stellenbosch, 2011.