
FAJRI KOTO
MBZUAI
I am a Postdoctoral Research Fellow at MBZUAI. Previously, I did my PhD at Unimelb, fortunately supervised by Prof. Timothy Baldwin and Dr. Jey Han Lau, and sponsored by Australia Awards scholarship. I was also an applied scientist intern at Amazon.
- Fajri Koto
- Research Fellow
- Natural Language Processing
- Abu Dhabi, UAE
- fajri.koto91@gmail.com
- fajri91
What's New?
-
2023-10-08: Paper accepted to EMNLP 2023
2023-09-05: Paper accepted to AACL 2023
2023-08-30: We release JAIS and JAIS-chat, the largest Arabic LLM
2023-05-10: Paper accepted to ACL 2023
2023-05-06: We won the Outstanding Paper Award for EACL 2023
2023-01-22: Paper accepted to EACL 2023
2022-12-06: I will attend EMNLP in person
2022-10-22: I join MBZUAI as a new Postdoctoral Research Fellow
2022-09-15: Invited Talk to University of Toronto, University of Queensland, and Binus University
2022-09-15: 1st Place of ALTA 2022 shared task
2022-09-07: Paper accepted to CODI at COLING 2022
2022-08-16: Paper accepted to COLING 2022
2022-06-28: I'll be one of the keynote panels at ACL 2022, Dublin, Ireland
2022-05-14: We won the Best Paper Award for CSRR 2022 and I got married at the same day
2022-04-04: Paper accepted to ECNLP at ACL 2022
2022-03-28: Paper accepted to CSRR at ACL 2022
2022-01-23: Paper accepted to ACL 2022
2021-12-14: Paper accepted to JAIR 2022
2021-10-15: 2nd Place of ALTA 2021 shared task
2021-09-26: Paper accepted to EMNLP 2021
2021-08-21: I'll be the speaker in DaTalk, Jakarta Artificial Intelligence Research
2021-08-02: I'll join Amazon as Applied Scientist Intern
2021-05-06: Paper accepted to Findings of ACL 2021
2021-03-28: Selected as one of Nominee for Data Researcher, Data Science Indonesia Award
2021-03-10: Paper accepted to NAACL 2021
2021-01-12: Paper accepted to EACL 2021
2021-01-08: I'll be the speaker in INACL Webinar BEDAH PAPER #15
2020-12-16: I'll be the speaker in IR-NLP Talk, Fasilkom, Universitas Indonesia
2020-09-30: Paper accepted to COLING 2020
2020-09-11: Paper accepted to AACL-IJCNLP 2020
2020-09-05: Paper accepted to PACLIC 2020
2019-12-03: I'll be the speaker in NLP Sydney Meetup
2019-11-20: Paper accepted to ALTA 2019
2019-07-31: I pass my confirmation, officially a PhD Candidate.
2018-07-22: I officially started my PhD
Academic History
-
PhD of Computer Science2018 - 2022
The University of Melbourne
Fully funded program by Australia Awards Scholarship PhD Thesis: "From Discourse and Keyphrases, to Language Modeling in Automatic Summarization" Advisor : Prof. Timothy Baldwin and Jey Han Lau, Ph.D.
-
Master of Computer Science2013 - 2014
Universitas Indonesia
Graduated with Cum Laude (first class honor) Final thesis: "A Comparative Study over Twitter Sentiment Analysis: Which Features are Good?" Advisor : Mirna Adriani, Ph.D.
-
Bachelor of Computer Science2009 - 2013
Universitas Indonesia
Graduated with Cum Laude (first class honor) Final thesis: "Touch Sensor based Keyboard Driver using AVR ATxmega 256 A3BU" Advisor : Bob Hardian, Ph.D.
Working Experience
-
Postdoctoral Research Fellow2022 - present
MBZUAI, UAE
Engaged in Natural Language Processing research with Prof. Timothy Baldwin and Prof. Iryna Gurevych (TU Darmstadt, Germany). I am part of the core evaluation team for large language model initiatives, in collaboration with Prof. Preslav Nakov, Prof. Eric Xing, Dr. Haonan Li, Dr. Zhengzhong Liu (Peetum Inc), and Dr. Willie Neiswanger (Stanford)
-
Applied Scientist Intern2021 - 2022
Amazon, AUSTRALIA
Working on NLP and Computer Vision. Projects: 1) multimodal language generation system; 2) information extraction system. Advisors: Prof. Chunhua Shen and Prof. Anton van den Hengel.
-
Tutor2020 - 2021
School of CIS, University of Melbourne, AUSTRALIA
a. Natural Language Processing COMP90042 (Semester 1, 2020) - Dr. Jey Han Lau b. Natural Language Processing COMP90042 (Semester 1, 2021) - Dr. Jey Han Lau
-
Data Scientist2016 - 2017
PT KMK-Labs / EMTEK Group, INDONESIA
Mainly working on a spam detection system that is integrated in BBM, Vidio, and Liputan6. Another responsibilities include data migration, tracking system, and logs extraction in AWS and GCP. Manager: Hafiz Badrie Lubis
-
Research Engineer2014 - 2016
Samsung Research Institute INDONESIA
Delivering 3 global (US) and 1 local (ID) patents. Manager: Agus Kurniawan
-
Research InternSummer 2013
Nara Institute of Science and Technology (NAIST), JAPAN
Working on Speech Technology at AHC Labs , with research topics: 1) Speech Summarization and 2) Quote Detection on Speech Advisors : Dr. Sakriani Sakti, Dr. Graham Neubig, Prof. Tomoki Toda, and Prof. Satoshi Nakamura
-
Teaching Assistant2010 - 2013
Faculty of Computer Science, University of Indonesia
a. Calculus 1 (Fall 2010) - Dr. Kasiyah b. Private Tutor (Spring 2011) - NA c. Database (Fall 2011) - Dr. Ika Alfina d. Discrete Math 1 (Spring 2012) - Prof. Belawati H. Widjaja, Ph.D e. Private Tutor (Spring 2012) - NA f. Statistic and Probability (Fall 2012) - Dr. Ika Alfina g. Theory of Language and Automata (Spring 2013) - Dr. Dina Cahyati h. Statistic and Probability (Fall 2013) - Dr. Ika Alfina
-
Android DeveloperSummer 2012
PT Astra International, INDONESIA
Awards & Grants
-
2023: Secured a research grant with a total value of USD 900K (3 years), "Brain Science and Neural Network", serving as the Co-Investigator (Co-I) with Dr. A. Shelmanov, Prof T. Baldwin, Prof M. Tsodyks
2022: Outstanding paper award of EACL 2023
2022: Rank 1st of ALTA 2022 shared task
2022: Best paper award of CSRR 2022 at ACL
2021: Rank 2nd of ALTA 2021 shared task
2021: Data Science Indonesia Award: Nominee for Data Researcher
2021: FEIT Unimelb conference travel scholarship for attending EACL, NAACL, ACL, EMNLP as presenter
2020: MSE Unimelb conference travel scholarship for attending AACL, COLING as presenter
2017: Australia Awards Scholarship (out of 5300+ applicants), estimated total of awards for PhD: A$358,000
2014: Best session presenter at ICACSIS (International Conference on Advanced Computer Science and Information System 2014)
2014: Cum Laude Award (Top 5), Graduation of Master Degree in Faculty of CS, Universitas Indonesia
2013: Cum Laude Award (Top 5), Graduation of Bachelor Degree in Facutly of CS, Universitas Indonesia
2013: Awardee of Japan Student Services Organization (JASSO), Summer research internship at NAIST, Japan
2012: Awardee of Fast track DIKTI (Minister of Higher Education) scholarship, Bachelor + Master degree at Universitas of Indonesia
2009: PPKB award for high-achiever high school students to get admitted to Universitas Indonesia without national exam
2008: Rank 7th (out of 1000+), High School Math competition, West Sumatra province, Indonesia
2007: Semi-finalist, Junior High School Physic competition, West Sumatra province, Indonesia
2006: Top 200 (0.5%) National (out of 40,000+), Junior High School Math competition, PASIAD, Indonesia
Invited Talks
-
10-2022: Binus University (Guest Lecture), "NLP for Indonesian Languages: The Current States and Future Works"
09-2022: University of Toronto, "Domain-Adaptive Pretraining in Indonesian Languages: The Current State, Challenges, and Opportunities"
09-2022: University of Queensland, "Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions?"
05-2022: Keynote Panel of ACL 2022, "Supporting Linguistic Diversity"
06-2022: Indonesian Association for Computational Linguistics, "Scientific Article Writing"
11-2021: University of Indonesia (Guest Lecture), "Indonesian NLP with Pretrained Language Models: State of the Art
08-2021: DaTalk, Jakarta Artificial Intelligence Research, "Natural Language Understanding Benchmark across Languages"
01-2021: Indonesian Association for Computational Linguistics, "IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP"
12-2020: University of Indonesia (IR-Lab), "Document Summarization in Indonesian Text: Resources and Benchmark Model"
10-2020: Data Science Indonesia, "IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP"
12-2019: NLP MeetUp at Sydney, Australia, "Improved Document Modelling with a Neural Discourse Parser"
12-2016: Wrangle Conference, Malaysia, "Spam Text and Video Detection System at Indonesian Media Company"
Academic Services
-
2023: Mentor of ACL-SRW; AACL-SRW; Primary reviewer of EACL; ACL; AACL; EMNLP; NLPCC; ACL-SRW; AACL-SRW; AI Open; Transactions on Audio, Speech and Language Processing; PRICAI; ALTA;
2022: Virtual Poster Session Chair of EMNLP (Language and Resource); Mentor of ACL-SRW; Primary reviewer of ICLR; ACL (ARR); NAACL (ARR); EMNLP; NAACL-SRW; LREC; ALTA; NLPCC; ICONIP; ICAICTA;
2021: Primary reviewer of ACL; EMNLP; ALTA; ICONIP
2019: Secondary reviewer of NAACL
2017: Primary reviewer of Knowledge-Based System Elsevier Journal
Student Supervision
-
Andrew Shen, BSc Student, 2021: Discourse Analysis - co-supervised with Prof. Timothy Baldwin and Dr. Jey Han Lau. Currently a Master student at CMU.
Link: Research Gate and Google Scholar. * indicates equal contribution.
-
2023
Fajri Koto, Nurul Aisyah, Haonan Li, and TImothy Baldwin. Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU. In Proceedings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), Singapore. [paper] [code]
Samuel Cahyawijaya*, Holy Lovenia*, Fajri Koto*, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, hanung linuwih, Bryan Wilie, Galih Muridan, Genta Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti and Pascale Fung. NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages. In Proceedings of the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL 2023), Bali, Indonesia.
Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, and Iryna Gurevych. Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings. Preprint. [paper]
Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Alham Fikri Aji, Zhengzhong Liu, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Preslav Nakov, Timothy Baldwin, and Eric Xing. Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models. Technical Report. [paper] [model] [website]
Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, and Timothy Baldwin. CMMLU: Measuring massive multitask language understanding in Chinese . Preprint. [paper] [code]
Haonan Li*, Fajri Koto*, Minghao Wu, Alham Fikri Aji, and Timothy Baldwin. Bactrian-X: A Multilingual Replicable Instruction-Following Model with Low-Rank Adaptation. Preprint. [paper] [code]
Samuel Cahyawijaya*, Holy Lovenia*, Alham Fikri Aji*, Genta Indra Winata*, Bryan Wilie*, Fajri Koto*, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Jennifer Santoso, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Muhammad Satrio Wicaksono, Ivan Halim Parmonangan, Ika Alfina, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh Dhole, Arie Suryani, Rifki Afina Putri, Dan Su, Keith David Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius Hadiwijaya, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Inastra Damapuspita, Haryo Akbarianto Wibowo, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Pascale Fung, Herry Sujaini, Sakriani Sakti, and Ayu Purwarianti. NusaCrowd: Open Source Initiative for Indonesian NLP Resources. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada.
Genta Indra Winata*, Alham Fikri Aji*, Samuel Cahyawijaya*, Rahmad Mahendra*, Fajri Koto*, Ade Romadhony*, Kemal Kurniawan*, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, and Sebastian Ruder. NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Dubrovnik, Croatia. [paper] [code] (Outstanding Paper Award)
-
2022
Fajri Koto. From Discourse and Keyphrases, to Language Modeling in Automatic Summarization. Ph.D. Thesis, The University of Melbourne, 2022. [thesis]
Fajri Koto, Timothy Baldwin, and Jey Han Lau. FFCI: A Framework for Interpretable Automatic Evaluation of Summarization. Journal of Artificial Intelligence Research (JAIR 2022). [paper] [code]
Fajri Koto, Timothy Baldwin, and Jey Han Lau. LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022), Gyeongju, Republic of Korea. [paper] [code]
Andrew Shen, Fajri Koto, Jey Han Lau, and Timothy Baldwin. Easy-First Bottom-Up Discourse Parsing via Sequence Labelling. In Proceedings of the 3rd Workshop on Computational Approaches to Discourse (CODI at COLING 2022), Gyeongju, Republic of Korea. [paper] [code]
Alham Fikri Aji*, Genta Indra Winata*, Fajri Koto*, Samuel Cahyawijaya*, Ade Romadhony*, Rahmad Mahendra*, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, and Sebastian Ruder. One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), Dublin, Ireland. [paper]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ads Text for Product Descriptions?. In Proceedings of the 5th Workshop on e-Commerce and NLP (ECNLP at ACL 2022), Dublin, Ireland. [paper]
Fajri Koto, Timothy Baldwin, and Jey Han Lau. Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian. In Proceedings of Commonsense Representation and Reasoning Workshop 2022 (CSRR at ACL 2022), Dublin, Ireland. [paper] [data] (Best Paper Award)
Biaoyan Fang*, and Fajri Koto*. Context-Aware Sentence Classification in Evidence-Based Medicine. In Proceedings of the Australasian Language Technology Association Workshop 2022 (ALTA 2022), Adelaide, Australia. (1st place in the shared task)
2021
Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), Dominican Republic (virtual). [paper] [code]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Evaluating the Efficacy of Summarization Evaluation across Languages. In Findings of the Association for Computational Linguistics: ACL 2021, Bangkok (virtual). [paper] [data]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Discourse Probing of Pretrained Language Models. In Proceedings of the 20th Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021), Mexico (virtual). [paper] [code]
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Top-down Discourse Parsing via Sequence Labelling. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021), Greece (virtual). [paper] [code]
Fajri Koto*, and Biaoyan Fang*. Handling Variance of Pretrained Language Models in Grading Evidence in the Medical Literature. In Proceedings of the Australasian Language Technology Association Workshop 2021 (ALTA 2021), Australia (virtual). (2nd place in the shared task) [paper]
2020
Fajri Koto, Afshin Rahimi, Jey Han Lau, and Timothy Baldwin, IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), Spain (virtual). [paper] [code] [website]
Fajri Koto, Jey Han Lau, and Timothy Baldwin, Liputan6: A Large-scale Indonesian Dataset for Text Summarization. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL 2020), China (virtual). [paper] [code]
Fajri Koto, and Ikhwan Koto, Towards Computational Linguistics in Minangkabau Language: Studies on Sentiment Analysis and Machine Translation. In Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation (PACLIC 2020), Vietnam (virtual). [paper] [code]
2019
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Improved Document Modelling with a Neural Discourse Parser. In Proceedings of the 17th Australasian Language Technology Workshop (ALTA 2019), Sydney, Australia. [paper] [code]
2017
Fajri Koto, and Gemala Y. Rahmaningtyas. InSet Lexicon: Evaluation of a Word List for Indonesian Sentiment Analysis in Microblogs. In Proceedings of the 21st International Conference on Asian Language Processing. IEEE. (IALP 2017), Singapore. [paper] [data]
2016
Fajri Koto. A Publicly Available Indonesian Corpora for Automatic Abstractive and Extractive Chat Summarization . In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. [paper]
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. Automatic Detection of Memorable Spoken Quotes . In the 2016 Spring Meeting of the Acoustical Society of Japan (ASJ 2016), Yokohama, Japan. [paper]
Fajri Koto, and Omar Abdillah. Automatic Advisor for Detecting Summarizable Chat Conversations in Online Instant Messages . In Proceedings of the 12th International Conference on Computing and Information Technology. Springer. (IC2IT 2016), Thailand. [paper]
2015
Fajri Koto, and Mirna Adriani. HBE: Hashtag-Based Emotion Lexicons for Twitter Sentiment Analysis. In Proceedings of the 6th Forum for Information Retrieval. ACM. (FIRE 2015), Gandhinagar, India. [paper]
Fajri Koto, and Mirna Adriani. A Comparative Study on Twitter Sentiment Analysis: Which Features are Good? In Proceedings of the 20th International Conference on Applications of Natural Language To Information Systems. Springer. (NLDB 2015), Passau, Germany. [paper]
Fajri Koto, and Mirna Adriani. The Use of POS Sequence for Analyzing Sentence Pattern in Twitter Sentiment Analysis In Proceedings of the 8th International Symposium on Mining and Web (joint with the 29th AINA Conference). IEEE. (MAW-WAINA 2015), Gwangju, Korea. [paper]
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. A Study On Natural Expressive Speech: Automatic Memorable Spoken Quote Detection. In Proceedings of the 6th International Workshop on Spoken Dialog Systems. Springer. (IWSDS 2015), Busan, Korea. [paper]
2014
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. The Use of Semantic and Acoustic Features for Open-Domain TED Talk Summarization. In Proceedings of the 6th Asia Pacific Signal and Information Processing Association. IEEE. (APSIPA 2014), Siem Reap, Cambodia. [paper]
Fajri Koto. SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An Enhancement Strategy to Handle Imbalance in Data Level. In Proceedings of the 6th International Conference on Advanced Computer Science and Information Systems. IEEE. (ICACSIS 2014), Jakarta, Indonesia. [paper] [code]
Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, and Satoshi Nakamura. Memorable Spoken Quote Corpora of TED Public Speaking. In Proceedings of the 17th Oriental COCOSDA Conference. IEEE. (OCOCOSDA 2014), Phuket, Thailand. [paper]
Patents-
Patent United States US 2020/0082699 A1 - Gilang Kusuma Jati, Agus Kurniawan, Fajri "Personal safety device and operating method therefor" Issued March 12, 2020 [Patent]
Patent WO/2018/124584 A1 - Gilang Kusuma Jati, Agus Kurniawan, Fajri "Personal safety device and operating method therefor" Issued May 7, 2018 [Patent]
Patent United States US 2017/0177797 A1 - Agus Kurniawan, Fajri, Omar Abdillah "Apparatus and method for sharing personal electronic - data of health" Issued June 22, 2017 [Patent]
Patent United States US 2016/0147387 A1 - Yanuar Rahman, Omar Abdillah, Fajri "Method And Apparatus For Displaying Summarized Data" Issued November 20, 2015 [Patent]
Books-
Agus Kurniawan, Fajri Koto, Gilang Kusuma Jati, "Panduan Dasar Pemrograman Tizen" . Published by Samsung Research Indonesia. Jakarta, 2016. [Book]
-