Skip Navigation and Go To Content

machine learning datathon covid-19 logo

SBMI Datathon 2021 for Stroke Prediction

Registration Deadline: March 25th, 2021

Competition Date: March 27th- 28th, 2021

Shayan Shams, Yejin Kim, Xiaoqian Jiang, Sean Savitz

About the Datathon

In 2017, 7.8 million adults in the U.S. reported having survived a stroke. While deaths attributable to stroke have declined, stroke remains a leading cause of morbidity and disability. By 2030, stroke-related costs are expected to reach $183 billion. Despite early treatment, stroke survivors often have a severe long-term disability including both physical and cognitive issues that require constant monitoring and care from the community. Rehabilitation is essential to recovery and begins soon after the injury when the brain is especially receptive to processes that can enhance repair . The appropriate quantity, quality, and timing of rehab therapy is unknown to optimize outcomes and remedy disabilities effectively. An accurate prediction of the functional and cognitive outcome at the acute stage of stroke is important for a personalized rehabilitation plan and improving communication among patient, family, and clinicians regarding possible outcomes and expectations.


The theme of this Datathon is to ask participants to compete on the development of algorithms to predict changes in cognitive and Functional Independence Measure (FIM) scores (18 subcategories) during inpatient rehabilitation (difference between admission FIM score and discharge for each subcategory). FIM score is extensively used across North America to measure disabilities. It includes eighteen subcategories of assessment items, grouped in six sections. The FIM assesses both motor and cognitive functions, and an increasing FIM score implies functional improvement while a decreasing score implies a decline in the patient's functional status.

FIM score for each category range from 1 to 7 where:

7 6 5 4 3 2 1
完全独立 Modified Independence Supervision 最小的帮助 Moderate Assistance 最大帮助 Total Assistance or not Testable


The participants are expected to develop algorithms to jointly predict changes in FIM score during inpatient rehabilitation in each subcategory from admission to discharge.

Predictive variables

The predictive variables consist of both continuous and categorical variables. While a great deal of effort has been invested in organizing and cleaning the dataset, participants are expected to be able to use novel strategies to deal with missing values in predictive variables.


In this machine learning challenge, we ask the participants to build models (in a justifiable manner) and evaluate final performance, based on L1 (Manhattan) distanceL1距离的示例表示FIM分数的实际和预测变化(即P子类别)。如果表现有联系,则应将参与者的绩效绑定,将对模型的解释性和预测变量重要性的识别进行额外考虑。

Example of final output:

ID Eating-Change Bathing-Change Memory-Change
100 5 7 1 ... 3 2
101 2 7 3 2 5
102 4 3 1 1 2

Data Description


Image of the Train.CSV file



  • Participants are required to submit source codes (e.g., Jupyter Notebook) in a self-contained manner.
  • 严格禁止从我们的服务器下载数据并在竞争后在本地保存这些数据以供当地使用。
  • 不允许在竞争期间私下共享我们提供的环境之外的数据。
  • 参与者必须使用算法方法进行预测。对方法的任何更改都必须以自动化的方式进行,以便可以将方法推广到新主题。
  • Use of external data is permitted.
  • All participants are asked to prepare summary slides that will describe their models.
  • The top three participants will be asked to give a short presentation and the top ten participants on the leaderboard may have an opportunity to publish their results in the special issue of a journal (under negotiation).
  • 参与者最多可以提交10个条目。出于最终判断的目的,具有最佳性能的条目。


A total of $1,500 sponsored by UTHealth

  • 第一名:$ 1000
  • Second place: $300
  • 第三名:$ 200



本科生和研究生目前正在参加硕士课程的第一年/第二年或博士学位的头两年。来自墨西哥湾沿岸财团内部机构的计划(包括Uthealth,MDACC,UH,RICE,TAMU,UTMB,IBT和BAYLOR等)。此外,鼓励来自休斯顿地区的合格学生(例如HBU,SHSU,TSU,PVAMU,圣托马斯大学,UH-Clear Lake,UH-Sugar Land和UH-Victoria等)申请。
No, this competition is free.
No, due to COVID-19 pandemic, the competition will be held remotely. Our team will provide you with required VPN and access to the coding environment.
这是一个编码datathon。你预计将甲型肝炎e mastered basic programming skills and have knowledge of machine learning.
Our panel of experts is composed of faculty members from UTHealth School of Biomedical Informatics. Your project will be judged via an automated leaderboard program; each contestant can only submit 10 times. The top 3 contestants will be asked to make a short presentation on their solution at the end of the event.
