Exploring Multimodal AI: A Model-Centric Approach

  • Type:Master's thesis
  • Date:Immediately
  • Supervisor:

    Nidhi Mishra

  • Background:

    In the rapidly evolving field of artificial intelligence, multimodal AI stands out as a critical area of research. It involves integrating and processing multiple data types, such as text, images, audio, and sensory data, to develop more robust and intelligent systems. This project aims to explore the model-centric facets of multimodal AI, focusing on developing and optimizing AI models that effectively process and integrate multiple data modalities. We are seeking motivated and passionate students at the bachelor's or master's level to join us in this exciting journey of discovery and innovation.

     

    Research Goal:

    The overarching goal of this research is to advance the understanding and application of multimodal AI by investigating model-centric approaches. By the end of this project, we aim to develop innovative AI models that set new standards in multimodal data processing. Specifically, the research will strive to achieve the following:

    1. 1. Develop and optimize AI models that effectively integrate multiple data modalities.
    2. 2. Implement advanced techniques such as cross-modal embeddings and attention mechanisms.
    3. 3. Evaluate the performance of these models in real-world applications.

     

    Working on this Thesis You Will:

    Students participating in this project will engage in the following activities:

    • - Model Development:
      • Design and implement AI models that integrate multiple data modalities using state-of-the-art techniques.
      • Developing a model that combines visual and textual data to improve image captioning accuracy.
    • - Performance Evaluation:
      • Conduct rigorous testing and evaluation of the developed models on various multimodal datasets.
      • Evaluating the model’s performance on a benchmark dataset for image-text retrieval tasks.
    • - Expected Outcomes:
      • By the end of this thesis project, students will have:
      • A deep understanding of multimodal AI and its applications.
      • Practical experience in developing and optimizing AI models for multimodal data.
      • Insights into the performance and limitations of different model architectures.
      • A comprehensive thesis document that contributes to the academic and practical knowledge of multimodal AI.

     

    We Look Forward to Receiving Your Application Because You:

    We are looking for candidates who:

    • Are currently enrolled in KIT bachelor's or master's program.
    • Have a strong foundation in machine learning and artificial intelligence.
    • Are proficient in programming languages such as Python and have experience with AI frameworks like TensorFlow or PyTorch.
    • Are passionate about AI research and eager to explore new frontiers in multimodal AI.
    • Possess excellent analytical, problem-solving, and communication skills.

     

    Details:

    Start: Immediately

    Duration: 6 months

    Language: English

    Location: Up to you

     

    How to Apply:

    Interested students should submit the following:

    A resume or CV highlighting relevant coursework, projects, and skills.

    A brief statement of interest explaining why you are interested in this project and how your background and skills make you a suitable candidate.

    Any relevant academic transcripts or references.

     

    Contact Information:

    For more information or to submit your application, please contact us at:

    nidhi.mishra@kit.edu 

    Join us in pushing the boundaries of artificial intelligence and making significant contributions to the field of multimodal AI. We look forward to working with talented and driven students who are ready to take on this exciting challenge.

     

    References:

    Liang, Paul Pu, Amir Zadeh, and Louis-Philippe Morency. "Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions." ACM Computing Surveys (2023).