UTAR Institutional Repository

Deep learning for scene visualization and sentence-based image synthesis

Beh, Teck Sian (2023) Deep learning for scene visualization and sentence-based image synthesis. Final Year Project, UTAR.

[img]
Preview
PDF
Download (5Mb) | Preview

    Abstract

    Deep learning and data mining is a subset of machine learning. This project requires to study mainly in field of deep learning and data mining. The research question to be addressed is solve deep learning for scene visualization and sentence-based image synthesis through image classification and image captioning using language python and anaconda navigator. Image classification is a part of project that has many practical applications in different fields, ranging from object recognition, medical imaging, content moderation, and quality control. Image captioning generator is simple which take an image and try to generate a caption that matches the gist of that image closely as possible, which include whole meaning of one picture in just one sentence, which saves times. The image captioning between NLP and computer vision and work in coordination to make image captioning possible and the attention mechanism came to rescue. The methodology and techniques included in the project are research-based project, which in the research process. Research methods and tools to be used were language python and anaconda navigator to launch the jupyter notebook and google colab. Besides that, the dataset was gotten from Kaggle which is Flickr8k Dataset to launch the progress. The platform uses to run the datasets is Jupyter notebook and google colab to run the coding input and give output to judge validity and generality of results. The projects image processing contributes to computer vision applications, such as object detection, classification, and tracking. Scene visualization allows computer understand objects, environment and sentence-based image synthesis enabled computers generate images from textual descriptions. User can random insert picture and system will detect the images given with suitable text description. This project used to generate visual instructions for robots to perform tasks and create more realistic and immersive gaming environments. For advertising and marketing, these techniques can used to generate personalized ads or product recommendations based on customer preferences. For example, sentence-based image synthesis can used to create custom product images based on user input or social media data. These neural networks try to mimic how the human brain functions. Using a public dataset as training data, a deep learning method called CNN used to detect and segment multiple targets in two-dimensional (2D) elemental images for integral imaging system. A range of applications are embracing these techniques to build virtual scenes by verbal description in tandem with advancement of computer graphics, natural language processing, and computing power. Image captioning with start an image and pass it through a pre-trained ImageNet model like inception v3 and produce output feature vectors. Inception v3 vii is a large network with many pooling, convolution, and fully connected layers which have higher accuracy in the ImageNet dataset which knows as transfer learning for layer output.

    Item Type: Final Year Project / Dissertation / Thesis (Final Year Project)
    Subjects: T Technology > T Technology (General)
    T Technology > TA Engineering (General). Civil engineering (General)
    T Technology > TD Environmental technology. Sanitary engineering
    T Technology > TN Mining engineering. Metallurgy
    Divisions: Faculty of Information and Communication Technology > Bachelor of Information Systems (Honours) Information Systems Engineering
    Depositing User: ML Main Library
    Date Deposited: 02 Jan 2024 23:20
    Last Modified: 02 Jan 2024 23:20
    URI: http://eprints.utar.edu.my/id/eprint/5988

    Actions (login required)

    View Item