Hybrid Synchronous Course | Summer 2026 (June 9 – August 4, 2026)

1. Instructor

Dr. Sonya Zhang
E-mail: xszhang@cpp.edu 

  • If you email me, please use “IBM 6600” in the subject line, spell your full name in the email body, and send it from your cpp.edu account.
  • Please do not submit any homework to me via email – it will not be accepted or graded. All homework should be submitted to either Canvas or Cengage as instructed.
  • I typically aim to reply to emails within 24 hours. However, please note that this timeframe does not include weekends. Therefore, I recommend emailing on Thursday or early Friday if you would like to receive a response before the weekend.
  • Office hours and the instructor’s Zoom link are provided in Canvas – Modules – Welcome to Class.

2. Class meetings

This hybrid course includes four face-to-face meetings (6/5, 6/12, 6/26, 7/24) at Bldg 163 Rm 1004 from 5:30 p.m. to 9:45 p.m. and four synchronized Zoom meetings (7/10, 7/17, 7/31, 8/7). All learning materials and activities will be posted online using Canvas.

Holidays – No Class Meetings: June 19 Juneteeth & July 3rd Independence Day reserved.

3. Course Description

Data collection, preparation, visualization, and analysis with a focus on text mining. Topics include: data collection and cleaning, exploratory data analysis and visualization, Natural Language Processing (NLP), classification, clustering, topic modeling, sentiment analysis, Large Language Model (LLM), and Generative AI. The Python programming language will be used.

4. Learning Objectives

Students successfully completing this course should have acquired the ability to:

  1. Describe and visualize data from social media sites/apps for further exploration and analysis. This is aligned with the MSBA program learning objective 1, storytelling.
  2. Collect and pre-process text by removing noise, parsing, normalizing formats, tokenizing, and applying stemming or lemmatization. Additionally, conduct spell checking, Part of Speech (PoS) tagging, and Named Entity Recognition (NER). Perform feature engineering such as  vectorization to represent text data numerically. This is aligned with MSBA program learning objectives 2, 3 and 4.
  3. Perform text mining techniques, such as classification, clustering, topic modeling, sentiment analysis, and text summarization. Evaluate the model performance, analyze the outcomes, and offer actionable business insights and solutions. This is aligned with MSBA program learning objectives 2, 3, and 4.
  4. Discuss and apply advanced techniques, including deep learning, LLM, and Gen-AI, for text generation to develop business applications. This is aligned with MSBA program learning objectives 2, 3, and 4.

5. Textbook and Software

Required Software

  • Google Colab, Anaconda, Jupyter Notebook, or other Python IDE
  • Access to a computer and the Internet

Useful Resources

Useful Data Sources

Recommended books:

6. Projects and Assignments

  • Quizzes: Students will take quizzes on text-mining knowledge and skills based on lectures and exercises.
  • Individual Literature Review: Each student will conduct a literature review on a theme related to their group project.
  • Group Project: Students will collaborate in teams of 3–4 to analyze a real-world text dataset, applying the full range of knowledge and skills acquired throughout the course. Each group will submit a research paper and deliver an in-class presentation as final deliverables. Groups will be formed after the add/drop period; students may form their own teams or request instructor assignment.

Make-up policy: Make-up exams are not offered except for serious and compelling reasons substantiated by formal and authoritative documents.

Late submission: Late assignments or projects will incur a 50% penalty if submitted within 24 hours after the deadline, and no late work will be accepted afterward.

Plagiarism: Students’ written assignments will be checked for plagiarism detection through AI software and Turnitin, which will check against not only Internet sources but also previous students’ work submitted to Canvas. Plagiarism activities will be reported to the university’s Student Conduct Office.

AI Use Policy: Generative AI tools such as ChatGPT, Gemini, Claude, Perplexity, and Copilot may be used for brainstorming, code, image or video generation, creating study materials, and text editing. However, you must clearly indicate what the AI produced and what you contributed, and you must disclose your use of AI in the assignment (e.g., in a note or footnote). I expect you to be the author of all work you turn in. If I ask about your assignments or projects, you should be able to explain them in depth and demonstrate mastery of the material without assistance. AI tools may support your analysis, but they cannot replace your own reasoning or interpretation. You are responsible for explaining results and showing your understanding. Reflections on readings and assignments must be written entirely by you without AI assistance. These are meant to demonstrate the quality of your own ideas and the personal nature of your reflection.

Student Responsibilities:

  • Each student is responsible for completing and submitting all assignments and projects. Corrupted files or incomplete submissions will not be credited. Students are also responsible for keeping a backup copy of each submission.
  • To ensure fairness, the instructor will NOT review, debug, or fix problems in student assignments and projects BEFORE grading the entire class. The instructor will, however, help students understand expectations, clarify requirements, provide guidance, help students gain knowledge and skills in analysis, design, and problem-solving, and answer specific questions on course topics.
  • Students must have spent a significant and reasonable amount of time and effort researching and working on the issue independently BEFORE asking for help.

7. Grading

GradePercentage
A93.00-100.00
A-90.00-92.99
B+87.00-89.99
B83.00-86.99
B-80.00-82.99
C+77.00-79.99
C73.00-76.99
C-70.00-72.99
D+67.00-69.99
D63.00-66.99
D-60.00-62.99
F0-59.99
ItemPercentage
Quizzes25
Individual Literature Review25
Group Project50
Total100

8. Course Schedule

Module/WeekTopicsActivities
Module 1Introduction to Text Analytics and Natural Language Processing (NLP)Post a self-introduction on the discussion board

Quiz & Group Project
Module 2Data Collection and Pre-processing
– Popular Python Libraries for Scraping

Natural Language Processing (NLP)
– NLTK, SpaCy
– Basic Text Cleaning: Regular expressions (RegEx), text normalization, stopword removal, stemming/lemmatization, tokenization, sentence segmentation.
– Part of Speech (PoS) tagging
– Named Entity Recognition (NER)

Exploratory data analysis (EDA) and visualization
Selected Individual lit review presentations

Quiz & Group Project
Module 3Text Representation and Embeddings
– Text Vectorization: Bag of Words (BoW), TF-IDF
– Word embeddings: Word2Vec, GloVe, fastText
– Contextual embeddings: BERT/GPT embeddings (for representation only, not full modeling)
– Semantic representations
Selected Individual lit review presentations

Quiz & Group Project
Module 4Text Classification
– Text feature extraction: CountVectorizer, TfidfTransformer, TfidfVectorizer
– Text classification (Supervised learning) algorithms: Logistic regression, Naive Bayes, KNN, SVM, Decision Tree.
– Sample Balancing
– Model Performance Evaluation
Selected Individual lit review presentations

Quiz & Group Project
Module 5Topic Modeling
– Lexicon-based: LDA (Latent Dirichlet Allocation), NMF (Non-Negative Matrix Factorization)
– Transformer-based: BERTopic
Selected Individual lit review presentations

Quiz & Group Project
Module 6Sentiment Analysis
– Lexicon-based: TextBlob, VADER (Valence Aware Dictionary and sEntiment Reasoner)
– Transformer-based: BERT, FinBERT
Selected Individual lit review presentations

Quiz & Group Project
Module 7Modern NLP and Generative AI
– Transformers
– LLM (Large Language Model )
– Vector databases (concepts)
– Retrieval-Augmented Generation (RAG)
– Prompt Engineering
– Generative AI (GPT, T5)
Selected Individual lit review presentations

Quiz & Group Project
Module 8Group project presentationGroup project presentations and deliverables

9. University Policies

Accessibility: Cal Poly Pomona is committed to student success as a learning-centered university. Students with disabilities are encouraged to contact the instructor privately or to visit the Disability Resource Center to coordinate course accommodations.

Computing Resources: At Cal Poly Pomona, computers and communications links to remote resources are recognized as being integral to the education and research experience. Every student must have access to a computer with all the required software for this course. Contact I&IT if you need help.

Academic Integrity: The University is committed to maintaining academic integrity throughout the university community. Academic dishonesty is a serious offense that can diminish the quality of scholarship, the educational environment, the academic reputation, and the quality of a Cal Poly Pomona degree. Plagiarism or cheating will not be tolerated in this course. 

Copyright Policy: Copyright laws and fair use policies protect the rights of those who have produced the material. The copy in this course has been provided for private study, scholarship, or research. Other uses may require permission from the copyright holder. The user of this work is responsible for adhering to the copyright law of the U.S. (Title 17, U.S. Code). The course website contains material protected by copyrights held by the instructor, other individuals, or institutions. Such material is used for educational purposes in accordance with copyright law and/or with permission given by the owners of the original material. Students may download one copy of the materials on any single computer for non-commercial, personal, or educational purposes only, provided that (1) do not modify it, (2) use it only for the duration of this course, and (3) include both this notice and any copyright notice originally included with the material. Beyond this use, no material from the course website may be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any way without the original copyright holder’s permission. The instructor assumes no responsibility for individuals who improperly use copyrighted material placed on the website.