Hybrid Synchronous Course | Summer 2026 (June 9 – August 4, 2026)
1. Instructor
Dr. Sonya Zhang
E-mail: xszhang@cpp.edu
- If you email me, pleaseuse “CIS 1010” in the subject line, spell your full name in the email body, and send it from your cpp.edu account.
- Please do not submit any homework to me via email – it will not be accepted or graded. All homework should be submitted to either Canvas or Cengage as instructed.
- I typically aim to reply to emails within 24 hours. However, please note that this timeframe does not include weekends. Therefore, I recommend emailing on Thursday or early Friday if you would like to receive a response before the weekend.
- Office hours and the instructor’s Zoom link are provided in Canvas – Modules – Welcome to Class.
2. Class meetings
This hybrid course includes four face-to-face meetings (6/9, 6/16, 6/23, 7/7) at Bldg 163 Rm 1005 from 5:30 p.m. to 9:45 p.m. and five synchronized Zoom meetings (6/30, 7/14, 7/21, 7/28, 8/4). All learning materials and activities will be posted online usingĀ Canvas.
3. Course Description
Data collection, preparation, visualization, and analysis with a focus on text mining. Topics include: data collection and cleaning, exploratory data analysis and visualization, Natural Language Processing (NLP), classification, clustering, topic modeling, sentiment analysis, Large Language Model (LLM), and Generative AI. The Python programming language will be used.
4. Learning Objectives
Students successfully completing this course should have acquired the ability to:
- Describe and visualize data from social media sites/apps for further exploration and analysis. This is aligned with the MSBA program learning objective 1, storytelling.
- Collect and pre-process text by removing noise, parsing, normalizing formats, tokenizing, and applying stemming or lemmatization. Additionally, conduct spell checking, Part of Speech (PoS) tagging, and Named Entity Recognition (NER). Perform feature engineering such as vectorization to represent text data numerically. This is aligned with MSBA program learning objectives 2, 3 and 4.
- Perform text mining techniques, such as classification, clustering, topic modeling, sentiment analysis, and text summarization. Evaluate the model performance, analyze the outcomes, and offer actionable business insights and solutions. This is aligned with MSBA program learning objectives 2, 3, and 4.
- Discuss and apply advanced techniques, including deep learning, LLM, and Gen-AI, for text generation to develop business applications. This is aligned with MSBA program learning objectives 2, 3, and 4.
5. Textbook and Software
Required Software
- Google Colab, Anaconda, Jupyter Notebook, or other Python IDE
- Access to a computer and the Internet
Useful Resources
- Stackoverflow.com
- Github.com
- Kaggle.com
- TowardsDataSicence.com
- Geeksforgeeks.org
- Medium.com
- W3School Python tutorial
Useful Data Sources
- Google Dataset Search (a search engine for datasets)
- Kaggle Datasets
- Data.gov
- Congressional and Federal Government Web Harvest
- Digest of Education Statistics
- English Corpora
- UC Irvine Machine Learning Repository
- Project Gutenberg
- Inside Airbnb
- Yelp Data Challenge
- IMDB
- https://www.innovatiana.com/en/post/best-datasets-for-text-classification
Recommended books:
- Mastering Text Analytics – A Hands-on Guide to NLP Using Python
- Text Analytics with Python, 2nd Edition https://www.apress.com/gp/book/9781484243534
- A Practitioner’s Guide to Natural Language Processing, Author: Sarkar, Dipanjan ISBN: 9781484243534
- Natural Language Processing with Python https://www.nltk.org/book/
- Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning 1st Edition https://www.amazon.com/Applied-Text-Analysis-Python-Language-Aware-ebook/dp/B07DNKHJL8/
- Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras (Vital Source available) https://www.amazon.com/Natural-Language-Processing-Computational-Linguistics-ebook/dp/B07BWH779J/
- An Introduction to Text Mining: Research Design, Data Collection, and Analysis (Vital Source available) https://www.amazon.com/Introduction-Text-Mining-Research-Collection-ebook/dp/B075Z3Q4PV/
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow-ebook/dp/B07XGF2G87/
- Natural Language Processing in Action: Understanding, analyzing, and generating text with Python https://www.amazon.com/Natural-Language-Processing-Action-Understanding-ebook/dp/B092J9WH4R/
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1492032646
6. Projects and Assignments
- Quizzes: Students will take quizzes on text-mining knowledge and skills based on lectures and exercises.
- Individual Literature Review: Each student will conduct a literature review on a theme related to their group project.
- Group Project: Students will collaborate in teams of 3-4 to analyze a real-world text dataset, applying the full range of knowledge and skills acquired throughout the course. Each group will submit a research paper and deliver an in-class presentation as final deliverables. Groups will be formed after the add/drop period; students may form their own teams or request instructor assignment.
Make-up policy: Make-up exams are not offered except for serious and compelling reasons substantiated by formal and authoritative documents.
Late submission: Late assignments or projects will incur a 50% penalty if submitted within 24 hours after the deadline, and no late work will be accepted afterward.
Plagiarism: Students’ written assignments will be checked for plagiarism detection through AI software and Turnitin, which will check against not only Internet sources but also previous students’ work submitted to Canvas. Plagiarism activities will be reported to the university’s Student Conduct Office.
AI Use Policy: Generative AI tools such as ChatGPT, Gemini, Claude, Perplexity, and Copilot may be used for brainstorming, code, image or video generation, creating study materials, and text editing. However, you must clearly indicate what the AI produced and what you contributed, and you must disclose your use of AI in the assignment (e.g., in a note or footnote). I expect you to be the author of all work you turn in. If I ask about your assignments or projects, you should be able to explain them in depth and demonstrate mastery of the material without assistance. AI tools may support your analysis, but they cannot replace your own reasoning or interpretation. You are responsible for explaining results and showing your understanding. Reflections on readings and assignments must be written entirely by you without AI assistance. These are meant to demonstrate the quality of your own ideas and the personal nature of your reflection.
Student Responsibilities:
- Each student is responsible for completing and submitting all assignments and projects. Corrupted files or incomplete submissions will not be credited. Students are also responsible for keeping a backup copy of each submission.
- To ensure fairness, the instructor will NOT review, debug, or fix problems in student assignments and projects BEFORE grading the entire class. The instructor will, however, help students understand expectations, clarify requirements, provide guidance, help students gain knowledge and skills in analysis, design, and problem-solving, and answer specific questions on course topics.
- Students must have spent a significant and reasonable amount of time and effort researching and working on the issue independently BEFORE asking for help.
7. Grading
| Grade | Percentage |
| A | 93.00-100.00 |
| A- | 90.00-92.99 |
| B+ | 87.00-89.99 |
| B | 83.00-86.99 |
| B- | 80.00-82.99 |
| C+ | 77.00-79.99 |
| C | 73.00-76.99 |
| C- | 70.00-72.99 |
| D+ | 67.00-69.99 |
| D | 63.00-66.99 |
| D- | 60.00-62.99 |
| F | 0-59.99 |
| Item | Percentage |
| Quizzes | 25 |
| Individual Literature Review | 25 |
| Group Project | 50 |
| Total | 100 |
8. Course Schedule
| Module/Week | Topics | Activities |
| Module 1 | Introduction to Text Analytics and Natural Language Processing (NLP) | Post a self-introduction on the discussion board Quiz & Group Project |
| Module 2 | Data Collection and Pre-processing – Popular Python Libraries for Scraping Natural Language Processing (NLP) – NLTK, SpaCy – Basic Text Cleaning: Regular expressions (RegEx), text normalization, stopword removal, stemming/lemmatization, tokenization, sentence segmentation. – Part of Speech (PoS) tagging – Named Entity Recognition (NER) Exploratory data analysis (EDA) and visualization | Selected Individual lit review presentations Quiz & Group Project |
| Module 3 | Text Representation and Embeddings – Text Vectorization: Bag of Words (BoW), TF-IDF – Word embeddings: Word2Vec, GloVe, fastText – Contextual embeddings: BERT/GPT embeddings (for representation only, not full modeling) – Semantic representations | Selected Individual lit review presentations Quiz & Group Project |
| Module 4 | Text Classification – Text feature extraction: CountVectorizer, TfidfTransformer, TfidfVectorizer – Text classification (Supervised learning) algorithms: Logistic regression, Naive Bayes, KNN, SVM, Decision Tree. – Sample Balancing – Model Performance Evaluation | Selected Individual lit review presentations Quiz & Group Project |
| Module 5 | Text Similarity, Retrieval, and Clustering – Cosine similarity – Nearest-neighbor retrieval – Semantic search – Embedding-based clustering – K-means | Selected Individual lit review presentations Quiz & Group Project |
| Module 6 | Topic Modeling – Lexicon-based: LDA (Latent Dirichlet Allocation), NMF (Non-Negative Matrix Factorization) – Transformer-based: BERTopic | Selected Individual lit review presentations Quiz & Group Project |
| Module 7 | Sentiment Analysis – Lexicon-based: TextBlob, VADER (Valence Aware Dictionary and sEntiment Reasoner) – Transformer-based: BERT, FinBERT | Selected Individual lit review presentations Quiz & Group Project |
| Module 8 | Modern NLP and Generative AI – Transformers – LLM (Large Language Model ) – Vector databases (concepts) – Retrieval-Augmented Generation (RAG) – Prompt Engineering – Generative AI (GPT, T5) | Selected Individual lit review presentations Quiz & Group Project |
| Module 9 | Group project presentation | Group project presentations and deliverables |
9. University Policies
Accessibility: Cal Poly Pomona is committed to student success as a learning-centered university. Students with disabilities are encouraged to contact the instructor privately or to visit the Disability Resource Center to coordinate course accommodations.
Computing Resources: At Cal Poly Pomona, computers and communications links to remote resources are recognized as being integral to the education and research experience. Every student must have access to a computer with all the required software for this course. Contact I&IT if you need help.
Academic Integrity: The University is committed to maintaining academic integrity throughout the university community. Academic dishonesty is a serious offense that can diminish the quality of scholarship, the educational environment, the academic reputation, and the quality of a Cal Poly Pomona degree. Plagiarism or cheating will not be tolerated in this course.
Copyright Policy: Copyright laws and fair use policies protect the rights of those who have produced the material. The copy in this course has been provided for private study, scholarship, or research. Other uses may require permission from the copyright holder. The user of this work is responsible for adhering to the copyright law of the U.S. (Title 17, U.S. Code). The course website contains material protected by copyrights held by the instructor, other individuals, or institutions. Such material is used for educational purposes in accordance with copyright law and/or with permission given by the owners of the original material. Students may download one copy of the materials on any single computer for non-commercial, personal, or educational purposes only, provided that (1) do not modify it, (2) use it only for the duration of this course, and (3) include both this notice and any copyright notice originally included with the material. Beyond this use, no material from the course website may be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any way without the original copyright holder’s permission. The instructor assumes no responsibility for individuals who improperly use copyrighted material placed on the website.