Generative AI/Data
- LLM Twin 프로젝트로 설명하는 데이터 수집 파이프라인 2025.02.06
- 파인튜닝을 위한 데이터 합성 방법 정리 2025.02.04
- NVIDIA: Curating Trillion-Token Datasets: Introducing NVIDIA NeMo Data Curator 2025.01.26
- NVIDIA: Synthetic Data Generation 2025.01.25
- What Makes Good Data For Alignment? A Comprehensive Study of Automatic Data Selection In Instruction Tuning 2025.01.25
- Alpagasus: Traning A better Alpaca with Fewer Data 2025.01.23
- Code Less, Align More: Efficient LLM Fine-tuning for Code Generationwith Data Pruning 2025.01.21
- ShareGPT4V: Improving Large Multi-Modal Models with Better Captions 2025.01.20
- Enhancing Chat Language Models by Scaling High-quality Instructional Conversations 2025.01.20
- GENIE: Achieving Human Parity In Content-Grounded Datasets Generation 2025.01.19