텍스트 어노테이터 스키마

텍스트 어노테이터 전용 스키마

지원 도구(Tools)

Tool Code	설명
`view`	뷰 도구
`named_entity`	개체명 인식 (NER)
`relation`	개체 간 관계
`edit`	편집 도구
`script-generation`	스크립트 생성
`annotation-tag`	어노테이션 태그
`relation-arrow`	관계 화살표

텍스트 전용 메타데이터(`extra`)

필드 예시 (필요에 따라 자유롭게 확장)

키	타입	용도
textContent	string	어노테이션 대상 원본문
language	string	ISO 639-1 언어 코드(예: “ko”, “en”)
charCount	number	원본문 총 문자 수

annotationsData 구조

{
  "id": "Xu0ft1vX1a",
  "tool": "named_entity",
  "ranges": [
    {
      "start": "/p[1]", // 시작 XPath 경로
      "end": "/p[1]", // 종료 XPath 경로
      "startOffset": 7, // 시작 문자 오프셋 (0-base)
      "endOffset": 12 // 종료 문자 오프셋 (exclusive)
    }
  ],
  "content": "출근 하는" // 추출된 엔티티 텍스트
}

어노테이션 툴별 고유 데이터

툴(tool)	필수·고유 필드	타입	설명
named_entity	start	number	엔티티 시작 오프셋(문자 인덱스)
	end	number	엔티티 종료 오프셋(포함 안 됨)
	text (선택)	string	추출된 엔티티 원문

모든 어노테이션 객체는 **공통 AnnotationBase*를 상속하므로id, tool, classification, isLocked, isVisible 등의 공통 속성을 그대로 가집니다.

`annotations` vs `annotationsData` 패턴

annotations 컬렉션에는 메타 정보(분류, 잠금 여부 등)만 저장
동일 ID를 가진 annotationsData 항목에 위치 정보(start, end 등)와 선택적 원문을 보관
이원화 구조로 대용량 텍스트에서 메타와 데이터의 분리를 유지

관계(`Relations`) 활용 예시

여러 named_entity가 같은 개체를 가리키면, relations에 coreference 타입으로 묶습니다.
문장 간 종속성, 이벤트-엔티티 연결 등 다양한 텍스트 관계 태깅도 동일 방식으로 구현합니다.

버전 관리 & 확장 가이드

새 텍스트 툴(예: sentence_span, event)이 필요하면 위 표에 행을 추가하고 고유 필드만 기술하세요.
언어별 특수 필요(예: 형태소 오프셋, 토큰 ID 등)는 extra 혹은 툴 전용 필드로 확장합니다.
공통 규약 변경이 없으면 본 문서만 수정해도 됩니다.

샘플

개체명 인식(NER)과 관계 추출을 포함한 예제:

{
  "extra": {
    "text_1": {
      "originalText": "삼성전자 이재용 회장이 서울 강남구 본사에서 2024년 1분기 실적을 발표했다.",
      "language": "ko",
      "documentType": "news"
    }
  },
  "relations": {
    "text_1": [
      {
        "id": "ner_001ner_002",
        "tool": "relation",
        "isLocked": false,
        "isVisible": true,
        "isValid": true,
        "annotationId": "ner_001",
        "targetAnnotationId": "ner_002",
        "classification": {
          "class": "소속",
          "relation_type": "affiliation"
        },
        "label": ["소속"]
      },
      {
        "id": "ner_002ner_003",
        "tool": "relation",
        "isLocked": false,
        "isVisible": true,
        "isValid": true,
        "annotationId": "ner_002",
        "targetAnnotationId": "ner_003",
        "classification": {
          "class": "위치",
          "relation_type": "located_at"
        },
        "label": ["위치"]
      }
    ]
  },
  "annotations": {
    "text_1": [
      {
        "id": "ner_001",
        "tool": "named_entity",
        "isLocked": false,
        "isVisible": true,
        "isValid": true,
        "classification": {
          "class": "기관",
          "entity_type": "ORG",
          "sub_type": "company"
        },
        "label": ["기관", "ORG"]
      },
      {
        "id": "ner_002",
        "tool": "named_entity",
        "isLocked": false,
        "isVisible": true,
        "isValid": true,
        "classification": {
          "class": "인물",
          "entity_type": "PER",
          "role": "executive"
        },
        "label": ["인물", "PER"]
      },
      {
        "id": "ner_003",
        "tool": "named_entity",
        "isLocked": false,
        "isVisible": true,
        "isValid": true,
        "classification": {
          "class": "장소",
          "entity_type": "LOC",
          "loc_type": "address"
        },
        "label": ["장소", "LOC"]
      },
      {
        "id": "ner_004",
        "tool": "named_entity",
        "isLocked": false,
        "isVisible": true,
        "isValid": true,
        "classification": {
          "class": "날짜",
          "entity_type": "DATE",
          "date_type": "quarter"
        },
        "label": ["날짜", "DATE"]
      },
      {
        "id": "cls_001",
        "tool": "classification",
        "isLocked": false,
        "isVisible": true,
        "isValid": true,
        "classification": {
          "class": "문서_분류",
          "category": "economy",
          "sentiment": "neutral"
        },
        "label": ["문서_분류"]
      }
    ]
  },
  "annotationsData": {
    "text_1": [
      {
        "id": "ner_001",
        "tool": "named_entity",
        "ranges": [
          {
            "start": "/p[1]",
            "end": "/p[1]",
            "startOffset": 0,
            "endOffset": 4
          }
        ],
        "content": "삼성전자"
      },
      {
        "id": "ner_002",
        "tool": "named_entity",
        "ranges": [
          {
            "start": "/p[1]",
            "end": "/p[1]",
            "startOffset": 5,
            "endOffset": 10
          }
        ],
        "content": "이재용 회장"
      },
      {
        "id": "ner_003",
        "tool": "named_entity",
        "ranges": [
          {
            "start": "/p[1]",
            "end": "/p[1]",
            "startOffset": 12,
            "endOffset": 22
          }
        ],
        "content": "서울 강남구 본사"
      },
      {
        "id": "ner_004",
        "tool": "named_entity",
        "ranges": [
          {
            "start": "/p[1]",
            "end": "/p[1]",
            "startOffset": 24,
            "endOffset": 33
          }
        ],
        "content": "2024년 1분기"
      },
      {
        "id": "cls_001"
      }
    ]
  },
  "annotationGroups": {
    "text_1": []
  },
  "assignmentId": 5001
}

텍스트 어노테이터 스키마

텍스트 어노테이터 전용 스키마

지원 도구(Tools)

텍스트 전용 메타데이터(extra)

annotationsData 구조

어노테이션 툴별 고유 데이터

annotations vs annotationsData 패턴

관계(Relations) 활용 예시

버전 관리 & 확장 가이드

샘플

텍스트 전용 메타데이터(`extra`)

`annotations` vs `annotationsData` 패턴

관계(`Relations`) 활용 예시