[Alg] NP Problem

Two approaches

1) develope a more efficient algorithm.

2) try to prove that more efficient algorithm is NOТ possible

==> Quit/Stop.

1) try to find a more efficient algorithm using NEW design

: O(n^3) -> O(n^2) -> O(nlogn)

2) try to find a greater(better) lower bound using computational analysis.

Example

1. Sorting: nlogn (comparison)

-> omega(nlogn): lower bound already found, algoritms already found/used

2. Search

-> binary search (TC) is as good as lower bound

-> omega(logN)

=> upper bound를 점점 낮추는 것이 알고리즘 developement

=> lower bound를 점점 높여서 만나는 지점에서, 알고리즘 개발을 멈출 수 있을 것

NP Theory

Solving a problem (for Normal/Classic Problems)

1. Design algorithms (for optimal/exact solution)

2. Analysis: time complexity, proofs, lower bound

IF we can NOT find efficient algorithms? (NP문제)

1. very slow algorithm for optimal solution

2. optimal solution 포기-> approximate solution 채택 (suboptimal, near-optimal solutions)

===> Approximation algorithm (NEW) 설계/분석

Problem Classification

"P <= NP <= EXP <= Solvable <= Countable"

Q1: 0~10 사이 정수	9개	Finite, Countable	P
Q2: 0~10 사이 실수	무한개	Uncountable	Not countable / unsolvable as counting
Q3: 0~∞ 사이 정수	무한개	Infinite, Countable	Solvable (정의 가능), Not finite
Q4: 0~1 실수 > 0~10 정수?	예	Uncountable vs Finite	P (간단한 비교)

Q5. Sort 문제(P)는 KS(NP)보다 쉬운가? 증명하시오
→ 문제의 난이도 클래스 확인 → proper design/analysis

- 자연수 집합과 일대일 대응이 가능하면 countable

- 자연수 집합과 일대일 대응이 불가능하면 uncountable

- 문제가 애초에 어떤 정답이 있는지조차 컴퓨터로 항상 판단할 수 없는 경우면 unsolvable

Easy problem vs intractable problem

1) easy problem: good algorithm 존재, polynomial TC.

2) intractable problem: difficult to treat or work

(NO Polynomial-time algorithm), NP(X)

P-class:

1. the class of decision problems that are polynomially bounded

2. T(n) = polynomial function of 'n'

3. problems solvable in polynomial time

EXP-class:

problems solvable in Exponential time

me (N-queens, KS, ...)

polynomial 알고리즘을 찾지 못했다고 해서 그 문제가 p-class가 아니라고 단정지을 수 없다

exp 알고리즘을 찾았다고 해서 그 문제가 exp-class라고 단정지을 수도 없다

=> 언젠가 polynomial 알고리즘을 찾을 수도 있다 ==> 수학적 정의 필요!

3 categories:

1. Problem proven to be in P-class:

sort, search, CMM, SP, Dijkstra, ... -> easy, efficient, fast, optimal, tractable.

2. Problem proven to be NOT in P-class: "a few"

"intractable"

: O(n^2) X, O(n^4) X O(n!) O

: determine ALL Hamiltonian paths -> O(n!) only BF

: Non-polynomial amount of output

"halting problem": given a computer program(algorithm), does it stop/halt? YES/NO.
: NOT in P-class임이 증명
: famous because it was the FIRST problem proven to be uncomputable/undecidable.

halting problem은 unsolvable

3. Problem proven "neither" to be in P "nor" to be intractable.
== neither (1) nor (2)
: Knapsack, Graph coloring, TSP, Subset sum, ....

심증 : ( 1 ) 난이도 << ( 3 ) 난 이 도
(3) could be (1) in the future
(3) 문제 범주 = NP-class 문제

NP Definition (loosely speaking):
NP is the class of decision problems for which a given
solution can be checked quickly (in polynomial time) to see if it is true solution

- Solution을 빠르게 계산하는 nice algorithm을 찾지 못했을 때,
- 일단 가정: Solution이 주어졌다고 가정하고 출발

Optimization Problem vs Decision Problem

TSP (Optimization problem → optimization prob ver, decision prob ver)

- Optimization problem: find optimal tour

- Decision problem: Is there a TSP tour with cost < 200? → Yes / No

Knapsack

- Optimization problem: maximize total profit

- Decision problem: Is there a set AA with profit > 100? → Yes / No

Graph Coloring

- Optimization problem(version): minimize the number of colors

- Decision problem: Is the graph m-colorable? → Yes or No (e.g., via backtracking)

Clique Problem

- Clique: 완전 부분 그래프 (예: Bioinformatics, SNS 등에서 활용)

- Clique size 예시: size = 4 (파란색), size = 3 (초록색)

- Optimization Problem: 최대 clique를 찾아라 (find max. clique)

- Decision Problem: 그래프에 크기 k 이상의 clique가 존재 하는가? → Yes or No

Q: TSP는 P인가? NP인가?
A: Optimization version TSP는 NP-Hard, Decision version은 NP-Complete

Q: Sorting 문제는 P인가? NP인가?:

A: Sorting문제는 P문제

Knapsack_Optimization ver : NP (x) NP-Hard (o)

Item set A가 주어졌을 때, 최대 500 profit이 maximum인지 P-time에서 확인 가능한가?

→ 2ⁿ개의 조합을 비교하여 real max, optimal 여부를 판별할 수 있다.

Knapsack_Decision ver : NP (o)

Deterministic (결정적)

어떤 입력을 넣으면 항상 똑같은 출력이 나오는 시스템

Stochastic (확률적, random, probabilistic)

무작위(randomness)를 기반으로 가능한 선택지 중 하나를 랜덤하게 고르고,

그 결과가 성공인지 실패인지는 운에 따라 결정됨.

동일한 인풋에서 아웃풋이 살짝 살짝씩 변한다

Nondeterministic

마치 모든 가능한 경우를 동시에 시도해보고, 그 중에서 정답인 경우만 딱 골라서 결과를 내는 것처럼 동작.

이론적 개념(현재의 폰노이만 아키텍쳐에서는 구현 불가능)

Magic algorithm = Nondeterministic algorithm

Nondeterministic Algorithm Machine

1. Nondeterministic Guess (step 1)
: given a problem instance, guess a correct solution

in unit time if exists. (without trying all options)

2. Deterministic Verification (step 2)
: verify the gussed solution

NP의 정의
1. Decision problems in P.time via a magic algorithm
2. Decision problems with solutions that can be checked(verifiable) in P time

3. set of all problems solvable by polynomial time Nondeterministic algorithm

4. solvable in polynomial time by Nondeterministic algorithm

=> Nondeterministic 알고리즘이 Polynomial time 안에 풀 수 있는 문제

= set of decision problems solvable in polynomial time by nondeterministic algorithm

P == NP (미해결 문제, probably no)

P⊆NP (Yes)

Almost all important problems in CS are in NP.

Too many problems cannot be proven to be in P.

Only one of NP-C is in P ⇒ All of them are in P.

NP 중에서 제일 대표적이고 어려운, core, 완전한 문제 → NP-Complete 문제

only one of NP-C problems is polynomial time solvable, then so are all of them

(하나의 NP-C라도 다항 시간 내에 풀 수 있다는 것이 밝혀지면 다른 모든 NP 문제도 다항 시간 내에 풀 수 있는 것이 증명되므로)

(P == NP임이 증명된다, 현재는 P가 NP에 포함된다는 것만 밝혀진 상태: P이면 NP이다)

- A문제가 B문제로 바뀔 수 있다 (transform, reduce)

- A is transfromed into B

- A is reduced into B

Def: NP-Complete

Problem B is NP-Complete if

[Definition 1]

Problem B is NP-Complete
if 1) B is in NP and
2) for all A in NP, A => B (transform, reduce).

(B가 NP이고, 모든 NP 문제를 polynomial 시간 안에 B로 환원할 수 있어야 한다)

(그러나 수학적으로 2번을 증명하기가 너무 어려움)

[Definition 2]

Problem C is NP-Complete
if 1) C is NP and
2) for NP-Complete B, B = > C. (transform, reduce)
NP-C인 B를 알 수 없기 때문에, C가 NP-C임을 증명할 수도 없다

(C가 NP이고, 모든 NP-C 문제를 polynomial 시간 안에 B로 환원할 수 있어야 한다)

(NP-C인 B를 알 수 없기 때문에, C가 NP-C로 임을 증명할 수도 없었는데..)

NP-Complete 문제란

NP에 속하는 문제이면서,
다른 모든 NP 문제(좁은 정의에서는 이미 알려진 하나의 NP-Complete 문제)가 다항시간 내에 환원될 수 있는 문제

Cook's Theorem (1971)
"If CNF problem is P, then P==NP."

CNF problem는 NP-C임이 증명된 최초의 문제

- CNF를 해결하는 polynomial 알고리즘이 하나 존재한다면, 다른 모든 np

CNF-Satisfiability (Boolean SAT) Problem
CNF: Conjunctive Normal Form (논리곱 표준형)

= SAT (Satisfactory Problem, 논리식 만족 문제)
ex. (x1 or x2) and (x2 or not x3) and not x2 = F (bool expr)
CNF (DP ver): is there a truth assignment for x1, x2, x3

- true로 만드는 x1, x2, x3에 대한 진리표 할당이 있을 수 있느냐 (Decision Problem: 대답 예/아니요)

Theorem: If Decision Problem B is P-class and A =>p B, then A is also P

SAT는 Decision Problem, transform 가능한 문제 역시 DP 형태를 가져야함

SAT(Decision Problem ver) ⇒ p(나의 문제, Decision Problem)

Motivation: 주어진 어려운 문제를 NP-C로 해결할 수 있다

SAT =>p (My Problem)

: NP-C 중에 하나가 해결되면, 다른 NP-C도 변환해서 해결할 수 있다

Hamiltonian Circuit

간선에 가중치가 없음.

순수하게 경로의 존재 유무만 판단.

본질적으로 Decision Problem임

NP-Complete 문제.

TSP Decision Problem

간선에 가중치가 있음.

역시 NP-Complete 문제.

(SAT ≤p HC)

(HC ≤p TSP-DP)

1) TSP의 Decision Problem은 NP이다: 정답이 주어졌을 때, O(n)만에 판별 가능

2) HC가 NC-Complete이고, HC를 polynomail time 안에 TSP Decision Problem으로 환원할 수 있다

[HC → TSP Decision Problem 환원]

- HC에 있던 엣지는 TSP로 옮기면서 웨이트를 1로 주고

- HC에 없던 엣지는 TSP로 옮기면서 기존에 있던 값보다 훨씬 큰 값을 넣어준다

- 2가 할당되어 있는 엣지는 절대 정답에 포함될 수 없으므로, TSP로 답을 풀면 HC 문제의 답을 구할 수 있다

전제: Subset sum 문제가 NP-C이다

1) 0/1 Knapsack은 NP문제이다: 특정 해가 주어졌을 때, O(n)만에 검증할 수 있으므로 NP임을 증명 가능

2) Subset sum 문제가 NP-C이고, subset sum을 polynomial time 안에 0/1 KS로 환원할 수 있다

따라서 0/1 KS도 NP-C이다

[Subset Sum → 0/1 Knapsack 환원]

Subset Sum 문제:

- 입력: 정수 집합 S={a1,a2,…,an}, 목표합 T

- 질문: 의 어떤 부분집합의 합이 정확히 T가 되는가?

이걸 0/1 Knapsack 문제로 바꾸는 방법:

아이템	원소 a_i	아이템 i
무게 wi	a_i	a_i
가치 vi	없음	a_i ← 무게와 동일하게 설정
배낭 용량 W	목표합 T	T

1) Clique의 Decision Problem은 NP이고

2) SAT가 polynomial time안에 Clique의 DP로 환원될 수 있기 때문에, Clique의 DP도 NP-Complete이다

TSP는 NP-Complete? => No

TSP-DP는 NP-Complete? => Yes

Subset Sum, HC, N-Queens는 본질적으로 Decision Problem

Knapsack, TSP는 본질적으로 Optimization Problem

Optimization 버전으로 제시된 문제는 ==> NP-Hard category

[Definition]

A is NP-Hard

if for NP-Complete인 B에 대해서, B =>p A

(A가 NP 문제일 것을 요하지 않는다)

HC-DP: 100개 도시를 1시간 내에 다 돌고 돌아올 수 있느냐? => O(n)에 검증 가능 => NP

TSP-DP: “비용이 k 이하인 해가 존재하는가? ”해답(경로)을 주면 그 해답을 다항 시간 내에 검증 가능 => NP

TSP-OP: 최소 경로가 주어졌을 때, 이 경로가 optimal이라는 것 검증 => O(n!) 필요하다 => NP에 속하지 않음!

Knapsack (default OP) => NP-Hard

=> given sollution: O(2^n) verified required

대응하는 결정 문제가 NP-Complete라면
그 최적화 문제는 적어도 NP-Hard임이 보장

[Approximation Algorithm: TSP]

: MST만든 다음에 pre-order 순회

MST: short edge 연결, no cycle

Approximation design:
1. determine a MST (using Prim's alg) : ranking (=> 가장 dominant한 것은 mst 만드는 것: O(n^2))
2. Create a path around the MST (using preorder visit)

Approximation analysis:

1. Time complexity: poly.time O(n^2)

2. Boundness (theorm, proof)

: approximate sol < opt sol (관계성 보여주는 것)

Boundness: approxiate path (p) < 2*optimal path (p*)

cost(MST.path) < cost(p*)
cost(Full walk on MST) = 2 * cost(MST.path)
-> cost(Full walk on MST) < 2 * cost(p*)
cost(p) < cost(Full walk on MST) (by triangular inequality)
-> cost(p) < 2 * cost(p*)

(여기서 p*는 optimal이고, p는 mst를 preorder로 순회한 것)

[Approximation: Bin Packing]

n items with sizes: s1, s2, ... sn (0 < si <= 1)

--> find min # bins (unit-cap bins)

Bin Packing Problem은 NP-Hard 문제

Subset Sum 문제를 Bin Packing 문제로 환원

Subset Sum의 목표 합 t를 bin의 크기 1로 정규화(normalization)

각각의 정수 aᵢ를 bin packing의 item으로 바꾸고, sᵢ = aᵢ / t 로 정규화

subset sum은 NP-Complete이고, 다항 시간 안에 bin packing 문제로 환원될 수 있지만

bin packing 문제가 NP가 아니기 때문에(Optimization), bin packing은 NP-Hard에 속한다

Example: n=8, item size= 0.85, 0.1, 0.5, 0.2, 0.4, 0.4, 0.3, 0.2

Approximation design: NFF(Nonincreasing First-Fit)

1. sort the items in nonincreasing order (greedy ranking)

2. pack into as close bin as possible in First Fit manner

=> 이 방식으로 위의 예시를 풀면 4개가 필요하지만, Optimal은 3개이다

Approximation analysis:

1. time complexity: T(n) = O(nlogn) : 이 알고리즘의 dominant part는 정렬하는 부분

2. bound theorm (qaulity)

---> approximate bin < optimal bin * 1.5 (theorm)

---> "any items placed by NFF in an extra bin has size <= 1/3" (lemma)를 이용하여 위의 관계 증명 가능

티스토리툴바